Users interact with museum application interfaces for many reasons. There are various types of users, who want to perform various tasks, in various contexts, that can access the same Web site. Thus, it is important to have user interfaces able to adapt to these different user requirements to facilitate the accomplishment of the desired goals. Most current interfaces to museum information do not take into account this variety of types of users, thus providing interfaces that some users find confusing to achieve their goals. In this article we discuss the various possible levels of support that can be given to different users during navigation of museum information. In particular, we focus our attention on how to obtain adaptable and adaptive interfaces using the web site for the Marble Museum, which we have designed and developed, as a source of examples for our discussion of possible solutions.
This article begins by presenting and discussing the distinction between record-oriented and pattern-oriented search. Examples of record-oriented (or item-oriented) questions include: "What (or how many, etc.) glass items made prior to 100 A.D. do we have in our collection?" and "How many paintings featuring dogs do we have that were painted during the 19th century, and who painted them?" Standard database systems are well suited to answering such questions, based on the. data in, for example, a collections management system. Examples of pattern-oriented questions include: "How does the (apparent) production of glass objects vary over time between 400 B.C. and 100 A.D.?" and "What other animals are present in paintings with dogs (painted during the 19th century and in our collection)?" Standard database systems are not well suited to answering these sorts of questions (and pattern-oriented questions in general), even though the basic data is properly stared in them. To answer pattern-oriented questions it is the accepted solution to transform the underlying (relational) data to what is called the data cube or cross tabulation form (there are other forms as well). We discuss how this can be done for non-numeric data, such as ave found widely in museum collections and archives. Further we discuss and demonstrate two distinct, but related, approaches to exploring for patterns in such cross tabulated museum data. The two approaches have been implemented as the prototype systems Homer and MOTC. We conclude by discussing initial experimental evidence indicating that these approaches are indeed effective in helping people find answers to their pattern-oriented questions of museum and archive collections.
This article examines, the sociotechnological impact of introducing collaborative technologies into the Spurlock Museum, a museum of world history and culture at the University of Illinois. It presents the results of a study that focused on the development of advanced information technology to support asynchronous collaboration between curators and exhibit designers planning a new museum facility. It highlights the importance of constructing a virtual museum in which collections management systems are integrated with on-line exhibit information in a dynamic fashion, and presents the methodologies we used to link the Spurlock's database systems to the Internet to allow more effective collaboration between those individuals planning the new facility. If: includes an analysis of the impact these technological innovations have had on the social infrastructure of the Spurlock Museum, and in particular, on the relationship between the Spurlock's curators and exhibit designers.
Although art appreciation/exploration is essentially a private experience, it cannot exist outside of a social context. Digital environments offer great potential for the enhancement of collaborative aspects of both art creation and art exploration. However, the current notion of a digital environment is vague, and most often associated with the traditional concepts of computer use. Thus, the goal of this article is twofold: (a) to present an analysis of the characteristics of digital environments, and (b) to suggest their potential uses in the building of collaborative pedagogical procedures for the digital medium.
The internet provides opportunities to conduct surveys more efficiently and effectively than traditional means. This article reviews previous studies that use the Internet for survey research. It discusses the methodological issues and problems associated with this new approach. By presenting a case study, it seeks possible solutions to some of the problems, and explores the potential the Internet can offer to survey researchers.
The issue of reducing the space overhead when indexing large text databases is becoming more and more important, as the text collections grow in size. Another subject, which is gaining importance as text databases grow and get more heterogeneous and error prone, is that of flexible string matching. One of the best tools to make the search more flexible is to allow a limited number of differences between the words found and those sought. This is called "approximate text searching," which is becoming more and more popular. In recent years some indexing schemes with very low space overhead have appeared, some of them dealing with approximate searching. These low overhead indices (whose most notorious exponent is Glimpse) are modified inverted files, where space is saved by making the lists of occurrences point to text blocks instead of exact word positions. Despite their existence, little is known about the expected behavior of these "block addressing" indices, and even less is known when it comes to cope with approximate search. Our main contribution is an analytical study of the space-time trade-offs for indexed text searching. We study the space overhead and retrieval times as functions of the block size. We find that, under reasonable assumptions, it is possible to build an index which is simultaneously sublinear in space overhead and in query time. This surprising analytical conclusion is validated with extensive experiments, obtaining typical performance figures. These results are valid for classical exact queries as well as for approximate searching. We apply our analysis to the Web, using recent statistics on the distribution of the document sizes. We show that pointing to documents instead of to fixed size blocks reduces space requirements but increases search times.
This experiment searches an online library catalog employing author surnames, plus title words of books in citations of eight scholarly works whose authors selected the title words used as being recallable. Searches comprising surname together with two recallable title words, or one if only one was available, yielded a single-screen miniature catalog (minicat) 99.0% of the time.
In the logical approach to information retrieval (IR), retrieval is considered as uncertain inference. Whereas classical in models are based on propositional logic, we combine Datalog (function-free Horn clause predicate logic) with probability theory. Therefore, probabilistic weights may be attached to both facts and rules. The underlying semantics extends the well-founded semantics of modularly stratified Datalog to a possible worlds semantics. By using default independence assumptions with explicit specification of disjoint events, the inference process always yields point probabilities. We describe an evaluation method and present an implementation. This approach allows for easy formulation of specific retrieval models for arbitrary applications, and classical probabilistic in models can be implemented by specifying the appropriate rules. In comparison to other approaches, the possibility of recursive rules allows for more powerful inferences, and predicate logic gives the expressiveness required for multimedia retrieval. Furthermore, probabilistic Datalog can be used as a query language for integrated information retrieval and database systems.
The purpose of this study is to analyze the relationship between citation ranking and peer evaluation in assessing senior faculty research performance. Other studies typically derive their peer evaluation data directly from referees, often in the form of ranking, This study uses two additional sources of peer evaluation data: citation content analysis and book review content analysis. Two main questions are investigated: (a) To what degree does citation ranking correlate with data from citation content analysis, book reviews, and peer ranking? (b) Is citation ranking a valid evaluative indicator of research performance of senior faculty members? Citation data, book reviews, and peer ranking were compiled and examined for faculty members specializing in Kurdish studies. Analysis shows that normalized citation ranking and citation content analysis data yield identical ranking results. Analysis also shows that normalized citation ranking and citation content analysis, book reviews, and peer ranking perform similarly (i.e., are highly correlated) for high-ranked and low-ranked senior scholars, Additional evaluation methods and measures that take into account the context and content of research appear to be needed to effectively evaluate senior scholars whose performance ranks relatively in the middle. Citation content analysis data did appear to give some specific and important insights into the quality of research of these middle performers, however, further analysis and research is needed to validate this finding. This study shows that citation ranking can provide a valid indicator for comparative evaluation of senior faculty research performance.
This study investigates the publication rates of successful doctoral students in the fields of analytical chemistry, experimental psychology, and American literature. Publication rates were calculated for a sample of those receiving doctorates every fifth year between 1965 and 1995. An analysis of the data revealed that there were differences in publication rates of the doctoral students that mirrored differences between the fields as a whole. The data also show that the decline in solo authorship observed in many fields is present in this population as well. Participation rates (percent that had at least one publication) for chemistry and literature showed initial increases followed by plateaus since the early 1980s, while the participation rate in psychology has remained between 40 and 60 percent over the course of the study.
One aim of science evaluation studies is to determine quantitatively the contribution of different players (authors, departments, countries) to the whole system. This information is then used to study the evolution of the system, for instance to gauge the results of special national or international programs. Taking articles as our basic data, we want to determine the exact relative contribution of each coauthor or each country. These numbers are then brought together to obtain country scores, or department scores, etc. It turns out, as we will show in this article, that different scoring methods can yield totally different rankings. In addition to this, a relative increase according to one method can go hand in hand with a relative decrease according to another counting method. Indeed, we present examples in which country (or author) c has a smaller relative score in the total counting system than in the fractional counting one, yet this smaller score has a higher importance than the larger one (fractional counting). Similar anomalies were constructed for total versus proportional counts and for total versus straight counts. Consequently, a ranking between countries, universities, research groups or authors, based on one particular accrediting method does not contain an absolute truth about their relative importance. Different counting methods should be used and compared. Differences are illustrated with a real-life example. Finally, it is shown that some of these anomalies can be avoided by using geometric instead of arithmetic averages.
Observed aging curves are influenced by publication delays. In this article, we show how the "undisturbed" aging function and the publication delay combine to give the observed aging function. This combination is performed by a mathematical operation known as convolution. Examples are given, such as the convolution of two Poisson distributions, two exponential distributions, and two lognormal distributions. A paradox is observed between theory and real data.
The KeyWords Plus in the Science Citation Index database represents an approach to combining citation and semantic indexing in describing the document content. This paper explores the similarities or dissimilarities between citation-semantic and analytic indexing. The dataset consisted of over 400 matching records in the SCI and MEDLINE databases on antibiotic resistance in pneumonia. The degree of similarity in indexing terms was found to vary on a scale from completely different to completely identical with various levels in between. The within-document similarity in the two databases was measured by a variation on the Jaccard Coefficient-the Inclusion Index. The average inclusion coefficient was 0.4134 for SCI and 0.3371 for MEDLINE, The 20 terms occurring most frequently in each database were identified. The two groups of terms shared the same terms that consist of the "intellectual base" for the subject. Conceptual similarity was analyzed through scatterplots of matching and nonmatching terms vs, partially identical and broader/narrower terms. The study also found that both databases differed in assigning terms in various semantic categories. Implications of this research and further studies are suggested.
Previous research describing Web page and link classification systems resulting from a content analysis of over 75 Web pages left us with four unanswered questions, (1) What is the most useful application of page types: as descriptions of entire pages or as components that are combined to create pages? (2) Is there a kind of analysis that we can perform on isolated anchors, which can be text, icons, or both together, that is equivalent to the syntactic analysis for embedded and labeled anchors? (3) How explicitly are readers informed about what can be found by traversing a link, especially for the relatively broad categories of expansion and resource links? (4) Is there a relationship between the type of link and whether its target is a whole page or a fragment, or if its target is in the same site or a different site than its source? This article examines these questions under the assumption that the author and the reader of Web pages will cooperate in order to have successful communication. Our discussion leads to ideas of how author-provided context and readers' expectations and experience are combining to form new stylistic conventions and genres on the Web.
Researchers evaluated the ability of 4th and 5th grade science and social science students to create Dublin Core metadata to describe their own images for inclusion in Digital portfolio Archives. The Dublin Core was chosen because it provided a systematic, yet minimal way for students to describe these resources at the item level and relate them to collection-level metadata prepared for digitized primary sources by archivists using Encoded Archival Description (EAD). Researchers found that while students were able to supply simple elements such as title and subject with relative ease, they had difficulty moving between general and progressively more granular or refined descriptive elements. Students performed poorly in distinguishing between and completing related, but distinct metadata elements, such as title, subject, and description. Researchers also found that there are still significant issues that need to be addressed if young users in a variety of learning contexts, especially those who are only recently literate, are to be able to make sense of richer metadata such as EAD that is used to describe collections of primary source material.
Genre conventions emerge across discourse communities over time to support the communication of ideas and information in socially and cognitively compatible forms. Digital genres frequently borrow heavily from the paper world even though the media optimally support different forms, structures, and interactions. This research sought to determine the existence and form of a truly digital genre. Results from a survey of user perceptions of the form and content of web home pages reveal a significant correlation between commonly found elements of home pages and user preferences and expectations of type. These data support the argument that the personal home page has rapidly evolved into a recognizable form with stable, user-preferred elements and thus may be considered the first truly digital genre.
The recent explosion of the Internet and the World Wide Web has made digital libraries popular. Easy access to a digital library is provided by commercially available Web browsers, which provide a user-friendly interface. To retrieve documents of interest, the user is provided with a search interface that may only consist of one input field and one push button. Most users type in a single keyword, click the button, and hope for the best. The result of a query using this kind of search interface can consist of a large unordered set of documents, or a ranked list of documents based on the frequency of the keywords. Both lists can contain articles unrelated to the user's inquiry unless a sophisticated search was performed and the user knows exactly what to look for. More sophisticated algorithms for ranking the search results according to how well they meet the users' needs as expressed in the search input may help. However, what is desperately needed are software tools that can analyze the search result and manipulate large hierarchies of data graphically. In this article we describe the design of a language-independent document classification system being developed to help users of the Florida Center for Library Automation analyze search query results. Easy access through the Web is provided, as well as a graphical user interface to display the classification results. We also describe the use of this system to retrieve and analyze sets of documents from public Web sites.
The creation of large, networked, digital document resources has greatly facilitated information access and dissemination, We suggest that such resources can further enhance how we work with information, namely, that they can provide a substrate that supports collaborative work. We focus on one form of collaboration, annotation, by which we mean any of an open-ended number of creative document manipulations that are useful to record and to share with others. Widespread digital document dissemination required technological enablers, such as web clients and servers. The resulting infrastructure is one in which information may be widely shared by individuals across administrative boundaries. To achieve the same ubiquitous availability for annotation requires providing support for spontaneous collaboration, that is, for collaboration across administrative boundaries without significant prior agreements. Annotation is not more commonplace, we suggest, because the technological needs of spontaneous collaboration are challenging. We have developed a document model, called multivalent documents, which provides a means to address these challenges. In the multivalent document model, a document comprises distributed data and program resources, called layers and behaviors, respectively. Because most document functionality is implemented by behaviors, the model is highly extensible, and can accommodate both new document formats and novel forms of functionality. Among other applications, it is possible to use the model to effect a wide class of annotation types, across different document formats, without any administrative provisions. An implementation of the model has allowed us to develop behaviors that currently support some quite different but common digital document types, and a number of quite different annotation capabilities-some familiar, and some novel. A related implementation provides some analogous capabilities for geographic data. Such capabilities could have a beneficial impact on the "scholarly information life cycle," i.e., the process by which researchers and scholars create, disseminate, and use knowledge.
The Alexandria Digital Library (ADL) is one of the six digital library projects funded by NSF, DARPA, and NASA. ADL's collection and services focus on information containing georeferences: maps, images, data sets, text, and other information sources with links to geographic locations. During this study period, three different user interfaces were developed and tested by user groups. User feedback was collected through various formal and informal approaches and the results fed back into the design and implementation cycle. This article describes the evolution of the ADL system and the effect of user evaluation on that evolution. ADL is an ongoing project; user feedback and evaluation plans for the remainder of the project are described.
Digital libraries need to facilitate the use of digital information in a variety of settings. One approach to making information useful is to enable its application to situations unanticipated by the original author, Walden's Paths is designed to enable authors to collect, organize, and annotate information from on-line collections for presentation to their readers, Experiences with the use of Walden's Paths in high-school classrooms have identified four needs/issues: (1) better support for the gradual authoring of paths by teachers, (2) support for student authoring of paths including the ability for students to collaborate on paths, (3) more obvious distinction between content of the original source materials and that added by the path author, and (4) support for maintaining paths over an evolving set of source documents, These observed needs have driven the development of new versions of Walden's Paths. Additionally, the experiences with path authoring have led to a conceptualization of metadocuments, documents whose components include complete documents, as a general domain where issues of collaboration, intellectual property, and maintenance are decidedly different from traditional document publication.
The World Wide Web provides unprecedented access to globally distributed content. The extent and uniform accessibility of the Web has proven beneficial for research, education, commerce, entertainment, and numerous other uses. Ironically, the fact that the Web is an information space without boundaries has also proven its biggest flaw. Key aspects of libraries, such as selectivity of content, customization of tools and services relative to collection and patron characteristics, and management of content and services are noticeably absent, Over the past four years, we researched the technology and deployment of a digital library architecture that makes it possible to create managed information spaces, digital libraries, within the World Wide Web. Our work has taken place in the context of NCSTRL,(1) a digital library of computer science research reports. The technical foundation of NCSTRL is Dienst, a protocol and architecture for distributed digital libraries that we developed as part of the DARPA-funded Computer Science Technical Reports Project. At the time of the writing of this paper, the NCSTRL collection consisted of papers from more than 100 research institutions residing in servers distributed across the United States, Europe, and Asia. In addition, the Dienst protocol and implementation has been successfully adopted by a number of other distributed collections, In this paper, we review our experiences with NCSTRL and Dienst, describe some of the lessons we have learned from the deployment experience, and define some directions for the future.
Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable translated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what form the translated result is presented in. About 100,000 Web pages translated in the last four months of 1997 are used for quantitative study of online and real-time Web page translation.
The advent of the heterogeneous digital library provides the opportunity and establishes the need for the design of new user interfaces. As a single portal to a wide array of information sources, the heterogeneous digital library requires that a variety of cataloging schemas, subject domains, document genres, and institutional biases be accommodated. SenseMaker is a user-centered interface for information exploration in a heterogeneous digital library. It unifies citations and articles from heterogeneous sources by presenting them in a common schema with affordances for quick comparisons of properties, At the same time, SenseMaker users can recover a degree of context by iteratively organizing citations and articles into higher-level bundles based on either metadata or content. Furthermore, SenseMaker enables users to move fluidly from browsing to searching by introducing structure-based searching and structure-based filtering. This paper outlines the SenseMaker interface design and details some of our experimental findings surrounding its use.
This paper presents an efficient spoken-access approach for both Chinese text and Mandarin speech information retrieval, The proposed approach is developed not only to deal with the retrieval of spoken documents, but also to improve the capability of human-computer interaction via voice input for information-retrieval systems. Based on utilization of the monosyllabic structure of the Chinese language, the proposed approach can tolerate speech recognition errors by performing speech query recognition and approximate information retrieval at the syllable-level. Furthermore, with the help of automatic term suggestion and relevance feedback techniques, the proposed approach is robust in enabling users using voice input to interact with in systems at each stage of the retrieval process. Extensive experiments show that the proposed approach can improve the effectiveness of information retrieval via speech interaction. The encouraging results suggest that a Mandarin speech interface for information retrieval and digital library systems can, therefore, be developed.
We attempt to assess the publication impact of a digital library (DL) of aerospace scientific and technical information (STI). The Langley Technical Report Server (LTRS) is a digital library of over 1,400 electronic publications authored by NASA Langley Research Center personnel or contractors and has been available in its current World Wide Web (WWW) form since 1994. In this article, we examine calendar year 1997 usage statistics of LTRS and the Center for AeroSpace Information (CASI), a facility that archives and distributes hard copies of NASA and aerospace information. We also perform a citation analysis on some of the top publications distributed by LTRS. We find that although LTRS distributes over 71,000 copies of publications (compared with an estimated 24,000 copies from CASI), citation analysis indicates that LTRS has almost no measurable publication impact. We discuss the caveats of our investigation, speculate on possible different models of usage facilitated by DLs, and suggest "retrieval analysis" as a complementary metric to citation analysis. While our investigation failes to establish a relationship between LTRS and increased citations and raises at least as many questions as it answers, we hope it will serve as an invitation to, and guide for, further research in the use of DLs.
Digital libraries store materials in electronic format. Research and development in digital libraries includes content creation, conversion, indexing, organization, and dissemination. The key technological issues are how to search and display desired selections from and across large collections effectively [Schatz & Chen, 1996]. Digital library research projects (DLI-1) sponsored by NSF/DARPA/NASA have a common theme of bringing search to the net, which is the flagship research effort for the National Information Infrastructure (NII) in the United States. A repository is an indexed collection of objects. Indexing is an important task for searching. The better the indexing, the better the searching result. Developing a universal digital library has been the dream of many researchers, however, there are still many problems to be solved before such a vision is fulfilled. The most critical is to support a cross-lingual retrieval or multilingual digital library. Much work has been done on English information retrieval, however, there is relatively less work on Chinese information retrieval. In this article, we focus on Chinese indexing, which is the foundation of Chinese and cross-lingual information retrieval. The smallest indexing units in Chinese digital libraries are words, while the smallest units in a Chinese sentence are characters. However, Chinese text has no delimiter to mark word boundaries as it is in English text. In English or other languages using Roman or Greek-based orthographies, often, spacing reliably indicates word boundaries. In Chinese, a number of characters are placed together without any delimiters indicating the boundaries between consecutive characters. In this article, we investigate the combination and boundary detection approaches based on mutual information for segmentation. The combination approach combines n-grams to form words with more number of characters. In the combination approach Algorithm 1 does not allow overlapping of n-grams while Algorithm 2 does. The boundary detection approach detects the segmentation points on a sentence based on the values and the change of values of the mutual information. Experiments are conducted to evaluate their performances. An interface of the system is also presented to show how a Chinese web page is downloaded, the text in the page filtered, and segmented into words. The segmented words can be submitted for indexing or new unknown words can be identified and submitted to a dictionary.
In an effort to assist medical researchers and professionals in accessing information necessary for their work, the A1 Lab at the University of Arizona is investigating the use of a natural language processing (NLP) technique called noun phrasing. The goal of this research is to determine whether noun phrasing could be a viable technique to include in medical information retrieval applications. Four noun phrase generation tools were evaluated as to their ability to isolate noun phrases from medical journal abstracts. Tests were conducted using the National Cancer Institute's CANCERLIT database. The NLP tools evaluated were Massachusetts Institute of Technology's (MIT's) Chopper, The University of Arizona's Automatic Indexer, Lingsoft's NPtool, and The University of Arizona's AZ Noun Phraser. In addition, the National Library of Medicine's SPECIALIST Lexicon was incorporated into two versions of the AZ Noun Phraser to be evaluated against the other tools as well as a nonaugmented version of the AZ Noun Phraser. Using the metrics relative subject recall and precision, our results show that, with the exception of Chopper, the phrasing tools were fairly comparable in recall and precision. It was also shown that augmenting the AZ Noun Phraser by including the SPECIALIST Lexicon from the National Library of Medicine resulted in improved recall and precision.
This article discusses the design of a digital library that addresses both content and knowledge management. The design of the digital library features two major distinctions: (1) the system incorporates a two-tier repository system to facilitate content management, and (2) the system incorporates an object-oriented model to facilitate the management of temporal information and exploits information extraction and deductive inference to derive implied knowledge based on the content of the digital library. The two-tier repository system relieves the system manager from manually maintaining the hyperlinks among the Web pages, when the digital library content is updated. The task of maintaining hyperlinks among Web pages can become cumbersome to the system manager if there are a large number of Web pages and hyperlinks. With respect to knowledge management, this design aims at facilitating temporal information management and deriving implied relations among the objects in the digital library. The motivation behind developing these knowledge processing utilities is to create a system that complements the capabilities of human beings. Deriving a comprehensive list of implied relations is an exhausting task if the digital library contains a great amount of information and the number of implied relations is great. With such knowledge-processing utilities, specialists are released from performing tedious work and can, therefore, spend more time with more productive philosophical activities to derive advanced knowledge. Applying knowledge management utilities effectively can extend the applications of digital libraries to new dimensions.
To aid designers of digital library interfaces, we present a framework for the design of information representations in terms of previews and overviews. Previews and overviews are graphic or textual representations of information abstracted from primary information objects. Previews act as surrogates for one or a few objects and overviews represent collections of objects. A design framework is elaborated in terms of the following three dimensions: (1) what information objects are available to users, (2) how information objects are related and displayed, and (3) how users can manipulate information objects. When utilized properly, previews and overviews allow users to rapidly discriminate objects of interest from those not of interest, and to more fully understand the scope and nature of digital libraries. This article presents a definition of previews and overviews in context, provides design guidelines, and describes four example applications.
How users meet infrastructure is a key practical, methodological challenge for digital library design. This article presents research conducted by the Social Science Team of the federally funded Digital Libraries Initiative (DLI) project at the University of Illinois, Data were collected from potential and actual users of the DLI test-bed - containing the full text of journal articles-through focus groups, interviews and observations, usability testing, user registration and transaction logging, and user surveys. Basic results on nature and extent of testbed use are presented, followed by a discussion of three analytical foci relating to digital library use as a process of assemblage: document disaggregation and reaggregation; information convergence; and the manner in which users confront new genres and technical barriers in information systems. The article also highlights several important methodological and conceptual issues that frame research on social aspects of digital library use.
With new interactive technology, we can increase user satisfaction by designing information retrieval systems that inform the user while the user is on-line interacting with the system, The purpose of this article is to model the information processing operations of a generic user who has just received an informative message from the system and is stimulated by the message into grasping at a higher understanding of his or her information task or problem. The model consists of three levels, each of which forms a separate subsystem, In the Perception subsystem, the user perceives the system message in a visual sense; in the Comprehension subsystem, the user must comprehend the system message; and in the Application subsystem, the user must (a) interpret the system message in terms of the user's task at hand, and (b) create and send a new message back to the system to complete the interaction. Because of the information process stimulated by the interaction, the user's new message forms a query to the system that more accurately represents the user's information need than would have been the case if the interaction had not taken place. This article proposes a device to enable clarification of the user's task, and thus his/her information need at the Application subsystem level of the model.
A speedup improvement to an Associative Access (ASSA) method, which is an information retrieval algorithm, has been suggested by Berkovich and others, The improvement is achieved through a novel technique of vertical counting, The vertical counting approach calculates the number of "ones" in characteristic vectors without performing the shift operation repeatedly. Using this technique, the determination of qualifying records in a database can be several times faster than other implementations. Unfortunately, the performance of the suggested technique has not been studied, This paper focuses on the performance of the vertical approach, and analyzes its behavior. It determines the speedup gained from using the Hamming Distance Bit Vertical Counter in the vertical approach, and, it evaluates the parameters that influence the speedup, It also discusses the breaking point that makes the vertical approach faster than the horizontal approach, and lastly determines its time complexity.
This article addresses the question of whether the Web can serve as an information source for research. Specifically, it analyzes by way of content analysis the Web pages retrieved by the major search engines on a particular date (June 7, 1998), as a result of the query "informetrics OR informetric." In 807 out of the 942 retrieved pages, the search terms were mentioned in the context of information science. Over 70% of the pages contained only indirect information on the topic, in the form of hypertext links and bibliographical references without annotation. The bibliographical references extracted from the Web pages were analyzed, and lists of most productive authors, most cited authors, works, and sources were compiled. The list of references obtained from the Web was also compared to data retrieved from commercial databases. For most cases, the list of references extracted from the Web outperformed the commercial, bibliographic databases. The results of these comparisons indicate that valuable, freely available data is hidden in the Web waiting to be extracted from the millions of Web pages.
This article reports a study of 45 Ph.D. history students and the effect of a technique of information seeking on their role as experts in training. It is assumed that the primary task of these students is to prove in their thesis that they have crossed over the line separating novice and expert, which they do by producing a thesis that makes both a substantial and original contribution to knowledge. Their information-seeking behavior, therefore, is a function of this primary task. It was observed that many of the Ph.D. students collected "names" of people, places, and things and assembled data about these names on 3 x 5 inch index cards. The "names" were used as access points to the primary and secondary source material they had to read for their thesis. Besides using name collection as an information accessing technique, the larger importance of collecting "names" is what it does for the Ph.D. student in terms of their primary task (to produce a thesis that proves they have become experts in their field). The article's thesis is that by inducing certain characteristics of expert thinking, the name collection technique's primary purpose is to push the student across the line into expert thinking.
This article addresses a crucial issue in the digital library environment: how to support effective interaction of users with heterogeneous and distributed information resources. In particular, this study compared usability, user preference, effectiveness, and searching behaviors in systems that implement interaction with multiple databases through a common interface, and with multiple databases as if they were one (integrated interaction) in an experiment in the Text REtrieval Conference (TREC) environment, Twenty-eight volunteers were recruited from the graduate students of the School of Communication, Information, & Library Studies at Rutgers University. Significantly more subjects preferred the common interface to the integrated interface, mainly because they could have more control over database selection. Subjects were also more satisfied with the results from the common interface, and performed better with the common interface than with the integrated interface. Overall, it appears that for this population, interacting with databases through a common interface, is preferable on all grounds to interacting with databases through an integrated interface, These results suggest that: (1) the general assumption of the information retrieval (IR) literature that an integrated interaction is best needs to be revisited; (2) it is important to allow for more user control in the distributed environment; (3) for digital library purposes, it is important to characterize different databases to support user choice for integration; and, (4) certain users prefer control over database selection while still opting for results to be merged.
This article argues that professional discourses tend to align themselves with dominant ideological and social forces by means of language, In twentieth century modernity, the use of the trope of "science" and related terms in professional theory is a common linguistic device through which professions attempt social self-advancement. This article examines how professional discourses, in particular those which are foundational for library and information science theory and practice, establish themselves in culture and project history-past and future-by means of appropriating certain dominant tropes in a culture's language. This article suggests that ethical and political choices arise out of the rhetoric and practice of professional discourse, and that these choices cannot be confined to the realm of professional polemics.
Interdisciplinary research has been identified as a critical means of addressing some of our planet's most urgent environmental problems. Yet relatively little is known about the processes and impacts of interdisciplinary approaches to environmental sciences. This study used citation analysis and ordinary least squares regression to investigate the relationship between an article's citation rate and its degree of interdisciplinarity in one area of environmental science; viz., forestry. Three types of interdisciplinarity were recognized-authorship, subject matter, and cited literature-and each was quantified using Brillouin's diversity index. Data consisted of more than 750 articles published in the journal Forest Science during the 10-year period 1985-1994. The results indicate that borrowing was the most influential method of interdisciplinary information transfer. Articles that drew information from a diverse set of journals were cited with greater frequency than articles having smaller or more narrowly focused bibliographies. This finding provides empirical evidence that interdisciplinary methods have made a measurable and positive impact on the forestry literature.
A citation analysis examined the 28 best articles published in JASIS (Journal of the American Society for Information Science) from 1969-1996. Best articles tend to be single-authored works twice as long as the average article published in JASIS. They are cited and self-cited much more often than the average article. The greatest source of references made to the best articles is from JASIS itself. The top five best papers focus largely on information retrieval and online searching.
Individual differences between users of information systems can influence search performance. In user-centered design it is important to match users with system configurations that will optimize their performance. Two matching strategies were explored in the first experiment: the capitalization match, and the compensatory match. Findings suggest that a compensatory match is likely to be encountered more frequently in designing information systems. Having determined an optimal match between users and system configurations, it is necessary to find ways to ensure that users are guided to the appropriate configuration. The second experiment examined user selection of system configurations, and concluded that users do not act to optimize system configuration when they select features. This result suggests that information systems must have mechanisms such as user models to direct users to optimal configurations. These experiments suggest some of the complexities and problems encountered in applying individual differences research to user-centered design of information systems.
User problems with large information spaces multiply in complexity when we enter the digital domain, Virtual information environments can offer 3D representations, reconfigurations, and access to large databases that may overwhelm many users' abilities to filter and represent. As a result, users frequently experience disorientation in navigating large digital spaces to locate and use information. To date, the research response has been predominantly based on the analysis of visual navigational aids that might support users' bottom-up processing of the spatial display. In the present paper, an emerging alternative is considered that places greater emphasis on the top-down application of semantic knowledge by the user gleaned from their experiences within the sociocognitive context of information production and consumption. A distinction between spatial and semantic cues is introduced, and existing empirical data are reviewed that highlight the differential reliance on spatial or semantic information as the domain expertise of the user increases. The conclusion is reached that interfaces for shaping information should be built on an increasing analysis of users' semantic processing.
This article presents two studies concerning the role of individual differences in searching through a spatial-semantic virtual environment. In the first study, 10 subjects searched for two topics through a spatial user interface of a semantic space. A strong positive correlation was found between associative memory (MA-I) and search performance (r = 0.855, p = 0.003), but no significant correlation was found between visual memory (MV-1) and search performance. In the second study, 12 subjects participated in a within-subject experimental design. The same spatial user interface and a simple textual user interface were used. The effects of spatial ability (VZ-2), associative memory (MA-1), and on-line experience were tested on a set of interrelated search performance scores. A statistically significant main effect of on-line experience was found, F(6, 4) = 6.213, p = 0.049, two-tailed. In particular, on-line experience has a significant effect on the recall scores with the textual interface. Individuals experienced in on-line search are more likely to have a higher recall score with the textual interface than less experienced individuals. No significant main effects were found for spatial ability and associative memory. Subjects' comments suggest a potentially complex interplay between individuals' mental models and the high-dimensional semantic model. Qualitative and process-oriented studies are, therefore, called for to reveal the complex interaction between individuals' cognitive abilities, domain knowledge, and direct manipulation skills. A recommendation is made that spatial-semantic models should be adaptable to suit individuals and tasks at various levels.
Virtual environments enable a given information space to be traversed in different ways by different individuals, using different routes and navigation tools. However, we urgently need robust user models to enable us to optimize the deployment of such facilities. Research into individual differences suggests that the notion of cognitive style may be useful in this process. Many such styles have been identified. However, it is argued that Pask's work on holist and serialist strategies and associated styles of information processing are particularly promising in terms of the development of adaptive information systems. These constructs are reviewed, and their potential utility in "real-world" situations assessed. Suggestions are made for ways in which they could be used in the development of virtual environments capable of optimizing the stylistic strengths and complementing the weaknesses of individual users. The role of neural networks in handling the essentially fuzzy nature of user models is discussed. Neural networks may be useful in dynamically mapping users' navigational behavior onto user models to enable them to generate appropriate adaptive responses. However, their learning capacity may also be particularly useful in the process of improving system performance and in the cumulative development of more robust user models.
This study sought to investigate the effects of cognitive style (field dependent and field independent) and on-line database search experience (novice and experienced) on the World Wide Web (WWW) search performance of undergraduate college students (n = 48). It also attempted to find user factors that could be used to predict search efficiency. Search performance, the dependent variable, was defined in two ways: (1) time required for retrieving a relevant information item, and (2) the number of nodes traversed for retrieving a relevant information item. The search tasks required were carried out on a University Web site, and included a factual task and a topical search task of interest to the participant. Results indicated that while cognitive style (FD/FI) significantly influenced the search performance of novice searchers, the influence was greatly reduced in those searchers who had on-line database search experience. Based on the findings, suggestions for possible changes to the design of the current Web interface and to user training programs are provided.
This article describes how the original ERIC was established as a conventional, centralized information center within the Office of Education in 1964, and how this initial ERIC was transformed from into a decentralized national system about 18 months later. The history of the two ERICs also illustrates how knowledge and expertise-in this case, that represented by advances in information systems technology and its applications- combined with interpersonal relationships within a bureaucracy, federal funding decisions, and organizational changes to shape the development of a major national information service, The time period covered by the article is from 1959, when planning for the first ERIC began, to June 1967, when the decentralized system became fully operational, Most of the description and analysis, however, is limited to the 1965-66 period, when the decentralized system was conceptualized and implemented. Important developments in ERIC since 1967 are also described.
Searching for information on the World Wide Web (WWW) basically comes down to locating an appropriate Web site and to retrieving relevant information from that site. This study examined the effect of a user's WWW experience on both phases of the search process, Twenty-five students from two schools for Dutch pre-university education were observed while performing three search tasks. The results indicate that subjects with WWW-experience are more proficient in locating Web sites than are novice WWW-users, The observed differences were ascribed to the experts' superior skills in operating Web search engines, However, on tasks that required subjects to locate information on specific Web sites, the performance of experienced and novice users was equivalent-a result that is in line with hypertext research. Based on these findings, implications for training and supporting students in searching for information on the WWW are identified. Finally, the role of the subjects' level of domain expertise is discussed and directions for future research are proposed.
PASCAL, whose troublesome artefacts we highlight, also has its strong points (multidisciplinarity, codification of the topic of each article, better coverage of some countries). As other sources, it shows that the current decade is one of crisis in African research. However, developments are highly contrasted, depending on the discipline and the regions. To the north of Africa, the Maghreb is witnessing an unprecedented gain in power. Nigerian science is in quite the contrary situation, imploding. In the rest of Africa, classification of countries brings to evidence very striking changes in order. Basic science declines. The Agricultural and the Medical sciences are stagnating. Conversely, the Engineering sciences are growing, in particular to the North of the Sahara.
Only multidimensional analyses can provide overviews of complex relationships among many variables. We have previously illustrated the use of Correspondence Factor Analysis (CFA) in the analysis of publication profiles. In this article, we retrace our activity in patent analysis from the late 1970s to the present day and show how CFA is a particularly useful tool not only for describing the correlations between countries and technological Fields but also for highlighting non-linear patenting time trends.
As European Union research programmes play an increasingly important role within the research and innovation systems of Member States, the need for appropriate indicators to grasp and analyze this collaborative phenomenon has in recent years become obvious. Such indicators are becoming essential decision-making tools for science policy makers at the national level. EU science policy responds to not one but a number of objectives, while one country or one laboratory's participation in European S&T cooperation is likely to manifest a number of particularities, and be quite different from another's. Such a complex system makes it possible to elaborate a large variety of indicators. This article proposes several possible types of indicators and shows how they could be useful for weighing research policy strategies at the national and European levels.
This article proposes a method for characterizing the "activity profiles" of research laboratories. It is based on the "research compass card model" derived from the sociology of science, and which highlights the five complementary contexts in which research activities develop. A test was conducted in a regional setting on 75 labs. It demonstrates that simple indicators are enough to measure levels of involvement in each activity. Seven "activity profiles" based upon the mix by labs of their marked involvement were identified, crossing both institutional and disciplinary barriers.
Both the technological and market focus of 228 European biotechnology SMEs are analysed in this paper. Data from the Genetic Engineering catalogue provide a complementary representation compared to the patent publications that are most commonly used. Results of the analysis produce a new view of the development of biotech SMEs. First, no pattern of specialisation by country is observed, even though three types of company with different technological focus can be distinguished in the sample. Second, it is argued that the rapid technological evolution in this domain can hardly be explained by a rapid evolution of the technological basis of the companies, and should consequently be explained primarily by the creation of new SMEs. Third, four different patterns of linkage between technology and market focus are observed, by means of co-word analysis.
The patterns that appear in exchanges between researchers, scientific journal publications and the demand for scientific articles often intersect, but the logic behind each type of activity is not necessarily the same. Analyses of requests for scientific articles from document suppliers may help to interpret current developments in electronic publishing. This study of article requests to the Institut de I'information scientifique et technique (INIST) shows that, in France, document supply customers fall into three main categories: business, academic libraries and public research organisations, in descending order. Demand focuses mainly on medicine, pharmacology, biology and chemistry, and the distribution of articles is entirely in accordance with the laws of bibliometrics. A further comparative analysis shows a high reciprocal correlation (except in the physical sciences) between the 50 journals most Frequently requested from INIST, and the 50 most frequently cited journals according to ISI (Institute for Scientific Information). The titles which did not appear in either one list or the other show that the most frequently cited physics journals are not necessarily requested from the document supplier, and that, conversely, some frequently requested journals are not often cited. It may therefore be assumed that a trade in electronic articles is likely to develop quite rapidly in disciplines which are common to both lists, although this would focus on reputed titles only, but that a different pattern of electronic document exchange would emerge for scientific literature in other disciplines.
The introduction of bibliometric indicators to compare the scientific performance of countries soon raised questions about what document types should be counted for comparison. The present study deals with the development of different document types published in journals related to Physics and recorded in the Science Citation Index. We first take a look at the evolution of the production and citation of papers by document type as well as at the specialization of countries in different document types. We then highlight some characteristics of the ISI document type category "Proceedings" followed by an analysis of publishers and average number of "Proceedings" pages.
The Science Citation Index, Journal Citation Reports (JCR), published by the Institute for Scientific Information (ISI) and designed to rank, evaluate, categorize and compare journals, is used in a wide scientific context as a tool for evaluating researchers and research work, through the use of just one of its indicators, the impact factor. With the aim of obtaining an overall and synthetic perspective of impact factor values, we studied the frequency distributions of this indicator using the box-plot method. Using this method we divided the journals listed in the JCR into five groups (low, lower central, upper central, high and extreme). These groups position the journal in relation to its competitors. Thus, the group designated as extreme contains the journals with high impact factors which are deemed to be prestigious by the scientific community. We used the JCR data from 1996 to determine these groups, firstly for all subject categories combined (all 4779 journals) and then for each of the 183 ISI subject categories. We then substituted the indicator value for each journal by the name of the group in which it was classified. The journal group may differ from one subject category to another. In this article, we present a guide for evaluating journals constructed as described above. It provides a comprehensive and synthetic view of two of the most used sections of the JCR, It makes it possible to make more accurate and complete judgements on and through the journals, and avoids an oversimplified view of the complex reality of the world of journals. It immediately reveals the scientific subject category where the journal is best positioned. Also, whereas it used to be difficult to make intra- and interdisciplinary comparisons, this is now possible without having to consult the different sections of the JCR. We construct this guide each year using indicators published in the JCR by the ISI.
This article aims at a characterization of the cooperation behavior among five large scientific countries (France, Germany, Japan, United Kingdom and United States of America) from 1986 to 1996. It looks at the cooperation profiles of these countries using classical measures such as the Probabilistic Affinity. The results show the major influence which historical, cultural and linguistic proximities may have on patterns of cooperation, with few changes over the period of time studied. A lack of specific affinities among the three largest European countries is revealed, and this contrasts with the strong linkage demonstrated between United States and Japan. The ensuing discussion raises some questions as to the process of Europeanization in science. The intensity of bilateral cooperation linkages is then studied with regard to field specialization by country, and this analysis yields no general patterns at the scale studied. Specific bilateral behaviors are also analyzed.
Current best-match ranking (BMR) systems perform well but cannot handle word mismatch between a query and a document. The best known alternative ranking method, hierarchical clustering-based ranking (HCR), seems to be more robust than BMR with respect to this problem, but it is hampered by theoretical and practical limitations. We present an approach to document ranking that explicitly addresses the word mismatch problem by exploiting interdocument similarity information in a novel way. Document ranking is seen as a query-document transformation driven by a conceptual representation of the whole document collection, into which the query is merged. Our approach is based on the theory of concept (or Galois) lattices, which, we argue, provides a powerful, well-founded, and computationaliy-tractable framework to model the space in which documents and query are represented and to compute such a transformation. We compared information retrieval using concept lattice-based ranking (CLR) to BMR and HCR. The results showed that HCR was outperformed by CLR as well as by BMR, and suggested that, of the two best methods, BMR achieved better performance than CLR on the whole document set, whereas CLR compared more favorably when only the first retrieved documents were used for evaluation, We also evaluated the three methods' specific ability to rank documents that did not match the query, in which case the superiority of CLR over BMR and HCR land that of HCR over BMR) was apparent.
One of the most common models in information retrieval (IR), the vector space model, represents a document set as a term-document matrix where each row corresponds to a term and each column corresponds to a document, Because of the use of matrices in IR, it is possible to apply linear algebra to this in model. This paper describes an application of linear algebra to text clustering, namely, a metric for measuring cluster quality, The metric is based on the theory that cluster quality is proportional to the number of terms that are disjoint across the clusters, The metric compares the singular values of the term-document matrix to the singular values of the matrices for each of the clusters to determine the amount of overlap of the terms across clusters. Because the metric can be difficult to interpret, a standardization of the metric is defined, which specifies the number of standard deviations a clustering of a document set is from an average, random clustering of that document set, Empirical evidence shows that the standardized cluster metric correlates with clustered retrieval performance when comparing clustering algorithms or multiple parameters for the same clustering algorithm.
This paper summarizes a prototype geographical image retrieval system that demonstrates how to integrate image processing and information analysis techniques to support large-scale content-based image retrieval, By using an image as its interface, the prototype system addresses a troublesome aspect of traditional retrieval models, which require users to have complete knowledge of the low-level features of an image. In addition we describe an experiment to validate the performance of this image retrieval system against that of human subjects in an effort to address the scarcity of research evaluating performance of an algorithm against that of human beings. The results of the experiment indicate that the system could do as well as human subjects in accomplishing the tasks of similarity analysis and image categorization, We also found that under some circumstances texture features of an image are insufficient to represent a geographic image. We believe, however, that our image retrieval system provides a promising approach to integrating image processing techniques and information retrieval algorithms.
Authors' motivations for citing documents are addressed through a literature review and an empirical study. Replicating an investigation in psychology, the works of two highly-cited authors in the discipline of communication were identified, and all of the authors who cited them during the period 1995-1997 were surveyed. The instrument posed 32 questions about why a certain document was cited, plus questions about the citer's relationship to the cited author and document. Most findings were similar to the psychology study, including a tendency to cite "concept markers" representing a genre of work. Authors in communication were more likely to have an interpersonal connection to cited authors, and to cite literature reviews-their most common reason for citation. Three types of judgments about cited works were found to best predict citation: (1) that the work was novel, well-known, and a concept-marker; (2) that citing it might promote the authority of one's own work; and (3) that the work deserved criticism. Suggestions are made for further research, especially regarding the anomalous role of creativity in cited works.
This study reports on the first part of a research project that investigated children's cognitive, affective, and physical behaviors as they use the Yahooligans! search engine to find information on a specific search task. Twenty-two seventh-grade science children from a middle school located in Knoxville, Tennessee participated in the project. Their cognitive and physical behaviors were captured using Lotus ScreenCam, a Windows-based software package that captures and replays activities recorded in Web browsers, such as Netscape. Their affective states were captured via a one-on-one exit interview, A new measure called "Web Traversal Measure" was developed to measure children's "weighted" traversal effectiveness and efficiency scores, as well as their quality moves in Yahooligans! Children's prior experience in using the Internet/Web and their knowledge of the Yahooligans! interface were gathered via a questionnaire. The findings provided insights into children's behaviors and success, as their weighted traversal effectiveness and efficiency scores, as well as quality moves. Implications for user training and system design are discussed.
This paper describes ethnomethodologically informed ethnography (EM) as a methodology for information science research, illustrating the approach with the results of a study in a university library. We elucidate major differences between the practical orientation of EM and theoretical orientation of other ethnographic approaches in information science research. We address ways in which EM may be used to inform systems design and consider the issues that arise in coordinating the results of this research with the needs of information systems designers. We outline our approach to the "ethnographically informed" development of information systems in addressing some of the major problems of interdisciplinary work between system designers and EM researchers.
This article evaluates the effectiveness of spelling-correction and string-similarity matching methods in retrieving similar words in a Malay dictionary associated with a set of query words, The spelling-correction techniques used are SPEEDCOP, Sounder, Davidson, Phonix, and Hartlib. Two dynamic-programming methods that measure longest common subsequence and editcost-distance are used, Several search combinations of query and dictionary words are performed in the experiments, the best being one that stems both query and dictionary words using an existing Malay stemming algorithm, The retrieval effectiveness (E) and retrieved and relevant (R&R) mean measures are calculated from weighted combination of recall and precision values. Results from these experiments are then compared with available digram, a string-similarity method. The best R&R and E results are given by using digram. Editcost-distances produce the best E results, and both dynamic-programming methods rank second in finding R&R mean measures.
Autonomy of operations combined with decentralized management of data gives rise to a number of heterogeneous databases or information systems within an enterprise. These systems are often incompatible in structure as well as content and, hence, difficult to integrate. Depsite heterogeneity, the unity of overall purpose within a common application domain, nevertheless, provides a degree of semantic similarity that manifests itself in the form of similar data structures and common usage patterns of existing information systems. This article introduces a conceptual integration approach that exploits the similarity in metalevel information in existing systems and performs metadata mining on database objects to discover a set of concepts that serve as a domain abstraction and provide a conceptual layer above existing legacy systems. This conceptual layer is further utilized by an information reengineering framework that customizes and packages information to reflect the unique needs of different user groups within the application domain. The architecture of the information reengineering framework is based on an object-oriented model that represents the discovered concepts as customized application objects for each distinct user group.
Because the World Wide Web is a dynamic collection of information, the Web search tools (or "search engines") that index the Web are dynamic. Traditional information retrieval evaluation techniques may not provide reliable results when applied to the Web search tools. This study is the result of ten replications of the classic 1996 Ding and Marchionini Web search tool research. It explores the effects that replication can have on transforming unreliable results from one iteration into replicable and therefore reliable results following multiple iterations.
The present article describes the use of Repertory Grid methodology as a means of externalizing an individual's view of information space. The method is described with a single participant, and appears to offer a viable means of exploring the concept of information space. Future work will examine multiple views in an attempt to explore the extent to which the concept is shared among a group of people.
In studies of information users' cognitive behaviors, it is widely recognized that users' perceptions of their information problem situations play a major role, Time-line interviewing and inductive content analysis are two research methods that, used together, have proven extremely useful for exploring and describing users' perceptions in various situational contexts. This article describes advantages and disadvantages of the methods using examples from a study of users' criteria for evaluation in a multimedia context.
Experimental subjects wrote abstracts of articles using a simplified version of the TEXNET abstracting assistance software. In addition to the full text, subjects were presented with either keywords or phrases extracted automatically, The resulting abstracts, and the times taken, were recorded automatically; some additional information was gathered by oral questionnaire. Selected abstracts produced were evaluated on various criteria by independent raters. Results showed considerable variation among subjects, but 37% found the keywords Or phrases "quite" or "very" useful in writing their abstracts, Statistical analysis failed to support several hypothesized relations: phrases were not viewed as significantly more helpful than keywords; and abstracting experience did not correlate with originality of wording, approximation of the author abstract, or greater conciseness. Requiring further study are some unanticipated strong correlations including the following: Windows experience and writing an abstract like the author's; experience reading abstracts and thinking one had written a good abstract; gender and abstract length; gender and use of words and phrases from the original text. Results have also suggested possible modifications to the TEXNET software.
This article reports on a qualitative study exploring: (1) strategies and behaviors of public library users during interaction with an on-line public access catalog; and (2) users' confidence in finding needed information on-line. Questionnaires, interviews, and observations were employed to gather data from 32 public library users. The results found search behaviors, confidence, and other feelings varied, based on three types of searches: unknown-item searches, area searches, and known-item searches. Term generation was the most important factor in unknown-item search strategies. Speed and convenience played a role in area searches, and simplicity characterized known-item searches. Of the three types, unknown-item searchers experienced the most frustration and doubt; known-item searchers the most disappointment; and area searchers the most confidence and contentment. Knowledge of these differences may prove helpful for librarians and interface designers.
As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographic entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files.
Electronic news has long held the promise of personalized and dynamic delivery of current event news items, particularly for web users. Although electronic versions of print news are now widely available, the personalization of that delivery has not yet been accomplished,. In this paper, we present a methodology of associating news documents based on the extraction of feature phrases, where feature phrases identify dates, locations, people, and organizations. A news representation is created from these feature phrases to define news objects that can then be compared and ranked to find related news items. Unlike traditional information retrieval, we are much more interested in precision than recall. That is, the user would like to see one or more specifically related articles, rather than all somewhat related articles. The algorithm is designed to work interactively with the user using regular web browsers as the interface.
This article examines information theory from the aspect of its "conduit metaphor." A historical approach and a close reading of certain texts by Warren Weaver and Norbert Wiener shows how this metaphor was used to construct notions of language, information, information theory, and information science, and was used to extend the range of these notions across social and political space during the period of the Cold War. This article suggests that this legacy remains with us today in certain notions of information and information theory, and that this has affected not only social space in general, but in particular, the range and possibilities of information studies.
The growth of scientific output in recent years has meant that fewer libraries are able to offer the entire range of journals, with the others being forced to make a selection. The objective of the present work is to describe criteria to regulate the selection of these journals to provide the researcher with the information that is most being used in research, One form of quantifying this information is by way of the citations that papers receive over a period of time following their publication. Obsolescence, expressed in terms of an annual aging factor, does not reflect the real behaviour of most papers. An alternative is the use of "topicality," considered as a latent variable, with the Rasch model as the measuring instrument. We considered 45 physics journals, and found the results of applying the Rasch model to be more satisfactory than those obtained with the annual aging factor.
This paper has a dual character dictated by its twofold purpose. First, it is a speculative historiographic essay containing an attempt to fix the present position of library and information science within the context of the probabilistic revolution that has been encompassing all of science. Second, it comprises a guide to practitioners engaged in statistical research in library and information science. There are pointed out the problems of utilizing statistical methods in library and information science because of the highly and positively skewed distributions that dominate this discipline. Biostatistics are indicated as the source of solutions for these problems, and the solutions are then traced back to the British biometric revolution of 1865-1950, during the course of which modern inferential statistics were created. The thesis is presented that science has been undergoing a probabilistic revolution for over 200 years, and it is stated that this revolution is now coming to library and information science, as general stochastic models replace specific, empirical informetric laws, An account is given of the historical development of the counting distributions and laws of error applicable in statistical research in library and information science, and it is stressed that these distributions and laws are not specific to library and information science but are inherent in all biological and social phenomena, Urquhart's Law is used to give a practical demonstration of the distributions. The difficulties of precisely fitting data to theoretical probability models in library and information science because of the inherent fuzziness of the sets are discussed, and the paper concludes with the description of a simple technique for identifying and dealing with the skewed distributions in library and information science. Throughout the paper, emphasis is placed on the relevance of research in library and information science to social problems, both past and present.
The variety of performance measures available for information retrieval systems, search engines, and network filtering agents can be confusing to both practitioners and scholars. Most discussions about these measures address their theoretical foundations and the characteristics of a measure that make it desirable for a particular application. In this work, we consider how measures of performance at a point in a search may be formally compared. Criteria are developed that allow one to determine the percent of time or conditions under which two different performance measures suggest that one document ordering is superior to another ordering, or when the two measures disagree about the relative value of document orderings, As an example, graphs provide illustrations of the relationships between precision and F.
Research has demonstrated that people engage in multiple types of information-seeking strategies when using information retrieval (IR) systems; unfortunately, current IR systems are designed to support only one type of information-seeking strategy: specifying queries. The limitation of the existing IR systems calls for the need to investigate how to support users as they shift from one information-seeking strategy to another in their attempts to achieve their information-seeking goals. The focus of this study is on the in-depth investigation of shifts in the micro-level of user goals-"interactive intention" acid information-seeking strategies that users engage in within an information-seeking episode. Forty cases of library uses were selected from four different types of libraries for this study. The qualitative and qualititative analysis of the data identifies four types of shifts of interactive intentions end three types of information-seeking strategies. The results of the study are discussed to understand the nature of the interactive IR process, and to further suggest their implications for the design of adaptive IR systems.
Medical science offers an increasing quantity and quality of information to at-risk groups, but too often people in a state of information poverty lack access to knowledge that would benefit them, Further, people may "know" what approach they should take to improve their health or other living conditions, but for a variety of reasons may not act on that knowledge. Pacific Island immigrants to New Zealand are especially at risk from cervical cancer, but participate less in cervical screening services than New Zealanders of European descent. This study reports on perceived barriers among New Zealand Pacific women to the use of cervical screening and proposes solutions for improved access to these services, Imperatives of cultural topic avoidance, modesty, and religion created significant barriers to the topic of cervical screening. Respondents strongly favored sources of information that were mediated through their community groups but smear-takers were preferably non-Pacific in ethnicity, The respondents' lived experience of "community connectedness" defined them as distinct from recent U.S. studies where participants were seen as highly isolated in their social environments.
In data mining, we emphasize the need for learning from huge, incomplete, and imperfect data sets. To handle noise in the problem domain, existing learning systems avoid overfitting the imperfect training examples by excluding insignificant patterns. The problem is that these systems use a limiting attribute-value language for representing the training examples and the induced knowledge. Moreover, some important patterns are ignored because they are statistically insignificant. In this article, we present a framework that combines Genetic Programming and Inductive Logic Programming to induce knowledge represented in various knowledge representation formalisms from noisy databases. The framework is based on a formalism of logic grammars, and it can specify the search space declaratively. An implementation of the framework, LOGENPRO (The Logic grammar based GENetic PROgramming system), has been developed. The performance of LOGENPRO is evaluated on the chess end-game domain. We compare LOGENPRO with FOIL and other learning systems in detail, and find its performance is significantly better than that of the others, This result indicates that the Darwinian principle of natural selection is a plausible noise handling method that can avoid overfitting and identify important patterns at the same time. Moreover, the system is applied to one real-life medical database. The knowledge discovered provides insights to and allows better understanding of the medical domains.
The primary purpose of the study was to identify motivations for hyperlinking in scholarly electronic articles. Fifteen Indiana University faculty and graduate students who had published at least one scholarly electronic article containing at least one external hyperlink were surveyed. Through a series of qualitative interviews, 19 different hyperlinking motivations, classified into the three motivational groups-scholarly, social, and technological-along the dimensional ranges of their properties, were identified. The vast majority of the hyperlinks were attributed to more than one motivation by the authors. The empirical findings of the study demonstrated that scholars use hyperlinks for a variety of purposes, and that their hyperlinking behavior frequently results from a complex interplay of motivations.
The authors describe a study of the social dynamics of new media in Scottish households. The evolving project drew on dialogues with multiple household members elicited in group conversations. This approach to interviews captured different and conflicting points of view, a feature shared with certain social approaches to systems design. Analysis of the interview transcripts revealed that there are recurrent narratives and behavioral genres across households (and across sample groups), and that these reflect tactics, stratagems, and plans by means of which respondents navigate social space. The authors' approach contrasts with prevailing 'needs and uses" models in information science, in offering a methodological framework based on group narrative and genre analysis that contributes to a theory of social informatics in the household.
The author presents the results of additional analyses of shifts of focus in IR interaction. Results indicate that users and search intermediaries work toward search goats in nonlinear fashion. Twenty interactions between 20 different users and one of four different search intermediaries were examined. Analysis of discourse between the two parties during interactive information retrieval (IR) shows changes in topic occurs, on average, every seven utterances. These twenty interactions included some 9,858 utterances and 1,439 foci. Utterances are defined as any uninterrupted sound, statement, gesture, etc., made by a participant in the discourse dyad. These utterances are segmented by the researcher according to their inentional focus, i.e., the topic on which the conversation between the user and search intermediary focus until the focus changes (i.e., shifts of focus). In all but two of the 20 interactions, the search intermediary initiated a majority of shifts of focus. Six focus categories were observed. These were foci dealing with: documents; evaluation of search results; search strategies; IR system; topic of the search; and information about the user.
We present the results of a study of user's perception of relevance of documents. The aim is to study experimentally how users' perception varies depending on the form that retrieved documents are presented. Documents retrieved in response to a query are presented to users in a variety of ways, from full text to a machine spoken query-biased automatically-generated summary, and the difference in users' perception of relevance is studied. The experimental results suggest that the effectiveness of advanced multimedia information Retrieval applications may be affected by the row revel of users' perception of relevance of retrieved documents.
The purpose of this research was to study current policies and practices of scholarly journals on evaluating manuscripts for publication that had been previously published electronically. Various electronic forms were considered: a manuscript having been e-mailed to members of a listserv, attached to a personal or institutional home page, stored in an electronic preprint collection, or published in an electronic proceedings or electronic journal. Factors that might affect the consideration of such manuscripts were also examined, including characteristics of the journal, the previously published work, and the submitted manuscript. A sample of 202 scholarly journals in the sciences, social sciences, and arts and humanities was selected for study. A questionnaire and cover letters were sent to the journal editors in the summer and fall of 1997, with an overall return rate of 57.4%. Results are reported for all journals, with comparisons being made between journals edited in the Unites States and outside the Unites States, by journal impact factor, and by discipline. The findings suggest that editorial policies regarding prior electronic publication are in an early stage of development. Most journal editors do not have a formal policy regarding the evaluation of work previously published in electronic form, nor are they currently evaluating such a policy. Editors disagreed widely on the importance of the various factors that might affect their decision to consider a work previously published electronically. The form or type of prior electronic publication was an important variable. Although some editors currently have a fairly rigid and negative posture towards work previously published electronically, most are willing to consider certain forms of such work for publication in their journals. Probably the most significant results of the study were the many differences in practices among scholarly disciplines. The findings of this study reveal how the internet and the World Wide Web are currently affecting manuscript consideration policies of scholarly journals at this early stage of Web and Internet publishing.
Queries submitted to the Excite search engine were analyzed for subject content based on the cooccurrence of terms within multiterm queries. More than 1000 of the most frequently cooccurring term pairs were categorized into one or more of 30 developed subject areas. Subject area frequencies and their cooccurrences with one another were tallied and analyzed using hierarchical cluster analysis and multidimensional scaling. The cluster analyses revealed several anticipated and a few unanticipated groupings of subjects, resulting in several well-defined high-level clusters of broad subject areas. Multidimensional scaling of subject cooccurrences revealed similar relationships among the different subject categories. Applications that arise from a better understanding of the topics users search and their relationships are discussed.
portant area of research as Web sites proliferate and problems with use are noted. Generally, aspects of Web sites that have been investigated focus on such areas as overall design and navigation. The exploratory study reported on here investigates one specific component of a Web site-the index structure. By employing index usability metrics developed by Liddy and Jorgensen (1993; Jorgensen & Liddy, 1996) and modified to accommodate a hypertext environment, the study compared the effectiveness and efficiency of 20 subjects who used one existing index (the A-Z index on the FedStats Web site at http://www.fedstats.gov) and three experimental variants to complete five researcher-generated tasks. User satisfaction with the indexes was also evaluated. The findings indicate that a hypertext index with multiple access points for each concept, all linked to the same resource, led to greater effectiveness and efficiency of retrieval on almost all measures. Satisfaction measures were more variable, The study offers insight into potential improvements in the design of Web-based indexes and provides preliminary assessment of the validity of the measures employed.
A user-centered investigation of interactive query expansion within the context of a relevance feedback system is presented in this article. Data were collected from 25 searches using the INSPEC database. The data collection mechanisms included questionnaires, transaction logs, and relevance evaluations. The results discuss issues that relate to query expansion, retrieval effectiveness, the correspondence of the on-line-to-off-line relevance judgments, and the selection of terms for query expansion by users (interactive query expansion). The main conclusions drawn from the results of the study are that: (1) one-third of the terms presented to users in a list of candidate terms for query expansion was identified by the users as potentially useful for query expansion. (2) These terms were mainly judged as either variant expressions (synonyms) or alternative (related) terms to the initial query terms. However, a substantial portion of the selected terms were identified as representing new ideas. (3) The relationships identified between the five best terms selected by the users for query expansion and the initial query terms were that: (a) 34% of the query expansion terms have no relationship or other type of correspondence with a query term; (b) 66% of the remaining query expansion terms have a relationship to the query terms. These relationships were: narrower term (46%), broader term (3%), related term (17%). (4) The results provide evidence for the effectiveness of interactive query expansion. The initial search produced on average three highly relevant documents; the query expansion search produced on average nine further highly relevant documents. The conclusions highlight the need for more research on: interactive query expansion, the comparative evaluation of automatic vs. interactive query expansion, the study of weighted Web-based or Web-accessible retrieval systems in operational environments, and for user studies in searching ranked retrieval systems in general.
The notions aging, obsolescence, impact, growth, utilization, and their relations are studied. It is shown how to correct an observed citation distribution for growth, once the growth distribution is known. The relation of this correction procedure with the calculation of impact measures is explained, More interestingly, we have shown how the influence of growth on aging can be studied over a complete period as a whole. Here, the difference between the so-called average and global aging distributions is the main factor, Our main result is that growth can influence aging but that it does not cause aging, A short overview of some classical articles on this topic is given, Results of these earlier works are placed in the framework set up in this article.
Research findings from the organizational theory tend to support the position that management uses Information Technology (IT) to maintain existing organizational hierachy and control. Another body of research from information technology advocates suggests that Information Technology's inherent capabilities transform organization hierarchy and control outside of management's control. In addition, advocates from governmental change toward a more responsive type of government advocate adoption of If as a form of change mechanism. This aritcle explores these conflicting positions. The authors examines one instance of the development of a form of network organization within the federal government, and the processes of IT change that have occurred over the past 20 years. The agency selected for study is the Federal Emergency Management Administration.
With new interactive technology, information science can use its traditional information focus to increase user satisfaction by designing information retrieval systems (IRSs) that inform the user about her task, and help the user get the task done, while the user is on-line interacting with the system. By doing so, the system enables the user to perform the task for which the information is being sought. In previous articles, we modeled the information flow and coding operations of a user who has just received an informative ins message, dividing the user's processing of the ins message into three subsystem levels. In this article, we use Kintsch's proposition-based construction-integration theory of discourse comprehension to further detail the user coding operations that occur in each of the three subsystems. Our enabling devices are designed to facilitate a specific coding operation in a specific subsystem. In this article, we describe an ins device made up of two separate parts that enable the user's (1) decoding and (2) encoding of an ins message in the Comprehension subsystem.
This research investigated conceptual alteration in medical article titles translation between English and Chinese with a twofold purpose: one was to further justify the findings from a pilot study, and the other was to further investigate how the concepts were altered in translation. The research corpus of 800 medical article titles in English and Chinese was selected from two English medical journals and two Chinese medical journals. The analysis was based on the pairing of concepts in English and Chinese and their conceptual similarity/dissimilarity via translation between English and Chinese. Two kinds of conceptual alteration were discussed: one was apparent conceptual alteration that was obvious with addition or omission of concepts in translation. The other was latent conceptual alteration that was not obvious, and can only be recognized by the differences between the original and translated concepts. The findings from the pilot study were verified with the findings from this research. Additional findings, for example, the addition/omission of single-word and multiword concepts in the general and medical domain and, implicit information vs. explicit information, were also discussed. The findings provided useful insights into future studies on crosslanguage information retrieval via medical translation between English and Chinese, and other languages as well.
Many aspects determine the quality of scientific journals. The impact factor is one of these quantitative parameters. However, the impact factor has a strong dependence on the journal discipline. This dependence forbids a direct comparison between different journals without introducing external considerations. In this paper, a renormalized impact factor, F-r, inspired in the definition of dimensionless physical parameters, is proposed. F-r allows a direct comparison among journals classified into different categories and, furthermore, the time evolution analysis of the journal's role in its field.
Peer review is a basic component of the scientific process, but its performance has seldom been evaluated systematically. To determine whether pre-approval characteristics of research projects predicted the performance of projects, we conducted a retrospective cohort study of all 2744 single-centre research projects financed by the Spanish Health Research Fund since 1988 and completed before 1996. Peer review scores of grant applications were significant predictors of performance of funded projects, and the likelihood of production was also higher for projects with a basic research component, longer duration, higher budget or a financed research fellow. Funding agencies should monitor their selection process and assess the performance of funded projects to design future strategies in supporting health sciences research.
Authorship and citation patterns in major journals in operational research (OR) are analysed. As a forerunner of interdisciplinary specialties applying mathematical or quantitative methods to social problems, OR has recently been in severe competition with new challengers with respect to applicable methods and real implementation. Through the analyses of authorship and citation patterns, this paper discusses behaviours of the journal editors and contributors with regard to the competition and reform policy of OR journals.
In this paper, I will argue that the process of ageing in scientific publications on the one hand, and the process of obsolescence and forgetting to which all kinds of phenomena, people and events are exposed on the other, develop with the same speed. Whereas in the literature on the subject it is stared that the speed of the ageing of scientific literature is exponential, it is shown that the decay from 'age 4' is best described by an inverse function, as was already brought to light in reference to forgetting of people and events as measured by the frequencies of calendar years in large text corpora. The empirical bases are SCI data as presented by Nakamoto and various files of reference data collected by the author. It is shown that the decay curve of the reference frequencies from 'age 4' backwards is independent of time.
A journal co-citation analysis of fifty journals and other publications in the information retrieval (IR) discipline was conducted over three periods spanning the years of 1987 to 1997. Relevant data retrieved from the Science Citation Index (SCI) and Social Science Citation Index (SSCI) are analysed according to the highly cited journals in various disciplines, especially in the Library gi Information Science area. The results are compared with previous research that covered the data only from the Social Science Citation Index (SSCI). The analysis reveals that there is no distinct difference between these two sets of results. The results of current study show that IR speciality is multi-disciplinary with broad relations with other specialities. The field of IR is a mature field, as the journals used for research communication remained quite stable during the study period.
The paper is a bibliometric study of the publication patterns and impact of South African scientists 1981-96, with special emphasis on the period 1992-96. The subject fields surveyed are Physics; Chemistry, Plant and Animal Sciences; and Biochemistry/Microbiology. Scientists were selected from the ten universities of the Eastern Cape, Western Cape and KwaZulu Natal, which vary considerably, with respect to standards of education, quantity of publications, development and overall progress. The general purpose is two-fold: 1) to observe the publication and citation trends during 1981-96, a period which covers significant policy changes in the country, in particular the end of apartheid 1994; within this context 2) to investigate the patterns used by scientists 1992-96 from these different institutions in publishing the results of their research in the form of conference papers or (inter)national journals. The study collected two sets of data through a scientometric analysis of Science Citation Index and a questionnaire. With the exception of Physics, the results demonstrate a decreasing South African world share, in particular for Plant & Animal Sc. publications, and a similar decline of citations starting in 1986/87. Further, the citation impact relative to the world, after a substantial drop 1985-93 probably representing the international embargo period, in 1994-96 reaches the same level as observed in 1985-89. Also, the study shows that there is a direct relation between academic position, research experience and productivity among South African Scientists in the four scientific disciplines.
Numbers of patents cannot indicate the state of research or the contents of patent documentation cannot indicate the true technological features achieved. Patent statistics though so used, is not a good indicator of the economic returns to investments in research. Use of this statistics for understanding the degree of competition and the competition-driven research strategy is attractive. A patent document is part of the public knowledge in such a way as to restrict the growth of the future public knowledge. This portent on the future content of research and on the number and areas of research, by a current application is a competition-defining aspect. This effect on the lagged future applications and accepting patent disclosure as an intentional strategic data - are the most significant characteristics of patent statistics. The present paper applied this understanding, and generated a number of indices derived from data bases on patenting. These are indicators on Competition,Technology Pool, Language Technology Pool, Modified Competition, Market Attractiveness and on the Strength of Patent Market. Values of these indicators for biotechnological research and for several countries have been derived as example.
The paper is introducing an economic method (interindustry relations analysis) into studies of autopoietic systems and shows its application to scientometrics, which can also be regarded as the analysis of autopoietic systems. The merit of the application is discussed, and the outline of the proof of a related theorem is suggested in the appendix.
In an earlier study,(1) a methodology was described for identifying Frontier Areas in a research field, i.e., areas which experienced in a particular time period significant increase in research output in comparison to a preceding time period. The application of this methodology was shown by identifying Frontier Areas of research in Physics in 1995. Comparison was done with respect to the outputs in different areas in 1990, Profiles of countries active in the identified Frontier Areas were then constructed. In this paper, attempt is made to reveal the active research topics/themes within these Frontier Areas in 1990 and 1995. The active research topics, which are uncovered, are classified as Frontier Topics. Countries active in these frontier topics are distinguished in each time period. Association among countries and Frontier Topics are observed using the multivariate technique of correspondence analysis. Dynamics are observed by analysing the changes in the profiles of the countries in the two time periods. Results and implications of this study for decision-making and as a policy tool are highlighted.
This study applies a method of author co-citation analysis to examine the intellectual structure of political communication study. Fifty one influential authors were selected from active members of the Political Communication Divisions of the International Communication Association (ICA) the National Communication Association (NCA), and the American Political Science Association (APSA). The results of the multidimensional scaling analysis and cluster analysis of these 51 selected authors' co-citation patterns show that intellectual fragmentation exists in political communication research; scholars with different academic backgrounds exhibit specialties using particular research approaches to study certain subjects in the field: scholars do not have much information exchange, and thus they are intellectually separate and confined within the boundaries of each fragment. The findings of this quantitative study complements and cross-validates the assessment made by other traditional qualitative reviews about the field.
We present a characterization of bibliometric output in Colombia resulting from research projects financed by COLCIENCIAS between 1983 and 1994 in the following programs: Health Sciences; Basic Science; Energy and Mining; Agricultural Sciences; Technological, Industrial and Quality Development; Marine Sciences; Social Sciences; Education; Environment and Habitat; Electronics, Telecommunications and Information Systems. In the case of periodicals, we establish: patterns of production by author; patterns of publication in national journals vs. international journals; the effect of international collaboration in projects over publication in international journals; patterns of bibliometric production by fields of research using UNESCO classifications; a list of the most frequently used journals by Colombian researchers as vehicles to communicate their results; patterns of bibliometric production from Colombian institutions; geographical distribution of bibliometric output; and finally, a review on the mean number of authors of articles for some fields of science and technology. We present also theses production patterns for books and B.Sc., MSc. and PhD. theses using UNESCO codes of the projects. We comment on the human resources formation. It is found as a dominant behavior of the so commented patterns a low index of publication per project and a high tendency in the distribution of publications to concentrate on few actors (researchers, institutions, origin of the publication, journals, human resources). It is also found that there exists a strong concentration of bibliometric output in the program of Basic Sciences, in fields such as phytochemistry and solid state physics (super and semiconductors).
There is sufficient evidence to prove the potential of immobilized enzymes to be commercially successful in many industries, but a survey of products in biotechnology and some reports indicate its limited success. To visualize the factual status, the present study looks into trends and profiles of this field using scientometric methods. The salient results show a steady decline in outputs in the form of patents and publications since 1993 along with a decline in the number of groups from academics and industries. Among the countries involved, there is also a decline, though USA and Japan show some strength in basic and applied research, respectively.
The emphasis of validity as a publication content was investigated in dissertations and journal articles. The time of first publication, longitudinal publication profile, ratio of articles to dissertations, and time lag between dissertations and articles emphasizing validity were compared within and among various fields. A three-decade gap separated the first field adopting validity-related contents in its dissertations from the latest fields that did so. The longitudinal data suggested three groups of fields (Agricultural Sciences, Applied Sciences and Social Sciences) which showed consistent differences among groups and consistent similarities within groups in their emphasis on validity-related content. Adoption of validity-related content in dissertations always preceded adoption of validity-related content in journal articles. On average, less than 4% of journal articles included validity-related content across fields. These findings support the hypothesis that validity has been introduced and disseminated within fields following patterns predicted by diffusion of innovations theory. It is argued that this pattern is inconsistent with an efficient and interdisciplinary utilization of available knowledge. Policy recommendations are made for developing strategic communication and education programs for academicians and journal reviewers.
In honor of the centennial of the American Astronomical Society, we asked 53 senior astronomers to select what they thought were the most important papers published in the Astronomical Journal or Astrophysical Journal during this century. This selection of important papers gives us the opportunity to determine whether important papers invariably produce high citation counts. We compared those papers with control papers that appeared immediately before and after the important papers. We found that the important papers published before 1950 produced 11 times as many citations on the average as the controls and after 1950, 5.1 times as many citations. Of the important papers, 92% produced more citations than the average for the control papers. Therefore important papers almost invariably produce many more citations than others, and citation counts are good measures of importance or usefulness. An appraisal of the 53 papers is that three are primarily useful collections of data or descriptions, 46 are fundamental studies giving important results, and four are both useful and fundamental. The lifetimes of all 53 important papers average 2.5 times longer than for the controls. The ages of the authors of these important papers ranged from 23 to 70, with a mean of 39+/-11 years, indicating that astronomers can write important papers at any age.
The contribution of Turkish researchers to positive sciences is increasing. Turkish scientists published more than 5100 articles in 1998 in scientific journals indexed by the Institute for Scientific Information's Science Citation Index, which elevated Turkey to the 25(th) place in the world rankings in terms of total contribution to science. In this paper, we report the preliminary findings of the bibliometric characteristics (authors and affiliations, medical journals and their impact factors, among others) of a total of 8442 articles published between 1988 and 1997 by scientists affiliated with Turkish institutions and indexed in the MEDLINE database.
This paper analyses communications between statistical methodology and applied statistics in terms of the similarity and dissimilarity in their authorship and citation patterns, and further the communication distance between them in terms of mutual citation and the time lag therein. Hypotheses are presented on their difference and distance and are verified for data from the Journal of the Royal Statistical Society, the oldest statistical society in the world. The data analysis reveals that they are indeed different and distant each other to a certain extent but less distinctly than initially conjectured in the hypotheses.
This article compares empirically the major factors affecting blinded and sighted reviewers in the selection of research proposals to be funded in a "scientifically small" country. Fisher's Z-test shows that the applicant characteristics (rank of undergraduate school where the applicant studied, professional age of the applicant, and academic recognition of the applicant) are the major factors leading to the significantly different evaluation scores between blinded and sighted reviewers. This means that "open" evaluation of research proposals is obviously biased. Policy implications of the findings and future research directions are discussed.
The development of publication activity and citation impact in Scandinavian countries is studied for the 1980-1997 period. Besides the analysis of trends in publication and citation patterns and of national publication profiles, an attempt is made to find statistical evidences of the relation between international co-authorship and both research profile and citation impact in the Nordic countries. A coherent Scandinavian cluster has been found, and the Nordic countries have strong co-authorship links with highly developed countries in West Europe and North America. It was found that international co-authorship, in general, results in publications with higher citation rates than purely domestic papers. International collaboration has, however, not the same influence on publication profiles and citation impact of each analysed countries.
This paper aims to contribute to a better understanding of patent citation analysis in general and its application to novel fields of science and technology in particular. It introduces into the subject-matter by discussing an empirical problem, the relationship of nano-publications and nano-patents as representations of nano-science and nano-technology. Drawing on a variety of sources, different interpretations of patent citations are presented. Then, the nature of patent citations is further investigated by comparing them to citations in the scientific literature. After characterizing the citation linkage as indicators of reciprocal relationships between science and technology, patent citations in nano-science and technology are analyzed in terms of interfield and organizational knowledge-flows.
As a basis for policy decisions, governments are increasingly using analysis of systems of innovation. Fundamental to the systems of innovation approach is the recognition that innovation processes essentially are interactive activities. The present paper illustrates the use and limitations of bibliometries in analysing the knowledge production and knowledge flows in a section of an innovation system focusing on life science subject fields relevant to innovation processes in biotechnology. Bibliometrics can in this context be used to identify the actors in a research intensive innovation system, the scientific profiles of actors as well as identifying networks and collaboration patterns.
The article is reporting the results of the first part of an extensive informetric analysis of the Welfare topic, carried out in 1998-1999. The aim was to analyse the structure of the literature of international Welfare research, to provide a detailed picture of its basic theoretical and empirical concepts and the mutual relations existing between these concepts. The approach is novel in that through the application of quantitative (i.e., bibliometric) techniques it tries to reduce subjectivity in domain analysis and in the mapping of the developments and segmentation in special topical areas. The analysis used the technique of co-ordinated online searches in a cluster of international bibliographic databases in DIALOG. The identified 13 sub-topics have been in detail analysed, in three time intervals. By measuring trends and developments in the number of publications, term occurrences, similarity between the subject terms and formation of clusters among the subject segments the analysis provides a comprehensive review of such a complex research field as the Welfare State is. The study, which primary aim is to improve the methodology of quantitative analysis in the so called "soft" sciences, will increase the interest among social scientists, scholars of the humanities and library and information science to use databases as analytical tools and to apply the modern text mining techniques for the extraction of knowledge from bibliographic data.
The paper examines the applicability of informetric methods to trace the pattern of debate about the three main critical issues of the modem Welfare State in Denmark: economic aspects, legitimacy and functionality. The methodology of issue tracking is used to follow the developments of these issues in periods through national databases of various types covering information about the research, implementation, press and legislation aspects. The approach taken is novel in that it implements and tests issue tracking in this area of social sciences, and tries to reduce subjectivity in the analysis of trends influencing social policy and public opinion. The study aims to show how the emerging data and text mining techniques can be applied to integrate downloaded bibliographic data with other types of information in a strategic mix.
We present some features that characterise the mobility and interaction of researchers within a given S&T environment. The variable of interest is the number of research proposals submitted for funding. The model is applied to the case of Colombia and the following results are exhibited: a) a "flux matrix" that characterises the "interactions" as a function of rime between researchers and COLCIENCIAS (national S&T funding agency). Some properties of the matrix are established and a "probability" for a researcher who has previously submitted a proposal to reenter is calculated as a function of time. It is found that this probability is approximately time-independent, at least for the next 7 years after first researcher's appearance; b) patterns of interaction between researchers/institutions and COLCIENCIAS, seen through the number of presented proposals. The interaction assumes the will-known form encountered in these kinds of distributions: a small set of actors (researchers/institutions) is responsible for most of the interaction; c) a temporal pattern for mean researcher's age is established and it is found that by the end of the observed period researchers start to interact in ages that are significantly greater than those observed at the beginning.
This article is an empirical study of two science and health policy controversies - "to screen or not to screen" with ultrasound in pregnancy and with mammography for breast cancer. In each case, conflicting experimental results have been published. Which of the results have been accepted within the medical science community? The article is also a theoretical and methodological study of three views of science - an institutional view, an interests view, and a semiotic view. How might each approach scientific publications as evidence? Could they be eclectically combined in a more complex view of science discourse?
The first-citation distribution, i.e. the cumulative distribution of the time period between publication of an article and the time it receives its first citation, has never bean modelled by using well-known informetric distributions. An attempt to this is given in this paper. For the diachronous aping distribution we use a simple decreasing exponential model. For the distribution of the total number of received citations we use a classical Lotka function. The combination of these two tools yield new first-citation distributions. The model is then tested by applying nonlinear regression techniques. The obtained fits are very good and comparable with older experimental results of Rousseau and of Gupta and Rousseau. However our single model is capable of fitting all first-citation graphs, concave as well as S-shaped; in the older results one needed two different models for it. Our model is the function Phi(t(1)) = gamma(1-a(t1))(alpha-1). Here gamma is the fraction of the papers that eventually get cited, t(1) is the time of the first citation, a is the aging rate and a is Lotka's exponent. The combination of a and alpha in one formula is, to the best of our knowledge, new. The model hence provides estimates for these two important parameters.
The paper addresses the potential of Internet mailing lists to enhance academic research with respect to Gibbons' distinction between Mode I and Mode II knowledge production (Gibbons et al., 1994). We examine threaded email messages in a selection of Self-Organization and Science & Technology Studies oriented Internet mailing lists to illustrate the internal dynamics involved in the electronic production of knowledge. Of particular interest is the EuroCon-Knowflow mailing list which houses the electronic communication of the Self-Organization of the European Information Society (SOEIS) research group. The research focuses upon the discussion threads of mailing lists. The use of threaded messages as our hermeneutic units of analysis provides the basis for a reflection upon three key theoretical positions: Medium Theory, Actor-Network Theory, and Self-Organization Theory. With respect to the latter, we measure for self-organized criticality by comparing the frequency and size of threaded messages. Using this and other methods as operationalized modes of theorizing we reveal network dynamics particular to the Internet mailing list.
Internationally co-authored publications may be regarded as an indicator of scientific cooperation between countries and is of interest in science policy. In this study, the extent of international collaboration in Indian science has bean estimated from SCI data in 1990 and 1994. We find an increase in collaboration both in terms of output and the extent of the network and significantly higher impact (IF) associated with internationally co-authored papers in several disciplines. However, there was no significant increase in IF of collaborative papers over time, whereas Indian papers in general showed a statistically significant, though small, increase in average impact from 1990 to 1994. The bulk of Indian scientific co-operation was with the developed Western nations and Japan, but it was often the smaller countries with a few coauthored papers which showed higher average impact. Go-operation with South Asian countries, initially low, has doubled in four years. By a combination of multivariate data analysis techniques the relative positions of India's partners in scientific collaboration have been mapped with respect to the fields of co-operation.
A third cohort of(mostly) young astronomers, who earned their PhDs around a median date of 1994 and who have recently applied for election to membership in the International Astronomical Union from the USA or for tenure-track faculty positions has been added to earlier samples (median years of PhD 1982 and 1962.5), and the samples examined for demographic trends. The three groups are of similar size (304, 269, and 268 astronomers from earliest to latest). The third, youngest, cohort includes more foreign-born and/or trained scientists than either of the earlier ones (about 1/2 vs. about 1/4) and more women (about 15% vs. about 10% For the two earlier groups). The median length of time From BS or BS to PhD, which had lengthened from 4 to 6 years, has apparently leveled off at 6 years. And, compared to the previous "young" sample, the present one includes many more job seekers and many fewer IAU aspirants.
This paper focuses on the measurement of scientific and technological performance of Korea and Taiwan in what has been the most successful technology catch-up within developing economies context. The performance measures are based on the publication data for scientific knowledge production and patent data for technological capabilities. In addition, this analysis also reveals on the features of innovation system of these two countries, focusing on the linkages between public and private sector in the scientific and technological knowledge: creation. By examining the scientific and technological performance and the changing structure of innovation system, it provides empirical evidence on the positive interaction between scientific and technological activities.
Iranian scientific publications in the Science Citation Index for two five-year periods, 1985-1989 and 1990-1994, were compared. Distributions of various attributes of the publication output for the two periods were obtained primarily through the Rank command of the Dialog Online System. Results include: productivity by publication year and by ranked order of the most productive iranian authors; influence or impact of the most productive Iranian authors by ranking them as cited authors; collaboration of Iranian scientists with scientists from other countries; and the journals Iranian scientists published in and the journals they cite in their papers. The subject areas of Iran's scientific publications were examined vis-a-vis the world's publication output and that of the Third World Countries (TWC).
This paper addresses two related issues regarding the validity of bibliometric indicators for the assessment of national performance within a particular scientific field. Firstly, the representativeness of a journal-based subject classification; and secondly, the completeness of the database coverage. Norwegian publishing in microbiology was chosen as a case, using the standard ISI-product National Science Indicators on Diskette (NSIOD) as a source database. By applying an "author-gated" retrieval procedure, we found that only 41 percent of all publications in NSIOD-indexed journals, expert-classified as microbiology, were included under the NSIOD-category Microbiology. Thus, the set of defining core journals only is clearly not sufficient to delineate this complex biomedical field. Furthermore, a subclassification of the articles into different subdisciplines of microbiology revealed systematic differences with respect to representation in NSIOD's Microbiology field; fish microbiology and medical microbiology are particularly underrepresented. In a second step, the individual publication lists from a sample of Norwegian microbiologists were collected and compared with the publications by the same authors, retrieved bibliometrically. The results showed that a large majority (94%) of the international scientific production in Norwegian microbiology was covered by the database NSIOD. Thus, insufficient subfield delineation, and not lack of coverage, appeared to be the main methodological problem in the bibliometric analysis of microbiology.
Two key features of science are its rapid growth and its continuous differentiation. The establishment of new journals can be seen as an expression of both growth and differentiation. In this study of the network among management journals, the focus is on forms of differentiation, i.e., the relationship between stratification and specialization in a network of journals. The question asked in this study is whether the different position of American and European journals corresponds with different levels of specialization. A tendency toward such a structuration of the journal network would indicate an interregional integration of management research. Articles published in six of the most influential American and European journals covering the period from 1981 to 1998 have been downloaded. The findings in this study indicate that even though European journals formed a periphery in relation to the American journals in terms of clearly asymmetrical exchange relations, it was the European journals that seemed to be more comprehensive in scope. The tendency during the investigated period indicated differentiation in terms of segmentation rather than specialization.
The article covers the period 1989-1998. It investigates the results and meaningfulness of applying the Social Science Citation Index (SSCI, ISI, USA) to publication and citation studies of nine selected Social Science research areas in Scandinavia by analysing the international visibility, the research profiles, and relative citation impact. The study demonstrates that the areas Economics, Political Science, Sociology & Anthropology, Social Policy, Language & Linguistics, and, for Denmark and Finland, Information & Library Science as well as, for Sweden, Management studies, are well anchored internationally with a visibility in line with common S&T domains. The journal article world share of the region is increasing rapidly. Other small European countries, like the Netherlands, are even more substantially represented as regards citation analyses. The conclusion is that SSCI, although biased towards Angle-American publications, actually makes room for valid bibliometric and scientometric analyses of research published by Scandinavian and other smaller countries with English as the second language in journals regarded international by ISI.
This paper makes the assumption that Norwegian patenting in the US reflects a quasi-universe of Norwegian technological capabilities. Based on this assumption, the paper combines a "patent-bibliometrics" and a "technometrics" approach to study other relevant bodies of knowledge these capabilities build upon. In order to study interactions at the "science-technology-innovation interface", the paper maps the citation patterns that radiate from the patent population (1990-96) to other areas of technology (patent-citations) and to science-bases (citations to Non-Patent Literature or NPL). The study identifies important technology-technology links that involve machinery, process-engineering and chemical and significant science-technology links that involve pharmaceuticals and instruments.
The mapping of author networks at academic departments is the focus of this study. Papers from two departments at two different universities, but within the same field of research, were analyzed in terms of co-authorship, direct and indirect citations among the authors. Considerable overlap was found between the co-authorship and the citation based networks. The paper also introduces the idea of socio-bibliometric maps that can be used to make social interpretations of bibliometric networks. The nodes of the networks were labeled by sex and seniority and supervisor-student links were also indicated. When reading the maps and tabulating the links it could be concluded that the two departmental networks were structured differently by sex and seniority.
The emergence of patent bibliometrics as a new branch of scientometrics necessitates a deeper understanding of the relationship between patents and papers. As this connection is established through the Linkage between patents and research papers, one must have a clear idea of similarities and differences between patent and paper citations. This paper will investigate to what extent one can not only apply bibliometric methods to patents but also extend the existing interpretative framework for citations in research papers to the field of patent citations. After pointing out some parallels in the debates about the nature of citations in patents and scientific articles, the paper outlines those parts of bibliometric theory covering scientific citations that could be relevant to patent citations too. Then it highlights the specialties and peculiarities of patent citations. One major conclusion is that the general nature of a common framework for both scientific and patent citations would severely limit its usefulness, but research on academic citations might still be a great source of inspiration to the study of patent citations.
To analyse the relationship between research group size and scientific productivity within the highly cooperative research environment characteristic of contemporary biomedical science, an investigation of Norwegian Microbiology was undertaken. By an author-gated retrieval from ISI's database National Science Indicators on Diskette (NSIOD), of journal articles published by Norwegian scientists involved in microbiological research during the period 1992-1996, a total of 976 microbiological and 938 non-microbiological articles, by 3,486 authors, were obtained. Functional research groups were defined bibliometrically on the basis of co-authorship, yielding a total of 180 research groups varying in size from one author/one article to 180 authors/83 articles (all authors associated with a group during the whole five-year period were included, hence the large group size). Most of Norwegian microbiological research (73% of the microbiology articles) appears to be performed by specialist groups (with greater than or equal to 70% of their production as microbiology), the remainder being published by groups with a broader biomedical research profile (who were responsible for 95% of the non-microbiological articles). The productivity (articles per capita) showed only moderate (Poisson-distributed) variability between groups, and was remarkably constant across all subfields, at about 0.1 article per author per year. No correlation between group size and productivity was found.
This paper compares external research collaboration in small science systems. The design involves studying research collaboration in an independent country (Iceland) and a region of a large country (Newfoundland, Canada). The objective of the paper is firstly to gain a deeper understanding of external research collaboration in small science systems by using both quantitative and qualitative methods and secondly to examine if it is justifiable to compare small regions and small independent countries in terms of their scientific activities. The two science systems are compared with respect to their publication patterns in order to explore how comparable they are in their scientific profiles. External collaboration rates for both science systems are then measured and compared, and it is shown that research collaboration plays an important part in the two science systems. The role of research collaboration is examined further with a combination of bibliometric analysis and interview data. It was found that scientists in small science systems do not collaborate only because they lack economic resources, but an important reason for their collaboration was the availability of research material which was in demand by scientists in the wider scientific world.
The aim of the study is to study empirical use of Bradford's law for decisions concerning information systems in problem based fields, were journals from different disciplines cover relevant information. Results of comparison of the cores in different fields can be used as a base for tailoring an information system. Bradford's law is in the study applied on five databases in the topic "Information retrieval and seeking" in order to compare the size and titles of the core journals. These databases give different views of the same interdisciplinary topic. Problems are relevance judgements, which change the shape of the graphs, and consistency of concepts in the analysis. The results show that. Bradford analyses can be useful tools in developing information systems.
This article presents and discusses interviews with 50 grade-6 primary school students about their experience of using the Web to find information for a class project. The children discuss the quantity and quality of textual and image information on the Web versus traditional print sources, and the reasons why they made very little use of any moving images and sound clips on the Web. They also discuss how they searched for information on the Web and the ways in which this differs from looking for information in printed sources. The children overall demonstrate a sophistication both in their appreciation of the Web's strengths and weaknesses as an information source, end in their information retrieval strategies. In their reaction to the Web compared with traditional print sources, they can be categorized as technophiles, traditionalists, or pragmatists. The results from this research study suggest that although the Web can make an important contribution to information retrieval by school students, for the time being, at any rate, a role also remains both for other electronic sources such as CD-ROMs and print materials that are targeted specifically st young users. The Web needs both a more straightforward interface and more information specifically aimed at the young before it can seriously threaten its rivals.
We discuss the implementation of a cartographic user interface to bibliographic and other information subspaces in astronomy. This includes a front end to two of the five premier scholarly journals in astronomy. We present a range of comparative assessments, in operational frameworks, of this approach to accessing and retrieving astronomical information, Finally, we discuss the particular role that such cartographic user interfaces can play in Web-based information seeking, and contrast this with widely used currently available search technologies.
Information retrieval (IR) is driven by a process that decides whether a document is about a query. Recent attempts spawned from a logic-based information retrieval theory have formalized properties characterizing "aboutness," but no consensus has yet been reached. The proposed properties are largely determined by the underlying framework within which aboutness is defined. In addition, some properties are only sound within the context of a given IR model, but are not sound from the perspective of the user. For example, a common form of aboutness, namely overlapping aboutness, implies precision degrading properties such as compositional monotonicity. Therefore, the motivating question for this article is: independent of any given IR model, and examined within an information-based, abstract framework, what are commonsense properties of aboutness (and its dual, nonaboutness)? We propose a set of properties characterizing aboutness and nonaboutness from a commonsense perspective. Special attention is paid to the rules prescribing conservative behavior of aboutness with respect to information composition. The interaction between aboutness and nonaboutness is modeled via normative rules. The completeness, soundness, and consistency of the aboutness proof systems are analyzed and discussed. A case study based on monotonicity shows that many current IR systems are either monotonic or nonmonotonic, An interesting class of IR models, namely those that are conservatively monotonic, is identified.
Advances in information technology have dramatically changed information seeking, and necessitate an examination of traditional conceptions of library collection. This article addresses the task and reveals four major presumptions associated with collections: tangibility, ownership, a user community, and an integrated retrieval mechanism. Some of these presumptions have sewed only to perpetuate misconceptions of collection, Others seem to have become more relevant in the current information environment. The emergence of nontraditional media, such as the World Wide Web (WWW), poses two specific challenges: to question the necessity of finite collections, and contest the boundaries of a collection. A critical analysis of these issues results in a proposal for an expanded concept of collection that considers the perspectives of both the user and the collection developer, invites rigorous user-centered research, and looks at the collection as an information-seeking context.
We compare several algorithms for identifying mirrored hosts on the World Wide Web. The algorithms operate on the basis of URL strings and linkage data: the type of information about Web pages easily available from Web proxies and crawlers. Identification of mirrored hosts can improve Web-based information retrieval in several ways: first, by identifying mirrored hosts, search engines can avoid storing and returning duplicate documents. Second, several new information retrieval techniques for the Web make inferences based on the explicit links among hypertext documents-mirroring perturbs their graph model and degrades performance. Third, mirroring information can be used to redirect users to alternate mirror sites to compensate for various failures, and can thus improve the performance of Web browsers and proxies. We evaluated four classes of "top-down" algorithms for detecting mirrored host pairs (that is, algorithms that are based on page attributes such as URL, IP address, and hyperlinks between pages, and not on the page content) on a collection of 140 million URLs (on 230,000 hosts) and their associated connectivity information. Our best approach is one which combines five algorithms and achieved a precision of 0.57 for a recall of 0.86 considering 100,000 ranked host pairs.
Relative own-language preference depends on two parameters: the publication share of the language, and the self-citing rate, Openness of language L with respect to language J depends on three parameters: the publication share of language L, the publication share of language J, and the citation share of language J among all citations given by language L. It is shown that the relative own-language preference end the openness of one language with respect to another one, can be represented by a partial order. This partial order can be represented by a polygonal line (for the relative own-language preference) or a three-dimensional solid (for openness), somewhat in the same spirit as the Lorenz curve for concentration and evenness. Any function used to measure relative own-language preference or openness of one language with respect to another one should at least respect the corresponding partial orders. This is a minimum requirement for such measures. Depending on the use one wants to make of these measures other requirements become necessary. A logarithmic dependence on the language share(s) seems a natural additional requirement, This would correspond with the logarithmic behavior of psychophysical sensations. We give examples of normalized functions satisfying this additional requirement. It is further investigated if openness partial orders can lead to measures for relative own-language preference. The article ends with some examples related to the language use in some sociological journals.
The Protein Annotators' Assistant (or PAA) (http://www.ebi.ac.uk/paa/) is a software system which assists protein annotators in the task of assigning functions to newly sequenced proteins. Working backward from SwissProt, a database which describes known proteins, and a prior sequence similarity search that returns a list of known proteins similar to a query, PAA suggests keywords and phrases which may describe functions performed by the query. In a preprocessing step, a database is built from the protein names that appear in the SwissProt database, and against each protein are listed key words and phrases that are extracted from the corresponding text records. Common words either in general English usage or from the biological domain are removed as the phrases are assembled, This process is assisted by the use of a simple stemming algorithm, which extends the list of stop-words (i.e., reject words), together with a list of accept-words. At runtime, the search algorithm, invoked by a user via a Web interface, takes a list of protein names and clusters the named proteins around keywords/phrases shared by members of the list. The assumption is that if these proteins have a particular keyword/phrase in common, and they are related to a query protein, then the keyword/phrase may also describe the query. Overall, PAA employs a number of in techniques in a novel setting and is thus related to text categorization, where multiple categories may be suggested, except that in this case none of the categories are specified in advance.
The need for information science (IS), curricula was recognized in the early 1960s, Drexel's IS curriculum was one of the earliest. I recall the background, need, and Drexel's response as learned from experience as student and faculty member in the program and comment specifically on the students, faculty, and coursework. I suggest that disciplines emerge from interdisciplinary sources in response to a perceived need, As they merge as disciplines, subdisciplines branch off. The fundamental framework remains.
This paper analyzes the communication between science and technology journals (STS), to illustrate patterns of differentiation and integration within scientific fields. First the STS field is delineated, using journal-journal citations as the empirical base. A strong and increasing differentiation is found, between 'qualitative STS','quantitative STS' (scientometrics), and 'S&T policy studies'. Given this process of differentiation. the relations between the three sub-fields of STS are analyzed, in terms of mutual flows of information, the joint information base, and research topics. Is differentiation and codification of sub-fields visible? The findings suggest that the relations between qualitative and quantitative STS are one-sided, and that integration between the sub-fields is almost completely lacking. However, the relations between scientometrics and S&T policy studies are much stronger and more substantial, and the same is the case for scientometrics and information science.
It is shown how Bradford curves, i.e. cumulative rank-frequency functions, as used in informetrics, can describe the fragment size distribution of percolation models. This interesting fact is explained by arguing that some aspects of percolation can he interpreted as a model for the success-breeds-success or cumulative advantage phenomenon. We claim, moreover, that the percolation model can be used as a model to study (generalised) bibliographies. This article shows how ideas and techniques studied and developed in informetrics and scientometrics can successfully be applied in other fields of science, and vice versa.
This paper deals with performance measures and performance indicators in the Impala electronic document ordering and delivery system for research libraries in Belgium and compares these with some international standards as, e.g., the ProLib/Pi study commissioned by the European Commission. Performance measures: Costs(clearinghouse principle) Number of ILL requests made to other libraries Number of ILL requests made to other libraries without success Number of ILL requests made to other libraries with success Number of ILL requests received from other libraries Number of ILL requests received from other libraries and not satisfied Number of ILL requests received from other libraries that were satisfied Frequently asked titles Performance indicators: Success rate Borrowing-lending ratio per library Response times, split into several segments of the ILL-procedure The article concludes with some indications for quality measurement in electronic document delivery where Impala will be able to measure the real supply times as perceived by the end user.
Two common ways to measure the "output" of a researcher (or research group) are to count numbers of publications and to count the citations (references to these publications in publications of others). These simple methods are flawed because they cannot easily take into account the differences in publication and citation habits in different scientific communities. An alternative approach is to view citations as hypertext links. and to use or adapt hypertext metrics to compare the scientific output of researchers, in comparison to that of others in areas with similar publication and citation patterns. We show how hypertext metrics, introduced by Botafogo, Rivlin and Shneiderman, can be modified in order to identify comparable research fields based on their publication and citation pattern. An author's performance can then be compared to that of others in research fields with a similar pattern.
N-grams are generalized words consisting of N consecutive symbols, as they are used in a text. This paper determines the rank-frequency distribution for redundant N-grams. For entire texts this is known to be Zipf's law (i.e., an inverse power law). For N-grams, however, we show that the rank (r)-frequency distribution is P-N(r)=C/(psi(N)(r))(beta), where psi(N) is the inverse function of f(N)(x)=x ln(N-1)x. Here we assume that the rank-frequency distribution of the symbols follows Zipf's law with exponent beta.
In 1988 Le Pair postulated the existence of a citation gap for technological research. Several cases were studied, which confirmed his hypothesis. In the same period the use of bibliometric indicators for policy purposes increased. Here we saw the citation gap causing a disadvantage for application-oriented research groups. This is not merely an injustice, it also leads to suboptimum use of available funds, to the detriment of science as a whole. In addition, it may, in the long term, undermine the reputation of scientometrics as a science ih its own right.
Using percentage performance shares of individual member states, the European Union can be assessed as if it were a network publication system. The prediction of systemness (based on the Markov property of the distribution) can be tested against the predictions of trend lines for individual nations. The publication performance of the EU can also be compared to that of the USA and Japan. The results suggest that a comparison with (global) world trade is important for understanding developments between the various R&D systems. Predictions for the 1999 indicator values are also provided.
This paper presents an overview of recent R&D policy developments in Flanders and Belgium. Special attention is paid to evaluation and monitoring, which are seen as central elements of the Flemish Government's more dynamic science and technology policy. The paper describes the process of setting up the necessary instruments to perform bibliometric studies and the application of these instruments for drawing a profile of the natural, life and technical sciences research carried out in Flanders. Although the total publication output weighted by population or regional wealth, is still lower than that of other small, highly industrialised countries, the international visibility of this research is comparable, if not slightly higher.
This article focuses on issues concerning science and technology relationships posed by the emergence of a new drug discovery method, namely, combinatorial chemistry and biology. We assess the scientific content of combinatorial chemistry and biology using citations in patents to scientific journals and compare this research platform with biotechnology. We also identify the institutional affiliation of all the authors of the cited papers, which leads us to an analysis of knowledge spillovers between the main participants in the research network. Finally, we examine the relevance of localisation in the process of knowledge exchange with regard to EU countries and the US. The result of the analysis provides evidence to support the view that the inventive capacity of a country is dependent upon the basic research which is carried out, especially in universities and public research centres located in the inventor's country.
In a bibliometric study of nine research departments in the field of biotechnology and molecular biology, indicators of research capacity, output and productivity were calculated, taking into account the researchers' participation in scientific collaboration as expressed in co-publications. In a quantitative approach, rankings of departments based on a number of different research performance indicators were compared with one another. The results were discussed with members from all nine departments involved. Two publication strategies were identified, denoted as a quantity of publication and a quality of publication strategy, and two strategies with respect to scientific collaboration were outlined, one focusing on multi-lateral and a second on bi-lateral collaborations. Our findings suggest that rankings of departments may be influenced by specific publication and management strategies, which in turn may depend upon the phase of development of the departments or their personnel structure. As a consequence, differences in rankings cannot be interpreted merely in terms of quality or significance of research. It is suggested that the problem of assigning papers resulting from multi-lateral collaboration to the contributing research groups has not yet been solved properly, and that more research is needed into the influence of a department's state of development and personnel structure upon the values of bibliometric indicators. A possible implication at the science policy level is that different requirements should hold for departments of different age or personnel structure.
On the basis of the measured time-dependent distribution of references in recent scientific publications, we formulate a novel model on the ageing of recent scientific literature. The framework of this model is given by a basic set of mathematical expressions that allows us to understand and describe large-scale growth and ageing processes in science over a long period of time. In addition, a further and striking consequence results in a self- consistent way from our model. After the Scientific Revolution in 16th century Europe, the `Scientific Evolution' begins, and the driving processes growth and ageing unavoidably lead - just as in our biological evolution - to a fractal differentiation of science. A fractal structure means a system build up with subsystems characterised by a power-law size distribution. Such a distribution implies that there is no preference of size or scale. Often this phenomenon is regarded as a fingerprint of self-organisation. These findings are in agreement with earlier empirical findings concerning the clustering of scientific literature. Our observations reinforce the idea of science as a complex, largely self-organising 'cognitive eco-system'. They also refute Kuhn's paradigm model of scientific development.
In the past 30 years various scientometric analyses have provided input data for research policy objectives of research institutions in the Netherlands. In this article we discuss several pioneering studies performed on behalf of the research councils for physics (FOM) and technical sciences (STW), which have played an important role in the early development of scientometrics in this country. The motives for these studies, the results and the influence on research policy are discussed. Relations to present themes in scientometric investigations are drawn.
LUC's research council stimulates research by allocating a part of its funds based on results. The output-financing scheme is presented and its role in the university's research policy is explained. It is shown how this works in practice. An important aspect is that not only articles in JCR-covered journals are included but also other publications. This scheme together with a full-scale scientometric study forms two important aspects (short term versus medium term) of the university's research evaluation exercise. Its success is largely due to a general acceptance by the: scientists.
Patent citations to the research literature offer a way for identifying and comparing contributions of scientific and technical knowledge to technological development. This case study applies this approach through a series of analyses of citations to Dutch research papers listed on Dutch-invented and foreign patents granted in the US during the years 1987-1996. First. we examined the general validity and utility of these data as input for quantitative analyses of science-technology interactions. The findings provide new empirical evidence in support of the general view that these citations reflect genuine links between science and technology. The results of the various analyses reveal several important features of industrially relevant Dutch science: (1) the international scientific impact of research papers that are also highly cited by patents, (2) the marked rise in citations to Dutch papers on foreign-invented patents; (3) the large share of author-inventor self-citations in Dutch-invented patents; (4) the growing relevance of the life sciences, (5) an increase in the importance of scientific co-operation. We also find significant differences between industrial sectors as well as major contributions of large science-based multinational enterprises, such as Philips, in domestic science-technology linkages. The paper concludes by discussing general benefits and limitations of this bibliometric approach for macro-level analysis of science bases in advanced industrialised countries like the Netherlands.
Bibliometric analyses of research in developing countries are interesting for various reasons. The situation of Cuba is rather exceptional. The Cuban Journal of Agricultural Science (CJAS) is the only Cuban research journal, indexed by the Institute of Scientific information's Web of Science (WoS). We explore the possibilities of a citation analysis for Cuban research publications in general and for those in CJAS in particular. For the period 1988-1999, we find that this journal represents 14% of Cuban research publications. cited in the WoS. We remark that the number of self citations is relatively high and even increases since 1995, The results are classified by disciplines and we use a co-citation matrix to discuss the different observed citation patterns.
This paper investigates the impact of large multinational firms on the Dutch technology infrastructure. More specifically, it asks how the structure of the knowledge flows network matters for diffusion of technological knowledge in the Dutch economy. Patent citation analysis based on European Patent applications is used to quantify this network. The paper finds that there are large differences between firms in terms of the density of their 'ego-network', and the amount of knowledge spillovers to the Dutch economy that they generate.
The aim of this work was to provide a rational frame for the design of scientific policies in MR infrastructure implementation. To this end, we have investigated the relationships between MR instruments, their scientific productivity or medical performance and several socio-economic, R&D or health care indicators in a Spanish and European context. The distribution of MR spectroscopy instruments among Spanish Autonomous Communities suggests that the allocation policy resulted from a compromise between the pull of demand based on regional strength in R&D activities and the push of convergence criteria to bring underdeveloped regions up to a national standard. On the whole. the average value for Spanish MR spectroscopy equipment(1.6 units per TRDP) was within the average value of 1.7 found in 6 European countries. The scientific productivity of these spectometres in Spain (10.3 publications per unit), compares with the ratio (12.4) found in the United Kingdom and was above the six countries' average (8.3). Larger differences in productivity were observed between Spanish Autonomous Communities, suggesting the existence of important laguna in the distributive side of the allocation policy. Consistent with its socio-sanitary importance. the regional distribution of MR imaging equipment in Spain correlated with the number of sanitary personnel and regional population or wealth. The average number of installed units per million inhabitants in Spain (3.3) is very close to the average found in five European countries and the diagnostic procedures per installed units are close to the 5 countries' average values of 3400/year. However, the scientific productivity of MR imaging equipment in Spain (1.6 publications per installed unit in the five year period) was very low as compared with other European countries (3.7 on average). Higher diagnostic demand or lower publication pressures could explain these differences equally well. Our results suggest that increases in scientific productivity and medical performance of MR instrumentation in Spanish Autonomous Communities may not necessarily involve a net increase in the number of MR instruments but rather, improvements in the global socio-economic throughputs derived from the organisation of R&D and medical service policies.
This paper studies the main bibliometric figures in order to analyse the "state of the art" and the evolution of research in physics in Catalonia (Spain) between 1981 and 1998 via the National Citation Report (NCR) for Catalonia elaborated by ISI (Institute for Scientific Information). The main indicators and parameters used are: bibliometric size. rate of citation, citedness of papers. concentration of scientific categories, journals and types of paper, index of immediacy, international collaboration, and papers and citation distribution by research centres and universities.
In this study we carried out a content analysis of Web pages containing the search term "S&T indicators". which were located by an extensive search of the Web. Our results dearly show that the Web is a valuable information source on this topic. Major national and international institutions and organizations publish the frill text of their reports on the Web. or allow free downloading of these reports in non-html formats. In addition to direct information, a number of pages listing and linking to major reports, programs and organizations were also located.
The paper investigates Indian organic chemistry research activity during 1971-1989 using Chemical Abstracts. It attempts at quantification of national contribution to world efforts, and identify areas of relative strengths and weaknesses. Also models the growth of Indian organic chemistry output to world organic chemistry output as a whole and in sub-fields where the activity index for the world and India are similar.
A novel method of displaying the publication and citation characteristics of outputs of researchers using a graphical "footprint" has been developed. Its first application has been to compare the publication and citation characteristics of a small group of top UK, and US academic chemical engineers. The footprint demonstrates the Relationship Factors of publications in a number of related disciplines, as defined by ISI's Journal Citation Reports. The technique has been used to compare both individual academics acid each national group as a whole. The results clearly show that US academic chemical engineers are far more interdisciplinary in their output than their UK counterparts. The technique has a number of potential applications, including tracking changes in a discipline over time. tracking individual academics' output over time, and comparing different disciplines for their interdisciplinarity.
As a corollary of former studies, high performance in Brazilian Management Sciences during the period of 1981 to 1995 is put to scrutiny. Information on the 66 papers registered to this field in the ISI databases for this time interval were retrieved, edited and processed as to elicit patterns. Occurrences of highly cited papers seemed haphazard but the presence of collaborative work consistently emerged as an important driving factor for good performance. International collaboration showed the most expressive impact over chances of citation but any form of collaboration seemed to have some effect, even those represented by single authors with double allegiance. Simple addition of authors, nonetheless, had no effect, and thus collaboration involving authors of common institutional affiliation showed the performance of single authored papers. Cluster analysis allowed the identification of patterns of performance, and groups of best performers showed higher levels of international collaboration. The institutional composition of the clusters is presented.
Different approaches are introduced for studying the growth of scientific knowledge, as reflected through publications and authors. Selected growth models are applied to the cumulated growth of publications and authors in theoretical population genetics from 1907 to 1980. The criteria are studied on which growth models are to be selected for their possible application in the growth of literature. It is concluded that the power model is observed to be the only model among the models studied which best explains the cumulative growth of publication and author counts in the theoretical population genetics.
This study(1) assesses the ways in which citation searching of scholarly print journals is and is not analogous to backlink searching of scholarly e-journal articles on the WWW, and identifies problems and issues related to conducting and interpreting such searches. Backlink searches are defined here as searches for Web pages that link to a given URL. Backlink searches were conducted on a sample of 39 scholarly electronic journals. Search results were processed to determine the number of backlinking pages, total backlinks, and external backlinks made to the e-journals and to their articles. The results were compared to findings from a citation study performed on the same e-journals in 1996. A content analysis of a sample of the files backlinked to e-journal articles was also undertaken. The authors identify a number of reliability issues associated with the use of "raw" search engine data to evaluate the impact of electronic journals and articles. No correlation was found between backlink measures and ISI citation measures of e-journal impact, suggesting that the two measures may be assessing something quite different. Major differences were found between the types of entities that cite, and those that backlink, e-journal articles, with scholarly works comprising a very small percentage of backlinking files. These findings call into question the legitimacy of using backlink searches to evaluate the scholarly impact of e-journals and e-journal articles (and by extension, e-journal authors).
Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance, in the process called "data fusion." There are many successful data fusion experiments reported in in literature, but there are also cases in which it did not work well. Thus, if would be quite valuable to have a theory that can predict, in advance, whether fusion of two or more retrieval schemes will be worth doing. In previous study (Ng & Kantor, 1998), we identified two predictive variables for the effectiveness of fusion: (a) a list-based measure of output dissimilarity, and (b) a pair-wise measure of the similarity of performance of the two schemes. In this article we investigate the predictive power of these two variables in simple symmetrical data fusion. We use the in systems participating in the TREC 4 routing task to train a model that predicts the effectiveness of data fusion, and use the in systems participating in the TREC 5 routing task to test that model. The model asks, "when will fusion perform better than an oracle who uses the best scheme from each pair?" We explore statistical techniques for fitting the model to the training data and use the receiver operating characteristic curve of signal detection theory to represent the power of the resulting models. The trained prediction methods predict whether fusion will beat an oracle, at levels much higher than could be achieved by chance.
The aim of this article is to test whether the results obtained from a specific bibliographic research can be applied to a real search environment and enhance the level of utility of an information retrieval session for all levels of end users. In this respect, a Web-based Bibliometric Information Retrieval System (BIRS) has been designed and created, with facilities to assist the end users to get better understanding of their search domain, formulate and expand their search queries, and visualize the bibliographic research results. There are three specific features in the system design of the BIRS: the information visualization feature of the BIRS (cocitation maps) to guide the end users to identify the important research groups and capture the detailed information about the intellectual structure of the search domain; the multilevel browsing feature to allow the end users to go to different levels of interesting topics; and the common user interface feature to enable the end users to search all kinds of databases regardless of different searching systems, different working platforms, different database producer and supplier, such as different Web search engines, different library OPACs, or different on-line databases, A preliminary user evaluation study of BIRS revealed that users generally found it easy to form and expand their queries, and that BIRS helped them acquire useful background information about the search domain. They also pointed out aspects of information visualization, multilevel browsing, and common user interface as novel characteristics exhibited by BIRS.
The method described permits visual analysis of information retrieval experiment results in classic control and treatment group protocols. It is an additional analysis technique for the evaluation of in procedures that may be conducted within the rich terrain of human visual acuity, supported by two well-known statistical measures.
An empirical investigation of information retrieval (IR) using the MEDLINE1 database was carried out to study user behaviour, performance and to investigate the reasons for suboptimal searches. The experimental subjects were drawn from two groups of final year medical students who differed in their knowledge of the search system (i.e., novice and expert users). The subjects carried out four search tasks and their recall and precision performance was recorded. Data was captured on the search strategies used, duration, and logs of submitted queries. Differences were found between the groups for the performance measure of recall in only one of the four experimental tasks. Overall performance was poor. Analysis of strategies, timing data, and query logs showed that there were many different causes for search failure or success. Poor searchers either gave up too quickly, employed few search terms, used only simple queries, or used the wrong search terms. Good searchers persisted longer, used a larger, richer set of terms, constructed more complex queries, and were more diligent in evaluating the retrieved results. However, individual performances were not correlated with all of these factors. Poor performers frequently exhibited several factors of good searcher behaviour and failed for just one reason. Overall end-user searching behaviour is complex and it seems that just one factor can cause poor performance, whereas good performance can result from suboptimal strategies that compensate for some difficulties. The implications of the results for the design of IR interfaces are discussed.
This study aimed at reevaluating the Success search strategy. It was conducted by in-depth interviews with 15 professional information searchers in a three-round Delphi-type research, The panelists were asked to critically analyze the rationale of structured searching that forms the theoretical basis of the Success strategy, and evaluate its key principles and guidelines. Success is a strategy for structured searching, Its rationale is based on the argument that information searching is a sequence of interrelated actions aimed at accomplishing the search assignment. Every action determines the course of the searching, and thus affects its final result, The searcher's reasoning emerges as a key to the success of the searching process. Consequently, it is essential to adopt structured search strategies that are based on rational search procedures and techniques. The Success strategy is grounded in the principle of planning the search according to five basic successive phases, namely assignment, resources, words, method, and evaluation, and seven generic guidelines: (1) define the assignment, (2) locate resources, (3) choose search words, (4) select methodology, (5) execute the search, (6) evaluate the results, and (7) if necessary, repeat the search by refining previous decisions.
Proliferating Web-user interface studies prompt a need for theoretical approaches. This study presents a two-factor model that can guide Website design and evaluation. According to the model, there are two types of Website design factors: hygiene and motivator. Hygiene factors are those whose presence make a Website functional and serviceable, and whose absence causes user dissatisfaction (thus dissatisfiers). Motivator factors, on the other hand, are those that add value to the Website by contributing to user satisfaction (thus satisfiers). An empirical study is conducted in two phases. In Phase I, 44 core features and 12 categories of features were identified by a total of 16 subjects as Web design factors. In Phase II, 79 different subjects distinguished hygiene and motivator factors in the context of a particular Website (CNN.com). The results showed that the two-factor model provides a means for Web-user interface studies. In addition, Subjects in Phase II commented that, as time passes or familiarity increases with certain design factors, their identification of what are hygiene and motivator factors might change, promoting further investigation and possible expansion of the model. Suggestions for Website designs and evaluation, and further research directions are provided.
(E)valuation in information retrieval (IR) has focussed largely on noninteractive evaluation of text retrieval systems. This is increasingly at odds with how people use modern IR systems: in highly interactive settings to access linked, multimedia information. Furthermore, this approach ignores potential improvements through better interface design. In 1996, the Commission of the European Union Information Technologies Programme funded a 9-year working group, Mira, to discuss and advance research in the area of evaluation frameworks for interactive and multimedia IR applications. Led by Keith van Rijsbergen, Steve Draper, and myself from Glasgow University, this working group brought together many of the leading researchers in the evaluation domain from both the IR and human-computer interaction (HCI) communities. This article presents my personal view of the main lines of discussion that took place throughout Mira: importing and adapting evaluation techniques from HCI, evaluating at different levels as appropriate, evaluating against different types of relevance and the new challenges that drive the need for rethinking the old evaluation approaches. The article concludes that we need to consider more varied forms of evaluation to complement engine evaluation.
Information technologies, particularly the personal computer and the World Wide Web, are changing the ways that scientists communicate. The traditional print-based system that relies on the refereed scientific journal as the key delivery mechanism for research findings is undergoing a transformation to a system much more reliant on electronic communication and storage media. This article offers a new paradigm for communication in science, and suggests how digital media might bring new roles and functionalities to participants. The argument is made that behavioral and organizational determinants are as important factors as technological capabilities in shaping the future.
Current research on the influence of electronic communication technologies such as electronic mail, World Wide Web, electronic journals, bibliographic databases, and on-line card catalogs suggest that they broaden academic research communities and change the ways researchers work. However, it is less well-understood how these changes take place. One explanation is that the mechanism for change is generational: doctoral students transform research disciplines as they apply new electronic communication skills they "grew up with." This article examines this explanation and related claims through evidence from a study of 28 graduate students and their advisors in four disciplines at eight U.S. research universities. Although all the doctoral students used electronic communication technologies in various ways, their work practices reinforced existing patterns of work and resource use in their disciplines. Students used electronic communication to (1) mimic the electronic communication patterns of their advisor, (2) differentiate or specialize their research with respect to their advisor or research specialty, (3) enhance the social connections and material resources their advisor or institution provided to them, and/or (4) ease or improve "hands-on" research techniques (textual analysis, wet lab work, programming, statistical analysis) that their advisor or research group delegated to them.
This article summarizes the preliminary findings from a recent study of scientists in four disciplines with regard to computer-mediated communication (CMC) use and effects. Based on surveys from 333 scientists, we find that CMC use is central to both professional and research-related aspects of scientific work, and that this use differs by field. We find that e-mail use focuses on coordination activities, and its biggest effect is helping to integrate scientists into professional networks. We do not find gender differences in use, but there is some evidence that e-mail is having a differential, positive effect for women. Furthermore, CMC use is positively associated with scientific productivity and collaboration.
The shift towards the use of electronic media in scholarly communication appears to be an inescapable imperative. However, these shifts are uneven, both with respect to field and with respect to the form of communication. Different scientific fields have developed and use distinctly different communicative forums, both in the paper and electronic arenas, and these forums play different communicative roles within the field. One common claim is that we are in the early stages of an electronic revolution, that it is only a matter of time before other fields catch up with the early adopters, and that all fields converge on a stable set of electronic forums. A social shaping of technology (SST) perspective helps us to identify important social forces-centered around disciplinary constructions of trust and of legitimate communication-that pull against convergence. This analysis concludes that communicative plurality and communicative heterogeneity are durable features of the scholarly landscape, and that we are likely to see field differences in the use of and meaning ascribed to communications forums persist, even as overall use of electronic communications technologies both in science and in society as a whole increases.
Five-hundred twenty-seven full bibliographic records containing URLs were downloaded from SCISEARCH as part of an exploration of the extent of Web publication of electronic research-related information (E-RRI) in the sciences and classified as to resource type, subject area, and degree of intellectual property protection. Four hundred eighty-five records represented nonduplicate descriptions of data compilations (194), software (153), Websites (73), electronic documents (49), and digitized images (17). The greatest concentration of E-RRI was found in molecular biology (QP=123), general natural history and biology (QH=84), and medicine (R=74). Roughly two-thirds of the 410 accessible Webpages (67%) permitted totally free and unrestricted public access and use of the information; 11% requested citation of a related journal article as acknowledgment of use; the remainder stated conditions for use or relied on a statement of copyright as an indication of ownership. The World Wide Web appears to have become a significant channel for scientists to distribute databases, software, and other information related to their published research.
This article reviews the evolution and growth of electronic journals, and describes the various emerging models of editorial peer review in an electronic environment. There is some debate about whether the traditional model of editorial peer review should be altered. Medicine, physics, and psychology are taking different approaches with editorial peer review in an electronic environment. This article summarizes several studies of peer review in an electronic environment. Studies to date have focused on attitudes toward electronic publications and citation patterns of electronic journals. Future trends and levels of acceptance of the new models are discussed.
The community of specialists in scientific information and communication must begin to give more vigorous attention to its expectations for ethical conduct by authors, reviewers, editors, and publishers in the realm of electronic publishing, and to consider how those expectations may differ from the standards applied in a print-based environment. The author suggests five areas that deserve special attention: ensuring the accuracy and truthfulness of content; building trust among editors, reviewers, and authors; sustaining a level of civility in electronic forums for scientific and technical debate; protecting the intellectual property of authors and publishers; and preserving publishers' (and scientific journals) independence from government interference with content.
An analysis of 766 publications by prolific authors in scientific journals indicate that prolific authors produce about 25% of the total scientific output in periodical literature in laser science and technology. The average productivity per author is about 2. Prolific authors from most of the countries belonged either to academic or research institutions except in USA and Japan. Prolific authors on average made more impact than non-prolific authors. However the situation varied from country to country.
The aim of this article is to describe some methods of comparison of maps of science and to show possibilities that these methods give for further research in this interesting area.
The aim of this article is to demonstrate on the scientific field "economics" the search for fundamental articles. Co-word analysis and co-citation analysis enable to visualize the structure of a scientific field on the maps of science. Then we can find the fundamental themes on the maps. After finding the articles belonging to these fundamental themes we can discuss the fundamentality of the formers, too.
This paper analyses the research activity conducted by Puerto Rican scientists in science and technology in the period 1990 to 1998. The Science Citation Index (SCI) database was used to analyse scientific production by geographic area, type of institution, document typology, language coverage, visibility of publications, subjects addressed and collaboration between local and international authors and institutions. Scientific production was observed to nearly double over the period studied and found to be concentrated in the academic sector, primarily in the city of San Juan, specifically in the University of Puerto Rico's Rio Piedras, Medical Sciences and Mayaguez campuses. Puerto Rican scientific production in the period studied was greater than in any other Caribbean country and the sixth largest in all of Latin America. Papers are mainly published in highly visible journals and scientific articles are the vehicle most commonly used to reach the scientific community. Go-operation indices between authors and institutions are high and the principal areas in which research is published are Medicine, Chemistry, Life Sciences and Physics.
In this paper we assess the utility of the curriculum vita (CV) as a data source for examining the career paths of scientists and engineers. CVs were obtained in response to an email message sent to researchers working in the areas of biotechnology and microelectronics. In addition, a number of CVs were obtained "passively" from a search of the Internet. We discuss the methodological issues and problems of this data collection strategy and the results from an exploratory analysis using OLS regression and event history analysis. In sum, despite difficulties with coding and variation in CV formats, this collection strategy seems to us to hold much promise.
We counted references in about 200 research papers in each of 16 journals in six physical sciences. They show that for average papers, the number of references is a linear function of the paper length. In fact, it is the same function for journals in different sciences. The Bet that various physical sciences all give the same reference frequencies for papers of the same length and impact factor tells us that citation counts in those sciences can be intercompared. There is a dependence upon impact factor and a general relation is derived. In addition, the number of references increases by about 1.5% per year, probably due to the increase in the literature pertinent to any paper. The average paper lengths differ among the six sciences and three possible explanations for that difference are given.
This research was conducted on a sample of 840 respondents who represent half of the Croatian population of young scientists. There are three main features which define the publication productivity of young scientists. 1) Despite the worsened position of R & D, they publish more scientific papers than the young generations of scientists at the beginning of the nineties. 2) Differences between a highly-productive minority, which produces on average half of all scientific publications, and a low-productive majority is already apparent in young scientists. 3) The productivity of young scientists is formed according to productivity patterns typical of particular scientific fields and disciplines. With regard to the explanation of productivity, the following has been found. a) An expansion of the set of predictors resulted in an improvement in the explanation of the productivity of young scientists compared with previous surveys. b) Among the factors which contribute significantly to the explanation of the quantity of scientific publications, the most powerful predictor is attendance at conferences abroad, followed by scientific qualifications and some gatekeeping variables. c) Besides certain similarities, scientific fields also show a specific structure of determinants of young scientists' productivity.
Semiconductor is the key element for information industry. The present study investigated the growth of semiconductor literature based on the database of INSPEC. Well-established bibliometric techniques, such as Bradford-Zipfs plot and Lotka's law have been employed to further explore the characteristics of semiconductor literature. Quantitative results on the literature growth, form of publication, research treatment, publishing country and language, author productivity and affiliate are reported. Moreover, from the Bradford-Zipfs plot, 25 core journals in semiconductor were identified and analyzed.
In a recent article a set of indicators have been proposed drawing upon patent statistics, which are meant to describe and compare firm and national research competence. However this article has raised more questions on the validity of such indicators as well as on their use. We have thus examined these issues so as to clarify the nature of the problems involved in the construction of competence and competitive indicators of firms and nations and their subsequent implementation on data bases.
Bibliographies of "reference books", namely Encyclopedias, Comprehensive Treatises, and Advanced Textbooks constitute a valuable source of information about seminal papers in various branches of science. Examples are given mainly for chemistry, but other areas might be treated similarly.Bibliographies of "reference books", namely Encyclopedias, Comprehensive Treatises, and Advanced Textbooks constitute a valuable source of information about seminal papers in various branches of science. Examples are given mainly for chemistry, but other areas might be treated similarly.
This article investigates the potentialities of a proposed information environment (PROPIE) for user interaction and value-adding of electronic documents (e-documents), The design of PROPIE was based on a thorough review of user needs and requirements in interacting with information through well-documented findings, and a focus group with 12 participants to identify features that were deemed desirable in future interactions. The design was also based on a review of developments in various user interface (UI) technologies, visualization, and interactive techniques, and a consideration of new forms of information structuring and organization that pose important implications for the design of more advanced UIs, To this end, a set of interface mockups was developed to demonstrate the potential of the environment in supporting the design of a new generation of electronic journals (e-journals), An empirical evaluation of various aspects of the environment was conducted to obtain representative users' feedback with regard to interacting with e-journals, Twenty-two participants from a variety of academic background took part in the evaluation. This article reports the results that have general implications for the design of e-documents.
This article describes the theoretical approach behind the InCommonSense system. This approach makes use of writing conventions on the Web. The theory behind InCommonSense is based on research findings from the fields of linguistics, psychology, HCl, sociology, and information retrieval. The theoretical background for finding and reusing conventions in hypertext is discussed, and the possibilities of extending the system to improve existing hypertext systems and creating new futuristic hypertext meanings (in particular on the Web) are examined.
Information Science Abstracts (ISA) is the oldest abstracting and indexing (A&I) publication covering the field of information science, A&I publications play a valuable "gatekeeping" role in identifying changes in a discipline by tracking its literature. This article briefly reviews the history of ISA as well as the history of attempts to define "information science" because the American Documentation Institute changed its name to ASIS in 1970. A new working definition of the term for ISA is derived from both the historical review and current technological advances. The definition departs from the previous document-cent definitions and concentrates on the Internet-dominated industry of today, information science is a discipline drawing on important concepts from a number of closely related disciplines that become a cohesive whole focusing on information. The relationships between these interrelated disciplines are portrayed on a " map" of the field, in which the basic subjects are shown as a central "core" with related areas surrounding it.
Although the importance of transforming information into knowledge was recognized early, the stronger shift towards knowledge processing has occurred recently, moving the processes of knowledge gathering, organization, representation, and dissemination into the center of research attention, In the first part of this article an attempt is made to provide more insight into the reasons that prompted the current shift from information to knowledge processing, encompassing both social contextualization and the recent technological advance, Thereafter, the knowledge production, viewed as five-step processing, is briefly discussed. In the last section, the highly interdisciplinary perspective and the primacy of the user are distinguished as necessary prerequisites for (a) solving the basic set of problems addressed by knowledge processing, and (b) improving the user-system interaction.
We report on our findings regarding authors' use of theory in 1,160 articles that appeared in six information science (IS) journals from 1993-1998. Our findings indicate that theory was discussed in 34.1% of the articles (0.93 theory incidents per article; 2.73 incidents per article when considering only those articles employing theory). The majority of these theories were from the social sciences (45.4%), followed by IS (29.9%), the sciences (19.3%), and humanities (5.4%), New IS theories were proposed by 71 authors, When compared with previous studies, our results suggest an increase in the use of theory within IS. However, clear discrepancies were evident in terms of how researchers working in different subfields define theory. Results from citation analysis indicate that IS theory is not heavily cited outside the field, except by IS authors publishing in other literatures. Suggestions for further research are discussed.
The essence of Scientometrics is precise measurement. Yet the measurements made in Scientometric research is steeped in ambiguity. This article explores the nature of ambiguity in measurement, and probes for mechanisms that allow regularities to be discovered in an environment in which ambiguity is pronounced.
This study explores the tendency of authors to recite themselves and others in multiple works over time, using the insights gained to build citation theory. The set of all authors whom an author cites is defined as that author's citation identity. The study explains how to retrieve citation identities from the Institute for Scientific Information's files on Dialog and how to deal with idiosyncrasies of these files, As the author's oeuvre grows, the identity takes the form of a core-and-scatter distribution that may be divided into authors cited only once (unicitations) and authors cited at least twice (recitations), The latter group, especially those recited most frequently, are interpretable as symbols of a citer's main substantive concerns, As illustrated by the top recitees of eight information scientists, identities are intelligible, individualized, and wide-ranging, They are ego-centered without being egotistical. They are often affected by social ties between citers and citees, but the universal motivator seems to be the perceived relevance of the citees' works. Citing styles in identities differ: "scientific-paper style" authors recite heavily, adding to core; "bibliographic-essay style" authors are heavy on unicitations, adding to scatter; "literature-review style" authors do both at once. Identities distill aspects of citers' intellectual lives, such as orienting figures, interdisciplinary interests, bidisciplinary careers, and conduct in controversies. They can also be related to past schemes for classifying citations in categories such as positive-negative and perfunctory-organic; indeed, one author's frequent recitation of another, whether positive or negative, may be the readiest indicator of an organic relation between them. The shape of the core-and-scatter distribution of names in identities can be explained by the principle of least effort, Citers economize on effort by frequently reciting only a relatively small core of names in their identities. They also economize by frequent use of perfunctory citations, which require relatively little context, and infrequent use of negative citations, which require contexts more laborious to set.
In addition to stipulating economic rights, the copyright laws of most nations grant authors a series of "moral rights." The development of digital information and the new possibilities for information processing and transmission have given added significance to moral rights. This article briefly explains the content and characteristics of moral rights, and assesses the most important aspects of legislation in this area. The basic problems of the digital environment with respect to moral rights are discussed, and some suggestions are made for the international harmonization of rules controlling these rights.
This study reports the results of Part II of a research project that investigated the cognitive and physical behaviors of middle school students in using Yahooligans! Seventeen students in the seventh grade searched Yahooligans! to locate relevant information for an assigned research task. Sixty-nine percent partially succeeded, while 31% failed. Children had difficulty completing the task mainly because they lacked adequate level of research skills and approached the task by seeking specific answers. Children's cognitive and physical behaviors varied by success levels. Similarities and differences in children's cognitive and physical behaviors were found between the research task and the fact-based task they performed in the previous study. The present study considers the impact of prior experience in using the Web, domain knowledge, topic knowledge, and reading ability on children's success. It reports the overall patterns of children's behaviors, including searching and browsing moves, backtracking and looping moves, and navigational styles, as well as the time taken to complete the research task. Children expressed their information needs and provided recommendations for improving the interface design of Yahooligans! Implications for formal Web training and system design improvements are discussed.
This article reports on a model and patterns of use of a library catalog that can be accessed through the Internet. Three categories of users are identified: individuals who perform a search of the catalog, tourists who look only at opening pages of the library catalog's site, and Web spiders that come to the site to obtain pages for indexing the Web. A number of types of use activities are also identified, and can be grouped with the presearch phase (which takes place before any searching begins): the search phase, the display phase tin which users display the results of their search), and phases in which users make errors, ask the system for help or assistance, and take other actions. An empirical investigation of patterns of use of a university Web-based library catalog was conducted for 479 days. During that period, the characteristics of about 2.5 million sessions were recorded and analyzed, and usage trends were identified. Of the total, 62% of the sessions were for users who performed a search, 27% were from spiders, and 11% were for tourists. During the study period, the average search session lasted about 5 minutes when the study began and had increased to about 10 minutes 16 months later. An average search consisted of about 1.5 presearch actions lasting about 25 seconds, about 5.3 display actions, and 2.5 searches per session. The latter two categories are in the range of 35-37 seconds per session each. There were major differences in usage (number of searches, search time, number of display actions, and display time), depending upon the database accessed.
Most on-line news sources are electronic versions of "ink-on-paper" newspapers. These are versions that have been filtered, from the mass of news produced each day, by an editorial board with a given community profile in mind. As readers, we choose the filter rather than choose the stories. New technology, however, provides the potential for personalized versions to be filtered automatically from this mass of news on the basis of user profiles. People read the news for many reasons: to find out "what's going on," to be knowledgeable members of a community, and because the activity itself is pleasurable. Given this, we ask the question, "How much filtering is acceptable to readers?" In this study, an evaluation of user preference for personal editions versus community editions of on-line news was performed. A personalized edition of a local newspaper was created for each subject based on an elliptical model that combined the user profile and community profile as represented by the full edition of the local newspaper. The amount of emphasis given the user profile and the community profile was varied to test the subjects' reactions to different amounts of personalized filtering. The task was simply, "read the news," rather than any subject specific information retrieval task. The results indicate that users prefer the coarse-grained community filters to fine-grained personalized filters.
The dichotomous bipolar approach to relevance has produced an abundance of information retrieval IIR) research. However, relevance studies that include consideration of users' partial relevance judgments are moving to a greater relevance clarity and congruity to impact the design of more effective IR systems. The study reported in this paper investigates the various regions of across a distribution of users' relevance judgments, including how these regions may be categorized, measured, and evaluated. An instrument was designed using four scales for collecting, measuring, and describing end-user relevance judgments. The instrument was administered to 21 end-users who conducted searches on their own information problems and made relevance judgments on a total of 1059 retrieved items. Findings include: (1) overlapping regions of relevance were found to impact the usefulness of precision ratios as a measure of in system effectiveness, (2) both positive and negative levels of relevance are important to users as they make relevance judgments, (3) topicality was used more to reject rather than accept items as highly relevant, (4) utility was more used to judge items highly relevant, and (5) the nature of relevance judgment distribution suggested a new IR evaluation measure-median effect. Findings suggest that the middle region of a distribution of relevance judgments, also called "partial relevance," represents a key avenue for ongoing study. The findings provide implications for relevance theory, and the evaluation of in systems.
This article is concerned with the problem of representing moving images for information retrieval. Of primary concern is evaluating the representativeness of different types of surrogates for various tasks. The basic factor considered is the ability of a surrogate to enable users to make the same distinctions that they would make given the actual video. To explore this issue, four types of video surrogates were created and compared under two tasks, Multidimensional scaling (MDS) was used to map the dimensional dispersions of users' judgments of similarity between videos and similarity between surrogates. Congruence between these maps was used to evaluate the representativeness of each type of surrogate. Congruence was greater for image-based surrogates than for text-based surrogates overall, while congruence between text-based surrogates and videos was greatest when a specific task was introduced.
This paper deals with knowledge sharing over Internet. After defining the concept, we will discuss work aimed at creating a technical system to implement it and at measuring the quality of results obtained. However, the reader will quickly see that the text is organized to address the theme of this special issue of Scientometrics. Models, methods and measures characterize scientometric research. What problems arise in attempting to develop them for internet? In order to answer this question, it is important to distinguish between two schools of practice in the scientometric research field: the first derives fi om applied statistics and is called bibliometrics; the second derives from cognitive sociology and is called infometrics (Turner, 1994).
Electronic publishing developments and new information technology in general will affect the main functions of scientific communication. Most changes however will be primarily technological hut not conceptual. Publication via journals of high reputation is in most fields of science crucial to receive professional recognition. That will remain so in the "electronic era". A much more revolutionary change in science will be the increasing availability and sharing of research data.
Since the mid-1990s has emerged a new research field, webometrics, investigating the nature and properties of the Web drawing on modern informetric methodologies. The article attempts to point to selected areas of webometric research that demonstrate interesting progress and space for development as well as to some currently less promising areas. Recent investigations of search engine coverage and performance are reviewed as a frame for selected quality and content analyses. Problems with measuring Web Impact Factors (Web-IF) are discussed. Concluding the article. new directions of webometrics are outlined for performing knowledge discovery and issue tracking on the Web, partly based on bibliometric methodologies used in bibliographic and citation databases. In this framework graph theoretic approaches, including path analysis, transversal links, "weak ties" and "small-world" phenomena are integrated.
Despite the promising introduction of bibliometric maps of science in a science policy context in the nineteen seventies, they have not been very successful yet. It seems, however, that only now they are becoming acknowledged as a useful tool. This is mainly due to the developments and integration of hypertext and graphical interfaces. Because of this, the strength of such navigation tools becomes obvious. The communication through the Internet enables the field expert (as a kind of peer review) as well as the user (from a science policy context) to contribute to the quality of the map and the interface. Moreover, the interface can provide suggestions to answer policy-related question, which is the initial purpose of such maps.
To learn how e-prints are cited, used, and accepted in the literature of physics and astronomy, the philosophies, policies, and practices of top-tier physics and astronomy journals regarding e-prints from the Los Alamos e-print archive, arXiv.org, were examined, Citation analysis illustrated e-prints were cited with increasing frequency by a variety of journals in a wide range of physics and astronomy fields from 1998 to 1999, The peak e-print citation rate of 3 years observed was comparable to that of print journals, suggesting a similarity in citation patterns of e-prints and printed articles. The number of citations made to 37 premier physics and astronomy journals and their impact factors have remained constant since arXiv.org's inception in 1991, indicating that e-prints have yet to make an impact on the use of the printed literature. The degree of acceptance stated by the journals' editors and the policies given in the journal's instructions to authors sections concerning the citing of e-prints and subsequent publication of papers that have appeared as e-prints differed from journal to journal, ranging from emphatically unacceptable to "why not?" Even though the use of the traditional literature has not changed since arXiv.org began and the policies concerning e-print citation and publication were inconsistent, the number of citations (35,928) and citations rates (34.1%) to 12 arXiv.org archives were found to be large and increasing. It is, therefore, evident that arXiv.org e-prints have evolved into an important facet of the scholarly communication of physics and astronomy.
In the author's previous article (JASIS, 50, 1999, 1284-1294) it was shown that the quantity of nonindexed indirect-collective references in the representative elite general physics journal, The Physical Review, now alone exceeds many times over the quantity of references taken into account by the ISI as "citations" and listed in the Science Citation Index. The present article reports the findings of a new ICR investigation carried out in a representative sample of the elite journal literature of physics: in the January 1997 issue of 44 source journals covering the domain of physics, i.e., in 2,662 scientific communications of 38 normal and 6 letter journals. The methods of the investigation were most rigorous, and consequently, only the indisputable minimum of the literature phenomenon examined was revealed. It is demonstrated that the ICR phenomenon is present in all source journals processed of bibliometrically very heterogeneous nature, in both the normal and the letter journals. The frequency of the generally occurring ICR phenomenon is very high: it is found in 17.2% of the sample. There is very little scattering in the rate of frequency: it is 17.0% in the group of normal journals and 17.9% in the letter journals. The bibliometrically very heterogeneous representative sample is very homogeneous regarding the presence and frequency of the ICR phenomenon. On the basis of these facts it can be stated that the quantity of nonindexed indirect-collective references in the elite physics journal literature now alone exceeds many times over the quantity of references listed in the Science Citation Index. The meaning of this fact and its logical consequences must be taken into consideration in the evaluation of results of sciento- and other "-metrics" studies based only on the reference stock of the Citation Indexes.
Learning users' interest categories is challenging in a dynamic environment like the Web because they change over time, This article describes a novel scheme to represent a user's interest categories, and an adaptive algorithm to learn the dynamics of the user's interests through positive and negative relevance feedback. We propose a three-descriptor model to represent a user's interests, The proposed model maintains a long-term interest descriptor to capture the user's general interests and a short-term interest descriptor to keep track of the user's more recent, faster-changing interests. An algorithm based on the three-descriptor representation is developed to acquire high accuracy of recognition for long-term interests, and to adapt quickly to changing interests in the short-term, The model is also extended to multiple three-descriptor representations to capture a broader range of interests. Empirical studies confirm the effectiveness of this scheme to accurately model a user's interests and to adapt appropriately to various levels of changes in the user's interests.
In studying actual Web searching by the public at large, we analyzed over one million Web queries by users of the Excite search engine. We found that most people use few search terms, few modified queries, view few Web pages, and rarely use advanced search features. A small number of search terms are used with high frequency, and a great many terms are unique; the language of Web queries is distinctive. Queries about recreation and entertainment rank highest. Findings are compared to data from two other large studies of Web queries. This study provides an insight into the public practices and choices in Web searching.
Research on Web searching is at an incipient stage. This aspect provides a unique opportunity to review the current state of research in the field, identify common trends, develop a methodological framework, and define terminology for future Web searching studies, In this article, the results from published studies of Web searching are reviewed to present the current state of research. The analysis of the limited Web searching studies available indicates that research methods and terminology are already diverging. A framework is proposed for future studies that will facilitate comparison of results, The advantages of such a framework are presented, and the implications for the design of Web information retrieval systems studies are discussed. Additionally, the searching characteristics of Web users are compared and contrasted with users of traditional information retrieval and online public access systems to discover if there is a need for more studies that focus predominantly or exclusively on Web searching, The comparison indicates that Web searching differs from searching in other environments.
An important problem in the indexing of natural language text is how to identify those words and phrases that reflect the content of the text. In general, automatic indexing has dealt with this problem by removing instances of a few hundred common words known as stop words, and treating the remaining words as though they were content bearing. This approach is acceptable for some applications such as statistical estimates of the similarity of queries and documents for the purpose of document retrieval. However, when the indexing terms are to be examined by a human as a means of accessing the literature, it greatly improves efficiency if most of the noncontent-bearing words and phrases can be eliminated from the indexing, Here we present three statistical techniques for identifying content-bearing phrases within a natural language database. We demonstrate the effectiveness of the methods on test data, and show how all three methods can be combined to produce a single improved method.
This study investigates end-users' image queries by comparing the features of the queries to those identified in previous studies by Enser and McGregor (1992), Jorgensen (1995), and Fidel (1997), Twenty-nine college students majoring in art history were recruited. They were required to finish a term paper including at least 20 images, The participants' image queries were collected by pre- and postsearch questionnaires, and three human reviewers mapped these queries into the previously identified features. Enser and McGregor's categories of Unique and Nonunique, and Jorgensen's classes of Location, Literal Object, Art Historical Information, People, and People-Related Attributes received high degrees of matching by three reviewers, This finding can be applied to add more details to Enser and McGregor's four categories (Unique, Nonunique, Unique with refiners, and Nonunique with refiners) and to regroup Jorgensen's 12 classes of image attributes.
Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearson's correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule's coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as chi (2) statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the chi (2) statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule's Y seem to overestimate rare terms.
We explore the intellectual subject structure and research themes in software engineering through the identification and analysis of a core journal literature. We examine this literature via two expert perspectives: that of the author, who identified significant work by citing it (journal cocitation analysis), and that of the professional indexer, who tags published work with subject terms to facilitate retrieval from a bibliographic database (subject profile analysis). The data sources are SCISEARCH (the on-line version of Science Citation Index), and INSPEC (a database covering software engineering, computer science, and information systems), We use data visualization tools (cluster analysis, multidimensional scaling, and PFNets) to show the "intellectual maps" of software engineering. Cocitation and subject profile analyses demonstrate that software engineering is a distinct interdisciplinary field, valuing practical and applied aspects, and spanning a subject continuum from "programming-in-the-small" to "programming-in-the-large." This continuum mirrors the software development life cycle by taking the operating system or major application from initial programming through project management, implementation, and maintenance, Object orientation is an integral but distinct subject area in software engineering. Key differences are the importance of management and programming: (1) cocitation analysis emphasizes project management and systems development; (2) programming techniques/languages are more influential in subject profiles; (3) cocitation profiles place object-oriented journals separately and centrally while the subject profile analysis locates these journals with the programming/languages group.
citation analysis of undergraduate term papers in microeconomics revealed a significant decrease in the frequency of scholarly resources cited between 1996 and 1999. Book citations decreased from 30% to 19%, newspaper citations increased from 7% to 19%, and Web citations increased from 9% to 21%. Web citations checked in 2000 revealed that only 18% of URLs cited in 1996 led to the correct Internet document. For 1999 bibliographies, only 55% of URLs led to the correct document. The authors recommend (1) setting stricter guidelines for acceptable citations in course assignments; (2) creating and maintaining scholarly portals for authoritative Web sites with a commitment to long-term access; and (3) continuing to instruct students how to critically evaluate resources.
Domain visualization is one of the new research fronts resulted from the proliferation of information visualization, aiming to reveal the essence of a knowledge domain. Information visualization plays an integral role in modeling and representing intellectual structures associated with scientific disciplines. In this article, the domain of computer graphics is visualized based on author cocitation patterns derived from an 18-year span of the prestigious IEEE Computer Graphics and Applications (1982-1999), This domain visualization utilizes a series of visualization and animation techniques, including author cocitation maps, citation time lines, animation of a high-dimensional specialty space, and institutional profiles. This approach not only augments traditional domain analysis and the understanding of scientific disciplines, but also produces a persistent and shared knowledge space for researchers to keep track the development of knowledge more effectively. The results of the domain visualization are discussed and triangulated in a broader context of the computer graphics field.
HTTP server log files provide Web site operators with substantial detail regarding the visitors to their sites, Interest in interpreting this data has spawned an active market for software packages that summarize and analyze this data, providing histograms, pie graphs, and other charts summarizing usage patterns. Although useful, these summaries obscure useful information and restrict users to passive interpretation of static displays. Interactive visualizations can be used to provide users with greater abilities to interpret and explore Web log data. By combining two-dimensional displays of thousands of individual access requests, color, and size coding for additional attributes, and facilities for zooming and filtering, these visualizations provide capabilities for examining data that exceed those of traditional Web log analysis tools. We introduce a series of interactive visualizations that can be used to explore server data across various dimensions. Possible uses of these visualizations are discussed, and difficulties of data collection, presentation, and interpretation are explored.
Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length, We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents.
A problem, raised by Wallace (JASIS, 37, 136-145, 1986), on the relation between the journal's median citation age and its number of articles is studied, Leaving open the problem as such, we give a statistical explanation of this relationship, when replacing "median" by "mean" in Wallace's problem. The cloud of points, found by Wallace, is explained in this sense that the points are scattered over the area in first quadrant, limited by a curve of the form y = E/x(2) +1, where E is a constant. This curve is obtained by using the Central Limit Theorem in statistics and, hence, has no intrinsic informetric foundation. The article closes with some reflections on explanations of regularities in informetrics, based on statistical, probabilistic or informetric results, or on a combination thereof.
This article describes our efforts in supporting information retrieval from OCR degraded text, In particular, we report our approach to an automatic cataloging and searching contest for books in multiple languages. In this contest, 500 books in English, German, French, and Italian published during the 1770s to 1970s are scanned into images and OCRed to digital text. The goal is to use only automatic ways to extract information for sophisticated searching. We adopted the vector space retrieval model, an n-gram indexing method, and a special weighting scheme to tackle this problem. Although the performance by this approach is slightly inferior to the best approach, which is mainly based on regular expression match, one advantage of our approach is that it is less language dependent and less layout sensitive, thus is readily applicable to other languages and document collections. Problems of OCR text retrieval for some Asian languages are also discussed in this article, and solutions are suggested.
In this article, we evaluate the retrieval performance of an algorithm that automatically categorizes medical documents, The categorization, which consists in assigning an International Code of Disease (ICD) to the medical document under examination, is based on well-known information retrieval techniques. The algorithm, which we proposed, operates in a fully automatic mode and requires no supervision or training data. Using a database of 20,559 documents, we verify that the algorithm attains levels of average precision in the 70-80% range for category coding and in the 60-70% range for subcategory coding. We also carefully analyze the case of those documents whose categorization is not in accordance with the one provided by the human specialists. The vast majority of them represent cases that can only be fully categorized with the assistance of a human subject (because, for instance, they require specific knowledge of a given pathology). For a slim fraction of all documents (0.77% for category coding and 1.4% for subcategory coding), the algorithm makes assignments that are clearly incorrect. However, this fraction corresponds to only one-fourth of the mistakes made by the human specialists.
Structured thesauri encode equivalent, hierarchical, and associative relationships and have been developed as indexing/retrieval tools. Despite the fact that these tools provide a rich semantic network of vocabulary terms, they are seldom employed for automatic query expansion (QE) activities. This article reports on an experiment that examined whether thesaurus terms, related to query in a specified semantic way (as synonyms and partial-synonyms (SYNs), narrower terms (NTs), related terms (RTs), and broader terms (BTs)), could be identified as having a more positive impact on retrieval effectiveness when added to a query through automatic QE, The research found that automatic QE via SYNs and NTs increased relative recall with a decline in precision that was not statistically significant, and that automatic QE via RTs and BTs increased relative recall with a decline in precision that was statistically significant, Recall-based and a precision-based ranking orders for automatic QE via semantically encoded thesauri terminology were identified, Mapping results found between enduser query terms and the ProQuest(R) Controlled Vocabulary (1997) (the thesaurus used in this study) are reported, end future research foci related to the investigation are discussed.
We investigate the modeling of changes in user interest in information filtering systems. A new technique for tracking user interest shifts based on a Bayesian approach is developed. The interest tracker is integrated into a profile learning module of a filtering system. We present an analytical study to establish the rate of convergence for the profile learning with and without the user interest tracking component, We examine the relationship among degree of shift, cost of detection error, and time needed for detection, To study the effect of different patterns of interest shift on system performance we also conducted several filtering experiments. Generally, the findings show that the Bayesian approach is a feasible and effective technique for modeling user interest shift.
Compression of databases not only reduces space requirements but can also reduce overall retrieval times. In text databases, compression of documents based on semistatic modeling with words has been shown to be both practical and fast. Similarly, for specific applications-such as databases of integers or scientific databases-specially designed semistatic compression schemes work well. We propose a scheme for general-purpose compression that can be applied to all types of data stored in large collections. We describe our approach-which we call RAY-in detail, and show experimentally the compression available, compression and decompression costs, and performance as a stream and random-access technique. We show that, in many cases, Rnv achieves better compression than an efficient Huffman scheme and popular adaptive compression techniques, and that it can be used as an efficient general-purpose compression scheme.
An attempt is made to find statistical evidences of the I elation between international coauthorship and citation impact. It was found that international co-authorship. in average, results in publications with higher citation rates than purely domestic papers. No correlation has been found, however, between the strength of co-authorship links and the relative citation eminence of the resulting publications. International co-authorship links in chemistry, as represented by the well-known Salton's measure, displayed a characteristic pattern reflecting geopolitical, historical, linguistic, etc. relations among countries. A new indicator, representing also the asymmetry of co-authorship links was used to reveal main "attractive" and "repulsive" centres of co-operation.
We present a model to assess the systemness of an innovation system. Patent and citation data with an institutional address in Catalonia (1986-1996) were analyzed in terms of relational linkages and the development in these distributions over time was evaluated using methods from systems dynamics. Relational linkages are extremely scarce. A transition at the system's level could be indicated around 1990 when using institutional addresses, but not when using cognitive categories. The institutional restructuring has led to changes in the pattern of linkages (coauthorship. etc.), but the reproduction of the system's knowledge base has remained differentiated. We conclude that although a system in several other respects, Catalonia cannot (yet) be considered as a (knowledge-based) innovation system. The existence of a mechanism for the integration could not be indicated at the regional level.
In this study, we examine the scientific output of Brazilian psychiatry, based on the database of the Institute for Scientific Information (ISI). publications in the 10 most important psychiatric journals, and publications in major Brazilian journals, The number of Brazilian publications (i.e., those carrying at least one Brazilian address) in psychiatry in the ISI database increased by 168% during the If-year period under study (1981-1995). Despite this growth, the relative contribution of publications in psychiatry to the country's publications in medical sciences did not change over the 15-year period. This fraction, around 2%, remained at less than one-third of the average contribution of psychiatry journals to publications in medicine worldwide. The impact inferred from number of citations (1981-1992) shows that Brazilian articles in psychiatry were cited less than the world average in this field. In the 10 psychiatry journals with the highest impact. Brazilian authors published only 48 articles in the 1981-1995 period, representing only 0.2% of the articles in those journals. Like their American and British counterparts. Brazilian psychiatrists also published primarily in domestic journals: 87.1% of the publications by Brazilians appeared in the two major Brazilian psychiatric journals, compared with only 12.9% in foreign journals. Among publications in psychiatry in the ISI database, the number of articles co-authored by Brazilians with scientists from other countries increased 12.3 fold from 1981-1985 to 1991-1995. representing at the end 50% of all publications by Brazilian psychiatrists in international journals. Despite all cuts in funding for Brazilian science during the last decades, all of the articles in our sample originated in public universities, and only 10 universities were responsible for similar to 70% of the publications by Brazilian psychiatrists in our survey period. We conclude that Brazilian psychiatric research is a subject worthy of particular concern. especially if we take into account the country's modest scientific performance and the socio-economic consequences of mental disorders in the Brazilian population.
Implied by the norm of universalism in modern science, known from Merton's CUDOS-norm set, is the demand that scientific careers should be open to talents, independent of personal attributes such as race, religion, class, and gender. In spite of a large amount of studies related to CUDOS-norms very few deals with class origin of researchers. Based on a survey among a sample of 788 Danish researchers this article investigates class bias, compared to I:ender bias in researcher recruitment and careers, and researcher assessments of impartiality and objectivity of evaluations and reward system. The data demonstrate very strong class bias, and also confirm the well-known gender bias in recruitment, class bias being the strongest. This is shown to be mainly because of bias in the educational system, however. Concerning later career attainment bias is also found, but much weaker, and most pronounced concerning social origin. Regarding researcher assessments of impartiality there are no indications of strong mistrust among researchers in general; nor are there significant differences in degree of trust in reward system, conditioned by class origin or gender. In conclusion, the analysis does not lend strong support to an assumption of deviance from norms of universalism.
The main purposes of this article are to uncover interesting features in real-world citation networks, and to highlight important substructures. In particular. it applies lattice theory to citation analysis. On the applied side, it shows that lattice substructures exist in real-word citation networks. It is further shown that, through its relations with co-citations and bibliographic coupling, the diamond (a four-element lattice) is a basic structural element in citation analysis. Finally, citation compactness is calculated for the four- and five element lattices.
The small size of institutes and publication clusters is a problem when determining citation indices. To improve the citation indexing of small sets of publications (less than 50 or 100 publications), a method is proposed. In addition, a method for error calculation is given for large sets of publications. Here. the classical methods of citation indexing remain valid.
The development of the field of bibliometric and scientometric research is analysed by quantitative methods to answer the following questions: (1) Is bibliometrics evolving from a soft science field towards rather hard (social) sciences (Schubert-Maczelka hypothesis)? (2) Can bibliometrics be characterised as a social science field with stable characteristics (Wouters-Leydesdorff hypothesis)? (3) Is bibliometrics a heterogeneous field. the sub-disciplines of which have their own characteristics? Are these sub-disciplines more and more consolidating, and are predominant sub-disciplines impressing their own characteristics upon the whole field (Glanzel-Schoepflin hypothesis)? The Price Index per paper, the percentage of references to serials, the mean references age, and the mean reference rate are calculated based on all articles and their respective references in Scientometrics in 1980, 1989, and 1997. The articles are classified in six categories. The findings suggest, that the field is in fact heterogeneous, and each sub-discipline has its own characteristics. While the contribution of these sub-disciplines in Scientometrics was still well-balanced in 1980, papers dealing with case studies and methodology became dominant by 1997.
Before India became an independent country, its scientists and policy maker!; could foresee the importance of science in its development, and accordingly a number of research and development (R&D) institutions were established. However during these five decades of independence, the choice between basic sciences and technology was always a subject of debate. It will be appropriate now to examine the changing patter ns of Science and Technology (S&T) manpower growth to find out the ground truth reality. The present study pertains to the analysis of S&T outturn data in various fields of scientific research that can provide a base for SET planning and policy making. These S&T indicators will be helpful in estimating future requirements, which in turn can be useful to a great extent in science and technology policy formulation. These estimates and future projections are based on mathematical modelling of the data pertaining to the outturn of highly qualified Scientific and Technical (S&T) personnel in India from different faculties over the period 1990-1998. From the trend analysis it is evident that research is no more perceived as an interesting career except in the field of engineering and medicine. The findings further suggest that there is a noticeable shift from basic sciences to technology.
We show that scientific production can be described by two variables: rate of production (rate of publications) and career duration. For mathematical logicians. we show that the time pattern of production is random and Poisson distributed, contrary to the theory of cumulative advantage. We show that the exponential distribution provides excellent goodness-of-fit to rate of production and a reasonable fit to career duration. The good fits to these distributions can be explained naturally from the statistics of exceedances. Thus. more powerful statistical tests and a better theoretical foundation is obtained for rate of production and career duration than has been the case for Lotka's Law.
Based on a set of information science papers this study demonstrates that "all a.uthor" citation counts should be preferred when visualizing the structure of research fields. "First author" citation studies distort the picture in terms of most influential researchers, while the subfield structure tends to be just about the same for both methods.
This article reports the results of a study that investigated effects of four user characteristics on users' mental models of information retrieval systems: educational and professional status, first language, academic background, and computer experience. The repertory grid technique was used in the study. Using this method, important components of information retrieval systems were represented by nine concepts, based on four in experts' judgments. Users' mental models were represented by factor scores that were derived from users' matrices of concept ratings on different attributes of the concepts. The study found that educational and professional status, academic background, and computer experience had significant effects in differentiating users on their factor scores. First language had a borderline effect, but the effect was not significant enough at a = 0.05 level. Specific different Views regarding in systems among different groups of users are described and discussed. Implications of the study for information science and in system designs are suggested.
A linguistic model for an Information Retrieval System (IRS) defined using an ordinal fuzzy linguistic approach is proposed. The ordinal fuzzy linguistic approach is presented, and its use for modeling the imprecision and subjectivity that appear in the user-IRS interaction is studied. The user queries and IRS responses are modeled linguistically using the concept of fuzzy linguistic variables. The system accepts Boolean queries whose terms can be weighted simultaneously by means of ordinal linguistic values according to three possible semantics: a symmetrical threshold semantic, a quantitative semantic, and an importance semantic. The first one identifies a new threshold semantic used to express qualitative restrictions on the documents retrieved for a given term. It is monotone increasing in index term weight for the threshold values that are on the right of the mid-value, and decreasing for the threshold values that are on the left of the mid-value. The second one is a new semantic proposal introduced to express quantitative restrictions on the documents retrieved for a term, i.e., restrictions on the number of documents that must be retrieved containing that term. The last one is the usual semantic of relative importance that has an effect when the term is in a Boolean expression. A bottom-up evaluation mechanism of queries is presented that coherently integrates the use of the three semantics and satisfies the separability property. The advantage of this IRS with respect to others is that users can express linguistically different semantic restrictions on the desired documents simultaneously, incorporating more flexibility in the user-IRS interaction.
This article examines some consequences for information control of the tendency of occurrences of content-bearing terms to appear together, or clump. Properties of previously defined clumping measures are reviewed and extended, and the significance of these measures for devising retrieval strategies discussed. A new type of clumping measure, which extends the earlier measures by permitting gaps within a clump, is defined, and several variants examined, Experiments are carried out that indicate the relation between the new measure and one of the earlier measures, as well as the ability of the two types of measure to predict compression efficiency.
While researchers have explored the value of structured thesauri as controlled vocabularies for general information retrieval (IR) activities, they have not identified the optimal query expansion (QE) processing methods for taking advantage of the semantic encoding underlying the terminology in these tools. The study reported on in this article addresses this question, and examined whether QE via semantically encoded thesauri terminology is more effective in the automatic or interactive processing environment. The research found that, regardless of end-users' retrieval goals, synonyms and partial synonyms (SYNs) and narrower terms (NTs) are generally good candidates for automatic QE and that related (RTs) are better candidates for interactive QE, The study also examined end-users' selection of semantically encoded thesauri terms for interactive QE, and explored how retrieval goals and QE processes may be combined in future thesauri-supported in systems.
Many people fail to property evaluate Internet information. This is often due to a lack of understanding of the issues surrounding evaluation and authority, and, more specifically, a lack of understanding of the structure and modi operandi of the Internet and the Domain Name System. The fact that evaluation is not being properly performed on Internet information means both that questionable information is being used recklessly, without adequately assessing its authority, and good information is being disregarded, because trust in the information is lacking. Both scenarios may be resolved by ascribing proper amounts of cognitive authority to Internet information. Traditional measures of authority present in a print environment are lacking on the Internet, and, even when occasionally present, are of questionable veracity. A formal model and evaluative criteria are herein suggested and explained to provide a means for accurately ascribing cognitive authority in a networked environment; the model is unique in its representation of overt and covert affiliations as a mechanism for ascribing proper authority to Internet information.
Using the financial industry as a context, the following study seeks to address the issue of the classification of electronic bookmarks in a multi-user system by investigating what factors influence how individuals develop categories for bookmarks and how they choose to classify bookmarks within those organizational categories. An experiment was conducted in which a sample of 15 participants was asked to bookmark and to categorize 60 web sites within Internet Browser folders of their own creation. Based on the data collected during this first component of the study, individual, customized questionnaires were composed for each participant. Whereas some of the questions within these surveys focused on particular classificatory decisions regarding specific bookmarks, others looked at how the participant defined, utilized, and structured the category folders that comprised his or her classification system. The results presented in this paper focus on issues investigated in Kwasnik's (Journal of Documentation, 1991, 47, 389-398) study of the factors that inform how individuals organize their personal, paper-based documents in office environments. Whereas classificatory attributes culled from questionnaire responses nominally resembled those identified by Kwasnik, it was found that a number of these factors assumed distinctive definitions in the electronic environment. The present study suggests that the application of individual instances of classificatory attributes and the distinction between Content and Context Attributes emphasized by Kwasnik play a minimal role in the development of a multi-user classification system for bookmarks.
Literature-based discovery has resulted in new knowledge. In the biomedical context, Don R. Swanson has generated several literature-based hypotheses that have been corroborated experimentally and clinically. In this paper, we propose a two-step model of the discovery process in which hypotheses are generated and subsequently tested. We have implemented this model in a Natural Language Processing system that uses biomedical Unified Medical Language System (UMLS) concepts as its unit of analysis. We use the semantic information that is provided with these concepts as a powerful filter to successfully simulate Swanson's discoveries of connecting Raynaud's disease with fish oil and migraine with a magnesium deficiency.
Classical assumptions about the nature and ethical entailments of authorship (the standard model) are being challenged by developments in scientific collaboration and multiple authorship. In the biomedical research community, multiple authorship has increased to such an extent that the trustworthiness of the scientific communication system has been called into question. Documented abuses, such as honorific authorship, have serious implications in terms of the acknowledgment of authority, allocation of credit, and assigning of accountability. Within the biomedical world it has been proposed that authors be replaced by lists of contributors (the radical model), whose specific inputs to a given study would be recorded unambiguously. The wider implications of the 'hyperauthorship' phenomenon for scholarly publication are considered.
The TREC benchmarking exercise for information retrieval (IR) experiments has provided a forum and an opportunity for IR researchers to evaluate the performance of their approaches to the in task and has resulted in improvements in in effectiveness. Typically, retrieval performance has been measured in terms of precision and recall, and comparisons between different in approaches have been based on these measures. These measures are in turn dependent on the so-called "pool depth" used to discover relevant documents. Whereas there is evidence to suggest that the pool depth size used for TREC evaluations adequately identifies the relevant documents in the entire test data collection, we consider how it affects the evaluations of individual systems. The data used comes from the Sixth TREC conference, TREC-6. By fitting appropriate regression models we explore whether different pool depths confer advantages or disadvantages on different retrieval systems when they are compared. As a consequence of this model fitting, a pair of measures for each retrieval run, which are related to precision and recall, emerge. For each system, these give an extrapolation for the number of relevant documents the system would have been deemed to have retrieved if an indefinitely large pool size had been used, and also a measure of the sensitivity of each system to pool size. We concur that even on the basis of analyses of individual systems, the pool depth of 100 used by TREC is adequate.
Search key resolution power is analyzed in the context of a request, i.e., among the set of search keys for the request. Methods of characterizing the resolution power of keys automatically are studied, and the effects search keys of varying resolution power have on retrieval effectiveness are analyzed. It is shown that it often is possible to identify the best key of a query while the discrimination between the remaining keys presents problems. It is also shown that query performance is improved by suitably using the best key in a structured query. The tests were run with InQuery(1) in a subcollection of the TREC collection, which contained some 515,000 documents.
In this article we investigate the use of signature files in Chinese information retrieval system and propose a new partitioning method for Chinese signature file based on the characteristic of Chinese words. Our partitioning method, called Partitioned Signature File for Chinese (PSFC), offers faster search efficiency than the traditional single signature file approach. We devise a general scheme for controlling the trade-off between the false drop and storage overhead while maintaining the search space reduction in PSFC. An analytical study is presented to support the claims of our method. We also propose two new hashing methods for Chinese signature files so that the signature file will be more suitable for dynamic environment while the retrieval performance is maintained. Furthermore, we have implemented PSFC and the new hashing methods, and we evaluated them using a large-scale real-world Chinese document corpus, namely, the TREC-5 (Text REtrieval Conference) Chinese collection. The experimental results confirm the features of PSFC and demonstrate its superiority over the traditional single signature file method.
This paper presents a citation analysis of the cognitive structure of current cardiovascular research. Used methods are co-citation analysis, bibliographic coupling and quantitative analysis of title words. Tables and graphs reveal: (1) The journal co-citation structure; (2) the cognitive content and the bibliometric structure of clusters based on co-citation: (3) the cognitive content and the bibliometric structure of clusters based on bibliographic coupling. A predominance of different research aspects on coronary artery disease was found in clusters based on co-citations as well as in dusters based on bibliographic coupling.
The aim of this study was to draw attention to the possible existence of "quirks" in bibliographic databases and to discuss their implications. We analysed the time-trends of "publication types" (PTs) relating to clinical medicine in the most frequently searched medical database, MEDLINE. We counted the number of entries corresponding to 10 PTs indexed in MEDLINE (1963-1998) and drew up a matrix of [10 PTs x 36 years] which we analysed by correspondence factor analysis (CFA). The analysis showed that, although the "internal clock" of the database was broadly consistent, there were periods of erratic activity. Thus, observed trends might not always reflect true publication trends in clinical medicine but quirks in MEDLINE indexing of PTs. There may be, for instance, different limits for retrospective tagging of entries relating to different PTs. The time-trend for Reviews of Reported Cases differed substantially from that of other publication types. Despite the quirks, quite rational explanations could be provided for the strongest correlations among PTs. The main factorial map revealed how the advent of the Randomised Controlled Trial (RCT) and the accumulation of a critical mass of literature may have increased the rate of publication of research syntheses (meta-analyses, practice guidelines...). The RCT is now the "gold standard" in clinical investigation and is often a key component of formal "systematic reviews" of the literature. Medical journal editors have largely contributed to this situation and thus helped to foster the birth and development of a new paradigm, "evidence based medicine" which assumes that expert opinion is biased and therefore relies heavily - virtually exclusively on critical analysis of the peer-reviewed literature. Our exploratory factor analysis, however, leads us to question the consistency of MEDLINE's indexing procedures and also the rationale for MEDLINE's choice of descriptors. Databases have biases of their own, some of which are not independent of expert opinion. User-friendliness should not make us forget that outputs depend on how the databases are constructed and structured.
Brazilian immunology dates from the end of the 19(th) century. The aim of the present paper was to analyze the impact of this field in contemporary Brazilian biomedical research. For this, a 15 years period (1981-1995) was studied. Production of immunological articles in Brazil represented in 1995 a percentage of 8.66 of total papers in biomedical sciences in this country. This level was achieved by an exponential increase in 1991 in the number of papers in immunology followed by a steady increase in the subsequent years. This growth was only observed in articles published in international immunology journals listed by ISI, a similar increase did not occur when the most representative Brazilian journal in biomedical sciences was analyzed. The production in immunology in the last five years (1991-1995) represented 60.69% of total articles in this field published in the whole 15 years period. When quality was assessed based on impact factor of the journals were the articles appeared, 52.71% of total immunology papers had been published in journals with impact factors varying between 7.29 and 3.24. A higher degree of international co-authorship was seen both in articles published in international journals and presentations at international congresses compared to national ones. The main countries collaborating with Brazil were: EUA. England and France. Within Brazil, immunology research was not equally distributed. Around 80% of the articles were produced by four states (Sao Paulo, Rio de Janeiro, Minas Gerais and Bahia). Sao Paulo being responsible for more than half of those articles. This geographic distribution closely resembles the distribution of the Brazilian Society of Immunology (SBI) membership. The main field of study throughout the period was immunoparasitotogy.
In his book "Scientific Progress". Rescher (1978, German ed. 1982, French ed. 1993) developed a principle of decreasing marginal returns of scientific research, which is based, inter alia. on a law of logarithmic returns and on Lotka's law in a certain interpretation. In the present paper, the historical precursors and the meaning of the principle are sketched out. It is reported on some empirical case studies concerning the principle spread over the literature. New bibliometric data are used about 19th-century mathematics and physics. They confirm Rescher's principle apart From the early phases of the disciplines where a square root law seems to be more applicable. The implication of the principle that the returns of different quality levels grow the slower, the higher the level, is valid. However, the time-derivative ratio between (logarithmized) investment in terms of manpower and returns in terms of first-rate contributors seems not to be linear, but rather to fluctuate vividly, pointing to the cyclical nature of scientific progress. With regard to Rescher's principle, in the light of bibliometric indicators no difference occurs between a natural science like physics and a formal science like mathematics. From mathematical progress of the 19th century, constant or increasing returns in the form of new formulas, theorems and axioms are observed, which leads to a complementary interpretation of the principle of decreasing marginal returns as a principle of scientific "mass production".
We show that scientific production can be described by two variables: rate of production (rate of publications) and career duration. For 19(th) century physicists, we show that the time pattern of production is random and Poisson distributed, contrary to the theory of cumulative advantage. We show that the exponential distribution provides excellent goodness-of-fit to rate of production and career duration. The good fits to these distributions can be explained naturally from the statistics of exceedances. Thus, more powerful statistical tests and a better theoretical foundation is obtained for rate of production and career duration than has been the case for Lotka's Law.
In this paper we examine, by means of a citation analysis, which factors influence the impact of articles published in demography journals between 1990 and 1992. Several quantifiable characteristics of the articles (characteristics with respect to authors, visibility, content and journals) are strongly related to their subsequent impact in the social sciences. Articles are most frequently cited when they deal with empirical, ahistorical research focusing on populations in the developed world, when they are prominently placed in a journal issue, when they are written in English and when they appear in core demography journals. Furthermore, although eminent scholars are likely to be cited on the basis of their reputation, the effect of reputation appears to be small in demography.
Publication and citation data for the thirty journals listed in the Dermatology gr Venereal Diseases category of the 1996 edition of the Journal Citation Reports (JCR) on CDROM and seven dermatology journals not listed in the JCR-1996 were retrieved online from DIMDI and analysed with respect to short- and long-term impact factors, ratios of cited to uncited papers, as well as knowledge export and international visibility. The short-term impact factors (calculated according to the rules applied in the JCR) are very similiar to their JCR counterparts: thus there are only minor changes in the rankings according to JCR impact factors and those calculated on the basis of online data, The non-JCR journals rank within the upper (two titles) and the lower third of the 37 journals (one title being at the upper end of the last third and the other four titles being at the very end of the list). Ranking the journals according to their long-term impact factors results in no major changes of a journal's position. Normalized mean citation rates which give a more direct impression of a journals's citedness in relation to the average citedness of its subfield are also shown. Ratios of cited to uncited papers parallel in general the impact factors, i.e., journals with higher (constructed) impact factors have a higher percentage of cited papers. For each journal, the Gini concentration coefficient was calculated as a measure of unevenness of the citation distribution. In general, journals with higher (constructed) impact factors have higher Gini coefficients, i.e., the higher the impact factors the more uneven the citation distribution. Knowledge export and international visibility were measured by determination of the distinct categories to which the citing journals have been assigned ("citing subfields") and of the distinct countries to which the citing authors belong ("citing countries"), respectively. Each journal exhibits a characteristic profile of citing subfields and citing countries, Normalized rankings based on knowledge export and international visibility (relating the number of published papers to the number of distinct subfields and distinct countries) are to a large extent different compared to the impact factor rankings. It is concluded that the additional data given, especially the data on knowledge export and international visibility, are necessary ingredients of a comprehensive description of a journal's significance and its position within its subject category.
This study examined research performance of Korean physicists, comparing Korean-authored papers versus internationally co-authored papers, indexed in SCI, 1994-1998, and using the number of citations received by internationally co-authored papers covered by the SCI CD-ROM. For the study, 4,665 papers published From the researchers affiliated with the physics departments or physics-associated laboratories at Korean universities and indexed by SCI were analyzed. Korean authored papers tended to be published in Korean, Japanese, and Ur( journals, while internationally co-authored papers were more likely to appear in German. Dutch, and Swiss journals. Among the 18 authorship countries ion the basis of first author), 93 internationally coauthored papers by U.S. researchers had the highest citation rate, an average 15.9 citations per paper. Of the eight countries that published over 5 papers, there was no correlation between the average number of citations per paper and the total number of citations. However, an ANOVA indicated a significant difference between the average number of citations per paper according to country (F = 5.84, p < 0.0005). In other words, papers by the U.S. and French researchers tended to be cited more frequently than papers by the Italian. Japanese, Korean, Russian, and German researchers.
This article focusses on third party funding of research in German universities. The central question is, whether funding data can function as suitable indicators for the measurement of research performance of university departments. After a brief description of the importance and the extent of third party funding in the German system of research funding, the quality of data is discussed and the funding indicator is compared with bibliometric indicators. Resultened, one can say that in subjects where external funding of research is usual, the funding indicator points to the same direction as other indicators do. Because of the peer review process involved in grant awarding, a funding indicator is in many subjects a suitable indicator to evaluate R&D impacts.
Mooers' Law, widely referenced in the literature of Library and Information Science, has generally been misinterpreted as concluding that customers will tend not to use Information Retrieval systems that are too difficult or frustrating, when in fact the law addresses the reluctance of customers to use any type of IR system, regardless of its faults or merits, within an environment in which having information requires more effort than not having it. An expansion of Mooers' original law is proposed, based upon a "Scale of Information Retrieval Environments," which includes not only those types of environments addressed by Mooers, but those in which a premium is placed upon having information, as well as those in which the effort required from having information vs, not having it is fairly evenly balanced.
It is empirically shown that, even using the normal or total counting procedure, Lotka's law breaks down when articles with a large, i.e., more than hundred, number of authors are included in the bibliography, The explanation of this phenomenon is that the conditions for an application of the basic success-breeds-success model are not fulfilled any more. Studying articles with many authors means dealing with items (the articles) having multiple sources (the authors), hence Egghe's generalized success-breeds-success model, leading to not necessarily decreasing distributions, explains the observed irregularities.
A visual term discrimination value analysis method is introduced using a document density space within a distance-angle-based visual information retrieval environment. The term discrimination capacity is analyzed using the comparison of the distance and angle-based visual representations with and without a specified term, thereby allowing the user to see the impact of the term on individual documents within the density space. Next, the concept of a "term density space" is introduced for term discrimination analysis. Using this concept, a term discrimination capacity for distinguishing one term from others in the space can also be visualized within the visual space, Applications of these methods facilitate more effective assignment of term weights to index terms within documents and may assist searchers in the selection of search terms.
In this article we report our research on building FEATUREs-an intelligent web search engine that is able to perform real-time adaptive feature (i.e., keyword) and document learning. Not only does FEATURES learn from the user's document relevance feedback, but it also automatically extracts and suggests indexing keywords relevant to a search query and learns from the user's keyword relevance feedback so that it is able to speed up its search process and to enhance its search performance, We design two efficient and mutual-benefiting learning algorithms that work concurrently, one for feature learning and the other for document learning, FEATURES employs these algorithms together with an internal index database and a real-time meta-searcher to perform adaptive real-time learning to find desired documents with as little relevance feedback from the user as possible, The architecture and performance of FEATURES are also discussed.
Visualizations of subspaces on the World Wide Web can provide users the ability to identify relevant information from a set of Web pages, while gaining new insights or understanding of the space. This study tested three classes of visualization techniques, distortion, zoom, and expanding outline, to better understand which classes of visualization techniques may better represent the underlying structure. The effects of different visualization techniques on user performance on information searching tasks and the effects of different sizes of the Web spaces were studied, Eighty participants were asked to search information with and without a visualization tool over the large Web space. The factors that may have caused cognitive overloads are further discussed.
This paper provides an overview of empirical investigations of people's use of relevance criteria. A laboratory experiment and a naturalistic study were conducted to explore the patterns of movement in use of relevance criteria during real-time document evaluation processes. The expectation was that the subjects would apply different frames of reference for relevance decision making as they moved from one stage of the document evaluation to another, and that change would occur not only for particular criteria but also for classes of criteria. The examination of such change was performed at two levels: a micro-level analysis, centering on the use of individual criteria, and a macro-level analysis, concentrating on the use of criteria classes. General results of the two studies are included in this paper. Both projects provided information on criteria patterns, which challenge the conventional meaning-based approach to criteria classification. Building on these findings, a classification scheme and criteria taxonomy that may enable generalization of criteria across different contexts is proposed.
The amount of spatial data collected from satellites, aerial photography, and land-based stations continues to grow at astounding rates. In this article, the role of exploratory data analysis (EDA) for spatial data mining is reviewed and a case study addressing environmental risk assessments in New York State is presented to illustrate the feasibility and usability of augmenting seriation for spatial data analysis. For this project, seriation, a univariate EDA technique, is augmented with a class of multimedia tools including iconic matrices, choropleth mapping, graphic interactions, and sound to exploit spatial datasets to better understand the relationships among spatial, temporal, and human variables. Additional software enhancements such as three-dimensional matrices, multivariate choropleth mapping, and tonal sonification are proposed to further improve these user-based tools for spatial data analysis.
We here perform an analysis of all 1128 publications produced by scientists during their employment at the University of Texas Institute for Geophysics, a geophysical research laboratory founded in 1972 that currently employs 23 Ph.D.-level scientists. We thus assess research performance using as bibliometric indicators such statistics as publications per year, citations per paper, and cited half-lives. To characterize the research style of individual scientists and to obtain insight into the origin of certain publication-counting discrepancies, we classified the 1128 publications into four categories that differed significantly with respect to statistics such as lifetime citation rates, fraction of papers never-cited after 10 years, and cited half-life. The categories were: mainstream (prestige journal) publications -32.6 lifetime cit/pap, 2.4% never cited, and 6.9 year half-life; archival (other refereed)-12.0 lifetime cit/pap. 21.5% never cited, and 9.5 years half-life; articles published as proceedings of conferences-5.4 lifetime cit/pap, 26.6% never cited, and 5.4 years half-life; and "other" publications (news articles, book reviews, etc.)-4.2 lifetime cit/pap, 57.1% never cited, and 1.9 years half-life. Because determining cited half-lives is highly similar to a well-studied phenomenon in earthquake seismology, which was familiar to us, we thoroughly evaluate five different methods for determining the cited half-life and discuss the robustness and limitations of the various methods. Unfortunately, even when data are numerous the various methods often obtain very different values for the half-life. Our preferred method determines half-life from the ratio of citations appearing in back-to-back 5-year periods. We also evaluate the reliability of the citation count data used for these kinds of analysis and conclude that citation count data are often imprecise. All observations suggest that reported differences in cited half-lives must be quite large to be significant.
The British controversy over the validity of Urquhart's and Garfield's Laws during the 1970s constitutes an important episode in the formulation of the probability structure of human knowledge. This controversy took place within the historical context of the convergence of two scientific revolutions-the bibliometric and the biometric-that had been launched in Britain. The preceding decades had witnessed major breakthroughs in understanding the probability distributions underlying the use of human knowledge. Two of the most important of these breakthroughs were the laws posited by Donald J. Urquhart and Eugene Garfield, who played major roles in establishing the institutional bases of the bibliometric revolution, For his part, Urquhart began his realization of S, C. Bradford's concept of a national science library by analyzing the borrowing of journals on interlibrary loan from the Science Museum Library in 1956. He found that 10% of the journals accounted for 80% of the loans and formulated Urquhart's Law, by which the interlibrary use of a journal is a measure of its total use. This law underlay the operations of the National Lending Library for Science and Technology (NLLST), which Urquhart founded. The NLLST became the British Library Lending Division (BLLD) and ultimately the British Library Document Supply Centre (BLDSC), In contrast, Garfield did a study of 1969 journal citations as part of the process of creating the Science Citation Index (SCI) formulating his Law of Concentration, by which the bulk of the information needs in science can be satisfied by a relatively small, multidisciplinary core of journals. This law became the operational principle of the Institute for Scientific Information created by Garfield, A study at the BLLD under Urquhart's successor, Maurice B, Line, found low correlations of NLLST use with SCI citations, and publication of this study started a major controversy, during which both laws were called into question. The study was based on the faulty use of the Spearman rank-correlation coefficient, and the controversy over it was instrumental in causing B. C, Brookes to investigate bibliometric laws as probabilistic phenomena and begin to link the bibliometric with the biometric revolution. This paper concludes with a resolution of the controversy by means of a statistical technique that incorporates Brookes' criticism of the Spearman rank-correlation method and demonstrates the mutual supportiveness of the two laws.
This article presents European documentalist, critical modernist, and Autonomous Marxist influenced post-Fordist views regarding the management of knowledge in mid- and late twentieth century Western modernity and postmodernity, and the complex theoretical and ideological debates, especially concerning issues of language and community. The introduction and use for corporate, governmental, and social purposes of powerful information and communication technologies created conceptual and political tensions and theoretical debates. In this article, knowledge management, including the specific recent approach known as "Knowledge Management," is discussed as a social, cultural, political, and organizational issue, including the problematic feasibility of capturing and representing knowledge that is "tacit," "invisible," and is imperfectly representable. "Social capital" and "affective labor" are discussed as elements of "tacit" knowledge. Views of writers in the European documentalist, critical modernist, and Italian Autonomous Marxist influenced post-fordist traditions, such as Otlet, Briet, Heidegger, Benjamin, Marazzi, and Negri, are discussed.(1)
This research is part of the ongoing study to better understand web page ranking on the web. It looks at a web page as a graph structure or a web graph, and tries to classify different web graphs in the new coordinate space: (out-degree, in-degree). The out-degree coordinate od is defined as the number of outgoing web pages from a given web page. The in-degree id coordinate is the number of web pages that point Po a given web page, In this new coordinate space a metric is built to classify how close or far different web graphs are. Google's web ranking algorithm (Brin & Page, 1998) on ranking web pages is applied in this new coordinate space, The results of the algorithm has been modified to fit different topological web graph structures. Also the algorithm was not successful in the case of general web graphs and new ranking web algorithms have to be considered. This study does not look at enhancing web ranking by adding any contextual information. It only considers web links as a source to web page ranking. The author believes that understanding the underlying web page as a graph will help design better ranking web algorithms, enhance retrieval and web performance, and recommends using graphs as a part of visual aid for browsing engine designers.
Over the past few years, temporal information processing and temporal database management have increasingly become hot topics. Nevertheless, only a few researchers have investigated these areas in the Chinese language. This lays down the objective of our research: to exploit Chinese language processing techniques for temporal information extraction and concept reasoning. In this article, we first study the mechanism for expressing time in Chinese. On the basis of the study, we then design a general frame structure for maintaining the extracted temporal concepts and propose a system for extracting time-dependent information from Hong Kong financial news. In the system, temporal knowledge is represented by different types of temporal concepts (TTC) and different temporal relations, including absolute and relative relations, which are used to correlate between action times and reference times. In analyzing a sentence, the algorithm first determines the situation related to the verb. This in turn will identify the type of temporal concept associated with the verb. After that, the relevant temporal information is extracted and the temporal relations are derived. These relations link relevant concept frames together in chronological order, which in turn provide the knowledge to fulfill users' queries, e.g., for question-answering (i.e., Q&A) applications.
This study is a follow-up to a published Correspondence Factorial Analysis (CFA) of a dataset of over 6 million bibliometric entries (Dore et al, JASIS, 47(8), 588-602,1996), which compared the publication output patterns of 48 countries in 18 disciplines over a 12-year period (1981-1992). It analyzes by methods suitable for investigating short time series how these output patterns evolved over the 12-year span. Three types of approach are described: (1) the chi(2) distances of the publication output patterns from the center of gravity of the multidimensional system-which represents an average world pattern-were calculated for each country and for each year. We noted whether the patterns moved toward or away from the center with time; (2) individual annual output patterns were introduced-as supplementary variables into an existing global overview covering the whole time-span [CFA map of (countries x disciplines)]. We observed how these patterns moved about within the map year by year; (3) the matrix (disciplines x time) was analyzed by CFA to derive time trends for each country. CFA revealed the "inner clocks" governing publication trends. The time scale that best fitted the data was not a linear but an elastic scale. Although different countries laid emphasis on publication in different disciplines, the overall tendency was toward greater uniformity in publication patterns with time.
Multiple authorship is a topic of growing concern in a number of scientific domains. When, as is increasingly common, scholarly articles and clinical reports have scores or even hundreds of authors-what Cronin (in press) has termed "hyperauthorship"-the precise nature of each individual's contribution is often masked. A notation that describes collaborators' contributions and allows those contributions to be tracked in, and across, texts (and over time) offers a solution. Such a notation should be useful, easy to use, and acceptable to communities of scientists. Drawing on earlier work, we present a proposal for an XML-like "contribution" mark-up, and discuss the potential benefits and possible drawbacks.
Theories of aboutness and theories of subject analysis and of related concepts such as topicality are often isolated from each other in the literature of information science (IS) and related disciplines. In IS it is important to consider the nature and meaning of these concepts, which is closely related to theoretical and metatheoretical issues in information retrieval (IR). A theory of IR must specify which concepts should be regarded as synonymous concepts and explain how the meaning of the nonsynonymous concepts should be defined.
Many international comparisons of the publication performance at the macro level are based on direct counts of citation frequencies in the Science Citation Index. However, these comparisons may reveal a significant negative language bias for non-English-speaking countries, or other selection biases, which can be illustrated by the relation between research budgets of scientific institutions and SCI publications. Against this background, a two-dimensional representation, specifying for the international alignment of the national publications and the journal-standardized citation impact, proves to be a more appropriate indicator base to assess the citation performance of countries such as Germany. In the light of a ten countries' benchmark, time series of these indicators for the nineties show a considerable impact of the German unification with a recent trend towards an adaptation of publication behaviour in East Germany towards the Western patterns.
The present study compares the international publication productivity of Latin American countries in the fields of business administration and economics from 1995 to 1999. Only four countries - Argentina, Brazil, Chile, and Mexico - have a substantial research production in these areas. Among these countries, Chile showed the most favorable results according to various indicators of publication productivity.
There is an evident need for the most scrupulous assessment possible of the fruits of research tin the context considered here; namely, publications) with a qualitative hence in-depth analysis of the single products of R&D. But this would require time and competences which not all policy makers have at their disposal. Hopefully, quantitative procedures, apparently objective and easy to apply, would be able to surmount these difficulties. The diffusion of the quantitative evaluation of research is, that is, the policy makers' adaptive response to the need to increase controls of the efficiency of public spending in R&D - since public investment clearly could not be determined at the outset on the basis of the market's spontaneous, decentralised balancing mechanisms. An essential step towards the prevention of the distortions most likely to result from quantitative evaluation is the adoption of quantitative procedures of evaluation of the editorial policies of scientific journals - or, rather, of journals which claim to be scientific. Such procedures must be designed to highlight any distortions caused by the non-optimal editorial policies of journals. With quantitative evaluation, in fact, journals play a crucial role in the formation of public science policies. They thus have to be subjected to specific monitoring to make sure that their conduct fits in with the prerequisites necessary for them to perform their semi-official activity as certifiers of the quality of the products of research. The phenomena of the production, divulgation and fruition of scientific discovery are, of course, so complex that it is necessary to weigh them not with a single indicator, however helpful it may be, but with a constellation of indicators. We received confirmation of the reliability of the impact factor as an instrument to monitor the quality of research and as a means of evaluating the research itself. This is a reassuring result for the current formulation of public policies and confirms the substantial honesty of the competition mechanisms of the scientific enterprise.
In this paper we examine the validity of Lotka's law and Zipf's law for research output in 15 top journals of economics in the period 1977 to 1997. Our data for individual authors satisfy a general form of Lotka's law. We find increasing competition over time among economists on the individual level. However, publications in top journals are concentrated heavily when the institutional level is under consideration. Research output of institutions can be fit adequately by Zipf's law.
An analysis of 3174 papers published in journals in the field of laser science and technology indicate that only 401 papers were single authored and the rest 2773 were co-authored papers. Of the 2773 papers, only 687 were written in local (inter-departmental), domestic (inter-institutional) and international collaboration. As reflected by the values of collaborative coefficient and coauthorship index, it is observed that the proportion of mega-authored papers for Japan, France, Italy, and the Netherlands was more, while for Canada, China, and Australia the proportion of single authored papers was more. Most of the collaborative papers had bilateral domestic and international collaboration. Domestic collaborations were higher for USA, Japan, France and Australia, while international collaboration was higher for China, Israel, the Netherlands, and Switzerland.
Given extensive research collaboration in modem science, both at the national and international level, one might wonder whether the network of researchers within each discipline is now sufficiently meshed that a large proportion of contributors to peer-reviewed journals in a given field could either share joint publications or, more realistically, be connected through chains of co-authorships. Such is not the case yet in the fields of probability and statistics, however, as shown here using a large data base covering 9 reknown journals from each of these two areas over the period 1986-1995.
The differentiation of scientific fields into sub-fields can be studied on the level of the 'scientific content' of the sub-field, that is on the level of the products, as well as on the level of the 'social structures' of the sub-field, that is on the lever of the producers of the content. By comparing the behavior of the constructs with the behavior of the constructors, we are able to demonstrate the analytical distinction between a cognitive and a social approach in an empirical way. This will be illustrated using the case of integration and differentiation in Science and Technology Studies (STS), Elsewhere, using relations between documents, I showed how STS is characterized by strong differentiation tendencies. In this paper I address the question to what extent this differentiation is also reflected in the social structure of the STS field. Can STS scholars and STS research groups be classified in terms of the sub-fields? Or do researchers and institutes carry an integrative role in the STS field? Are the relations between the sub-fields of STS maintained by individual researchers or research institutes, and to what extent? The analysis in this paper reveals that this is generally not the case. Although we are able to distinguish analytically between the cognitive and social dimension of the development of the research field, we find similar patterns of differentiation an the social level too. At the same time, this differentiation differs in some respects from the cognitive differentiation pattern. Consequently, the social and the cognitive dimensions of the STS field are not independent as no serious STS scholar would argue - but also not identical, as radical constructivists claim, but are strongly interacting Further analysis may reveal the leading dynamics, that is answering the question whether the 'social' follows the 'cognitive', the other way around, or whether the dynamics has the pattern of 'co-evolution'.
We apply methods and concepts of statistical physics to the study of science & technology (S&T) systems. Specifically, our research is motivated by two concepts of fundamental importance in modern statistical physics: scaling and universality. We try to identify robust, universal, characteristics of the evolution of S&T systems that can provide guidance to forecasting the impact of changes in funding. We quantify the production of research in a novel fashion inspired by our previous study of the growth dynamics of business firms. We study the production of research from the point of view both of inputs (R&D funding) and of outputs (publications and patents) and find the existence of scaling laws describing the growth of these quantities. We also analyze R&D systems of different countries to test the "universality" of our results. We hypothesize that the proposed methods may be particularly useful for fields of S&T (or for levels of aggregation) for which either not enough information is available, or for which evolution is so fast that there is not enough time to collect enough data to make an informed decision.
Competition is one of the most essential features of science. A new journal indicator - the "number of Matthew citations in a journal" was found that reflects certain aspects of this competition. The indicator mirrors the competition of countries in scientific journals for recognition in terms of seemingly "redistributed" citations. The indicator shows, as do other journal indicators, an extreme skewed distribution over an ensemble of 2712 SCI journals. Half of all Matthew citations are contained in 144 so-called Matthew core journals. In this paper, a new typology of scientific journals, including the Matthew core journals, is introduced. For a few selected journals, graphs are presented showing national impact factors as well as the absolute number of Matthew citations gained or lost by the countries publishing in the journal. Scientific competition among countries for recognition is strongest in the Matthew core journals, they ate the most competitive markets for scientific publications. Conclusions are drawn for national science policy, for the journal acquisition policy of national libraries, and for the publication behaviour of individual scientists.
The use of a map as a metaphor of a scientific field is an established idea and using it as an interface to bibliometric data seems to have great potential. Nevertheless, our own implementation of such an interface came up with some limits inhibiting the user to comprehend as to what he was looking at. As a result, the map was not used to its fullest potential. The implementation described in this paper as a high-level (conceptual) design, addresses the problems noted by users. It combines both top-down and bottom-up access to the bibliometric data, something we see as vital to mapping internal knowledge onto the external depiction and vice versa. And as such, it becomes a more complete tool to explore the mapped scientific field and to find and retrieve relevant information.
The main objective of this study is the elaboration of national characteristics in international scientific co-authorship relations. An attempt is made to find statistical evidence of symmetry and asymmetry in co-publication links, of the relation between international co-authorship and both national research profiles and citation impact. Four basic types can be distinguished in the relative specialisation of domestic and internationally co-authored publications of 50 most active countries in 1995/96 concerning the significance of the difference between the two profiles. Co-publication maps reveal structural changes in international co-authorship links in the last decade. Besides stable links and coherent clusters, new nodes and links have also been found. Not all links between individual countries are symmetric. Specific (unidirectional) co-authorship affinity could also be detected in several countries. As expected, international co-authorship, on an average, results in publications with higher citation rates than purely domestic papers. However, the influence of international collaboration on the national citation impact varies considerably between the countries (and within one individual country between fields). In some cases there is, however, no citation advantage for one or even for both partners.
This paper considers the status of information science as science through an exploration of one of the leading journals in the field - the Journal of the American Society for Information Science (JASIS) from its initial publication as American Documentation (AD) in 1950 through the closing issue of its Silver Anniversary year in December 1999, It is a bibliometric examination of AD/JASIS articles. Based on our analysis of articles published in AD and JASIS from 1950 to 1999, we find that there has been a slow but perhaps inevitable shift based first on the single nonfunded researcher and author to a much wider research and publishing participation among authors, regions, corporate authors, and countries. This suggests not only cross-fertilization of ideas, but also more complex research questions. A small trend toward greater external funding further reinforces this hypothesis. Information may no longer be "little" science, but it is also not "big" science.
The purpose of this paper is to apply concepts of the diffusion of innovations research in the study of the international diffusion of a formerly national scientific journal, Annales Zoologici Fennici. The study was conducted using bibliometric methodology. The diffusion of the journal was described through citations of the journal and through the development of the national distribution of its contributors. The compatibility of the journal as well as the decrease of complexity were found to have an influence on diffusion. Bibliometric methods were able to represent the international diffusion of a scientific journal.
Results are presented on journal growth dynamics at both the micro and macro levels, showing that journal development clearly follows researcher behaviour and growth characteristics. At the subject discipline level, the journal system is highly responsive to research events. Overall journal growth characteristics clearly show the predominance of 3.3% compound annual growth under a number of different socio-political climates. It is proposed that this represents a lower limit to journal growth rates and that this growth is the outcome of a self-organizing information system that reflects on the growth and specialization of knowledge. Potential models are suggested which could form attractive theoretical further lines of enquiry.
This paper explores the interrelationships between science and technology in the emerging area of nano-science and technology. We track patent citation relations at the sectoral-disciplinary, the organizational, and the combined industrial/organizational levels, Then we investigate the geographic location and organizational affiliation of inventor/authors, Our main finding is that there are only a small number of citations connecting nano-patents with nano-science papers, while nano-science and technology appear to be relatively well connected in comparison with other fields. Further explorations suggest that nano-science and technology are still mostly separated spheres, even though there are overlaps, as an analysis of title words shows. Another observation is that university-assigned patents seem to cite papers more frequently than other patents.
The present paper focuses on some important requirements for understanding patent search reports in view of their use for statistical analysis. It is pointed out and illustrated that the comprehensiveness and the quality of a given search report may vary significantly as a function of the patent office drawing up the report. These differences imply consequences with respect to the safe use and interpretation of the data. The authors stress that a sound analysis based on patent citation data can only be performed in a meaningful way if the analyst has a minimum knowledge of the underlying search reports.
Interdisciplinarity has become of increasing interest in science in the past few years. This paper is a case study in the area of Chemistry, in which a series of different bibliometric indicators for measuring interdisciplinarity are presented. The following indicators are analysed: a) ISI multi-classification of journals in categories, b) patterns of citations and references outside category and c) multi-assignation of documents in Chemical Abstracts sections. Convergence between the different indicators is studied. Depending on the size of the unit analysed (area, category or journal) the most appropriate indicators are determined.
An analysis carried out on the 4.326 periodicals in the social sciences included in the most recent 1991 printed edition of the UNESCO DARE database showed that 64% of the world's production is published by High Income Economy countries (IEC). Only 0.7% of Low IEC journals in the UNESCO database were also present in the Social Sciences Citation Index (SSCT) for the same year while corresponding figures for the Middle and High IEC were 2.3%, and 97.0%, respectively. With the notable exception of the United States, all countries had fewer journals in SSCI than in UNESCO database.
Methods were developed to allow quality assessment of academic research in linguistics in all sub-disciplines. Data were obtained from samples of respondents from Flanders, the Netherlands, as well as a world-wide sample, evaluated journals, publishers, and scholars. Journals and publishers were ranked by several methods. First, we weighted the number of times journals or publishers were ranked as 'outstanding', 'good'. or 'occasionally/not at all good'. To reduce the influence of unduly positive or negative biases of respondents, the most extreme ratings were trimmed. A second weight reflects the (international) visibility of journals and publishers. Here, journals or publishers nominated by respondents from various countries or samples received a greater weight than journals or publishers nominated by respondents from one country or one sample only. Thirdly, a combined index reflects both quality and international visibility. Its use is illustrated on the output of scholars in linguistics. Limitations and potentials for application of bibliometric methods in output assessments are discussed.
We argue in favour of artificial neural networks for exploratory data analysis, clustering and mapping. We propose the Kohonen self-organizing map (SOM) for clustering and mapping according to a multi-maps extension. It is consequently called Multi-SOM. Firstly the Kohonen SOM algorithm is presented. Then the following improvements are detailed: the way of naming the clusters, the map division into logical areas, and the map generalization mechanism. The multi-map display founded on the inter-maps communication mechanism is exposed, and the notion of the viewpoint is introduced. The interest of Multi-SOM is presented for visualization, exploration or browsing, and moreover for scientific and technical information analysis. A case study in patent analysis on transgenic plants illustrates the use of the Multi-SOM. We also show that the inter-map communication mechanism provides support for watching the plants on which patented genetic technology works. It is the first map. The other four related maps provide information about the plant parts that are concerned, the target pathology, the transgenic techniques used for making these plants resistant, and finally the firms involved in genetic engineering and patenting. A method of analysis is also proposed in the use of this computer-based multi-maps environment. Finally, we discuss some critical remarks about the proposed approach at its current state. And we conclude about the advantages that it provides for a knowledge-oriented watching analysis on science and technology. In relation with this remark we introduce in conclusion the notion of knowledge indicators.
As part of a larger project to investigate knowledge flows between fields of science, we studied the differences in speed of knowledge transfer within and across disciplines. The age distribution of references in three selections of articles was analysed, including almost 800,000 references in journal publications of the United Kingdom in 1992, 700,000 references in publications of Germany in 1992, and more than 11 million references in the world total of publications in 1998. The rate of citing documented knowledge from other disciplines appears to differ sharply among disciplines. For most of the disciplines the same ratio's are found in the three data sets. Exceptions show interesting differences in the interdisciplinary nature of a field in a country. We find a general tendency of a citation delay in case of knowledge transfer between different fields of science: citations to work of the own discipline show less of a time lag than citations to work in a foreign discipline. Between disciplines typical differences in the speed of incorporating knowledge from other disciplines are observed, which appear to be relatively independent of time and place: for each discipline the same pattern is found in the three data sets. The discipline specific characteristics found in the speed of interdisciplinary knowledge transfer may be point of departure for further investigations. Results may contribute to explanations of differences in citation rates of interdisciplinary research.
The neuroscience research front on Retrograde Amnesia is taken as an example to demonstrate the capabilities of co-citation mapping in combination with peer review. In an interview with a well-known expert in the field the co-citation map was confirmed as a good representation of the speciality. The expert was able to identify and comment on different regions of the map and he could validate important documents in the cluster core and research front as well as the main actors on institutional and national level. The bibliometric data inspired the expert to outline the cognitive and social "history" of the speciality.
One of the main objectives of technology analyses is to understand how investing in technological innovation can have commercial benefits. However, empirical studies of the relationship between investments in technology and subsequent economic performance are relatively scarce. This paper provides such an analysis by demonstrating how quantitative R&D and technology indicators may be used to forecast company stock price performance. The purpose of the analysis is to utilize a unique patent database, and the science and technology indicators developed from the data therein, to explore this issue of technological competence and economic performance. The underlying concept behind this study is that the quality of a company's technology is reflected in its patent portfolio. Previous research has shown that a company with a large percentage of influential parents is much more likely to be technologically successful than a company with weaker patents. The analysis presented here reveals that such a company is also more likely to be successful in capital markets.
Empirical evidence presented in this paper shows that the utmost care must be taken in interpreting bibliometric data in a comparative evaluation of national research systems. From the results of recent studies, the authors conclude that the value of impact indicators of research activities at the level of an institution or a country strongly depend upon whether one includes or excludes research publications in SCI covered journals written in other languages than in English. Additional material was gathered to show the distribution of SCI papers among publication languages. Finally. the authors make suggestions for further research on how to deal with this type of problems in future national research performance studies.
We present a model in which scientists compete with each other in order to acquire status for their publications in a two-step-process: first, to get their work published in better journals, and second, to get this work cited in these journals. On the basis of two Maxwell-Boltzmann type distribution functions of source publications we derive a distribution function of citing publications over source publications. This distribution function corresponds very well to the empirical data. In contrast to all observations so far, we conclude that this distribution of citations over publications, which is a crucial phenomenon in scientometrics. is not a power law, but a modified Bessel-function.
Empirical work shows significant benefits from using relevance feedback data to improve information retrieval (IR) performance. Still, one fundamental difficulty has limited the ability to fully exploit this valuable data. The problem is that it is not clear whether the relevance feedback data should be used to train the system about what the users really mean, or about what the documents really mean. In this paper, we resolve the question using a maximum likelihood framework. We show how all the available data can be used to simultaneously estimate both documents and queries in proportions that are optimal in a maximum likelihood sense. The resulting algorithm is directly applicable to many approaches to IR, and the unified framework can help explain previously reported results as well as guide the search for new methods that utilize feedback data in IR.
Using novel informatics techniques to process the output of Medline searches, we have generated a list of viruses that may have the potential for development as weapons. Our findings are intended as a guide to the virus literature to support further studies that might then lead to appropriate defense and public health measures. This article stresses methods that are more generally relevant to information science. Initial Medline searches identified two kinds of virus literatures-the first concerning the genetic aspects of virulence, and the second concerning the transmission of viral diseases. Both literatures taken together are of central importance in identifying research relevant to the development of biological weapons. Yet, the two literatures had very few articles in common. We downloaded the Medline records for each of the two literatures and used a computer to extract all virus terms common to both. The fact that the resulting virus list includes most of an earlier independently published list of viruses considered by military experts to have the highest threat as potential biological weapons served as a test of the method; the test outcome showed a high degree of statistical significance, thus supporting an inference that the new viruses on the list share certain important characteristics with viruses of known biological warfare interest.
Relevance has been a difficult concept to define, let alone measure. In this paper, a simple operational definition of relevance is proposed for a Web-based library catalog: whether or not during a search session the user saves, prints, mails, or downloads a citation. If one of those actions is performed, the session is considered relevant to the user. An analysis is presented illustrating the advantages and disadvantages of this definition. With this definition and good transaction logging, it is possible to ascertain the relevance of a session. This was done for 905,970 sessions conducted with the University of California's Melvyl online catalog. Next, a methodology was developed to try to predict the relevance of a session. A number of variables were defined that characterize a session, none of which used any demographic information about the user. The values of the variables were computed for the sessions. Principal components analysis was used to extract a new set of variables out of the original set. A stratified random sampling technique was used to form ten strata such that each new strata of 90,570 sessions contained the same proportion of relevant to nonrelevant sessions. Logistic regression was used to ascertain the regression coefficients for nine of the ten strata. Then, the coefficients; were used to predict the relevance of the sessions in the missing strata. Overall, 17.85% of the sessions were determined to be relevant. The predicted number of relevant sessions for all ten strata was 11 %, a 6.85% difference. The authors believe that the methodology can be further refined and the prediction improved. This methodology could also have significant application in improving user searching and also in predicting electronic commerce buying decisions without the use of personal demographic data.
The popularity of digital images is rapidly increasing due to improving digital imaging technologies and convenient availability facilitated by the Internet. However, how to find user-intended images from the Internet is nontrivial. The main reason is that the Web images are usually not annotated using semantic descriptors. In this article, we present an effective approach to and a prototype system for image retrieval from the Internet using Web mining. The system can also serve as a Web image search engine. One of the key ideas in the approach is to extract the text information on the Web pages to semantically describe the images. The text description is then combined with other low-level image features in the image similarity assessment. Another main contribution of this work is that we apply data mining on the log of users' feedback to improve image retrieval performance in three aspects. First, the accuracy of the document space model of image representation obtained from the Web pages is improved by removing clutter and irrelevant text information. Second, to construct the user space model of users' representation of images, which is then combined with the document space model to eliminate mismatch between the page author's expression and the user's understanding and expectation. Third, to discover the relationship between low-level and high-level features, which is extremely useful for assigning the low-level features' weights in similarity assessment.
A fundamental aspect of content-based image retrieval (CBIR) is the extraction and the representation of a visual feature that is an effective discriminant between pairs of images. Among the many visual features that have been studied, the distribution of color pixels in an image is the most common visual feature studied. The standard representation of color for content-based indexing in image databases is the color histogram. Vector-based distance functions are used to compute the similarity between two images as the distance between points in the color histogram space. This paper proposes an alternative real valued representation of color based on the information theoretic concept of entropy. A theoretical presentation of image entropy is accompanied by a practical description of the merits and limitations of image entropy compared to color histograms. Specifically, the L-1 norm for color histograms is shown to provide an upper bound on the difference between image entropy values. Our initial results suggest that image entropy is a promising approach to image description and representation.
A novel idea of media agent is briefly presented, which can automatically build a personalized semantic index of Web media objects for each particular user. Because the Web is a rich source of multimedia data and the text content on the Web pages is usually semantically related to those media objects on the same pages, the media agent can automatically collect the URLs and related text, and then build the index of the multimedia data, on behalf of the user whenever and wherever she accesses these multimedia data or their container Web pages. Moreover, the media agent can also use an off-line crawler to build the index for those multimedia objects that are relevant to the user's favorites but have not accessed by the user yet. When the user wants to find these multimedia data once again, the semantic index facilitates text-based search for her.
Content-based image retrieval is based on the idea of extracting visual features from image and using them to index images in a database. The comparisons that determine similarity between images depend on the representations of the features and the definition of appropriate distance function. Most of the research literature uses vectors as the predominate representation given the rich theory of vector spaces. While vectors are an extremely useful representation, their use in large databases may be prohibitive given their usually large dimensions and similarity functions. In this paper, we propose similarity measures and an indexing algorithm based on information theory that permits an image to be represented as a single number. When use in conjunction with vectors, our method displays improved efficiency when querying large databases.
The explosive growth of digital image collections on the Web sites is calling for an efficient and intelligent method of browsing, searching, and retrieving images. In this article, an artificial neural network (ANN)-based approach is proposed to explore a promising solution to the Web image retrieval (IR). Compared with other image retrieval methods, this new approach has the following characteristics. First of all, the Content-Based features have been combined with Text-Based features to improve retrieval performance. Instead of solely relying on low-level visual features and high-level concepts, we also take the textual features into consideration, which are automatically extracted from image names, alternative names, page titles, surrounding texts, URLs, etc. Secondly, the Kohonen neural network model is introduced and led into the image retrieval process. Due to its self-organizing property, the cognitive knowledge is learned, accumulated, and solidified during the unsupervised training process. The architecture is presented to illustrate the main conceptual components and mechanism of the proposed image retrieval system. To demonstrate the superiority of the new IR system over other IR systems, the retrieval result of a test example is also given in the article.
Based on an analysis of the 377 documents that cited Griffith's publications in the ISI citation databases, it has been found that Griffith made pioneer and significant contributions with his collaborators to the fields of bibliometrics and scholarly communication among scientists. His research work has also greatly influenced people from all over the world conducting research in psychology, bibliometric information science, and social studies of science in the past several decades.
Characteristics of publication activity and co-authorship in neurosciences are analysed. The present study aims at describing the common, as well as the distinguishing features of productivity and co-publication patterns of four types of authors. For this purpose, authors are classified according to their anterior and posterior records. The role of the author types in the process of documented scientific communication, the relation between co-authorship and publication activity, as well as collaboration between the four types is studied.
The rich body of literature examining communications flow in the research context, an area where Professor Belver Griffith made major contributions, has very direct relevance to the relatively newly emerging recognition in the business community of the importance of knowledge creation and deployment to the competitive performance of an organization. This essay examines and delineates some of those lessons, specifically the tension between open and rich communications versus the need to protect intellectual property; the importance of environmental awareness and serendipity, and achieving the correct balance with efficient use of information searching time; the importance of end-user training; and crafting the balance in knowledge management between codifications and personalization.
The relation between philosophy of science and epistemology is studied using the author co-citation technique. Co-citation links among 62 authors - a representative list of various styles and approaches to rationality - were established using the Arts and Humanities Citation Index. Multidimensional scaling results in a two-dimensional map of authors, where the axes represent the subject (philosophy of science to epistemology) and the method (qualitative to quantitative), respectively. The authors on the map can be clustered into more or less coherent groups at different levels of resolution.
This paper describes results of a survey conducted among the Russian Foundation for Basic Research (RFBR) grant-holders. The aim of this paper is to examine the attitude of grant holders to new multi-channel funding system and to assess its significance for Russian scientists involved in research in natural and applied sciences. It is a first attempt to get a fair and general picture of what scientists think about competitive funding. In 1999, 1440 questionnaires were distributed by mail. The response rate was 31.8%. The results of the survey clearly show that proposal writing has become a substantial part of research activity in Russia. Each respondent received more than 5 grants during 1993-1997. The RFBR and foreign funding agencies, particularly ISF, INTAS, and the Civilian Research and Development Foundation equally evaluated Russian scientists' performance: about 69.% of RFBR grant-holders were awarded a grant from foreign agencies. The present findings are being used, as a practical matter, to guide and inform the Ministry of Science and Technology Policy which is responsible for the promotion R&D in Russia to organize a special training for students and post does on proposal writing.
The expansion in the number of journals being published really took off in the nineteenth century. Between the beginning and end of that century, the problems of dealing with the spread of literature appearing consequently grew rapidly. The reactions of scientists to this included a move towards increasing specialisation in their research, and a higher level of organisation of their communication activities. Li particular, ways of assisting information retrieval were developed then which became extremely important in the twentieth century. Two of these developments are examined here - the provision of abstracts for scientists and of popular articles for non-scientists. Parallels can be found between these two activities, as well as differences due to the different target audiences. It is noted that both appeared in print environment: an electronic environment may affect their futures differently.
Recent advances in the power and capabilities of personal computers have brought the algorithms and representational methods of Geographic Information Systems (GIS) to the desktop. Information that has relationships between elements may be represented spatially, especially if some distance metric can be brought to bear. This paper discusses information cartography, the use of spatial methods for the display of non-Geographic data.
Among Belver C. Griffith's many contributions to disciplinary communication is the idea that science and scholarship at large constitute a social system to be investigated empirically. This paper reports findings of an author co-citation analysis of the field of human behavioral ecology that expands Griffith's concept of the social system of scientific communication to fit a socioecological framework. Cluster analysis and multidimensional scaling techniques are used to characterize the research specialty at large and portray five respondents' individual resource maps. The techniques reveal co-citation relationships among authors whose work they had referenced in recent articles. Survey data on searching and handling behaviors for an aggregated sample of 180 cited references are correlated with core-periphery zones of the individual maps. Findings that types of socially mediated communication and distinctive information foraging behaviors correlate with different zones of a bibliographic microhabitat support an interpretation that active specialty members conform to foraging efficiency principles as predicted by prey-choice models from optimal foraging theory.
This article describes ways of automatically generating 15 kinds of personal profiles of authors from bibliographic data on their publications in databases. Nicknamed CAMEOs, the profiles can be used for retrieval of documents by human searchers or computerized agents. They can also be used for mapping an author's subject matter (in terms of descriptors, identifiers, and natural language) and studying his or her publishing career. Finally, they can be used to map the intellectual and social networks evident in citations to and from authors and in co-authorships.
In collaborative information finding systems, evaluations provided by users assist other users with similar needs. This article examines the problem of getting users to provide evaluations, thus overcoming the so-called "free-riding" behavior of users. Free riders are those who use the information provided by others without contributing evaluations of their own. This article reports on an experiment conducted using the "AntWorld," system a collaborative information finding system for the Internet, to explore the effect of added motivation on users' behavior. The findings suggest that for the system to be effective, users must be motivated either by the environment, or by incentives within the system. The findings suggest that relatively inexpensive extrinsic motivators can produce modest but significant increases in cooperative behavior.
Different users of a Web-based information system will have different goals and different ways of performing their work. This article explores the possibility that we can automatically detect usage patterns without demographic information about the individuals. First, a set of 47 variables was defined that can be used to characterize a user session. The values of these variables were computed for approximately 257,000 sessions. Second, principal component analysis was employed to reduce the dimensions of the original data set. Third, a two-stage, hybrid clustering method was proposed to categorize sessions into groups. Finally, an external criteria-based test of cluster validity was performed to verify the validity of the resulting usage groups (clusters). The proposed methodology was demonstrated and tested for validity using two independent samples of user sessions drawn from the transaction logs of the University of California's MELVYL(R) on-line library catalog system (www.melvyl.ucop.edu). The results indicate that there were six distinct categories of use in the MELVYL system: knowledgeable and sophisticated use, unsophisticated use, highly interactive use with good search performance, known-item searching, help-intensive searching, and relatively unsuccessful use. Their characteristics were interpreted and compared qualitatively. The analysis shows that each group had distinct patterns of use of the system.
The enormous number of images of works of art available on the Internet promises great potential for research in art history and related image-using disciplines. Yet without adequate indexing, this resource is very difficult to explore. It is certainly necessary to identify each image with the basic unique information relating to it, but more is needed. Since it is largely the concepts related to works of art that interest art historians, it is these concepts that need to be reflected in indexing terminology. As these concepts have changed overtime, a sizeable verbal infrastructure is needed to support the indexing of large image databases to achieve maximum use by art historians.
This article reports on a quantitative categorical analysis of metadata elements in the Dublin Core, VRA Core, REACH, and EAD metadata schemas, all of which can be used for organizing and describing images. The study found that each of the examined metadata schemas contains elements that support the discovery, use, authentication, and administration of images, and that the number and proportion of elements supporting functions in these classes varies per schema. The study introduces a new schema comparison methodology and explores the development of a class-oriented functional metadata schema for controlling images across multiple domains.
To support browsing-based subject access to image collections, it is necessary to provide users with networks of subject terms that are organized in an intuitive, richly interconnected manner. A principled approach to this task is to organize the subject terms by their relationship to activity contexts that are commonly understood among users. This article describes a methodology for creating networks of subject terms by manually representing a large number of common-sense activities that are broadly related to image subject terms. The application of this methodology to the Library of Congress Thesaurus for Graphic Materials produced 768 representations that supported users of a prototype browsing-based retrieval system in searching large, indexed photograph collections.
Keyword search of multimedia collections lacks precision and automatic parsing of unrestricted natural language annotations lacks accuracy. We propose a structure for natural language descriptions of the semantic content of visual materials that requires descriptions to be (modified) keywords, phrases, or simple sentences, with components that are grammatical relations common to many languages. This structure makes it easy to implement a collection's descriptions as a relational database, enabling efficient search via the application of well-developed database-indexing methods. Description components may be elements from external resources (thesaurus, ontology, database, or knowledge base). This provides a rich superstructure for the meaningful retrieval of images by their semantic contents.
This article presents exploratory research evaluating a conceptual structure for the description of visual content of images. The structure, which was developed from empirical research in several fields (e.g., Computer Science, Psychology, Information Studies, etc.), classifies visual attributes into a "Pyramid" containing four syntactic levels (type/technique, global distribution, local structure, composition), and six semantic levels (generic, specific, and abstract levels of both object and scene, respectively). Various experiments are presented, which address the Pyramid's ability to achieve several tasks: (1) classification of terms describing image attributes generated in a formal and an informal description task, (2) classification of terms that result from a structured approach to indexing, and (3) guidance in the indexing process. Several descriptions, generated by naive users and indexers, are used in experiments that include two image collections: a random Web sample, and a set of news images. To test descriptions generated in a structured setting, an Image Indexing Template (developed independently over several years of this project by one of the authors) was also used. The experiments performed suggest that the Pyramid is conceptually robust (i.e., can accommodate a full range of attributes), and that it can be used to organize visual content for retrieval, to guide the indexing process, and to classify descriptions obtained manually and automatically.
The use of primitive content features of images for classification and retrieval has matured over the past decade. However, human beings often prefer to locate images using words. This article proposes a number of methods to utilize image primitives to support term assignment for image classification. Further, the authors propose to release code for image analysis in a common tool set for other researchers to use. Of particular interest to the authors is the expansion of work by researchers in image indexing to include image content based feature extraction capabilities in their work.
A database management system is a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. This article investigates the role of data management in multimedia digital libraries, and its implications for the design of database management systems. The notions of content abstraction and content independence are introduced, which clearly expose the unique challenges (for database architecture) of applications involving multimedia search. A blueprint of a new class of database technology is proposed, which supports the basic functionality for the management of both content and structure of multimedia objects.
In this article we address the problem of benchmarking image browsers. Image browsers are systems that help the user in finding an image from scratch, as opposed to query by example (QBE), where an example image is needed. The existence of different search paradigms for image browsers makes it difficult to compare image browsers. Currently, the only admissible way of evaluation is by conducting large-scale user studies. This makes it difficult to use such an evaluation as a tool for improving browsing systems. As a solution, we propose an automatic image browser benchmark that uses structured text annotation of the image collection for the simulation of the user's needs. We apply such a benchmark on an example system.
Content-based retrieval (CBR) promises to greatly improve capabilities for searching for images based on semantic features and visual appearance. However, developing a framework for evaluating image retrieval effectiveness remains a significant challenge. Difficulties include determining how matching at different description levels affects relevance, designing meaningful benchmark queries of large image collections, and developing suitable quantitative metrics for measuring retrieval effectiveness. This article studies the problems of developing a framework and testbed for quantitative assessment of image retrieval effectiveness. In order to better harness the extensive research on CBR and improve capabilities of image retrieval systems, this article advocates the establishment of common image retrieval testbeds consisting of standardized image collections, benchmark queries, relevance assessments, and quantitative evaluation methods.
A simple stochastic model, based upon mixtures of non-homogeneous Poisson processes, is proposed to describe the citation process in the presence of ageing/obsolescence. Particular emphasis is placed upon investigation of the first-citation distribution where it is shown that in the presence of ageing there will inevitably be never-cited items. Conditions are given which show how the model is capable of modelling the various shapes of first-citation distributions reported in the literature. In particular, the essential link between the first-citation distribution and the obsolescence distribution is established.
Fish and aquaculture research in the People's Republic of China over the six years 1994-1999 has been mapped using data from six databases - three abstracting services and three citation indexes. The results are compared with fish science research in India. During the six years China has published 2035 papers (roughly 4.5 -5% of the world output) and India 2454. More than 95% of China's papers are journal articles, compared to 82.8% of Indian papers. About 78% of China's journal paper output has appeared in 143 domestic journals compared to 70% from India in 113 Indian journals. Less than one-eighth of the journal articles published by Chinese researchers are published in journals indexed in SCI, compared to 30% of journal articles by Indian researchers. Less than a dozen papers from each of these countries have appeared in journals of impact factor greater than 3.0. Fish research institutes and fishery colleges are the major contributors of the Chinese research output in this area. In India academic institutions are the leading contributors (61%), followed by central government institutions (> 25%). Qingdao, Wuhan, Beijing and Shanghai are the cities and Shandong, Hubei and Fujian are the provinces contributing a large number of papers. As we do not have addresses of all authors in most of the papers, we are unable to estimate the extent of international collaboration. Although China's research output and its citation impact are less than those of India, China's fish production and export earnings are far higher than those of India. Probably China is better at bridging the gap between knowhow (research) and do-how (technology and creation of employment and wealth). China is pretty strong in extension.
The output of female researchers in Iceland, relative to that of males, can be investigated because typically their "surnames" end in "dottir" whereas the names of males end in "son". Over the 21 years from 1980 to 2000, there has been a rise in female: male output from 8% to about 30%. It is higher in the life sciences (biomedical research, biology and clinical medicine) but lower where there is also foreign co-authorship, suggesting that females are less able to make overseas contacts through travel. There appears to be no difference in the quality of female and male research output, as measured either by journal impact categories or by citations.
This study evaluates the distribution of papers published by European Union (EU) authors in ophthalmological journals from 1995 to 1997. The impact of ophthalmological research in the EU is compared with that produced in other countries and trends of research are highlighted through the keywords analysis. Data of articles published in ophthalmological journals (ISI Subject Category) were downloaded. Mean Impact Factor, source country population and gross domestic product were analyzed. A special purpose software for keyword elaboration was utilized. 11,219 papers were published in the world in the ophthalmological journals: 34.8% came from the EU (UK, Germany, France, Italy and the Netherlands ranking at the top) and 40.7% from the US. The mean Impact Factor of EU papers was 0.8 in comparison with 1.5 in the US. Despite the limitations of the existing methods, bibliometric findings are useful for the monitoring of research trends. The keywords analysis shows that the leading fields of research were retinal pathologies for diseases and keratoplasty for surgical procedures. It also suggests that keywords are overused, and urges minimization of this as well as standardization among journal editors.
Extensive citation analysis with the Science Citation Index (SCI) has become possible through expanded search capabilities introduced by STN International a few years ago. STN enhanced its retrieval language with some important features, originally developed for statistical analysis of patents. Most important are an expanded select command and several functions to list the search results. The publications to be evaluated may be selected either in the SCI, or in a number of other bibliographic databases offered by the host. With the help of these features, the basic methods to appropriately measure the impact of scientific activities are demonstrated. Furthermore, possible shortcomings as well as the risks when interpreting the results of such studies are discussed.
In this paper we analyse the growth in scientific results of natural sciences in terms of infinite dynamical system theory. We use functional differential equations to model the evolution of science in its sociological aspect. Our model includes the time-to-build of fundamental notions in science (time required to understand them). We show that the delay parameter describing time required to learn and to apply past scientific results to new discoveries plays a crucial role in generating cyclic behaviour via the Hopf bifurcation scenario. Our model extends the de Solla Price model by including death of results as well as by incorporating the time-to-build notion. We also discuss the concepts of knowledge and its accumulation used in economic growth theory.
Although the word "naukometriya" (first translated as sciencemetrics) was coined by V. V, Nalimov (1910-1997) in 1969, this field was not his main concern In the work of this multifaceted and intriguing scientist and scholar, scientometrics was only of central concern for a short period of time. Nevertheless, it is no coincidence that Nalimov is regarded as one of the founding fathers of scientometrics. In this article, we discuss the development of Nalimov's style of scientometric research within the context of his distinctive approach to the sciences, social sciences and humanities in their entirety: his probabilistic philosophy of science and the world.
This article is devoted to. the scientometric research of Professor V. V, Nalimov ( 1910-1997) of Moscow State University, His first scientometric article was published in 1959: mathematical models of world science growth were examined and logical grounds for the applicability of these models were also given, In his further works, V.V. Nalimov continued to stress the importance of quantitative studies of science development. In 1969, the monograph on scientometrics by V. V. Nalimov and his co-author Z. M. Mulchenko was published. This book reflected his earlier publications on scientometrics and the solutions of new tasks. In 1970, Nalimov published articles on the comparison of science and the biosphere, the geographic distribution of scientific information, and changes in the demand of scientific staff. In later articles in philosophy of science, he stressed the necessity of a combination of the scientometric approach with works on the logic of science development. One of the latest works by Nalimov was an analysis of articles published by The Journal of Transpersonal Psychology: Here the scientometric approach was used to study the origin and development of a new scientific branch.
The paper presents a comparative analysis of publications, co-authored by Polish and foreign researchers, selected from seven annual files of Science Citation Index and Social Sciences Citation Index (CD-ROM Editions 1992-1998). Information obtained from SCI and SSCI were elaborated, completed, coded and entered in two-international files" designed for analytical purposes. It was found that the number of internationally co-authored papers was many times higher (18 982 records) in science than in social sciences (342 records). The share of these,international papers" in the "Polish files" increased in the time under review, but for those derived from SCI was also higher (39.1-46.0%) than in case of SSCI (22.4-37.0%). Results of the analysis include countries of foreign partners and affiliation of domestic coauthors, as well as, subject structure of both international files. Observed differences in the scale of international co-operation in science and in social sciences are being the matter under discussion.
Mathematics research in India, as reflected by papers indexed in Mathsci 1988-1998, is quantified and mapped. Statistics, quantum theory and general topology are the three subfields contributing the most to India's output in mathematics research, followed by special functions, economics and operations research, and relativity and gravitational theory. Indian Statistical Institute and Tata Institute of Fundamental Research are the two leading publishers of research papers. Unlike in many other fields, Calcutta publishes the largest number of papers in mathematics, followed by Mumbai, New Delhi, Chermai and Bangalore. West Bengal, Uttar Pradesh, Maharashtra, Tamil Nadu and Delhi are the leading states. Researchers from 257 institutions spread over 134 cities/towns have published 17,308 papers in the 11 years. About 92% of these papers have appeared in 877 journals published from 62 countries. Journals published in the USA, UK and the Netherlands are popular with Indian mathematicians. Of the 36 journals that have published at least a hundred papers, 20 are Indian journals of which only two are indexed in Journal Citation Reports. In all, about 38.5% of papers have been published in Indian journals, as against about 70% in agriculture, 55% in life sciences, 33.5% in medicine and 20% in physics. In the later years, there has been a moderate shift to non-Indian journals. Close to 78% of papers have come from universities and colleges and 13% from the institutions under science related departments. Almost all papers in high impact journals are physics related and most of them have come from institutions under the Department of Atomic Energy. Over 15% of the 9760 papers published during 1993-1998 are internationally coauthored. In all of science, as seen from Science Citation Index, 14% of Indian papers were internationally coauthored in 1991 and 17.6% in 1998, The USA, Canada, and Germany are the important collaborating nations, followed by France, Italy, Japan and the UK.
Relative concentration theory studies the degree of inequality between two vectors (a(1),....,a(N)) and (alpha (1),....,alpha (N)). It extends concentration theory in the sense that, in the latter theory, one of the above vectors is (1/N,....,1/N) (N coordinates). When studying relative concentration one can consider the vectors (a(1),....,a(N)) and (alpha (1),.....,alpha (N)) as interchangeable (equivalent) or not. In the former case this means that the relative concentration of (a(1),....,a(N)) versus (alpha (1),....,alpha (N)) is the same as the relative concentration of (alpha (1),.....,alpha (N)) versus (a(1),....,a(N)). We deal here with a symmetric theory of relative concentration. In the other case one wants to consider (a(1),....,a(N)) as having a different role as and hence the results can be different when interchanging the vectors. This leads to an asymmetric theory of relative concentration. In this paper we elaborate both models, As they extend concentration theory, both models use the Lorenz order and Lorenz curves. For each theory we present good measures of relative concentration and give applications of each model.
A comparative study was carried out to determine the quality of research papers published during 1996 in two leading Russian psychiatric journals: Social and Clinical Psychiatry - SCP (27 papers) and the Journal of Neuropathology and Psychiatry S.S. Korsakov - JNP (33 papers). A newly created "Checklist for the formalised assessment of medical papers" elaborated on the principles of the evidence-based medicine was used for the analysis. A paper was defined as a scientific study if the suggested hypothesis had been verified by the methods that permitted to minimise systematic errors, to take into consideration random errors and if conclusions and arguments answered the suggested goals and were based on the data obtained. 1/3 of all papers in both journals appeared to be purely descriptive ones. Tbe analysis showed that only 2 papers in SCP (7%) and 5 papers in JNP (15%) could be defined as scientific studies. 12% of papers met the requirements of scientific standards to a certain extent. But 77% of papers published in 1996 were real spoilage of scientific research.
This paper is dedicated to the memory of Prof.Nalimov. The paper is to show some possibilities of bibliometric methods applied to Subject Index to "CHEMICAL ABSTRACTS" (CA) and to Permuterm Subject Index to "SCIENCE CITATION INDEX".
Previously, a new method for measuring scientific productivity was demonstrated for authors in mathematical logic and some subareas of 19th-century physics. The purpose of this article is to apply this new method to other fields to support its general applicability. We show that the method yields the same results for modern physicists, biologists, psychologists, inventors, and composers. That is, each individual's production is constant over time, and the time-period fluctuations follow the Poisson distribution. However, the productivity (e.g., papers per year) varies widely across individuals. We show that the distribution of productivity does not follow the normal (i.e., bell curve) distribution, but rather follows the exponential distribution. Thus, most authors produce at the lowest rate and very few authors produce at the higher rates. We also show that the career duration of individuals follows the exponential distribution. Thus, most authors have a very short career and very few have a long career. The principal advantage of the new method is that the detail structure of author productivity can be examined, such as trends, etc. Another advantage is that information science studies have guidance for the length of time interval being examined and estimating when an author's entire body of work has been recorded.
An attempt has been made to give an answer to the question: Why do most bibliometric and scientometric laws reveal characters of Non-Gaussian distributions, i.e., have unduly long "tails"? We tried to apply the approach of the so-called "Universal Law," discovered by G. Stankov (1997, 1998). The basic principle we have used here is that of the reciprocity of energy and space. A new "wave concept" of scientific information has been propounded, in which terms the well-known bibliometric and scientometric distributions find a rather satisfactory explanation. One of the made corollaries is that alpha = 1 is the most reasonable value for the family of Zipf laws, applied to information or social phenomena.
This article discusses the history and emergence of non-library commercial and noncommercial information services on the World Wide Web. These services are referred to as "expert services," while the term "digital reference" is reserved for library-related on-line information services. Following suggestions in library and information literature regarding quality standards for digital reference, researchers make clear the importance of developing a practicable methodology for critical examination of expert services, and consideration of their relevance to library and other professional information services. A methodology for research in this area and initial data are described. Two hundred forty questions were asked of 20 expert service sites. Findings include performance measures such as response rate, response time, and verifiable answers. Sites responded to 70% of all questions, and gave verifiable answers to 69% of factual questions. Performance was generally highest for factual type questions. Because expert services are likely to continue to fill a niche for factual questions in the digital reference environment, implications for further research and the development of digital reference services may be appropriately turned to source questions. This is contrary to current practice and the emergence of digital reference services reported in related literature thus far.
Although technological determinism is an inadequate description of change, it remains common, if implicit, in much information science literature. Recent developments in science and technology studies offer a social constructivist alternative, in which technology is seen, not as autonomous, but as the result of interests. However, the stability of these interests can be argued to privilege social factors in the same way as technological determinism privileges technological factors. A second alternative is to shift to a relativist stance and analyze discourse as interaction, rather than as a neutral carrier of information, or communication. The focus of the discourse analyses of interview interactions presented in this article is on two aspects of discursive structure, the indexical category of "research," and interest management, which refers to the ways that participants manage their own and others' stakes in particular accounts. The article concludes by noting how formal scholarly communication acts as a "category entitlement" in interviews, and how technological determinism works as a dilemma for this entitlement that participants (including researchers) negotiate at the very local level of their interactions and accounts.
It has become increasingly difficult to locate relevant information on the Web, even with the help of Web search engines. Two approaches to addressing the low precision and poor presentation of search results of current search tools are studied: meta-search and document categorization. Meta-search engines improve precision by selecting and integrating search results from generic or domain-specific Web search engines or other resources. Document categorization promises better organization and presentation of retrieved results. This article introduces MetaSpider, a meta-search engine that has real-time indexing and categorizing functions. We report in this paper the major components of MetaSpider and discuss related technical approaches. Initial results of a user evaluation study comparing MetaSpider, NorthernLight, and MetaCrawler in terms of clustering performance and of time and effort expended show that MetaSpider performed best in precision rate, but disclose no statistically significant differences in recall rate and time requirements. Our experimental study also reveals that MetaSpider exhibited a higher level of automation than the other two systems and facilitated efficient searching by providing the user with an organized, comprehensive view of the retrieved documents.
Identifying the users and impact of research is important for research performers, managers, evaluators, and sponsors. It is important to know whether the audience reached is the audience desired. It is useful to understand the technical characteristics of the other research/development/applications impacted by the originating research, and to understand other characteristics (names, organizations, countries) of the users impacted by the research. Because of the many indirect pathways through which fundamental research can impact applications, identifying the user audience and the research impacts can be very complex and time consuming. The purpose of this article is to describe a novel approach for identifying the pathways through which research can impact other research, technology development, and applications, and to identify the technical and infrastructure characteristics of the user population. A novel literature-based approach was developed to identify the user community and its characteristics. The research performed is characterized by one or more articles accessed by the Science Citation Index (SCI) database, beccause the SCI's citation-based structure enables the capability to perform citation studies easily.
Much has been written about the potential and pitfalls of macroscopic Web-based link analysis, yet there have been no studies that have provided clear statistical evidence that any of the proposed calculations can produce results over large areas of the Web that correlate with phenomena external to the Internet. This article attempts to provide such evidence through an evaluation of Ingwersen's (1998) proposed external Web Impact Factor (WIF) for the original use of the Web: the interlinking of academic research. In particular, it studies the case of the relationship between academic hyperlinks and research activity for universities in Britain, a country chosen for its variety of institutions and the existence of an official government rating exercise for research. After reviewing the numerous reasons why link counts may be unreliable, it demonstrates that four different WIFs do, in fact, correlate with the conventional academic research measures. The WIF delivering the greatest correlation with research rankings was the ratio of Web pages with links pointing at research-based pages to faculty numbers. The scarcity of links to electronic academic papers in the data set suggests that, in contrast to citation analysis, this WIF is measuring the reputations of universities and their scholars, rather than the quality of their publications.
Over a 35-year period, Irving H. Sher played a critical role in the development and implementation of the Science Citation Index (R) and other ISI (R) products. Trained as a biochemist, statistician, and linguist, Sher brought a unique combination of talents to ISI as Director of Quality Control and Director of Research and Development. His talents as a teacher and mentor evoked loyalty. He was a particularly inventive but self-taught programmer. In addition to the SCI,(R) Social Sciences Citation Index,(R) and Arts and Humanities Citation Index,(R) Sher was involved with the development of the first commercial SDI system, the Automatic Subject Citation Alert, now called Research Alert,(R) and Request-A-Print Cards. Together we developed the journal impact factor and the Journal Citation Reports.(R) Sher was also the inventor of the SYSTABAR System of coding references and Sherhand. He was involved in key reports on citation-based historiography, forecasting Nobel prizes, and served as a referee for JASIS over a 20-year period.
This experiment explores the effectiveness of retrieving the listing of a known-item book from the 3.6 million entry onine catalog at the library of the University of Michigan using various combinations of author's name plus first and last title words. The principal finding was that 98.9% of the time a I to 20 line miniature catalog (minicat) was displayed that contained either the entry sought or a not-in-database (NID) reply when the search comprised all three words.
The present study adopts an interpretive and situated approach to observe and assess learners' problem solving using hypermedia. The evaluation is conducted in context by considering the use of Perseus in particular learning situations. The study reflects on the design and use of hypermedia learning systems from both the learners' and researcher's perspective. The characteristics of Perseus are discussed together with some design recommendations for future consideration. Drawing from the study, conclusions are set out that highlight some implications for designers. It should be noted that the list of suggested features is not meant to be either definitive or exhaustive. The list is indicative of which design considerations should be addressed to improve Perseus hypermedia learning systems in the future. The researcher hopes that the findings of the study can help designers develop and refine better intellectual tools with which to augment learners' performance.
The equivalence of semantic networks with spreading activation and vector spaces with dot product is investigated under ranked retrieval. Semantic networks are viewed as networks of concepts organized in terms of abstraction and packaging relations. It is shown that the two models can be effectively constructed from each other. A formal method is suggested to analyze the models in terms of their relative performance in the same universe of objects.
The vectors used in IR, whether to represent the documents or the terms, are high dimensional, and their dimensions increase as one approaches real problems. The algorithms used to manipulate them, however consume enormously increasing amounts of computational capacity as the said dimension grows. We used the Kohonen algorithm and a fuzzification module to perform a fuzzy clustering of the terms. The degrees of membership obtained were used to represent the terms and, by extension, the documents, yielding a smaller number of components but still endowed with meaning. To test the results, we use a topological classification of sets of transformed and untransformed vectors to check that the same structure underlies both.
The distribution of bibliographic records in on-line bibliographic databases is examined using 14 different search topics. These topics were searched using the DIALOG database host, and using as many suitable databases as possible. The presence of duplicate records in the searches was taken into consideration in the analysis, and the problem with lexical ambiguity in at least one search topic is discussed. The study answers questions such as how many databases are needed in a multifile search for particular topics, and what coverage will be achieved using a certain number of databases. The distribution of the percentages of records retrieved over a number of databases for 13 of the 14 search topics roughly fell into three groups: (1) high concentration of records in one database with about 80% coverage in five to eight databases; (2) moderate concentration in one database with about 80% coverage in seven to 10 databases; and (3) low concentration in one database with about 80% coverage in 16 to 19 databases. The study does conform with earlier results, but shows that the number of databases needed for searches with varying complexities of search strategies, is much more topic dependent than previous studies would indicate.
The effects of link annotations on user search performance in hypertext environments having deep (layered) and shallow link structures were investigated in this study. Four environments were tested-layered-annotated, layered-unannotated, shallow-annotated, and shallow-unannotated. A single document was divided into 48 sections, and layered and unlayered versions were created. Additional versions were created by adding annotations to the links in the layered and unlayered versions. Subjects were given three queries of varying difficulty and then asked to find the answers to the queries that were contained within the hypertext environment to which they were randomly assigned. Correspondence between the wording links and queries was used to define difficulty level. The results of the study confirmed previous research that shallow link structures are better than deep (layered) link structures. Annotations had virtually no effect on the search performance of the subjects. The subjects performed similarly in the annotated and unannotated environments, regardless of whether the link structures were shallow or deep. An analysis of question difficulty suggests that the wording in links has primacy over the wording in annotations in influencing user search behavior.
Fields of technoscience like biotechnology develop in a network mode: disciplinary insights from different backgrounds are recombined as competing innovation systems are continuously reshaped. The ongoing process of integration at the European level generates an additional network of transnational collaborations. Using the title words of scientific publications in five core journals of biotechnology, multivariate analysis is used to distinguish between the intellectual organization of the publications in terms of title words and the institutional network in terms of addresses of documents. The interaction among the representation of intellectual space in terms of words and co-words, and the potentially European network system is compared with the document sets with American and Japanese addresses. The European system can also be decomposed in terms of the contributions of member states. Whereas a European vocabulary can be made visible at the global level, this communality disappears by this decomposition. The network effect at the European level can be considered as institutional more than cognitive.
In an earlier article about the methods of recognition of machine and hand-written cursive letters, we presented a model showing the possibility of processing, classifying, and hence recognizing such scripts as images. The practical results we obtained encouraged us to extend the theory to an algorithm for word recognition. In this article, we introduce our ideas, describe our achievements, and present our results of testing words for recognition without segmentation. This would lead to the possibility of applying the methods used in this work, together with other previously developed algorithms to process whole sentences and, hence, written and spoken texts with the goal of automatic recognition.
Personal observations and reflections on scientific collaboration and its study, past, present,, and future, containing new material on motives for collaboration, and on some of its salient features. Continuing methodological problems are singled out, together with suggestions for future research.
In this paper, our objective is to delineate some of the problems that could arise in using research output for performance evaluation. Research performance in terms of the Impact Factor (EF) of papers, say of scientific institutions in a country, could depend critically on coauthored papers in a situation where internationally co-authored papers are known to have significantly different (higher) impact factors as compared to purely indigenous papers. Thus, international collaboration not only serves to increase the overall output of research papers of an institution, the contribution of such papers to the average Impact Factor of the institutional output! could also be disproportionately high. To quantify this effect, an index of gain in impact through foreign, collaboration (GIFCOL) is defined such that it ensures comparability between institutions with differing proportions of collaborative output. A case study of major Indian institutions is undertaken, where Cluster Analysis is used to distinguish between intrinsically high performance institutions and those that gain disproportionately in terms of perceived quality of their output as a result of international collaboration.
This study covers a ten-year period, 1990-1999, of the publishing careers of nine authors who appear in the top-20 most productive authors in the field of ophthalmology In this paper we discuss findings from a study of the publishing careers of elite researchers in the field of ophthalmology. The paper highlights the extent and nature of the journals in which these elite researchers publish their work. Data derived from the study include indications of multidisciplinary involvement or 'work-space' interests, publication characteristics, and: collaborative engagement with others. We provide insights into the workings of author productivity, characteristics of papers such as numbers per paper of pages, references, and: authors, and initial findings about their collaboration patterns. These findings, showing! (ir)regularities or patterns in publishing careers, may be of interest to researchers and practitioners because they provide a view that might not otherwise be apparent to the field or to authors themselves.
This article discusses the methodological problems of integrating scientometric methods into a; qualitative study. Integrative attempts of this kind are poorly supported by the methodologies of both the sociology of science and scientometrics. Therefore it was necessary to develop a project-specific methodological approach that linked scientometric methods to theoretical considerations. The methodological approach is presented and used to discuss general methodological problems concerning the relation between (qualitative) theory and scientometric methods. This discussion: enables some conclusions to be drawn as to the relations that exist between scientometrics and them sociology of science.
Coming together to get publishable research results is not always a simple task. There can be geographical, cultural, disciplinary and political barriers, which have to be overcome. The Berlin Wall was such a barrier. After its fall in November 1989 Berlin scientists changed their collaboration behaviour. Research groups in East Berlin went West to look for partners and vice versa. The numbers of papers in life science journals with co-authors working in Berlin and coauthors in other places are discussed against the background of the international trend to more and more collaboration in science.
The collaboration model of Kretschmer was applied to the co-authorship network of Indian medicine with the aim of being able to observe changes in structure over a period of 30 years. The idea of Liang, on her "Distribution of Major Scientific and Cultural Achievements in Terms of Age" was put in relation to the collaboration model by Kretschmer.
To examine whether primary-citation indexing can be taken as an unbiased representation of all-author indexing, we compared the cited first-author counts (straight counts) with the:cited all-author counts (complete counts) in two psychological journals over two publication years. Although rather high correlations were found between straight counts and complete counts, correlations differ with journals of the same discipline, with different publication years of them same journal, and according to seniority of cited authors. No effect of alphabetical name ordering was found. Results are discussed against the background of the possible use of weighting! procedures for all-author indexing.
This paper is a scientometric study of the age structure of scientific collaboration in Chinese computer science, Analysis reveals some special age structures in scientific collaboration in Chinese computer science. Most collaborations are composed of scientists younger than thirty-six (Younger) or older than fifty (Elder). For two-dimensional collaboration formed by first and second authors, Younger-Elder and Younger-Younger are the Predominant age structures. For three-dimensional collaboration formed by first, second and third authors, Younger-Younger-Elder and Younger-Younger-Younger are the most important age structures. Collaboration between two authors older than 38 amounts to only 6.4 percent of all two-person collaborations. Collaboration between two middle-aged scientists is seldom seen. Why do such types of age structure in Chinese computer science exist? We suggest a tentative, explanation based on analyses of the age composition of all authors, the age distributions of the authors in different ranks, and the name-ordering of authors in articles written by professors and their students.
Bibliographic information systems have to address the needs of users by providing "value-added-components." For instance, users would benefit from knowing the social and cognitive structures of research fields. Research suggests that a relationship exists between actors' position in scientific networks and the innovativeness of themes they examine. The present study confirms: and expands these results through a technique that relates the cognitive and social structures of a research field (socio-cognitive analysis). The results from two social science fields suggest that well-integrated actors are engaged in the consolidation of the mainstream, whereas new ideas are most likely to be introduced and pursued by social climbers, i.e., actors who are starting to form a social network of collaboration.
Time-series of collaboration trends indicated through co,authorships are examined from 1800 to presence in mathematics, logic, and physics. In physics, the share of co-authored papers expands in the second half of the 19th century, in mathematics in the first decades of the 20th century, in logic in the second half of the 20th century. Subdisciplines of mathematics, of physics, and areas of logic show large differences in their respective propensities to collaborate. None of the existing explanatory approaches meets this :heterogeneity; the most salient: feature is a propensitiy to collaborate in fields where theoretical and applied research is combined.
The publication output of India and Bulgaria ion epidemiology of neoplasms as reflected in Medline on CD-ROM for 1966-1999 was scientometrically analyzed. Indians have published, 347 papers in 24 domestic journals but 444 papers in 169 journals from 21 countries. Bulgarians have published 88 papers in 6 Bulgarian journals but 63 papers in 39 journals from 13 countries. Some: 17 journals from 8 countries contained papers by Indian and Bulgarian authors both. Oncology dominated with 46 different journals. Indians have published papers in foreign journals of 30 thematic profiles but Bulgarians - of 12 ones. The collaboration of Indians and Bulgarians resulted from joint bilateral projects and/or postgraduate studies abroad.
In this paper specifics of the research subject within the natural sciences and humanities are supposed to be well-known. These specifics set limits: to communication between, scholars and natural scientists. In particular this leads to critical situations in cases if both participantes have to collaborate within a common interdisciplinary research work. The modem conception of complex system as subject of investigation for both natural sciences and humanities have in this context an integrating function. The term 'complex system' is now recognized as a transdisciplinary matters of research. Despite of the well-known differences between two fields of modem science one can find on this condition a number of mechanisms which are generating also common properties of them.
The growing importance of collaboration in research and the still underdeveloped state-of-the-art of research on collaboration have encouraged scientists from 16 countries to establish a global interdisciplinary research network under the title "Collaboration in Science and in Technology" (COLLNET) with Berlin as its virtual centre which has been set up on January Ist, 2000. The network is to comprise the prominent scientists, who work at present mostly in the field of quantitative science studies. The intention is to work together in co-operation both on theoretical and applied aspects.
There has been an increased growth in the use of hypermedia to deliver learning and teaching material. However, much remains to be learned about how different learners perceive such systems. Therefore, it is essential to build robust learning models to illustrate how hypermedia features are experienced by different learners. Research into individual differences suggests cognitive styles have a significant effect on student learning in hypermedia systems. In particular, Witkin's Field Dependence has been extensively examined in previous studies. This article reviews the published findings from empirical studies of hypermedia learning. Specifically, the review classifies the research into five themes: nonlinear learning, learner control, navigation in hyperspace, matching and mismatching, and learning effectiveness. A learning model, developed from an analysis of findings of the previous studies, is presented. Finally, implications for the design of hypermedia learning systems are discussed.
This study investigated Simon's behavioral decisionmaking theories of bounded rationality and satisficing in relation to young people's decision making in the World Wide Web, and considered the role of personal preferences in Web-based decisions. It employed a qualitative research methodology involving group interviews with 22 adolescent females. Data analysis took the form of iterative pattern coding using QSR NUD*IST Vivo qualitative data analysis software. Data analysis revealed that the study participants did operate within the limits of bounded rationality. These limits took the form of time constraints, information overload, and physical constraints. Data analysis also uncovered two major satisficing behaviors-reduction and termination. Personal preference was found to play a major role in Web site evaluation in the areas of graphic/multimedia and subject content preferences. This study has related implications for Web site designers and for adult intermediaries who work with young people and the Web.
Although many different visual information retrieval systems have been proposed, few have been tested, and where testing has been performed, results were often inconclusive. Further, there is very little evidence of benchmarking systems against a common standard. An approach for testing novel interfaces is proposed that uses bottom-up, stepwise testing to allow evaluation of a visualization, itself, rather than restricting evaluation to the system instantiating it. This approach not only makes it easier to control variables, but the tests are also easier to perform. The methodology will be presented through a case study, where a new visualization technique is compared to more traditional ways of presenting data.
Information Science development in Romania provides a comprehensive, comparative study of the cultural influences that prevail in the acquisition, processing, and distribution of data, information, and knowledge in that nation. The chronology of most important events of history of Information Science in Romania and the historical context in which the present development of Information Society in Romania can be understood are presented.
In the author's three previous articles dealing with the ICR phenomenon (JASIS, 49, 1998, 477-481; 50, 1999, 1284-1294; JASIST, 52, 2001, 201-211) the nature, life course, and importance of this phenomenon of scientific literature was demonstrated. It was shown that the quantity of nonindexed indirect-collective references in The Physical Review now alone exceeds many times over the quantity of formal references listed in the Science Citation Index as "citations." It was shown that the ICR phenomenon is present in all the 44 elite physics journals of a representative sample of this literature. The bibliometrically very heterogeneous sample is very homogeneous regarding the presence and frequency of the ICR phenomenon. However, no real connection could be found between the simple degree of documentedness and the presence and frequency of the ICR phenomenon on the journal level of the sample. The present article reports the findings of the latest ICR investigation carried out on the level of communications of the representative sample. Correlation calculations were carried out in the stock of all 458 communications containing the ICR phenomenon as a statistical population, and within this population also in the groups of communications of the "normal" and the "letter" journals, and the "short communications." The correlation analysis did not find notable statistical correlation between the simple and specific degree of documentedness of a communication and the number of works cited in it by ICR act(s) either in the total population or in the selected groups. There is no correlation either statistical or real (i.e., cause-and-effect) between the documentedness of scientific communications made by their authors and the presence and intensity of the ICR method used by their authors. However, in reality there exists a very strong connection between these two statistically independent variables: both depend on the referencing author, on his/her subjectivity and barely limited subjective free will. This subjective free will shapes the stock of the formal-direct references of scientific communications, thereby placing the achievements cited in this way and their creators into the (indexed) showcase of present Big Science. The same free will decides on the use or nonuse of the ICR method, and in the case of use also on the intensity with which the method is used.
A study of the "real world" Web information searching behavior of 206 college students over a 10-month period showed that, contrary to expectations, the users adopted a more passive or browsing approach to Web information searching and became more eclectic in their selection of Web hosts as they gained experience. The study used a longitudinal transaction log analysis of the URLs accessed during 5,431 user days of Web information searching to detect changes in information searching behavior associated with increased experience of using the Web. The findings have implications for the design of future Web information retrieval tools.
Four focus groups were held with young Web users (10 to 13 years of age) to explore design criteria for Web portals. The focus group participants commented upon four existing portals designed with young users in mind: Ask Jeeves for Kids, KidsClick, Lycos Zone, and Yahoo-ligans! This article reports their first impressions on using these portals, their likes and dislikes, and their suggestions for improvements. Design criteria for children's Web portals are elaborated based upon these comments under four headings: portal goals, visual design, information architecture, and personalization. An ideal portal should cater for both educational and entertainment needs, use attractive screen designs based especially on effective use of color, graphics, and animation, provide both keyword search facilities and browsable subject categories, and allow individual user personalization in areas such as color and graphics.
Conventional wisdom holds that queries to information retrieval systems will yield more relevant results if they contain multiple topic-related terms and use Boolean and phrase operators to enhance interpretation. Although studies have shown that the users of Web-based search engines typically enter short, term-based queries and rarely use search operators, little information exists concerning the effects of term and operator usage on the relevancy of search results. In this study, search engine users formulated queries on eight search topics. Each query was submitted to the user-specified search engine, and relevancy ratings for the retrieved pages were assigned. Expert-formulated queries were also submitted and provided a basis for comparing relevancy ratings across search engines. Data analysis based on our research model of the term and operator factors affecting relevancy was then conducted. The results show that the difference in the number of terms between expert and nonexpert searches, the percentage of matching terms between those searches, and the erroneous use of nonsupported operators in nonexpert searches explain most of the variation in the relevancy of search results. These findings highlight the need for designing search engine interfaces that provide greater support in the areas of term selection and operator usage.
Users' individual differences and tasks are important factors that influence the use of information systems. Two independent investigations were conducted to study the impact of differences in users' cognition and search tasks on Web search activities and outcomes. Strong task effects were found on search activities and outcomes, whereas interactions between cognitive and task variables were found on search activities only. These results imply that the flexibility of the Web and Web search engines allows different users to complete different search tasks successfully. However, the search techniques used and the efficiency of the searches appear to depend on how well the individual searcher fits with the specific task.
This article compares search effectiveness when using query-based Internet search (via the Google search engine), directory-based search (via Yahoo), and phrase-based query reformulation-assisted search (via the Hyperindex browser) by means of a controlled, user-based experimental study. The focus was to evaluate aspects of the search process. Cognitive load was measured using a secondary digit-monitoring task to quantify the effort of the user in various search states; independent relevance judgements were employed to gauge the quality of the documents accessed during the search process and time was monitored as a function of search state. Results indicated directory-based search does not offer increased relevance over the query-based search (with or without query formulation assistance), and also takes longer. Query reformulation does significantly improve the relevance of the documents through which the user must trawl, particularly when the formulation of query terms is more difficult. However, the improvement in document relevance comes at the cost of increased search time, although this difference is quite small when the search is self-terminated. In addition, the advantage of the query reformulation seems to occur as a consequence of providing more discriminating terms rather than by increasing the length of queries.
This article reviews selected literature related to the credibility of information, including (1) the general markers of credibility, and how different source, message and receiver characteristics affect people's perceptions of information; (2) the impact of information medium on the assessment of credibility; and (3) the assessment of credibility in the context of information presented on the Internet. The objective of the literature review is to synthesize the current state of knowledge in this area, develop new ways to think about how people interact with information presented via the Internet, and suggest next steps for research and practical applications. The review examines empirical evidence, key reviews, and descriptive material related to credibility in general, and in terms of on-line media. A general discussion of credibility and persuasion and a description of recent work on the credibility and persuasiveness of computer-based applications Is presented. Finally, the article synthesizes what we have learned from various fields, and proposes a model as a framework for much-needed future research in this area.
In the Web, making judgments of information quality and authority is a difficult task for most users because overall, there is no quality control mechanism. This study examines the problem of the judgment of information quality and cognitive authority by observing people's searching behavior in the Web. Its purpose is to understand the various factors that influence people's judgment of quality and authority in the Web, and the effects of those judgments on selection behaviors. Fifteen scholars from diverse disciplines participated, and data were collected combining verbal protocols during the searches, search logs, and postsearch interviews. It was found that the subjects made two distinct kinds of judgment: predictive judgment, and evaluative judgment. The factors influencing each judgment of quality and authority were identified in terms of characteristics of information objects, characteristics of sources, knowledge, situation, ranking in search output, and general assumption. Implications for Web design that will effectively support people's judgments of quality and authority are also discussed.
Changes in the topography of the Web can be expressed in at least four ways: (1) more sites on more servers in more places, (2) more pages and objects added to existing sites and pages, (3) changes in traffic, and (4) modifications to existing text, graphic, and other Web objects. This article does not address the first three factors (more sites, more pages, more traffic) in the growth of the Web. It focuses instead on changes to an existing set of Web documents. The article documents changes to an aging set of Web pages, first identified and "collected" in December 1996 and followed weekly thereafter. Results are reported through February 2001. The article addresses two related phenomena: (1) the life cycle of Web objects, and (2) changes to Web objects. These data reaffirm that the half-life of a Web page is approximately 2years. There is variation among Web pages by top-level domain and by page type (navigation, content). Web page content appears to stabilize over time; aging pages change less often than once they did.
We introduce a technique for creating novel, enhanced thumbnails of Web pages. These thumbnails combine the advantages of plain thumbnails and text summaries to provide consistent performance on a variety of tasks. We conducted a study in which participants used three different types of summaries (enhanced thumbnails, plain thumbnails, and text summaries) to search Web pages to find several different types of information. Participants took an average of 67, 86, and 95 seconds to find the answer with enhanced thumbnails, plain thumbnails, and text summaries, respectively. As expected, there was a strong effect of question category. For some questions, text summaries outperformed plain thumbnails, while for other questions, plain thumbnails outperformed text summaries. Enhanced thumbnails (which combine the features of text summaries and plain thumbnails) had more consistent performance than either text summaries or plain thumbnails, having for all categories the best performance or performance that was statistically indistinguishable from the best.
As the Web has become a major channel of information dissemination, many newspapers expand their services by providing electronic versions of news information on the Web. However, most investors find it difficult to search for the financial information of interest from the huge Web information space-information overloading problem. In this article, we present a personal agent that utilizes user profiles and user relevance feedback to search for the Chinese Web financial news articles on behalf of users. A Chinese indexing component is developed to index the continuously fetched Chinese financial news articles. User profiles capture the basic knowledge of user preferences based on the sources of news articles, the regions of the news reported, categories of industries related, the listed companies, and user-specified keywords. User feedback captures the semantics of the user rated news articles. The search engine ranks the top 20 news articles that users are most interested in and report to the user daily or on demand. Experiments are conducted to measure the performance of the agents based on the inputs from user profiles and user feedback. It shows that simply using the user profiles does not increase the precision of the retrieval. However, user relevance feedback helps to increase the performance of the retrieval as the user interact with the system until it reaches the optimal performance. Combining both user profiles and user relevance feedback produces the best performance.
This work presents a new stemming algorithm. This algorithm stores the stemming information in tree structures. This storage allows us to enhance the performance of the algorithm due to the reduction of the search space and the overall complexity. The final result of that stemming algorithm is a normalized concept, understanding this process as the automatic extraction of the generic form (or a lexeme) for a selected term.
In this paper, we develop a new model for a process that generates Lotka's Law. We show that four relatively mild assumptions create a process that fits five different informetric distributions: rate of production, career duration, randomness, and Poisson distribution overtime, as well as Lotka's Law. By simulation, we obtain good fits to three empirical samples that exhibit the extreme range of the observed parameters. The overall error is 7% or less. An advantage of this model is that the parameters can be linked to observable human factors. That is, the model is not merely descriptive, but also provides insight into the causes of differences between samples. Furthermore, the differences can be tested with powerful statistical tools.
Relevance judgment has traditionally been considered a personal and subjective matter. A user's search and the search result are treated as an isolated event. To consider the collaborative nature of information retrieval (IR) in a group/organization or even societal context, this article proposes a method that measures relevance based on group/peer consensus. The method can be used in IR experiments. In this method, the relevance of a document is decided by group consensus, or more specifically, by the number of users (or experiment participants) who retrieve it for the same search question. The more users who retrieve it, the more relevant the document will be considered. A user's search performance can be measured by a relevance score based on this notion. The article reports the results of an experiment using this method to compare the search performance of different types of users. Related issues with the method and future directions are also discussed.
A recently proposed stochastic model to describe the citation process in the presence of obsolescence is used to answer the question: If a paper has not been cited by time t after its publication, what is the probability that it will ever be cited?
In the vector space model for information retrieval, term vectors are pair-wise orthogonal, that is, terms are assumed to be independent. It is well known that this assumption is too restrictive. In this article, we present our work on an indexing and retrieval method that, based on the vector space model, incorporates term dependencies and thus obtains semantically richer representations of documents. First, we generate term context vectors based on the co-occurrence of terms in the same documents. These vectors are used to calculate context vectors for documents. We present different techniques for estimating the dependencies among terms. We also define term weights that can be employed in the model. Experimental results on four text collections (MED, CRANFIELD, CISI, and CACM) show that the incorporation of term dependencies in the retrieval process performs statistically significantly better than the classical vector space model with IDF weights. We also show that the degree of semantic matching versus direct word matching that performs best varies on the four collections. We conclude that the model performs well for certain types of queries and, generally, for information tasks with high recall requirements. Therefore, we propose the use of the context vector model in combination with other, direct word-matching methods.
A statistical overview is given of the first 50 volumes of the journal Scientometrics. Authorship and co-authorship characteristics, as well as citation and reference patterns of the journal are analysed. Geographic and thematic maps of its papers are presented. A brief outlook to the future prospects and challenges is attempted.
The biological arms race could have been considered a closed chapter in the Cold War history. However, the growth of different terrorist groups and organisations has increased the threat of biological weapon (BW) use. The goal of this pilot scientometric project was to trace changes in biodefense research and the activities of its main players, Russia and the US. Data were collected from the SCI via the Dialog information system for 1991-2000, the period covering the post-soviet era. In-depth content analysis was performed on selected papers from the 2870 publications identified as BW-related. During the period examined, the publication flow increased by 250 percent. The main contributors to this literature weir shown to be the US, Russia, UK France and Germany. The results presented in this paper are of interest to security analysis (follwing the attacks in the US of 11th September 200 1), to public health care policy researchers and to politicians,
In the 1970s Mexico started to consolidate its S&T system by training human resources and,actively preventing brain drain, mainly by motivating researchers through economic incentives. Considering Bradford's Law, an analysis of significant Mexican research in the health sciences, i.e., papers published in journals with a high-impact factor which grant a degree of credibility and importance was carried out, Significant papers produced in Mexico show a measure of the country's productivity, and these papers' citations measure the country's international impact.
Background: Citation analysis for evaluative purposes typically requires normalization against some control group of similar papers. Selection of this control group is an open question. Objectives: Gain a better understanding of control group requirements for credible normalization. Approach: Performed citation analysis on prior publications of two proposing research units, to help estimate team research quality. Compared citations of each unit's publications to citations received by thematically and temporally similar papers. Results: Identification of thematically similar papers was very complex and labor intensive, even with relatively few control papers selected. Conclusions: A credible citation analysis for determining performer or team quality should have the following components: - Multiple technical experts to average out individual bias and subjectivity; - A process for comparing performer or team output papers with a normalization base of similar papers; - A process for retrieving a substantial fraction of candidate normalization base papers; - Manual evaluation of many candidate normalization base papers to obtain high thematic similarity and statistical representation.
The study analyses the distribution of productivity of authors in theoretical population genetics (TPG) as reflected in their publication output from 1881 to 1980 from two different approaches, The internal dynamics of TPG specialty affecting the distribution of the productivity of authors is studied using time cross-sectional type of approach. Here the productivity distribution of authors in 10 time-year blocks and in three phases of the development (1921-50, 1951-65 and 1966-80) of TPG is studied using cohort type of approach. The extent of cumulative advantage acquired by the prolific group of authors over time in TPG is also studied. The paper also analyzes the regularity in the distribution of productivity of various cohorts, having same length of activity, but different periods of participation.
The web site is an important communication medium for universities in many countries. There are numerous reasons to expect that their characteristics will vary along national lines, the most immediate being differences in technological level and the organisation of higher education. In a world where the web is seen in many places as an important source of information it has the potential to overcome national boundaries, but are there still technological barriers? This paper reports on the results of a survey of the sizes of 670 web sites of higher education institutions in countries associated with the European Union, as estimated by AltaVista. It finds that there are still enormous national differences of up to three orders of magnitude. A related issue addressed is the extent to which AltaVista's coverage of university web sites is reliable and consistent across Europe. Large but uneven differences were identified between the main engine and national variations. Despite such methodological problems and cultural reasons for national variations in web site development, a clear pattern emerges, with the richer countries in Europe having much larger web sites. This is a problem for those wishing to use the Internet to increase international collaboration.
Ahmed Hassan Zewail, the Nobel laureate (1999) in chemistry have collaborated with 103 colleagues and has published 246 papers during 1976 to 1994 in; femtochemistry (62), reaction rates and IVR (56), general reviews (49), coherence and optical dephasing phenomena (27), solids: magnetic resonance and optical studies (13), liquids and biological systems (9), local modes in large molecules (9), molecular structure from rotational coherence (8), solar energy concentrators (7), and other studies (6). This authorship pattern included: three authored papers (87) followed by two authored (78), four authored (38), one authored (30), five authored (8), and six authored (5). Highest collaborations were with P. M. Felker (39), M. Damns (19), and L. R. Khundkar (16). The core journals publishing his papers were: J. Chem. Phys. (77), Chem. Phys. Lett. (53), J. Phys. Chem. (33), and Nature (6) out of the 33 journal channels and 32 chapters in books.
This paper is proposing a new index for international scientific co-authorship, which is based on a simple model of domestic and international co-authorships. The index draws a modestly different Picture of international collaboration, concerning a country's size effect on the likelihood of internationally co-authored papers.
Can change in citation patterns among journals be used as an indicator of structural change in the organization of the sciences? Aggregated journal-journal citations for 1999 are compared with similar data in the Journal Citation Reports 1998 of the Science Citation Index. In addition to indicating local change, probabilistic entropy measures enable us to analyze changes in distributions at different levels of aggregation. The results of various statistics are discussed and compared by elaborating the journal-journal mappings. The relevance of this indicator for science and technology policies is further specified.
The main objectives of this study are: (a) to find the applicability of selected growth models to the growth of publications in six sub-disciplines of social sciences, namely anthropology, economics, history, political science, psychology, and sociology in the world; and (b) to verify the criteria for selecting the most appropriate growth model suggested by Egghe and Rao (1992).
The objective of this paper is to suggest a methodology for studying the quantitative profile of a research university, with a view to get idea about the performance and impact of research produced in each department, and the comparison of the impact of research in various departments.
This article presents a socio-cognitive perspective in relation to information science (IS) and information retrieval (IR). The differences between traditional cognitive views and the socio-cognitive or domain-analytic view are outlined. It is claimed that, given elementary skills in computer-based retrieval, people are basically interacting with representations of subject literatures in IR. The kind of knowledge needed to interact with representations of subject literatures is discussed. It is shown how different approaches or "paradigms" in the represented literature imply different information needs and relevance criteria (which users typically cannot express very well, which is why IS cannot primarily rely on user studies). These principles are exemplified by comparing behaviorism, cognitivism, psychoanalysis, and neuroscience as approaches in psychology. The relevance criteria implicit in each position are outlined, and empirical data are provided to prove the theoretical claims. It is further shown that the most general level of relevance criteria is implied by epistemological theories. The article concludes that the fundamental problems of IS and IR are based in epistemology, which therefore becomes the most important allied field for IS.
This article studies concentration aspects of bibliographies. More, in particular, we study the impact of incompleteness of such a bibliography on its concentration values (i.e., its degree of inequality of production of its sources). Incompleteness is modeled by sampling in the complete bibliography. The model is general enough to comprise truncation of a bibliography as well as a systematic sample on sources or items. In all cases we prove that the sampled bibliography (or incomplete one) has a higher concentration value than the complete one. These models, hence, shed some light on the measurement of production inequality in incomplete bibliographies.
This two-part study investigates the effect of a 30-minute presentation of Carol Kuhlthau's Information Search Process (ISP) model on students' perceptions of research and research paper anxiety. An experiment was designed to collect both quantitative and qualitative data during a semester. An upper division undergraduate course, Technical and Professional Writing, with four sections participated in the experiment in fall 1999. A survey instrument, the Research Process Survey (RPS), was developed to collect data about students' feelings and thoughts at the onset of their course research project (pretest) and at the completion of the project (posttest). A standard anxiety test (STAI Y-1) was adopted to measure anxiety levels during pretest and posttest sessions and at two additional points between. Two of the four sections heard a guest presentation of the ISP model as treatment after the pretest; the other two sections heard a different guest speak about career experiences as a technical writer (a placebo talk). The results of this experiment are reported in two articles according to the nature of the collected data. This article reports on results of the quantitative analysis. Four hypotheses were proposed to examine the effects on awareness of cognitive aspects, awareness of affective aspects, level of anxiety, and satisfaction with research. One hypothesis was supported. A significant change in anxiety levels was measured (p = 0.5). Although statistical tests did not reject three null hypotheses, positive trends in change as a result of the ISP model were identified. A second article reports on results of qualitative analysis of the texts that participants wrote about a memorable past research experience and about the current research experience.
This is Part II of an experimental study investigating students' perceptions of research and research paper anxiety. The study integrates quantitative and qualitative designs to collect complimentary data. The participants were students in four sections of an upper division undergraduate course on technical and professional writing during the fall of 1999. A survey instrument used the Critical Incident Technique to solicit writings in students' own words about a memorable past research and writing experience at the beginning of the semester and the current research and writing at the end of the semester. The quantitative part of the survey measured students' perceptions about research using a questionnaire with five-point Likert scale, and students' anxiety levels using a standard state anxiety test (STAI Y-1). The first article, Part 1, provides a detailed description of the experimental design and reports on quantitative results. This article reports on content analysis of students' writings about their experiences of the two research projects. Analysis of the data confirmed Kuhlthau's Information Search Process (ISP) model and revealed additional affective and cognitive aspects related to research and writing.
This study introduces methods for evaluating search engine performance over a time period. Several measures are defined, which as a whole describe search engine functionality over time. The necessary setup for such studies is described, and the use of these measures is illustrated through a specific example. The set of measures introduced here may serve as a guideline for the search engines for testing and improving their functionality. We recommend setting up a standard suite of measures for evaluating search engine performance.
This study investigates the use of criteria to assess relevant, partially relevant, and not-relevant documents. Study participants identified passages within 20 document representations that they used to make relevance judgments; judged each document representation as a whole to be relevant, partially relevant, or not relevant to their information need; and explained their decisions in an interview. Analysis revealed 29 criteria, discussed positively and negatively, that were used by the participants when selecting passages that contributed or detracted from a document's relevance. These criteria can be grouped into six categories: abstract (e.g., citability, informativeness), author (e.g., novelty, discipline, affiliation, perceived status), content (e.g., accuracy/validity, background, novelty, contrast, depth/scope, domain, citations, links, relevant to other interests, rarity, subject matter, thought catalyst), full text (e.g., audience, novelty, type, possible content, utility), journal/publisher (e.g., novelty, main focus, perceived quality), and personal (e.g., competition, time requirements). Results further indicate that multiple criteria are used when making relevant, partially relevant, and not-relevant judgments, and that most criteria can have either a positive or negative contribution to the relevance of a document. The criteria most frequently mentioned by study participants were content, followed by criteria characterizing the full text document. These findings may have implications for relevance feedback in information retrieval systems, suggesting that systems accept and utilize multiple positive and negative relevance criteria from users. Systems designers may want to focus on supporting content criteria followed by full text criteria as these may provide the greatest cost benefit.
Although no unified definition of the concept of search strategy in Information Retrieval (IR) exists so far, its importance is manifest: nonexpert users, directly interacting with an IR system, apply a limited portfolio of simple actions; they do not know how to react in critical situations; and they often do not even realize that their difficulties are due to strategic problems. A user interface to an IR system should therefore provide some strategic help, focusing user's attention on strategic issues and providing tools to generate better strategies. Because neither the user nor the system can autonomously solve the information problem, but they complement each other, we propose a collaborative coaching approach, in which the two partners cooperate: the user retains the control of the session and the system provides suggestions. The effectiveness of the approach is demonstrated by a conceptual analysis, a prototype knowledge-based system named FIRE, and its evaluation through informal laboratory experiments.
Most popular search engines are not designed for answering natural language questions. However, when we asked hundreds of natural language questions of nine leading search engines, all retrieved at least one correct answer on more than three-quarters of the questions. We identified the best-performing search engines overall for factual natural language questions. We found performance differences depending on the domain of factual question asked. Other aspects of questions also predicted significantly different performance: the number of words in the question, the presence of a proper noun, and whether the question is time dependent. An additional analysis tested for differential performance by specific search engines on these four question factors. The analysis found no evidence for such interactions.
New statistical formulas were developed for identifying two- and three-character words in Chinese text. The formulas were constructed by performing stepwise logistic regression using a sample of sentences that had been manually segmented. For identifying two-character words, the relative frequency of the adjacent characters and the document frequency of the overlapping bigrams were found to be significant factors. These provide information about the immediate neighborhood or context of the character string. Contextual information was also found to be significant in predicting three-character words. Local information (the number of times the bigram or trigram occurs in the document being segmented) and the position of the bigram/trigram in the sentence were not found to be useful in identifying words. The new formulas, called contextual information formulas, were found to be substantially better than the mutual information formula in identifying two- and three-character words. Using the contextual information formulas for both two- and three-character words gave significantly better results than using the formula for two-character words alone. The method can also be used for identifying multiword terms in English text.
In this article we report on a series of experiments designed to investigate the combination of term and document weighting functions in information retrieval. We describe a series of weighting functions, each of which is based on how information is used within documents and collections, and use these weighting functions in two types of experiments: one based on combination of evidence for ad hoc retrieval, the other based on selective combination of evidence within a relevance feedback situation. We discuss the difficulties involved in predicting good combinations of evidence for ad hoc retrieval, and suggest the factors that may lead to the success or failure of combination. We also demonstrate how, in a relevance feedback situation, the relevance assessments can provide a good indication of how evidence should be selected for query term weighting. he use of relevance information to guide the combination process is shown to reduce the variability inherent in combination of evidence.
A Delphi study conducted in Israel during 1998-2000 examined the views of library and information science (LIS) experts on the future of the profession in light of the changes in information technology. The study focused on three areas: (a) the transition from the traditional to the virtual library; (b) the transition from the technical to user-centered approach, and (c) the skills and the roles of the LIS professionals. The study found that most experts believe that the traditional library will continue to operate along with the virtual library. Most of the experts agree that in the future, libraries will place larger emphasis on customer services. LIS professionals will be specialists in locating, filtering, and evaluating information, and will be primary instructors in the use of new information technologies. This study's conclusions closely match those of the Kaliper project (1998-2000), which examined the change in the curricula of LIS schools.
The Impact Factor introduced by Eugene Garfield is a fundamental citation-based measure for significance and performance of scientific journals. It is perhaps the most popular bibliometric product used in bibliometrics itself, as well as outside the scientific community. First, a concise review of the background and history of the ISI impact factor and the basic ideas underlying it are given. A cross-citation matrix is used to visualise the construction of the Impact Factor and several related journal citation measures. Both strengths and flaws of the impact factor are discussed. Several attempts made by different authors to introduce more sophisticated journal citation measures and the reasons why many indicators aiming at a correction of methodological limitations of the Impact Factor were not successful are described. The next section is devoted to the analysis of basic technical and methodological aspects, In this context, the most important sources of possible biases and distortions for calculation and use of journal citation measures are studied. Thereafter, main characteristics of application contexts are summarised. The last section is concerned with questions of statistical reliability of journal citation measures. It is shown that in contrast to a common misbelief statistical methods can be applied to discrete 'skewed' distributions, and that the statistical reliability of these statistics can be used as a basis for application of journal impact measures in comparative analyses. Finally, the question of sufficiency or insufficiency of a single, howsoever complex measure for characterising the citation impact of scientific journals is discussed.
Impact factor is a quasi-qualitative indicator, which provides a measurement of the prestige and international visibility of journals. Although the use of impact factor-based indicators for science policy purposes has increased over the last two decades, several limitations have been pointed out and should be borne in mind. The use of impact factor should be treated carefully when applied to the analysis of peripheral countries, whose national journals are hardly covered by ISI databases. Our experience in the use of impact factor based indicators for the analysis of the Spanish scientific production is shown. The usefulness of the impact factor measures in macro, meso and micro analyses is displayed. In addition, the main advantages, such as the great accessibility of impact factor and its ready-to-use nature are pointed out. Several limitations such as the need to avoid inter-field comparisons or the convenience of using a fixed journal set for international comparisons are also stressed. It is worth noting that the use of impact factor in the research evaluation process has influenced strongly the publication strategy of scientists.
A study undertaken in 1996 of Australia's performance in the high impact journals of a few selected fields of science has produced empirical data for examining the factors that influence peers in their choice of the 'highly-rated' journals in their field. A number of characteristics were used to compare the selected journals with those having the highest impact factor, as listed in ISI's Journal Citation Reports. This paper ranked journals on three impact factors - ISI's impact factor for two consecutive years, and one calculated for a five-year window. The data suggests that the type of impact measure was less important in journal selection than the long-term validity of the rankings. A group of experts was less likely to include journals that were only highly ranked for a short period in their 'top 20'. Of the more descriptive journal characteristics analysed, the age of the journal appeared significant. Their selections also appeared biased against journals that were relatively new, regardless of how high their impact factor was.
Journal citation impact factors, which are frequently used as a surrogate measure of research quality, do not correlate well with UK researchers' subjective views of the relative importance of journals as media for communicating important biomedical research results. The correlation varies with the sub-field: it is almost zero in nursing research but is moderate in more "scientific" subfields such as multiple sclerosis research, characterised by many authors per paper and appreciable foreign co-authorship. If research evaluation is to be based on journal-specific indicators, then these must cover different aspects of the process whereby research impacts on other researchers and on healthcare improvement.
In an evaluation of physics research programs in the Netherlands, held in 1996, assessments of research by expert panels were supplemented with bibliometric analysis. This latter analysis included the calculation of several bibliometric indicators, among which some taking journal impact measures as a baseline. Final outcomes of this evaluation provided an opportunity to re-examine the results of this assessment from the perspective of the degree of interdisciplinarity of programs involved. In this paper we discuss results of this latter analysis, in particular with respect to the relation between several citation based indicators and interdisciplinary research in Dutch physics.
This paper discusses development and application of journal impact indicators in a number of bibliometric studies commissioned by Dutch organizations and institutions, and conducted in our institute during the past five years. An outline is given of the research questions addressed in these studies and their policy context. For each study the appropriateness of the use of journal impact indicators produced by the Institute for Scientific Information (ISI) is evaluated. Alternative journal impact measures were developed which are shown to be more appropriate in the particular research and policy contexts than the ISI measures. These measures were considered to be highly useful by the users. The studies have revealed methodological flaws of the ISI journal impact factors.
The assessment of the publications of research teams working on different subfields raises concerns because of the different scientometric features of the subfields. For equalizing the differences in the Garfield (Impact) Factors of journals, several methods applied in practice have been described. A new indicator - Specific Impact Contribution (SIC) relating the citation share of a respective team (or journal) in the total citations of the teams (or journals) evaluated to its share in publications - was introduced. It has been realized that the normalized Garfield Factors and the normalized SIC values are identical measures within any selected set of journals. Consequently, the Garfield Factor of a journal should be assumed as an indicator characterizing the contribution of the information channel as a whole, appropriately.
Several languages for querying and transforming XML, including XML-QL, Quilt, and XQL, have been proposed. However, these languages do not support ranked queries based on textual similarity, in the spirit of traditional IR. Several extensions to these XML query languages to support keyword search have been made, but the resulting languages cannot express IR-style queries such as "find books and CDs with similar titles". In some of these languages keywords are used merely as boolean filters without support for true ranked retrieval; others permit similarity calculations only between a data value and a constant, and thus cannot express the above query. WHIRL avoids both problems, but assumes relational data. We propose ELIXIR, an expressive and efficient language for XML information retrieval that extends XML-QL with a textual similarity operator that can be used for similarity joins, so ELIXIR is sufficiently expressive to handle the sample query above. ELIXIR thus qualifies as a general-purpose XML IR query language. Our central contribution is an efficient algorithm for answering ELIXIR queries that rewrites the original ELIXIR query into a series of XML-QL queries to generate intermediate relational data, and uses WHIRL to efficiently evaluate the similarity operators on this intermediate data, yielding an XML document with nodes ranked by similarity. Our experiments demonstrate that our prototype scales well with the size of the query and the XML data.
EquiX is a search language for XML that combines the power of querying with the simplicity of searching. Requirements for such languages are discussed, and it is shown that EquiX meets the necessary criteria. Both a graph-based abstract syntax and a formal concrete syntax are presented for EquiX queries. In addition, the semantics is defined and an evaluation algorithm is presented. The evaluation algorithm is polynomial under combined complexity. EquiX combines pattern matching, quantification, and logical expressions to query both the data and meta-data of XML documents. The result of a query in EquiX is a set of XML documents. A DTD describing the result documents is derived automatically from the query.
XML is nowadays considered the standard meta-language for document markup and data representation. XML is widely employed in Web-related applications as well as in database applications, and there is also a growing interest for it by the literary community to develop tools for supporting document-oriented retrieval operations. The purpose of this article is to show the basic new requirements of this kind of applications and to present the main features of a typed query language, called Tequyla-TX, designed to support them.
XML represents both content and structure of documents. Taking advantage of the document structure promises to greatly improve the retrieval precision. In this article, we present a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Our query model is based on tree matching as a simple and elegant means to formulate queries without knowing the exact structure of the data. Using this query model we propose a logical document concept by deciding on the document boundaries at query time. We combine structured queries and term-based ranking by extending the term concept to structural terms that include substructures of queries and documents. The notions of term frequency and inverse document frequency are adapted to logical documents and structural terms. We introduce an efficient technique to calculate all necessary term frequencies and inverse document frequencies at query time. By adjusting parameters of the retrieval process we are able to model two contrary approaches: the classical vector space model, and the original tree matching approach.
Despite the fact that several models to structure text documents and to query on this structure have been proposed in the past, a standard has emerged only relatively recently with the introduction of XML and its proposed query language XQL, on which we focus in this article. Although there exist some implementations of XQL, efficiency of the query engine is still a problem. We show in this article that an already existing model, Proximal Nodes, which was defined with the goal of efficiency in mind, can be used as an efficient query engine behind an XQL front-end.
The advances in storage and communications enable users to store massive amounts of data, and to share it seamlessly with their peers. With the advent of XML, we expect a significant portion of this data to be in XML format. We describe here the architecture and implementation of an XML repository that promotes a novel navigation paradigm for XML documents based on content and context. Support for these capabilities is achieved by bringing to bear the organizational power of information retrieval to the domain of semistructured documents. File systems remain the preferred storage infrastructure for the home and business desktop environments. We have built a system, XMLFS, based on the ideas stated above. XMLFS presents a storage abstraction that manifests itself to the client as a familiar file system. However, it breaks the tight coupling between the directory hierarchical structure and the file system. XMLFS creates automatically a directory, organization of any XML document collection based on content and context. Each user can navigate through the file system according to her/his domain of interest:at that point in time. Our result is a first step towards a solution to the discovery and navigation problems presented by the collective repositories of XML documents in peer-to-peer environments.
We show several problems with marking free text, text that is either natural language or semigrammatical but unstructured. These problems prevent well-formed XML from marking text for readily available meaning. A solution is proposed to mark meaning in free text that is consistent with the intended simplicity of XML versus SGML.
This contribution focuses on the application of bibliometric techniques to research activities in China, based on data extracted from the Science Citation Index (SCI) and related Citation Indexes, produced by the Institute for Scientific Information (ISI). The main conclusion is that bibliometric analyses based on the ISI databases in principle provide useful and valid indicators of the international position of Chinese research activities, provided that these analyses deal properly with the relatively large number of national Chinese journals covered by the ISI indexes. It is argued that it is important to distinguish between a national and an international point of view. In order to assess the Chinese research activities from a national perspective, it is appropriate to use the scientific literature databases with a good coverage of Chinese periodicals, such as the Chinese Science Citation Database (CSCD), produced at the Chinese Academy of Sciences. Assessment of the position of Chinese research from an international perspective should be based on the ISI databases, but it is suggested to exclude national Chinese journals from this analysis. In addition it is proposed to compute an indicator of international publication activity, defined as the percentage of articles in journals processed for the ISI indexes, with the national Chinese journals being removed, relative to the total number of articles published either in national Chinese or in other journals, regardless of whether these journals are processed for the ISI indexes or not. This indicator can only be calculated by properly combining CSCD and ISI indexes.
The author surveyed a set of ten scholarly journals that publish the mainstream of papers in the field of Scientometrics, Informetrics, and Bibliometrics (SIB). The survey is limited only to the research articles published in the field for the two decades period 1981-2000. Each journal was examined issue by issue for the institutional affiliations of contributing authors. Institutional rankings for the total period and the two decade periods; 1981-1990 and 1991-2000 were determined by awarding credit to the authors' institutions based on authorship. In the composite of ten journals, the University Sheffield (England), the University of North Carolina (USA), the University of Leiden (Netherlands), the City University of London (England), the National Institute of Science, Technology and Development Studies (India), the University of Sussex (England), the University of Illinois (USA), the University of Michigan (USA), the Hungarian Academy of Sciences Library (Hungary), and Indiana University (USA) emerged as the ten most productive institutions for the period 1981-2000.
The stochastic model proposed recently by the author to describe the citation process in the presence of obsolescence is further investigated to illustrate the nth-citation distribution and the distribution of the total number of citations. The particular case where the latent rate has a gamma distribution is analysed in detail and is shown to be able to agree well with empirical data.
The number of Brazilian publications in the Institute for Scientific Information database, ISI, increased significantly in the last 20 years, comprising more than I percent of the database in the last two years. The relationship between size and recognition of Brazilian science, estimated by number of ISI-indexed publications, p, and citations, c, obeyed a power law, c = k p(n). The value of n, a known indicator of such relationship was 1.42 +/- 0.04, significantly higher than that found for the whole set of ISI-indexed world publications. The recent growth of Brazilian publication was not solely due to international collaboration, since over the last six years international collaboration, estimated as the percentage of Brazilian publications having at least one foreign address, reached a constant value of ca. 30%. International collaboration increased the impact of Brazilian publications. Although the most frequent collaborating countries are those that produce the largest percentage of world's science, Brazilian collaboration with Argentina and Chile exhibit impacts comparable to the major science producers.
The authorship and citation patterns in the journal titled Management Science (MS) are analysed. The purpose of the analysis is to examine the competitive relation of MS with OR (operational research or operations research). The analysis is focused on the use of mathematical methods, because MS entered the management research area by using mathematical methods developed by OR and because the use of mathematical methods in real management is facing difficulties. The relationship of MS with information systems (IS) and organisation research (Org) is analysed in regard to the competition of MS with OR. The analysis reveals the intermediate character of MS; that is, MS is less prone to mathematical methods and is more inclined towards IS and Org than OR is.
We propose enhancing the traditional literature review through "research profiling". This broad scan of contextual literature can extend the span of science by better linking efforts across research domains. Topical relationships, research trends, and complementary capabilities can be discovered, thereby facilitating research projects. Modem search engine and text mining tools enable research profiling by exploiting the wealth of accessible information in electronic abstract databases such as MEDLINE and Science Citation Index. We illustrate the potential by showing sixteen ways that "research profiling" can augment a traditional literature review on the topic of data mining.
The cumulative distribution of the age of the most-recent-reference distribution is the "dual" variant of the first-citation distribution. The latter has been modelled in previous publications of different authors but the former one has not. This paper studies a model of this cumulative most-recent-reference distribution which is different from the first-citation distribution. This model is checked on JASIS and JACS data, with success. The model involves the determination of 3 parameters and is a transformation of the lognormal distribution. However we also show that the first-citation model (involving only 2 parameters and which is easier to handle), developed in an earlier paper, gives enough freedom to give close fits to the most-recent-reference data as well.
We discuss the internationalisation and the visibility of Chinese journals covered by the Institute for Scientific Information (ISI), Attention is focused on physics and chemistry journals. For these journals the country of origin of published papers and their citation patterns are analysed. As an indicator of internationality we further consider the composition of their editorial boards. It is concluded that even those Chinese journals that are covered by ISI are still rather 'local' and suffer from a low visibility in the world. Yet we are optimistic about the future of Chinese science and its scientific journals.
Health systems are reforming their structures and services world-wide. Both, developed and developing countries are searching for better organisation and functioning schemes of their health systems. The social service delivery system in developing countries is severely limited in its ability to respond and adjust to changing circumstances by institutional, organisational, and structural factors. As a result, different countries of the Latin American and Caribbean regions have developed a diversity of reform models. While international agencies and non-government academic organisations have been funding some of the health system reform initiatives among developing countries, no clear picture exists as to the results or impact of this support. Indicators related to knowledge administration, published results or shared experiences are needed to establish a diagnosis of the existing situation and to support decision making processes in ten-as of policy and research funding. This work presents the results of a bibliometric and webometric analysis on the production and distribution of the literature generated on health system reforms, as produced in or about Latin America and the Caribbean, for the period 1980-1999. Results indicated the limitations and low quality of local and regional databases to represent the productivity in the field. Data was obtained regarding the patterns of production and distribution of documents over time; the main countries and areas involved in health system reform processes; and the institutions behind the initiatives. The implications of the results derived from this research to health policy makers, researchers, librarians, database producers, and information scientists are discussed by the author.
The noticeable tendency of young Croatian scientists towards a professional and foreign exodus can be explained not so much by their social, professional or material differentiation as by a subjective experience of their own life and professional situation; in other words, dissatisfaction of the researchers with their own professional and social status. The hierarchy of motives of a potential young scientists' drain indicates a complex pattern. It combines suppressive yet attractive motives where economic factors take precedence but with a desire for better conditions in scientific work and opportunities for promotion, thus demonstrating a motivational duality: integration into the general motivational migration pattern while displaying a scientifically motivational specificity.
The aim of this study was to assess the influence of civil war during recent disintegration of the former Yugoslavia on scientific output, as measured by changes in numbers of articles published in peer-reviewed journals. The articles published in journals indexed in the Science Citation Index (SCI) were retrieved for the former Yugoslav republics. According to the census of 1991, the republics' populations were as follows: Serbia 9.7 million inhabitants, Croatia 4.7, Bosnia and Herzegovina (B&H) 4.3, Macedonia 2.0, Slovenia 1.9, and Montenegro 0.6. The annual numbers of articles from each were determined from 1988 to 2000. This period includes three prewar years, 5 years of civil war from 1991 to 1995, and the NATO military interventions in B&H (1995) and F.R. Yugoslavia (1999), which includes Serbia and Montenegro. In the late 1980s, Serbia produced more than 900 scientific articles per year and was well ahead, with twice as many publications as Slovenia. The number of publications from Croatia fell between that of Serbia and Slovenia. In the prewar period, the remaining republics had a relatively small scientific presence. The outputs from B&H decreased, from 50 articles in 1991, sharply during the war and continued to decrease. During the postwar period only 18 to 27 papers per year were published. In 1995, the output from Serbia dropped 33% in comparison to 1991. Slovenia produced more publications that year while Croatia was stagnant, and 3 most productive states had a similar output. In 1998, Serbia produced 1543 publications, Slovenia 1116, Croatia 1103, Macedonia 100, B&H 25, and Montenegro 12. The number of articles from Serbia dropped in 1999 and 2000 for 10.2% and 27.9%, respectively, in comparison to 1998. For the same two years, the number of publications was increased in Croatia (37.3% and 12.5%), Slovenia (10.9% and 52.8%), Macedonia (5% and 6%) and Montenegro (75% and 66%). The concentration of scientific research in well-established universities caused an uneven distribution of scientific output among various republics. Thus, the annual output of scientific papers per 100,000 inhabitants in 1990 greatly varied in various republics. In Montenegro it was 1.79, B&H 1.95, Macedonia 2,36, Serbia 11.92, Croatia 18.40 and Slovenia 29.63. In 2000, the annual output per 100,000 inhabitants in these republics was 3.41, 0.61, 5.24, 11,34, 26.00 and 76.84, respectively. The scientific production in B&H and in Serbia was affected not only by the devastated economy, damaged communications, and hardship of everyday life during the war and postwar years, but because many scientists left the country, and the scientists in Serbia were isolated from the international scientific community.
This study applies reader-response criticism to examine the relationship between and among designers, text, and users of the Gaiter Health Sciences Library Web site. It asks such questions as "How do Web site designers construct their subject?" or, "Whom do the web designers think their users are?" The study ascertains the intentions of the designers of the GHSL Web site; examines the meanings made by the users through interviews; compares the similarities and differences of designers' intentions with their organization of knowledge represented in the GHSL Web site; and compares the similarities and differences between the designers' intentions and views of the users.
Users move from one state (or task) to another in an information system's labyrinth as they try to accomplish their work, and the amount of time they spend in each state varies. This article uses continuous-time stochastic models, mainly based on semi-Markov chains, to derive user state transition patterns (both in rates and in probabilities) in a Web-based information system. The methodology was demonstrated with 126,925 search sessions drawn from the transaction logs of the University of California's MELVYL(R), library catalog system (www.melvyl.ucop.edu). First, user sessions were categorized into six groups based on their similar use of the system. Second, by using a three-layer hierarchical taxonomy of the system Web pages, user sessions in each usage group were transformed into a sequence of states. All the usage groups but one have third-order sequential dependency in state transitions. The sole exception has fourth-order sequential dependency. The transition rates as well as transition probabilities of the semi-Markov model provide a background for interpreting user behavior probabilistically, at various levels of detail. Finally, the differences in derived usage patterns between usage groups were tested statistically. The test results showed that different groups have distinct patterns of system use. Knowledge of the extent of sequential dependency is beneficial because it allows one to predict a user's next move in a search space based on the past moves that have been made. It can also be used to help customize the design of the user interface to the system to facilitate interaction. The group CL6 labeled "knowledgeable and sophisticated usage" and the group CL7 labeled "unsophisticated usage" both had third-order sequential dependency and had the same most-frequently occurring search pattern: screen display, record display, screen display, and record display. The group CL8 called "highly interactive use with good search results" had fourth-order sequential dependency, and its most frequently occurring pattern was the same as CL6 and CL7 with one more screen display action added. The group CL13, called "known-item searching" had third-order sequential dependency, and its most frequently occurring pattern was index access, search with retrievals, screen display, and record display. Group CL14 called "help intensive searching," and CL18 called "relatively unsuccessful" both had third-order sequential dependency, and for both groups the most frequently occurring pattern was index access, search without retrievals, index access, and again, search without retrievals.
This article reports the results of a survey of reference librarians in public and academic libraries of various sizes in the United States, asking them about their experiences with and attitudes towards the use of digital and networked technologies and resources in reference work. A total of 648 responded. In general, respondents were positive and optimistic in their outlook, but not unreservedly so. Among the strongest findings was a correlation between recent experience at doing digital reference and positive attitudes towards it, a clear set of opinions about what such services would be best and worst at, and differing perspectives and patterns of responses between academic and public librarians. In addition, questions asking about characteristics of librarians, their current and planned reference services, and some of their professional choices in doing reference work are reported.
The proliferation of digital information resources and electronic databases challenges libraries and demands that libraries develop new mechanisms to facilitate and better inform user selection of electronic databases and search tools. We developed a prototype, Web-based database selection expert system based on reference librarian's database selection strategy. This system allows users to simultaneously search all databases available to identify databases most relevant to their quests using free-text keywords or phrases taken directly from their research topics. This article reports on (1) the initial usability test and evaluation of the Selector-the test design, methodology used, performance results; (2) summary of search query analyses; (3) user satisfaction measures; (4) the use of the findings for further modification of the Selector; and (5) the findings of using randomly selected subjects to perform a usability test with predefined searching scenarios. Future prospects of this research have also been discussed in the article.
This article presents the research findings of a study on task complexity and information-seeking activities in real-life work tasks. The focus was on perceived task complexity, which was determined according to the task performers' prior knowledge about the task ahead. This view on task complexity is closely related to research considering task uncertainty and analyzability. Information-seeking activities considered were a need to acquire different types of information and the subsequent use of different types of sources. The research data were mainly collected by (1) self-recorded journals that were filled out by municipal administrators in the course of performing their ordinary work duties (altogether 78 task diaries), and (2) subsequent interviews. The results indicated that there is a relatively strong relationship between types of information and types of sources. The effects of task complexity made experts more attractive as a source than other people and all types of documentary sources.
This article argues that individual Web sites form hyperlink-affiliations with others for the purpose of strengthening their individual trust, expertness, and safety. It describes the hyperlink-affiliation network structure of Korea's top 152 Web sites. The data were obtained from their Web sites for October 2000. The results indicate that financial Web sites, such as credit card and stock Web sites, occupy the most central position in the network. A cluster analysis reveals that the structure of the hyperlink-affiliation network is influenced by the financial Web sites with which others are affiliated. These findings are discussed from the perspective of Web site credibility.
Information and communication technologies (ICTs) have opened up new opportunities for the Nigerian print media to improve on their products and services. This study explores the socio-economic factors associated with the adoption and use of ICTs by the media. Of a total of 54 socio-economic factors considered, exactly 50% were found to have significant influence on the adoption and success of ICT applications. The factors that have the greatest positive influence on adoption of ICTs include organizational goal, profitability, organizational image, communication in the organization, productivity, and openness of workers to change. They also constitute success factors in the use of these technologies. The factors that constrained adoption and also successful application include high rate of inflation, unfavourable exchange rate of the naira to the dollar, low wage level, huge costs, low gross national product, inadequate funding, and unstable political situation. These constraining factors are indicators of economic weakness and political uncertainty. It seems that the significance of such factors, which are completely external to a business organization, was often underestimated in studies of organizational performance in developing countries.
Subject content analysis of Web query terms is essential to understand Web searching interests. Such analysis includes exploring search topics and observing changes in their frequency distributions with time. To provide a basis for in-depth analysis of users' search interests on a larger scale, this article presents a query categorization approach to automatically classifying Web query terms into broad subject categories. Because a query is short in length and simple in structure, its intended subject(s) of search is difficult to judge. Our approach, therefore, combines the search processes of real-world search engines to obtain highly ranked Web documents based on each unknown query term. These documents are used to extract cooccurring terms and to create a feature set. An effective ranking function has also been developed to find the most appropriate categories. Three search engine logs in Taiwan were collected and tested. They contained over 5 million queries from different periods of time. The achieved performance is quite encouraging compared with that of human categorization. The experimental results demonstrate that the approach is efficient in dealing with large numbers of queries and adaptable to the dynamic Web environment. Through good integration of human and machine efforts, the frequency distributions of subject categories in response to changes in users' search interests can be systematically observed in real time. The approach has also shown potential for use in various information retrieval applications, and provides a basis for further Web searching studies.
This study applied the Biglan model of disciplinary differences to the information-seeking behavior patterns of 5,175 undergraduates responding to questions on the College Student Experiences Questionnaire (CSEQ). The Biglan model categorizes academic disciplines along three dimensions: (1) hard-soft, (2) pure-applied, and (3) life-nonlife systems. Using t-tests, this model proved to be valid for distinguishing differences in undergraduates' information-seeking behavior patterns among various academic disciplines. The results indicate that the Biglan model has implications for the redesign of academic library services and use as a valid theoretical framework for future library and information science research.
Recent studies show that humans engage in multitasking behaviors as they seek and search information retrieval (IR) systems for information on more than one topic at the same time. For example, a Web search session by a single user may consist of searching on single topics or multitasking. Findings are presented from four separate studies of the prevalence of multitasking information seeking and searching by Web, IR system, and library users. Incidence of multitasking identified in the four different studies included: (1) users of the Excite Web search engine who completed a survey form, (2) Excite Web search engine users filtered from an Excite transaction log from 20 December 1999, (3) mediated on-line databases searches, and (4) academic library users. Findings include: (1) multitasking information seeking and searching is a common human behavior, (2) users may conduct information seeking and searching on related or unrelated topics, (3) Web or IR multitasking search sessions are longer than single topic sessions, (4) mean number of topics per Web search ranged of 1 to more than 10 topics with a mean of 2.11 topic changes per search session, and (4) many Web search topic changes were from hobbies to shopping and vice versa. A more complex model of human seeking and searching levels that incorporates multitasking information behaviors is presented, and a theoretical framework for human information coordinating behavior (HICB) is proposed. Multitasking information seeking and searching is developing as major research area that draws together IR and information seeking studies toward a focus on IR within the context of human information behavior. Implications for models of information seeking and searching, IR/Web systems design, and further research are discussed.
This article describes an evaluation of the Kea automatic keyphrase extraction algorithm. Document keyphrases are conventionally used as concise descriptors of document content, and are increasingly used in novel ways, including document clustering, searching and browsing interfaces, and retrieval engines. However, it is costly and time consuming to manually assign keyphrases to documents, motivating the development of tools that automatically perform this function. Previous studies have evaluated Kea's performance by measuring its ability to identify author keywords and keyphrases, but this methodology has a number of well-known limitations. The results presented in this article are based on evaluations by human assessors of the quality and appropriateness of Kea keyphrases. The results indicate that, in general, Kea produces keyphrases that are rated positively by human assessors. However, typical Kea settings can degrade performance, particularly those relating to keyphrase length and domain specificity. We found that for some settings, Kea's performance is better than that of similar systems, and that Kea's ranking of extracted keyphrases is effective. We also determined that author-specified keyphrases appear to exhibit an inherent ranking, and that they are rated highly and therefore suitable for use in training and evaluation of automatic keyphrasing systems.
In this article we demonstrate the use of an integrative approach to visualizing and tracking the development of scientific paradigms. This approach is designed to reveal the long-term process of competing scientific paradigms. We assume that a cluster of highly cited and cocited scientific publications in a cocitation network represents the core of a predominant scientific paradigm. The growth of a paradigm is depicted and animated through the rise of citation rates and the movement of its core cluster towards the center of the cocitation network. We study two cases of competing scientific paradigms in the real world: (1) the causes of mass extinctions, and (2) the connections between mad cow disease and a new variant of a brain disease in humans-vCJD. Various theoretical and practical issues concerning this approach are discussed.
Drawing its insights from traditions in scientometrics, this paper proposes to gather data by computing the number of patents granted rather than applications for them. Given the precedent set by the Frascati manual, which prompted countries to compute data on the R&D of institutional sectors and to exclude the contribution of households, it is recommended to distinguish clearly between independent and institutional inventors. This paper also suggests computing data for multiple addresses in order to reveal patterns of collaboration. Finally, it is important to consider that patents are not only an indicator of novelty but also an indicator of intellectual property, hence, the paper suggests measuring both dimensions.
We describe the steps involved in constructing authors' citation identities (whom they cite) and citation images (who cites them). Familiarity with the intellectual, social, and institutional connections of these authors over time helps inform the analysis and augment the specificity of citation counts. Our study shows that authors' writing and referencing styles constitute a form of watermark for their scholarly output.
Determining the core of a field's literature, i.e. its 'most important' sources, has been and still is an important problem in bibliometrics. In this article an exact definition of a core of a bibliography or a conglomerate is presented. The main ingredients for this definition are: fuzzy set theory, Lorenz curves and concentration measures. If one prefers a strict delineation, the fuzzy core can easily be defuzzified. The method we propose does not depend on the subjective notion of 'importance'. It is, moreover, completely reproducible. The method and the resulting core is also independent of the mathematical function (Lotka, Zipf, Bradford, etc.) that may be used to describe the relation between the set of sources and that of items.
Theoretically, we first classified interdisciplinary research to three types: 1) creating new field from inside of the scientific community, 2) collaboration among disciplines lead by the societal needs outside of the scientific community, 3) making suggestions for public. For analysis on dynamics of knowledge combination in interdisciplinary research, we developed an indicator using "relative transmission" concept and also sub-domain maps using quantification factor analysis (1974). We used a sub-domain matrix, which show the structure of submitting crossing over several sub-domains from one individual. Using "researcher" as a unit of the analysis, we analyzed dynamics of the blends / separation of sub-domain in interdisciplinary studies. Biophysics and environmental science, which are categorized as 1) and 2) in the above theoretical types, are chose for analysis. The total numbers of researchers are 516 and 346, in each field respectively, and observation year points are 1966, 1972, 1979, and 1984 for biophysics and 1977 and 1983 for environmental science. The results showed that the structure of groups of sub-domains in environmental science reflects the classical disciplines and there observed little combination in the knowledge, whereas in biophysics, the sub-domains are not fixed and there observed dynamics in knowledge integration. These tendency are considered to reflect the characteristics of the type- 2) and type- 1), respectively.
In a recent study, de Lange and Glanzel introduced a model for the bibliometric analysis of the extent of multinational co-authorship links. They showed that this model can be considered a generalisation of the 'fractionation approach' by Nederhof and Moed. The authors analysed international collaboration links (the Multilateral Collaboration Index) as a function of the share of internationally co-authored papers. The measurement of the deviation of individual countries from (sub-)field peculiarities proved, however, complicated. The intensifying international collaboration and, in several fields, the substantial growth of number of multinational papers (involving three or more countries) in the 90s necessitates a detailed analysis of co-publication distributions, that is, of the distributions of partner countries in a given country's publication output. The main objective of the study is to elaborate such a measure to be used in addition to the share of international publications and the Multilateral Collaboration Index. In addition, a detailed analysis of national citation impact of domestic, bilateral and multilateral papers in the major science fields is conducted. The model, we develop and the statistical analysis that it allows, support the practical conclusion that the ratio of the number of international links and international papers turns out to be roughly proportional to the ratio of full and fractional publication counts.
The Parliament, the highest legislative body in India, plays a significant role in formulating national policies. It is, therefore, pertinent to find the concern the Members of Parliament and different political parties show and the priorities they accord to the S&T related issues. They can judge it statistically through the number of questions raised/asked on the floor of the House. The study presents such an analysis taking the example from the S&T questions raised in the year 1992 during the Tenth Parliament. The analysis has been done by dividing the S&T related issues into 14 socio-economic areas, such as environmental sciences, biotechnology, energy, food and agriculture, health, natural resources, telecommunications, human resource development, etc. and eight policy areas such as technology policy, international collaborations in S&T, etc. The raising of S&T questions jointly by MPs and different political parties through inter-party and intra-party sponsorships has also been studied. Such an analysis may provide an important basis to the managers and policy makers in formulating the S&T policy of a country.
The topic of fuzzy set theory was examined using the occurrence of phrases in bibliographic records. Records containing the word fuzzy, were downloaded from over 100 databases, and from these records, phrases were extracted surrounding the word fuzzy. A methodology was developed to trim this list of phrases to a list of high frequency phrases relevant to fuzzy set theory. This list of phrases was in turn used to extract records from the original downloaded set, which were (algorithmically) relevant to fuzzy set theory. This set of records was then analysed to show the development of the topic of fuzzy set theory, the distribution of the fuzzy phrases over time and the frequency distribution of the fuzzy phrases. In addition, the field of the bibliographic record in which the phrase occurred was examined, as well as the first appearance of a particular fuzzy phrase.
Developing the probability function to describe rank-size Zipfian phenomena, i.e., a form like P(R = r)similar toc/r(a) (alpha>0) with a rank type random variable R, has been an important problem in scientometrics and informetrics. In this article a new rank-size distribution of Zipf's law is presented and applied to an actual distribution of scientific productivities in Chinese universities.
An earlier publication and citation analysis of Scandinavian clinical and social medicine 1988-96 reported that in particular Sweden and Denmark loose publication and citation world shares in many medical fields. In well fare systems such observations are alarming, and follow up studies and monitoring are thus carried out in selected medical fields. One such typical field is Psychiatry. It was decided to broaden the scope of analysis also to include the Netherlands with the European Union, USA and the world as comparative baselines. The period covered is 1981-98. This paper reports the findings and their implications on research policy. As in many other scientific fields the Psychiatric research output converges with respect to the US vs. EU in publication world shares. Both Denmark and Sweden suffer from stagnation in absolute publication numbers over the period and loose visibility dramatically in terms of world and EU shares. Finland and the Netherlands show steep growth rates. In terms of citations the picture is identical. Sweden declines dramatically its EU citation share from 13% to 6.5% during the period. The gap between EU and the US citation impact widens with USA on top. Among the analysed Northern EU countries only the Netherlands demonstrates an above-average impact. Other European players, like Belgium and Ireland, increasingly take part in psychiatric research and show much higher citation impact than the Scandinavian welfare countries.
We describe the Chinese Scientometric Indicators (CSI), an indicator database derived from the Chinese Science Citation Database (CSCD). Its design is supported by the Natural Sciences Foundation of China (NSFC). In this indicator database data of a statistical nature are organized and categorized leading to ranked lists and providing bases for comparisons among Chinese institutions and regions.
Our project has investigated the processes of mediated information retrieval (IR) searching during human information-seeking processes to characterize aspects of this process, including information seekers' changing situational contexts; information problems; uncertainty reduction; successive searching, cognitive styles; and cognitive and affective states. The research has involved observational, longitudinal data collection in the United States and United Kingdom. Three questionnaires were used for pre- and postsearch interviews: reference interview, information seeker postsearch, and search intermediary postsearch questionnaires. In addition, the Sheffield team employed a fourth set of instruments in a follow-up interview some 2 months after the search. A total of 198 information seekers participated in a mediated on-line search with a professional intermediary using the Dialog Information Service. Each mediated search process was audio taped and search transaction log recorded. The findings are presented in four parts. Part I presents the background, theoretical framework, models, and research design used during the research. Part 11 is devoted to exploring changes in information seekers' uncertainty during the mediated process. Part III provides results related to successive searching. Part IV reports findings related to cognitive styles, individual differences, age and gender. Additional articles that discuss further findings from this complex research project, including: (1) an integrated model of information seeking and searching, (2) assessment of mediated searching, and (3) intermediary-information seeker communication, are in preparation and will be published separately.
This article explores the relationship between the concept of uncertainty in information seeking, within a model of the problem-solving process proposed by Wilson (1999a) and variables derived from other models and from the work of Ellis and Kuhlthau. The research has involved longitudinal data collection in the United States and United Kingdom employing three interview schedules (incorporating self-completed questionnaires) used for pre- and postsearch interviews: and postsearch interviews with the information seeker and the search intermediary. In addition, the Sheffield team employed a fourth set of instruments in a follow-up interview some 2 months after the search. Related search episodes, with a professional search intermediary using the Dialog Information Service and other sources were audiotaped, and search transaction logs were recorded. The mediated search clients were faculty and research students engaged in either personal or externally supported research projects. The article concludes that the problem solving model is recognized by such researchers as describing their activities and that the uncertainty concept, operationalized as here, serves as a useful variable in understanding information-seeking behavior. It also concludes that Ellis's concept of "search characteristics" and Kuhlthau's information-seeking stages are independent of the problem stage, and that a set of affective variables, based on those of Kuhlthau, appear to signify a generalized positive or negative affective orientation towards the course of the information problem solution.
Our project has investigated the processes of mediated information retrieval (IR) searching during human information-seeking processes to characterize aspects of this process, including information seekers' changing situational contexts; information problems; uncertainty reduction; successive searching, cognitive styles; an cognitive and affective states. The research has involved observational, longitudinal data collection in the United States and the United Kingdom. Three questionnaires were used for pre- and postsearch interviews: reference interview, information-seeker postsearch, and search intermediary postsearch questionnaires. In addition, the Sheffield team employed a fourth set of instruments in a follow-up interview some 2 months after the search. A total of 198 information seekers participated in a mediated on-line search with a professional intermediary using the Dialog Information Service. Each mediated search process was audiotaped and search transaction logs recorded. The findings are presented in four parts. Part I presents the background, theoretical framework, models, and research design used during the research. Part 11 is devoted to exploring changes in information seekers' uncertainty during the mediated process. Part III provides results related to successive searching. Part IV reports findings related to cognitive styles, individual differences, age and gender. Additional articles that discuss further findings from this complex research project, including: (1) an integrated model of information seeking and searching, (2) assessment of mediated searching, and (3) intermediary information-seeker communication, are in preparation and will be published separately.
This is the fourth in a series resulting from a joint research project directed by Professor Tom Wilson in the United Kingdom and Dr. Amanda Spink in the United States. The analysis reported here sought to test a number of hypotheses linking global/analytic cognitive styles and aspects of researchers' problem-solving and related information-seeking behavior. One hundred and eleven postdoctoral researchers were assessed for Witkin's field dependence/independence using Riding's Cognitive Styles Analysis and for Pask's holist/serialist biases using items from Ford's Study Processes Questionnaire. These measures were correlated with the researchers' perceptions of aspects of their problem-solving and information-seeking behavior, and with those of the search intermediary who performed literature searches on their behalf. A number of statistically significant correlations were found. Field-independent researchers were more analytic and active than their field-dependent counterparts. Holists engaged more in exploratory and serendipitous behavior, and were more idiosyncratic in their communication than serialists.
This article describes an information retrieval, visualization, and manipulation model. After term or phrases have been input for a query, the system designed on this model offers the user multiple ways to exploit the retrieval set via an interactive interface. The retrieved data are clustered into thematic concepts related to the query, represented on screen as a grid of nodes. Users of the system may manipulate the retrieval set to explore document-document, document-concept, concept-concept relationships in the retrieval set that might otherwise be masked by altering (a) the discrete grid size of the display, (b) the influence, or weight, of various document terms and properties, and (c) mixed levels of granularity. As these factors are reweighed, the display is updated in real-time to expose unanticipated document relationships, and shifts in cluster membership. The article outlines the mathematical model and then describes an information-retrieval application built on the model to search structured and full-text files. The application, written in Java, uses a small test collection of Dialog and Swiss-Prot documents.
The core of any document retrieval system is a mechanism that ranks the documents in a large collection in order of the likelihood with which they match the preferences of any person who interacts with the system. Given a broader interpretation of "recommending" than is commonly accepted, such a preference ordering may be viewed as a recommendation, made by the system to the information seeker, that is itself typically derived through synthesis of multiple preference orderings expressed as recommendations by indexers, information seekers, and document authors. The ERIn (Evaluation-Recommendation-Information) model, a decision-theoretic framework for understanding information-related activity, highlights the centrality of recommending in the document retrieval process, and may be used to clarify the respects in which indexing, rating, and citation may be considered analogous, as well as to make explicit the points at which content-based, collaboration-based, and context-based flavors of document retrieval systems vary.
We present the application of our knowledge visualization tool, VxInsight(R), to enable domain analysis for science and technology management within the enterprise. Data mining from sources of bibliographic information is used to define subsets of information relevant to a technology domain. Relationships between the individual objects (e.g., articles) are identified using citations, descriptive terms, or textual similarities. Objects are then clustered using a force-directed placement algorithm to produce a terrain view of the many thousands of objects. A variety of features that allow exploration and manipulation of the landscapes and that give detail on demand, enable quick and powerful analysis of the resulting landscapes. Examples of domain analyses used in S&T management at Sandia are given.
A new citation search strategy is proposed for Information Retrieval (IR) based on the principle of polyrepresentation (Ingwersen, 1992, 1996). The strategy exploits logical overlaps between a range of cognitively different interpretations of the same documents in a structured manner, i.e. so-called cognitive overlaps of representations. The strategy is essentially a 'cycling strategy' starting with documents retrieved by a subject search, wherefrom new documents are identified automatically by following the network of citations in scientific papers backwards and forwards in time. In contrast to earlier citation search strategies the proposed strategy does not require known relevant documents (seed documents) as a starting point, but may be based on a subject search. A pilot study is reported where the ability of the strategy to retrieve additional relevant documents is analysed. Results show that a very large amount of documents can be retrieved by the strategy, and that these may be segmented in a number of distinct 'overlap levels'. It is demonstrated that the combined core of the higher-level overlaps contains higher relevance density than found in the original retrieval results. Based on these results it is suggested that the documents be displayed in order of their presence in higher-level overlaps, so as to maximise the chances that as many relevant documents as possible will be presented first to a user.
Traditional means of analysis of research outputs have focussed on citations to papers in journals in other journal publications. But these only chronicle the early stages whereby research in biomedicine is converted into health improvement through better patient care and through preventive measures. New evaluation methods, still based on the concept of citation of research in other documents, are needed and are now being developed. These include the use of textbooks in medical education and the analysis of governmental regulations and health policies, which can influence both the availability of new drugs and the control of toxic substances in food and the environment. There is also an interest in the way that newspapers report biomedical research advances. Readers include politicians, healthcare professionals, the general public (who are increasingly becoming active consumers of healthcare products) and other researchers who may value the immediacy of the reporting. Newspaper articles tend to focus on fashionable topics and to offer premature hopes of cures to disease, but they can also provide a valuable service in showing the importance of animal experiments to biomedical progress. It would be useful to create an international database of newspaper citations through a consortium of partners in different countries who would agree a common protocol and exchange information on a regular basis.
This paper gives an overview of quantitative approaches used to study the science/technology linkage. Our discussion is informed by a number of theoretical approaches that have emerged over the past few years in the area of innovation studies emphasizing the exchange of actors in innovation system and a shift in the division of labour between publicly funded basic research and industrial development of technology. We review the more quantitative literature on efforts made to study such linkage phenomena, to which theorizing in the science policy area has attributed great importance. We then introduce a typology of three approaches to study the science/technology linkage - patent citation, industrial science, and university patenting. For each approach, we shall discuss merits and possible disadvantages. In another step we illustrate them using results from studies of the Finnish innovation system. Finally, we list key limitations of the informetric methods and point to possible hybrid approaches that could remedy some of them.
In this paper we have analyzed the pattern of cooperation links among fifty most prolific institutions (hereafter called "elite institutions") in India. The network of relationships among these institutions is sparse and more than two thirds of the cells in the collaboration matrix are empty. The network is centralized, but no institution dominates the network. It is only a set of few institutions that dominate the network. We have constructed a measure (Bonacich eigenvector centrality index) to assess the position of each institution in the network. Barring a few notable exceptions, scientific size of an institution is directly related to its position in the network. We have graphically depicted the network of relationships among these institutions above a certain threshold of cooperation strength. The network incorporating 50 nodes and 171 arcs provides a synoptic view of bilateral relations among the institutions, but it is quite complex. We have therefore developed a block model of the network to assess the macro level features of cooperation links among the institutions. The block model indicates the isolation and marginality of certain clusters (or blocks) of institutions.
The total scientific output of mainstream articles for the 15 most productive African countries for the period 1991 to 1997 was 45,080, with South Africa and Egypt publishing 15,725 and 10,433, respectively. The productions of these two top ranked countries varied little from 1991-1997 while others such as the Maghreb countries increased between 75-102%. Total contributions were mainly in the fields of Clinical Medicine (36%), Biology (17%), Chemistry (14%), and Biomedical Research (12%). Papers in international collaboration were overriding in Biomedical Research, Biology, Earth and Space Science, and Physics. Institutions in the US were the principal collaborators followed closely by those in France.
We analyse the statistical properties a database of musical notes for the purpose of designing an information retrieval system as part of the Musifind project. In order to reduce the amount of musical information we convert the database to the intervals between notes, which will make the database easier to search. We also investigate a further simplification by creating equivalence classes of musical intervals which also increases the resilience of searches to errors in the query. The Zipf, Zipf-Mandelbrot, Generalized Waring (GW) and Generalized Inverse Gaussian-Poisson (GIGP) distributions are tested against these various representations with the GIGP distribution providing the best overall fit for the data. There are many similarities with text databases, especially those with short bibliographic records. There are also some differences, particularly in the highest frequency intervals which occur with a much lower frequency than the highest frequency "stopwords" in a text database. This provides evidence to support the hypothesis that traditional text retrieval methods will work for a music database.
Three measures of international composition on journal editorial boards - the number of countries represented on the board, the number of international members, and the proportion of international board members - were correlated with impact factor and total citation data in the 1999 Journal Citation Reports for 153 business, political science, and genetics journals. With a few exceptions the relationship between international editorial board composition and citation measures was non-linear, leading to the conclusion that international membership on the editorial board can not generally be used as a marker of better journal quality. Yet further investigation is warranted due to positive correlations between some editorial board and citation measures for non-U.S. business and political science journals.
The aim of this study was to examine the extent to which the field of bibliometrics and scientometrics makes use of sources outside the field. The research was carried out by examining the references of articles published in Scientometrics in the course of two calendar years, 1990, 2000. The results show that in 2000, 56.9% (and 47.3% in 1990) of the references originated from three fields: scientometrics and bibliometrics; library and information science; and the sociology, history and philosophy of science. When comparing the two periods, there is also a considerable increase in journal self-citation (i.e., references to the journal Scientometrics) and in the percentage of references to journals.
The present study views author self-citations, as a blend of experience and cognition of authors. This paper supports to lay down emphasis on consciousness and cognition of authors while assessing author self-citations.
Like the citation network of scientific publications, the Web is also a graph where pages are connected together by hypertext links or "sitations". In the new research field Webometrics, scholars have investigated equivalencies between citationist concepts established in bibliometrics and hyperlinks networks. This paper focuses on the possible analogy between co-citation and co-sitation to structure Web universes. It reports an experiment in the field of bibliometrics and scientific indicators. Several technical aspects that must be dealt with are reviewed. Co-sitation seems a promising way to delineate topics on the Web. However, the analogy with traditional co-citation is deeply misleading: many precautions must be taken in the interpretation of the results.
The purpose of this paper is to present the preliminary results of a bibliometric analysis of AIDS documents as produced on Sub-Saharan Africa. AIDSLINE 1980-2000 was used to conduct the literature search. In this paper, an analysis was made only of the records retrieved under 'Central Africa'. Bibexcel (version 2001) and Microsoft Excel (2000) were used as software tools to conduct the analysis of the records. Seven countries and 1052 records were identified. Main participating countries were Democratic Republic of the Congo (527 documents), and Cameroon (271). Results indicated a high pattern of collaboration through multiple authorship. Most documents were published in English (84.50%) and French (14.73%). Over 57% corresponded to journal articles. The subject content of the documents was mainly focused on epidemiological, complications, and prevention & control issues on 'HIV Infections' and 'Acquired Immunodeficiency Syndrome'. Countries behind this productivity were Cameroon, USA, Democratic Republic of the Congo, France, and Belgium. Comparison of results among Central African countries and among other developing countries is made by the authors.
In order to formulate firm, national or regional technology policy, it is necessary to have indicators that can measure technological competence. This paper develops a set of indicators using patent statistics to compare the "knowledge base" of individuals, laboratories, firms or nations. These indicators are then applied to the patent applications in France, Germany and the U.K. in the biotechnology sectors. The paper shows that France is lagging behind Germany and the U.K. in technology stocks (or its patent applications) in all biotechnology fields. However it is the leader in the technology network supporting the foods industry. It has a comparative advantage in terms of either technology stock counts or networks in Genetic Engineering, Pharmaceuticals, Foods, Chemicals, Cell Culture and Biocatalysis. Germany is leading in many sectors, but in all sectors in which it is a leader, it is a specialized leader, i.e. its technology networks need to be more extensive. It has a comparative advantage in terms of either technology stock counts or networks in all sectors except Genetic Engineering, Pharmaceuticals, Agriculture and Cell Culture. The U.K. is the leader in the important field of Genetic Engineering and in terms of the entire technology networks in the biotechnology sectors. It has a comparative advantage in terms of either technology stock counts or networks in Genetic Engineering, Pharmaceuticals, Agriculture and Purification.
In this paper we report on the results of an exploratory study of knowledge exchange between disciplines and subfields of science, based on bibliometric methods. The goal of this analysis is twofold. Firstly, we consider knowledge exchange between disciplines at a global level, by analysing cross-disciplinary citations in journal articles, based on the world publication output in 1999. Among others a central position of the Basic Life Sciences within the Life Sciences and of Physics within the Exact Sciences is shown. Limitations of analyses of interdisciplinary impact at the journal level are discussed. A second topic is a discussion of measures which may be used to quantify the rate of knowledge transfer between fields and the importance of work in a given field or for other disciplines. Two measures are applied, which appear to be proper indicators of impact of research on other fields. These indicators of interdisciplinary impact may be applied at other institutional levels as well.
Counts of links the websites of Australasian universities were calculated from the output of a specially designed crawler that covered universities in the UK, Australia and New Zealand. These figures were compared to those from the commercial search engines AltaVista and AllTheWeb. Web Impact Factors (WIFs) for Australasian universities were then calculated by dividing link counts from the three countries by academic staff numbers at each target university. WIFs were compared with each other and also with a conventional measure of research output for Australia. It was discovered that the crawler-generated link counts were roughly proportional to those from AltaVista and AllTheWeb for Australia, albeit with some outliers in the data. WIFs correlated quite well with research output for Australia, but the relationship was not clear enough to be able to differentiate between the characters of the WIFs from the three different sources. However, a new measurement introduced, the normalised propensity to link, suggests that the New Zealand university web is more insular than that of Australia.
This paper introduces a citation-based 'systems approach' for analyzing the various institutional and cognitive dimensions of scientific excellence within national research systems. The methodology, covering several aggregate levels, focuses on the most highly cited research papers in the international journal literature. The distribution of these papers across institutions and disciplines enables objective comparisons their (possible) international-level scientific excellence. By way of example, we present key results from a recent series of analyses of the research system in the Netherlands in the mid 1990s, focussing on the performance of the universities across the various major scientific disciplines within the context of the entire system's scientific performance. Special attention is paid to the contribution in the world's top 1% and top 10% most highly cited research papers. The findings indicate that these high performance papers provide a useful analytical framework - both in terms of transparency, cognitive and institutional differentiation, as well as its scope for domestic and international comparisons - providing new indicators for identifying 'world class' scientific excellence at the aggregate level. The average citation scores of these academic 'Centres of Scientific Excellence' appear to be an inadequate predictor of their production of highly cited papers. However, further critical reflection and in-depth validation studies are needed to establish the true potential of this approach for science policy analyses and evaluation of research performance.
In this paper, we develop and discuss a method to design a linkage scheme that links the systems of science and technology through the use of patent citation data. After conceptually embedding the linkage scheme in the current literature on science-technology interactions and associations, the methodology and algorithms used to develop the linkage scheme are discussed in detail. The method is subsequently tested on and applied to subsets of USPTO patents. The results point to highly skewed citation distributions, enabling us to discern between those fields of technology that are highly science-interactive and those fields where technology development is highly independent from the scientific literature base.
A new index - Relative Publication Growth (RPG) - was suggested for characterizing the annual increase of publications in different selected periods. It has been revealed that the mean citedness of papers ("Chance for Citedness") increases parallel with increasing RPG and growing mean number of references in papers. The number of citations attainable by a paper published in a given journal may be estimated by multiplying the resp. Journal Citedness Factor (JCF) with the Garfield Factor of the resp. journal. The JCF values may represent the aging of information whereas GF-s the potential frequency of citations.
The great importance of titles being highly informative is almost unanimously accepted in literature, assuming that the more informative titles are, the more effectively they serve their functions. The most common measure of title "informativeness" has been the number of "significant" (i.e., non-trivial) words included in it, and one of the factors which might be associated with it is the length of the paper, measured by its number of pages. The present study attempted to test, in a large group of journals from different areas and over six decades, the hypothesis that a paper with more pages will have more "significant" words in its title. Large samples of original research papers were drawn from each decade year of twenty-four leading journals selected from the sciences, the social sciences and the humanities. For each paper, the number of "significant" words in the title was correlated with the number of pages. Findings indicate a difference between the scientific journals on the one hand, and the social sciences and humanities journals on the other. A moderate positive correlation was found in most scientific journals for many periods. In the social sciences journals, and to a greater extent, in the humanities journals, a significant positive correlation was limited to only a few periods, while the rest showed a very low correlation, or even a negative one. The different findings for the sciences are perhaps attributable to their unique inherent features.
With the primary goal of exploring whether citation analysis using scientific papers found on the Web as a data source is a worthwhile means of studying scholarly communication in the new digital environment, the present case study examines the scholarly communication patterns in XML research revealed by citation analysis of ResearchIndex data and SCI data. Results suggest that citation analysis using scientific papers found on the Web as a data source has both advantages and disadvantages when compared with citation analysis of SCI data, but is nonetheless a valid method for evaluating scholarly contributions and for studying the intellectual structure in XML research.
This paper is the continued study on age structure of scientific collaboration in Chinese computer science. Based on an extended database a new method is used to analyze the nature and preference of collaboration. Observed values of two- three- and four-dimensional collaboration were compared respectively with their expected values. Investigation covered co-authors' combination patterns, name permutations in their papers, especially the age of the first author.
Science is the core sector of present-day knowledge production. Yet, the mechanisms of science as an industry are poorly understood. The economic theory of science is still in its infancy, and philosophy of science has only sparsely addressed the issue of economic rationality. Research, however, is costly. Inefficient use of resources consumed by the scientific industry is as detrimental to the collective advancement of knowledge as are deficiencies in method. Economic inefficiency encompasses methodological inadequacy. Methods are inadequate if they tend to misallocate time and effort. If one omits the question of how inputs are transformed into outputs in self-organised knowledge production, this means neglecting an essential aspect of the collective rationality of science. A self-organised tendency towards efficiency comes to the fore as soon as science is described as an economy in which researchers invest their own attention in order to obtain the attention of others. Viewed like this, scientific communication appears to be a market where information is exchanged for attention. Scientific information is measured in terms of the attention it earns. Since scientists demand scientific information as a means of production, the attention that a theory attracts is a measure of its value as a capital good. On the other hand, the attention a scientist earns is capitalised into the asset called reputation. Elaborating the ideas introduced in Franck (1998) and (1999), the paper describes science as a highly developed market economy. Science conceived as capital market covers the specific conditions under which scientists, while maximising their reputation, optimise output in the eyes of those competent to judge. Attention is not just any resource. It is the resource whose efficient use is called intelligence. Science, as an industry transforming attention into cognitive output, is bound to miss the hallmark of rationality if it does not pass a test of collective intelligence. The paper closes with considering the prospective outcome of such a test.
The paper presents the results of an examination of gender differences in scientific productivity on a sample of 840 respondents, half the young scientific population in Croatia. In the last decade gender differences in the scientific productivity of young researchers have increased, which may be the result of introducing a more competitive scientific system. Young female researchers publish an average of two scientific papers less than their male counterparts in five years, and their publications reach 70.6% of males' publication productivity in the same period. In the case of both sexes, about 15% researchers publish about half of all research papers, but even the most productive women publish less than their male counterparts Socio-demographic, educational and qualificational predictors contribute more or less equally to the number of scientific publications by women and men. It is not until we introduce structural variables that a strong sex differentiation appears because these factors are much more powerful in explaining the production of women. They show that female scientists' publication productivity is more strongly influenced by their position in the social organization of science. There are also considerable sex differences in the case of individual productivity predictors. International contacts determine the number of papers by female scientists most of all. Attendance at scientific conferences abroad is the most powerful predictor of male productivity, too, but reviewing colleagues' papers and academic degree are also very important.
The Erdos number (EN) for collaborative papers among mathematicians was defined as indicating the topological distance in the graph depicting the co-authorship relations, i. e., EN = 1 for all co-authors of Paul Erdos; EN = 2 for their co-authors who did not publish jointly with Erdos; etc. A refinement of this notion uses resistance distances leading to rational Erdos numbers (REN), which (as indicated by their name) are rational numbers. For acyclic graphs, EN = REN, but for graphs with circuits these numbers differ. Further refinements are possible using weighted edges in the co-authorship graph according to the number of jointly authored papers.
An analysis of 1223 papers published by India (347papers) and China (876papers) at conferences and in journals during 1993 and 1997 in the field of laser S&T indicates that China's output was twice to that of India. However, Activity Indices for both the countries in 1993 and 1997 were almost the same. Chinese scientists preferred to publish in domestic journals, while Indian scientists published in foreign journals. The number of papers by Indian scientists in SCI covered journals and journals with high-Normalized Impact Factors was more than for China, and, thus India was better connected to the mainstream science compared to China. The impact made by Indian papers was more than for Chinese papers, as reflected by normalized impact per paper, proportion of papers in high quality journals, and publication effective index. Indian papers also got more citations per paper than Chinese papers. Team research appears to be better in China than in India, as reflected by the number of mega-authored papers produced by the two countries.
This paper informs about an evaluation of Spanish educational research journals using the modality of reputation inferred from survey data. Univariate and multivariate patterns are offered. Specifically cluster analysis and non-parametric multidimensional scaling reveal themselves as useful methods to inquire the complexity of this scientometric question which is the evaluation of periodical series.
Publications originating from Turkey in SSCI were analyzed for changes in the thirty-year span between 1970 and 1999. There has been a high rate of increase in the number of publications and most of these publications were in the form of articles and review papers. The rate of increase was lower than the increase in science publications but the rankings among other countries in sciences and social sciences were comparable. The analysis of impact factors and citations received by published work showed a decline across years. Many of the high-impact publications were joint work with foreign authors. The low level of impact was attributed in part by the difficulty of international scholars in belonging to research networks.
The contribution of Turkish researchers to sciences is increasing. Turkish scientists published more than 6.000 articles in 1999 in scientific journals indexed by the Institute for Scientific Information's Science Citation Index, which puts Turkey to the 25(t)h place in the world rankings in terms of total contribution to science. The number of biomedical publications authored by Turkish scientists is increasing faster than that of engineering and other non-medical sciences, which might be one of the main causes of the steep rise in Turkey's rankings that we have been witnessing in recent years. More specifically, researchers affiliated with Hacettepe University produce almost a quarter of all the biomedical publications of Turkey that appear in international biomedical literature. In this paper, we report the findings of the bibliometric characteristics (authors and affiliations, medical journals and their impact factors, among others) of a total of 1.434 articles published between 1988 and 1997 by scientists affiliated with Hacettepe University Faculty of Medicine and indexed in MEDLINE, a well-known biomedical bibliographic database.
We present some results of an evaluation of research performance of Spanish senior university researchers in Geology. We analyse to what extent productivity of individual researchers is influenced by the level of consolidation of the team they belong to. Methodology is based on the combination of a mail survey carried out among a defined set of researchers, and a bibliometric study of their scientific output. Differences among researchers have been investigated with regard to team size and composition, patterns of publication in domestic and foreign journals, productivity, co-authorship of papers, and impact of publications. Results indicate that not belonging to a research team represents a handicap at the time of publishing in top international journals. Researchers belonging to consolidated teams are more productive than their colleagues in non-consolidated teams, and these in turn more than individuals without team. Team size does not appear to be as important for scientific productivity as the number of researchers within the team that reached a stable job position. Analysis of the impact factor of journals has not revealed differences among researchers with regard to the visibility of their papers.
The dominant method currently used to improve the quality of Internet search systems is often called "digital democracy." Such an approach implies the utilization of the majority opinion of Internet users to determine the most relevant documents: for example, citation index usage for sorting of search results (google.com) or an enrichment of a query with terms that are asked frequently in relation with the query's theme. "Digital democracy" is an effective instrument in many cases, but it has an unavoidable shortcoming, which is a matter of principle: the average intellectual and cultural level of Internet users is very low-everyone knows what kind of information is dominant in Internet query statistics. Therefore, when one searches the Internet by means of "digital democracy" systems, one gets answers that reflect an underlying assumption that the user's mind potential is very low, and that his cultural interests are not demanding. Thus, it is more correct to use the term "digital ochlocracy" to refer to Internet search systems with "digital democracy." Based on the well-known mathematical mechanism of linear programming, we propose a method to solve the indicated problem.
Fractional frequency distributions of, for example, authors with a certain (fractional) number of papers are very irregular and, therefore, not easy to model or to explain. This article gives a first attempt to this by assuming two simple Lotka laws (with exponent 2): one for the number of authors with n papers (total count here) and one for the number of papers with n authors, n is an element of N. Based on an earlier made convolution model of Egghe, interpreted and reworked now for discrete scores, we are able to produce theoretical fractional frequency distributions with only one parameter, which are in very close agreement with the practical ones as found in a large dataset produced earlier by Rao. The article also shows that (irregular) fractional frequency distributions are a consequence of Lotka's law, and are not examples of breakdowns of this famous historical law.
The potential impact of the Internet on the public's demand for the services and resources of public libraries is an issue of critical importance. The research reported in this article provides baseline data concerning the evolving relationship between the public's use of the library and its use of the Internet. The authors developed a consumer model of the American adult market for information services and resources, segmented by use (or nonuse) of the public library and by access (or lack of access) to, and use (or nonuse) of, the Internet. A national Random Digit Dialing telephone survey collected data to estimate the size of each of six market segments, and to describe their usage choices between the public library and the Internet. The analyses presented in this article provide estimates of the size and demographics of each of the market segments; describe why people are currently using the public library and the Internet; identify the decision criteria people use in their choices of which provider to use; identify areas in which libraries and the Internet appear to be competing and areas in which they appear to be complementary; and identify reasons why people choose not to use the public library and/or the Internet. The data suggest that some differentiation between the library and the Internet is taking place, which may very well have an impact on consumer choices between the two. Longitudinal research is necessary to fully reveal trends in these usage choices, which have implications for all types of libraries in planning and policy development.
There are signs that information architecture is coalescing into a field of professional practice. However, if it is to become a profession, it must develop a means of educating new information architects. Lessons from other fields suggest that professional education typically evolves along a predictable path, from apprenticeships to trade schools to college- and university-level education. Information architecture education may develop more quickly to meet the growing demands of the information society. Several pedagogical approaches employed in other fields may be adopted for information architecture education, as long as the resulting curricula provide an interdisciplinary approach and balance instruction in technical and design skills with consideration of theoretical concepts. Key content areas are information organization, graphic design, computer science, user and usability studies, and communication. Certain logistics must be worked out, including where information architecture studies should be housed and what kinds of degrees should be offered and at what levels. The successful information architecture curriculum will be flexible and adaptable in order to meet the changing needs of students and the, marketplace.
The article presents a matrix that can serve as a tool for designing the information architecture of a Web portal in a logical and systematic manner. The information architect begins by inputting the portal's objective, target user, and target content. The matrix then determines the most appropriate information architecture attributes for the portal by filling in the Applied Information Architecture portion of the matrix. The article discusses how the matrix works using the example of a children's Web portal to provide access to museum information.
This article suggests that Information Architecture (IA) design is primarily an inductive process. Although top-level goals, user attributes and available content are periodically considered, the process involves bottom-up design activities. IA is inductive partly because it lacks internal theory, and partly because it is an activity that supports emergent phenomena (user experiences) from basic design components. The nature of IA design is well described by Constructive Induction (CI), a design process that involves locating the best representational framework for the design problem, identifying a solution within that framework and translating it back to the design problem at hand. The future of IA, if it remains inductive or develops a body of theory (or both), is considered.
An information architecture that allows users to easily navigate through a system and quickly recover from mistakes is often defined as a highly usable system. But usability in systems design goes beyond a good interface and efficient navigation. In this article we describe two database systems in a law enforcement agency. One system is a legacy, text-based system with cumbersome navigation (RMS); the newer system is a graphical user interface with simplified navigation (CopNet). It is hypothesized that law enforcement users will evaluate CopNet higher than RMS, but experts of the older system will evaluate it higher than others will. We conducted two user studies. One study examined what users thought of RMS and CopNet, and compared RMS experts' evaluations with nonexperts. We found that all users evaluated CopNet as more effective, easier to use, and easier to navigate than RMS, and this was especially noticeable for users who were not experts with the older system. The second, follow-up study examined use behavior after CopNet was deployed some time later. The findings revealed that evaluations of CopNet were not associated with its use. If the newer system had a better interface and was easier to navigate than the older, legacy system, why were law enforcement personnel reluctant to switch? We discuss reasons why switching to a new system is difficult, especially for those who are most adept at using the older system. Implications for system design and usability are also discussed.
Information interaction is the process that people use in interacting with the content of an information system. Information architecture is a blueprint and navigational aid to the content of information-rich systems. As such information architecture performs an important supporting role in information interactivity. This article elaborates on a model of information interactivity that crosses the "no-man's land" between user and computer articulating a model that includes user, content and system, illustrating the context for information architecture.
Creating an information architecture for a bilingual Web site presents particular challenges beyond those that exist for single and multilanguage sites. This article reports work in progress on the development of a content-based bilingual Web site to facilitate the sharing of resources and information between Speech and Language Therapists. The development of the information architecture is based on a combination of two aspects: an abstract structural analysis of existing bilingual Web designs focusing on the presentation of bilingual material, and a bilingual card-sorting activity conducted with potential users. Issues for bilingual developments are discussed, and some observations are made regarding the use of card-sorting activities.
This study examines patterns of authorship in nineteen Egyptian journals of agricultural science. Multiple authorship was found to be the predominant trend in the field and co-authored papers accounted for some 79 percent of the sample. The most common form of multiple authorship involved three people. Considerable variation was found among sub-fields and co-authorship was found to be most common in social-science related agricultural disciplines. The author found no significant differences in patterns of collaboration in the agricultural sciences in Egypt and two the other developing countries for which comparative data was available, India and Pakistan.
This paper analyses trends and volatility in patenting in USA by Japan in the electronics/electrical and motor vehicle/transport epuiqment industries. The number of patents has increased steadily, with the two industries accounting in 1997 for one-half of Japanese patents in USA. The electronics/electrical industry has been a much stronger performer, with a share of 30% of US patents, compared with 20% for motor vehicle/transport epuiqment. Using monthly data for 1975-1997, the time-varying nature of the volatility of patents registered in the USA is examined. The asymmetric AR(1)-GJR(1,1) model is found to be suitable for motor vehicle/transport epuiqment, whereas the AR(1)-GARCH(1,1) and AR(1)-GJR(1,1) models provide interesting results for electronics/electrical.
Although Derwent Biotechnology Abstracts has been used in a variety of bibliometric studies, it has never undergone a systematic examination of its reliability and validity. The objective of this paper is to assess its quality for bibliometric studies attempting to analyse the evolution of biotechnology research, to map leading organizations, and to study the interaction between science and technology. The first part reviews the tools used in bibliometric studies of biotechnology and describes the Derwent Biotechnology Abstracts database. The second part is a case study of plant genetic research, with special emphasis on Canada.
An analysis of 952 publications published by Indian scientists and abstracted by Journal of Current Laser Abstracts during 1970-1994 indicates that laser research in India picked up during 1978-1994 and reached its peak in 1980. The Indian output in the field of laser research forms an integral part of the mainstream science as reflected by the pattern of publications and their citations in the international literature. Laser research performed in India improved considerably during 1985-1994 as compared to 1970-1984 as seen by different impact indicators such as citation per paper, proportion of high quality papers, and publication effective index. The publication output is concentrated among few institutions and there is a similarity in the activity and attractively profile of the highly productive institutions. India's citation rate per paper for highly productive authors is at par with the world citation rate per paper. The study indicates that the proportion of mega authored papers increased during 1990-1994 and the international collaboration is mainly with the USA.
An analysis of gender differences in psychology in India provides quantitative and qualitative assessment of R&D output contributed by psychologists with the indication of the trend of growth, skewness, relatedness, co-authorship pattern of productivity.
This study explores the intellectual structure and interdisciplinary breadth of Knowledge Management in its early stage of development. Intellectual structure is established by a principal component analysis applied to an author co-citation frequency matrix. The author co-citation frequencies were derived from the 1994-1998 academic literature and captured by the single search phrase of "Knowledge Management." Four factors were labeled Knowledge Management, Organizational Learning, Knowledge-based Theories, and The Role of Tacit Knowledge in Organizations. The interdisciplinary breadth surrounding Knowledge Management mainly occurs in the discipline of management. Empirical evidence suggests that the discipline of Computer Science is not a key contributor as originally hypothesized.
Certain similarities between the types of data reported in retrospective citation analyses and lifetime/survival/reliability models are noted. Graphical techniques much used in reliability analyses are exploited to throw further light on observed citation age distributions and these are then compared and contrasted with previously reported studies. These simple techniques allow systematic departures of empirical data from assumed theoretical models to be highlighted and the models to be compared.
On the basis of the measured frequency distribution of China's inter-regional co-authored papers covered by the Chinese Science Citation Database, this paper shows the pattern of China's inter-regional research collaboration (IRRC), and analyzes how the collaborative pattern was formed. A new method is used to calculate the expected value matrix based on an observed value matrix of IRRC, which is asymmetric and has no diagonal elements. The results fall into three groups. 1) Regional scientific productivity affects both the collaborative preference and ranking of authors' name; 2) geographical proximity is an important factor determining the pattern of IRRC; 3) when using Salton's measure, regional mean collaborative strength increases as the regional productivity increases, and as the distance between two regions decreases.
Results of searches using a variety of query formulations with several Internet search engines show that strategies intended to give narrower and more precise results may not give improvements in precision even though they result in fewer hits. Searches were performed by students in graduate information retrieval courses using different formulations for the same topic.
This article is the fifth in a series of articles from our study examining information-seeking behavior in relation to information-retrieval (IR) interaction. This article focuses on the examination of the interaction variables within Saracevic's (1989) triadic IR model. The analysis involved an examination of the information-searching behavior of academic researchers during a mediated interaction with an IR system, particularly concentrating on the interaction between the information seeker, the search intermediary, and the IR system. To explore the variables during mediated search interaction, two small-scale studies of mediated on-line searching were conducted at the University of Sheffield. The studies involved mainly qualitative data analysis of interview transcripts and on-line search results, together with quantitative data analysis of questionnaire results. The studies specifically investigated: (1) aspects of the mediated search process, (2) relevant information sources, and (3) interaction measures derived from search logs and tape transcripts, and related interaction measures. Findings include: (1) a number of different types of interactions were identified, (2) the presearching interactions between information seeker and intermediary aided the information seeker to identify their idea and problem, and (3) most information seekers in this study were at the problem definition stage or problem resolution stage following the search process. From this research, it is clear that the interaction did affect the search process. The intermediary helped the users to identify their search terms more clearly and focus on the references obtained. In most cases, the users and intermediary considered the, communication process very effective, and the interactions that took place during the on-line search were found to affect the users' perceptions of the problem, personal knowledge, and relevance judgments. The interaction process aided the users to obtain very useful results with help from the intermediary. In general, the users gave a positive evaluation of the retrieved answers in terms of focus, completeness, novelty, and degree of nonrelevancy.
We report findings from a recent study of how public libraries are using on-line community networks to facilitate the public's information seeking and use in everyday situations. These networks have been lauded for their potential to strengthen physical communities through increasing information flow about local services and events, and through facilitating civic interaction. However, little is known about how the public uses such digital services and what barriers they encounter. This article presents findings from a 2-year study that comprised a national survey with public library staff, followed by extensive case studies in three states. At each site, data were collected using on-line surveys, field observation, in-depth interviews, and focus groups with Internet users, human service providers, and library staff. The on-line surveys and the follow-up interviews with respondents were based on sense-making theory. In our article we discuss: (1) how the public is using networked community information systems and the Internet for daily problem solving, (2) the types of barriers users encounter, and (3) the benefits for individuals and physical communities from public library-community networking initiatives and the emergence of "information communities."
This article presents a case study of the information-seeking behavior of 7-year-old children in a semistructured situation in their school library media center. The study focuses on how young children who are in the process of learning to read cope with searching for information in a largely textual corpus, and how they make up for their deficit in textual experience. Children's search strategies are examined and discussed in the context of computer versus shelf searching, textual versus visual searching, and in comparison with adult search dimensions previously established.
As the Web becomes more popular, the interest in effective navigation is increasing. Menu design is becoming a central issue of human computer interface design as the focus of computer applications moves from the computer as a machine to the human as a user. The purpose of this study was to investigate the effect of three different Web menu designs (a simple selection menu, a global and local navigation menu, and a pull-down menu) on users' information-seeking performance and attitudes. Three Cyber-shopping mail Web sites were developed for the experiment. These Web sites had the same content and a constant information structure, but each had a different menu design. The results showed different effect of menu design on both searching performance and browsing performance. More specifically, participants' searching performance was superior in the pull-down menu condition compared to the global and local navigation menu and the simple selection menu conditions. Browsing task performance was the fastest with the global and local navigation menu. However, there were no significant differences among three menu designs in terms of users' perception on appeal of the Web site and disorientation.
This article presents a genetic relevance optimization process performed in an information retrieval system. The process uses genetic techniques for solving multi-modal problems (niching) and query reformulation techniques commonly used in information retrieval. The niching technique allows the process to reach different relevance regions of the document space. Query reformulation techniques represent domain knowledge integrated in the genetic operators structure to improve the convergence conditions of the algorithm. Experimental analysis performed using a TREC subcollection validates our approach.
The authors investigate the influence of index term distributions, and indexing exhaustivity levels on the document space within a visual information retrieval environment called DARE. Using combinations of three levels of term distributions (shallow, observed, steep) and indexing exhaustivity (low, observed, high), hypothetical document sets were generated and projected onto the DARE environment. The results from the simulated document sets demonstrate the importance of term distribution and exhaustivity characteristics on the density of document spaces and their implications for retrieval, particularly when different term weighting schemes are used. The results also demonstrate how different combinations of exhaustivity and term distributions may result in similar document space density characteristics.
This article reports findings from a study of the geographic distribution of foreign authors in the Journal of American Society for Information Science & Technology (JASIST) and Journal of Documentation (JDoc). Bibliographic data about foreign authors and their geographic locations from a 50-year publication period (1950-1999) are analyzed per 5-year period for both JASIST and JDoc. The distribution of foreign authors by geographic locations was analyzed for the overall trends in JASIST and JDoc. UK and Canadian authors are the most frequent foreign authors in JASIST. Authors from the United States and Canada are the most frequent foreign authors in JDoc. The top 10 geographic locations with highest number of foreign authors and the top 10 most productive foreign authors were also identified and compared for their characteristics and trends.
This article presents a concrete example of a work task: a doctor treating a patient suffering from schizophrenia. It is outlined how work task, work situation, perceived work situation, task complexity, information need, information seeking and topicality, situational relevance, relevance assessment (including a discussion of system relevance and algorithmic relevance) and work task fulfillment may be understood. Relevance is defined as something serving as a tool to a goal. "Tool" understood in the widest possible sense, including ideas, meanings, theories and documents as tools.
Explores the Malaysian computer science and information technology publication productivity. A total of 547 unique Malaysian authors, affiliated to 52 organizations in Malaysia, contributed 461 publications between 1990 and 1999 as indicated by data collected from three Web-based databases. The majority (378 or 69.1%) of authors wrote one publication. The productive authors and the number of their papers as well as the position of their names in the articles are listed to indicate their productivity and degree of involvement in their research publications. Researchers from the universities contribute about 428 (92.8%) publications. The three most productive institutions together account for a total of 258 (56.0%) publications. The composition of the publications are 197 (42.7%) journal articles, 263 (57.1%) conference papers, and 1 (0.2%) monograph chapters. The results indicate that the scholars published in a few core proceedings but contributed to a wide variety of journals. Thirty-nine fields of research undertaken by the scholars are also revealed. The possible reasons for the amount and pattern of contributions are related to the size of researcher population in the country, the availability of refereed scholarly journals, and the total expenditure allocated to information, computers, and communication technology (ICCT) research in Malaysia.
Can the inclusion of new journals in the Science Citation index be used for the indication of structural change in the database, and how can this change be compared with reorganizations of relations among previously included journals? Change in the number of journals (n) is distinguished from change in the number of journal categories (m). Although the number of journals can be considered as a given at each moment in time, the number of journal categories is based on a reconstruction that is time-stamped ex post. The reflexive reconstruction is in need of an update when new information becomes available in a next year. Implications of this shift towards an evolutionary perspective are specified.
All known previous Web link studies have used the Web page as the primary indivisible source document for counting purposes. Arguments are presented to explain why this is not necessarily optimal and why other alternatives have the potential to produce better results. This is despite the fact that individual Web files are often the only choice if search engines are used for raw data and are the easiest basic Web unit to identify. The central issue is of defining the Web "document": that which should comprise the single indissoluble unit of coherent material. Three alternative heuristics are defined for the educational arena based upon the directory, the domain and the whole university site. These are then compared by implementing them on a set of 108 UK university institutional Web sites under the assumption that a more effective heuristic will tend to produce results that correlate more highly with institutional research productivity. It was discovered that the domain and directory models were able to successfully reduce the impact of anomalous linking behavior between pairs of Web sites, with the latter being the method of choice. Reasons are then given as to why a document model on its own cannot eliminate all anomalies in Web linking behavior. Finally, the results from all models give a clear confirmation of the very strong association between the research productivity of a UK university and the number of incoming links from its peers' Web sites.
Knowledge management (KM) or knowledge sharing in organizations is based on an understanding of knowledge creation and knowledge transfer. In implementation, KM is an effort to benefit from the knowledge that resides in an organization by using it to achieve the organization's mission. The transfer of tacit or implicit knowledge to explicit and accessible formats, the goal of many KM projects, is challenging, controversial, and endowed with ongoing management issues. This article argues that effective knowledge management in many disciplinary contexts must be based on understanding the dynamic nature of knowledge itself. The article critiques some current thinking in the KM literature and concludes with a view towards knowledge management programs built around knowledge as a dynamic process.
This article examines the nature of Knowledge Management-how it differs from Data Management and Information Management, and its relationship to the development of Expert Systems and Decision Support Systems. It also examines the importance of Communities of Practice and Tacit Knowledge for Knowledge Management. The discussion is organized around five explicit questions. One: What is "knowledge"? Two: Why are people, especially managers, thinking about Knowledge Management? Three: What are the enabling technologies for Knowledge Management? Four: What are the prerequisites for Knowledge Management? Five: What are the major challenges for Knowledge Management?.
Virtual teams are becoming a preferred mechanism for harnessing, integrating, and applying knowledge that is distributed across organizations and in pockets of collaborative networks. In this article we recognize that knowledge application, among the three phases of knowledge management, has received little research attention. Paradoxically, this phase contributes most to value creation. Extending communication theory, we identify four challenges to knowledge integration in virtual team environments: constraints on transactive memory, insufficient mutual understanding, failure in sharing and retaining contextual knowledge, and inflexibility of organizational ties. We then propose knowledge management system (KMS) approaches to meet these challenges. Finally, we identify promising avenues for future research in this area.
Knowledge management is discussed in the context of "articulation" work, that is routine interactions in groups of local practice. In such situations, knowledge is largely acquired and maintained by learning from the appropriate behavior of others by means of "organizational ethology." This phenomenon is described as "mundane knowledge management." The concepts of mundane knowledge management and organizational ethnology are explored in a case study of a project to promote virtual enterprise formation. Evaluation of the project prototype, a platform for online cooperative work, suggests that unless design provides adequate social and technical cues for the work to hand, the mundane knowledge that sustains cooperative work may be compromised by ethological breakdown.
A major professional concern for people undertaking knowledge management initiatives is interpreting the field for managers and others in their organization. This exploratory research sought to investigate the dynamics of knowledge in organizations: how (if at all) it was perceived, interpreted, utilized, and integrated into the functions, processes, and outputs of the organization. Three organizations with differing functions and outputs were studied: a law firm, an educational institution, and a suburban local council, each being between 100 and 200 employees in size. Semistructured interviews (both individual and small group) were carried out with people at all levels to gather perceptions of the dynamics of knowledge in that organization. It was found that knowledge structures and cultures differed substantially between organizations, and were heavily influenced by the commercial environment and the governing structures. People at all levels had substantial awareness of the nature of knowledge within the organization, and there were a significant number of initiatives targeted at improving the way that knowledge was used. The concept of knowledge itself was quite unproblematic, although it was considerably more complex and nuanced than most definitions allow for. Information services had an important, although not a central role in knowledge dynamics. These findings raise a number of questions about the suitability of much knowledge management theory.
One form of knowledge management is the use of measures, to foster learning, to transform individual tacit understanding to shared explicit sensemaking, to evaluate and improve processes and customer service, and even to rationalize and control organizational activities and workers. This article summarizes and applies four theoretical approaches-organizational learning, sensemaking, quality management, and critical theory-to explore how measures are constructed, interpreted, and used within organizational settings as forms of knowledge management. The primary principles, the role of communication, and the role of measures are summarized for each approach. The article ends by discussing some implications of measures in general and this multitheoretic conceptualization of measures in particular for knowledge management.
This article engages one of the most important concepts in Knowledge Management, namely, the concept of "social capital," focusing upon the problem of measure and value in capitalism, specifically within the period and conditions of post-Fordist production. The article engages work that has emerged from out of the Italian Workerist and Autonomist Marxist movements (as well as French post-structuralist theory) since the 1960s, and it particularly focuses upon the work of the contemporary Italian philosopher and political activist, Antonio Negri.(1) In doing so, it presents a more politically "Left" development of the concept of social capital than is often possible within the largely Management-defined discourses common to Knowledge Management. At the same time, however, the article points to the importance of Knowledge Management as a symptom of a turn in political economy, even though Knowledge Management, because of its provenance, has been unable to fully explore social capital as a shift in capitalist notions of value.
In this article we study directed, acyclic graphs. We introduce the head and tail order relations and study some of their properties. Recalling the notions of generalized bibliographic coupling and generalized co-citation, and introducing a new property, called the l-property, we come to a characterization of lattices. As document citation networks are concrete realizations of directed acyclic graphs all our results are directly applicable to citation analysis.
An investigation into the pattern of international interlinking between Asia-Pacific university Web sites is described. AltaVista advanced searches were used for the data collection and network diagrams used to portray the results from four perspectives. It was found that each of the four angles allowed novel interpretations of the data, but that Australia and Japan were nevertheless clearly at the heart of the Web in the region, with Australia being a particularly common target of links and Japan having a more balanced profile of ingoing and outgoing hyperlinks. Interestingly, one of the perspectives mimicked an official grouping of less wealthy countries in the region whilst another contained the more developed countries, with Singapore and Thailand appearing in both. It was hypothesised that the nature of larger Web sites covered was qualitatively different from that of smaller ones, making the deduction of relationships between the hosting institutions difficult from the link counts alone.
We analyse to what extent research collaboration and performance of individual scientists is influenced by the level of consolidation of the team they belong to. A case study of Spanish senior university researchers in Geology is performed. Methodology is based on the combination of a mail survey carried out among a defined set of researchers, and a bibliometric study of their scientific output. Results provide support for the hypothesis that consolidation of research teams would result in a greater facility to establish contacts and collaborations with colleagues, that could benefit all members of the team, fostering their participation in funded projects and favouring their potential to publish in international mainstream journals.
Social network analysis is an important research tradition in structural sociology and has contributed much to our understanding of inter and intra organizational relations. Of particular significance is the contribution of social network analysis to the definition of community. Communities, whether traditional or scientific, can be effectively thought of as a series of positions and roles. This paper proposes four hypotheses about a select group of management scholars (laureates) and the network ties that connect them. Laureates were asked to identify individuals who had influenced their intellectual development and work in the management discipline. An invisible college in the traditional sense did not exist but rather a complex series of intellectual neighborhoods were identified. These neighborhoods, as contrasted to true communities or colleges were small, uncoordinated, and fragmented.
This article examines the contrast between Brazil's impressive scientific development over the last 25 years and its lagging innovative capacity. As the world's eighth largest economy, an evaluation of this link offers insights, not only for Brazil, but also for parallel developments in other emerging countries. The growth rate of Brazilian scientific production has exceeded the international average, showing a six-fold increase in the last twenty-five years. However, Brazil's innovation capability is still unsatisfactory and, in contrast to its scientific production, is failing to grow significantly. This hiatus between the generation of science and innovation is also typical of other emergent countries. A possible remedy is suggested: interaction with universities and research centers is particularly necessary for Brazilian companies - far more so than in industrialized countries.
By comparing the citation patterns of Korean researchers in physics and mechanical engineering, this study identifies the extent to which type of publication source (Korean non-SCI, Korean SCI, and international SCI) and type of authorship (purely Korean authors, Korean-foreign co-authors, and foreign-Korean co-authors) influence the choice of sources cited by Korean scientists. Koreans publishing physics or mechanical engineering papers in international SCI journals are more likely to cite articles published in journals of the science mainstream countries (the U. S., the U. K., the Netherlands, and Germany) than articles published in national journals, while Koreans publishing in Korean journals tend to cite articles published in national journals. In terms of authorship, articles published in mainstream journals are more highly cited by internationally co-authored papers than Korean-authored papers in both disciplines.
A comparison has been carried out between the scientific production of Turkish physicists in the periods 1961-1971 and 1994-2000, by considering articles (written singly or in collaboration with scientists of different nationalities) which have received at least ten citations. The results show that in 30 years, appreciable increases have occurred in the number of authors making significant contributions and in the number of papers based on research carried out in Turkey.
This article aimed to report Journal Impact Factor (J-IF) and Journal Immediacy Index (J-II) of 68 Thai academic journals during the past five years (from 1996 to 2000) using the calculation method given by the Institute for Scientific Information (ISI). This was the first time that the citation indexes of Thai academic journals were established. With respect to the journal impact factor, the results showed that only six journals have been cited continuously during the past five years, this being 8.8% of the total journal number selected in this work. It was also noticeable that articles published in longer journal age tended to have greater opportunity to be cited and higher journal impact factor. The average impact factor of the 68 journals was relatively low, this being of 0.069, suggesting that the possibility of an article published in a national journal to be cited was only 6.9%. In terms of the immediacy index, it was found that the average immediacy index value was 0.063, which was again very low. No significant relationship between the journal age and the immediacy index could be observed. 47% of the journals have never been able to produce the immediacy index in the past five years, suggesting that articles in the Thai academic journals were hardly cited within the same years they were published.
Usage and user feedback about a state digital library, in which the developers/designers, content providers, different types of libraries and their staffs, and a variety of user groups represent a loose federation of separate organizations with diverse expectations and needs, are investigated. Through corroboratory evidence from usage statistics of Internet-based database services available through the digital library, responses to a state-wide-administered library survey, and a Web-based survey of end users, the authors identify contributing factors for the organizational usability of state digital libraries. The authors refine and enhance an organizational usability model for the unique environment of state digital libraries and identify three modes of interaction (influence, communication, activity) and the challenges each interaction presents: in addressing diverse player needs and expectations; the unequal awareness and training in using state digital libraries; and the lack of sufficient communication channels among players. In addition, the findings highlight the double-edged impact of physical libraries on the state digital library.
The term and notion of the "half-life" index-number of literature obsolescence, as well as their borrowing from nuclear physics and adaptation into the literature of literature obsolescence, have up to now been attributed to the librarian Burton and the physicist Kebler and to their famous 1960 journal article. In this article it is documented that (1) Burton and Kebler in their 1960 article were not the first to use the term literature "half-life"; (2) it was not Burton and Kebler who borrowed the conception of "half-life" from nuclear physics and not them who adapted it into the literature of literature obsolescence; (3) in their 1960 article Burton and Kebler first made critical and later ambiguous statements, and finally attributed only "some validity" to the idea of literature half-life; (4) Burton and Kebler stated and produced an argument to show that there is an essential difference between the nature of radioactive "half-life" and that of literature "half-life", and they therefore disapproved the use of the latter term; (5) in his next article published in 1961 and entirely left out of consideration, Burton proposed the term "median age" of statistical nature in place of the term literature "half-life." For all these reasons it is unfounded and erroneous to continue to attribute the term and conception of literature "half-life" to Burton and Kebler.
In each of 41 research journals in the physical, life, and social sciences there is a linear relationship between the average number of references and the normalized paper lengths. For most of the journals in a given field, the relationship is the same within statistical errors. For papers of average lengths in different sciences the average number of references is the same within +/-17%. Because papers of average lengths in various sciences have the same number of references, we conclude that the citation counts to them can be inter-compared within that accuracy. However, review journals are different: after scanning 18 review journals we found that those papers average twice the number of references as research papers of the same lengths.
Using citations, papers and references as parameters a relatedness factor (RF) is computed for a series of journals. Sorting these journals by the RF produces a list of journals most closely related to a specified starting journal. The method appears to select a set of journals that are semantically most similar to the target journal. The algorithmic procedure is illustrated for the journal Genetics. Inter-journal citation data needed to calculate the RF were obtained from the 1996 ISI Journal Citation Reports on CD-ROM(C). Out of the thousands of candidate journals in JCR(C), 30 have been selected. Some of them are different from the journals in the JCR category for genetics and heredity. The new procedure is unique in that it takes varying journal sizes into account.
This article proposes evaluation methods based on the use of nondichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable from the user point of view in modern large IR environments. The proposed methods are (1) a novel application of P-R curves and average precision computations based on separate recall bases for documents of different degrees of relevance, and (2) generalized recall and precision based directly on multiple grade relevance assessments (i.e., not dichotomizing the assessments). We demonstrate the use of the traditional and the novel evaluation measures in a case study on the effectiveness of query types, based on combinations of query structures and expansion, in retrieving documents of various degrees of relevance. The test was run with a best match retrieval system (InQuery(1)) in a text database consisting of newspaper articles. To gain insight into the retrieval process, one should use both graded relevance assessments and effectiveness measures that enable one to observe the differences, if any, between retrieval methods in retrieving documents of different levels of relevance. In modern times of information overload, one should pay attention, in particular, to the capability of retrieval methods retrieving highly relevant documents.
This article reports an approach to automatic thesaurus construction for Chinese documents. An effective Chinese keyword extraction algorithm is first presented. Experiments showed that for each document an average of 33% keywords unknown to a lexicon of 123,226 terms could be identified by this algorithm. Of these unregistered words, only 8.3% of them are illegal. Keywords extracted from each document are further filtered for term association analysis. Association weights larger than a threshold are then accumulated over all the documents to yield the final term pair similarities. Compared to previous studies, this method speeds up the thesaurus generation process drastically. It also achieves a similar percentage level of term relatedness.
In Cross-Language Information Retrieval (CLIR), queries in one language retrieve relevant documents in other languages. Machine-Readable Dictionaries (MRD) and Machine Translation (MT) systems are important resources for query translation in CLIR. We investigate the use of MT systems and MRD to Arabic-English and English-Arabic CUR. The translation ambiguity associated with these resources is the key problem. We present three methods of query translation using a bilingual dictionary for Arabic-English CUR. First, we present the Every-Match (EM) method. This method yields ambiguous translations because many extraneous terms are added to the original query. To disambiguate query translation, we present the First-Match (FM) method that considers the first match in the dictionary as the candidate term. Finally, we present the Two-Phase (TP) method. We show that good retrieval effectiveness can be achieved without complex resources using the Two-Phase method for Arabic-English CUR. We also empirically evaluate the effectiveness of the Arabic-English MT approach using short, medium, and long queries of TREC7 and TREC9 topics and collections. The effects of the query length to the quality of the MT-based CUR are investigated. English-Arabic CUR is evaluated via MRD and English-Arabic MT. The query expansion via post-translation approach is used to deemphasize the extraneous terms introduced by the MRD and MT for English-Arabic CUR.
This article reports on a study that uses a new analysis and display tool to examine the influences of understanding the system and goals on end-user Internet searching. Thirty-one public library users were observed searching the Web and/or a Web-based on-line catalog. The study identified four user categories, distinguished by the number of search approaches employed. These included linking, use of search engines, URL use, on-line catalog searching, and searching within a specific Web-site domain. Results conclude that experience and motivation, elements of situational goals and mental models, work in tandem to determine search approaches, Web sites visited, and sources used. People who sought information for job-related or educational purposes were highly motivated. Thus, they were persistent. Those who had a great deal of Internet experience used an array of tools; while those with immature mental models of the Internet relied more heavily on the Web on-line catalog or off-line sources. People seeking information for recreational or personal use were not highly motivated. Whether experienced or not, they relied on serendipity, linking, and other tasks that were not cognitively overbearing. When searching became too difficult, they abandoned the Internet as an information source.
This article presents the third part of a research project that investigated the information-seeking behavior and success of seventh-grade science children in using the Yahooligans! Web search engine/directory. In parts 1 and 2, children performed fully assigned tasks to pursue in the engine. In the present study, children generated their tasks fully. Children's information seeking was captured from the cognitive, physical, and affective perspectives using both quantitative and qualitative inquiry methods. Their information-seeking behavior and success on the fully self-generated task was compared to the behavior and success they exhibited in the two fully assigned tasks. Children were more successful on the fully self-generated task than the two fully assigned tasks. Children preferred the fully self-generated task to the two fully assigned tasks due to their ability to find the information sought and satisfaction with search results rather than the nature of the task in itself (i.e., self-generated aspect). Children were more successful when they browsed than when they searched by keyword on the three tasks. Yahooligans! design, especially its poor keyword searching, contributed to the breakdowns children experienced. Implications for system design improvement and Web training are discussed.
The data systems, policies and procedures, corporate culture, and public face of an agency or institution make up its organizational interface. This case study describes how user interfaces for the Bureau of Labor Statistics web site evolved over a 5-year period along with the larger organizational interface and how this co-evolution has influenced the institution itself. Interviews with BLS staff and transaction log analysis are the foci in this analysis that also included user information-seeking studies and user interface prototyping and testing. The results are organized into a model of organizational interface change and related to the information life cycle.
Chemical Abstracts Service recently unveiled citation searching in Chemical Abstracts. With Chemical Abstracts and Science Citation Index both now available for citation searching, this study compares the duplication and uniqueness of citing references for works of chemistry researchers for the years 1999-2001. The two indexes cover very similar source material, so one would expect the citation results to be very similar. This analysis of SciFinder Scholar and Web of Science shows some important differences as the databases are currently offered. Authors and institutions using citation counts as measures of scientific productivity should take note.
This article presents an overview of some of the methodology used in a project that examined children's understanding of library information and how those perspectives change in the first 5 years of formal schooling. Because our understanding of information is reflected in the manner in which we classify, or typify, that information in order to view the library collection from a child's perspective children were invited to shelve (i.e., classify) terms representative of library books and then to label those categories. The resulting shelf categories help us to see library information from a child's perspective. Data collection using group dialog, visual imagery, narrative, cooperative learning techniques, and hands-on manipulatives is described for one session of a project in which children used induction to form concepts related to knowledge organization in a hypothetical library. Analysis for this session included use of hierarchical clustering and multidimensional scaling to examine and compare children's constructions for qualitative differences on several grade levels. Following the description of data collection methods and analysis, a discussion focuses on the reasons for using these particular methods of data collection with a child population.
This article describes how diaries were implemented in a study of the use of archives and archival finding aids by history graduate students. The issues concerning diary use as a data collection technique are discussed as well as the different types of diaries.
Finding information on the Web can be a much more complex search process than previously experienced on many pre-Web information retrieval systems given that finding content online does not have to happen via a search algorithm typed into a search field. Rather, the Web allows for a myriad of search strategies. Although there are numerous studies of Web search techniques, these studies often limit their focus to just one part of the search process and are not based on the behavior of the general user population, nor do they include information about the users. To remedy these shortcomings, this project looks at how people find information online in the context of their other media use, their general Internet use patterns, in addition to using information about their demographic background and social support networks. This article describes the methodology in detail, and suggests that a mix of survey instruments and in-person observations can yield the type of rich data set that is necessary to understand in depth the differences in people's information retrieval behavior online.
Digital libraries allow information access to be integrated into work processes rather than separated from them, but also have the potential to overwhelm users with excessive or irrelevant information, impairing their performance rather than improving it. With the opportunity to create new models of what a library is and how it can be used comes the challenge of improving our understanding of its patrons, their work, and the circumstances under which they perform it. In this article we offer an overview of our experiences using observational methods to learn about one class of users, expert clinicians treating patients in hospital settings. We describe the evolution of our understanding of the users and their informational tasks, and how this evolving understanding is guiding our efforts to create digital library technology. The multidisciplinary composition of our team has enriched our observations and improved the validity of our analysis and interpretations. The multiple observation methods we have employed, including "think-aloud" scenarios in the laboratory, participant observation in the field, key informant interviews, and focus group sessions, have enabled us to enrich the data set, gain greater insight, and verify findings with informants. The relatively tight cycle of observation, analysis, development, and repeat observation has enabled us to iteratively and more rapidly refine our "user model" and "task model," improving, we hope, the usefulness of the technologies we are developing.
Complementary, socially grounded, user-centered methodologies are being used to design new information systems to support biodiversity informatics. Each of the methods- interviews, focus groups, field observations, immersion, and lab testing -has its, own strengths and weaknesses. Methods vary in their ability to reveal the automatic processes of experts (that need to be learned by novices), data richness, and their ability to help interpret complex information needs and processes. When applied in concert, the methods provide a much clearer picture of the use of information while performing a real life information-mediated task. This picture will be used to help inform the design of a new information system, Biological Information Browsing Environment (BIBE). The groups being studied are high school students, teachers, and volunteer adult groups performing biodiversity surveys. In this task the people must identify and record information about many species of flora and fauna. Most of the information tools they use for training and during the survey are designed to facilitate the difficult species identification task.
This article explores the role of scenarios (or use-oriented design representations) in the Afya project as a participatory action research (PAR) tool for studying information seeking and use across the "digital divide." With the aim of improving access to health information and services for Black women, the Afya project has involved forging community-level partnerships with SisterNet, a local grassroots group of Black women devoted to improving their physical, emotional, intellectual, and spiritual health. In the context of community health care, scenarios in the Afya project as a socially grounded planning and design methodology have taken the form of personal narratives of Black women that capture their social experiences and typical problematic health situations. Scenarios of Black women point towards the need to foster social justice by nurturing equitable and participative social activities around technological development and use associated with health information services. Scenarios also suggest specific action-oriented strategies for empowering Black women to build social and digital technologies that we hope will make the provision of health care in our community more just.
The accrual of symbolic capital is an important aspect of academic life. Successful capital formation is commonly signified by the trappings of scholarly distinction or acknowledged status as a public intellectual. We consider and compare three potential indices of symbolic capital: citation counts, Web hits, and media mentions. Our findings, which are domain specific, suggest that public intellectuals are notable by their absence within the information studies community.
We present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names found in the text. Beyond the similarity search capabilities applied at the intraword level, the tool considers a set of specific person name formation rules at the word level, such as combination, abbreviation, duplicity detections, ordering, word omission and insertion, among others. This engine is used in a successful commercial application (also named Matchsimile), which allows searching for lawyer names in official law publications.
A model of browsing-based conceptual information retrieval is proposed employing two different types of dictionaries, a global dictionary and a local dictionary. A global dictionary with the authorized terms is utilized to capture the commonly acknowledgeable conceptual relation between a query and a document by replacing their keywords with the dictionary terms. The documents are ranked by the conceptual closeness to a query, and are arranged in the form of a user's personal digital library, or pDL. In a pDL a user can browse the arranged documents based on a suggestion about which documents are worth examining. This suggestion is made by the information in a local dictionary that is organized so as to reflect a user's interest and the association of keywords with the documents. Experiments for testing the retrieval performance of utilizing the two types of dictionaries were also performed using standard test collections.
Web links have been studied by information scientists for at least six years but it is only in the past two that clear evidence has emerged to show that counts of links to scholarly Web spaces (universities and departments) can correlate significantly with research measures, giving some credence to their use for the investigation of scholarly communication. This paper reports on a study to investigate the factors that influence the creation of links to journal Web sites. An empirical approach is used: collecting data and testing for significant patterns. The specific questions addressed are whether site age and site content are inducers of links to a journal's Web site as measured by the ratio of link counts to Journal Impact Factors, two variables previously discovered to be related. A new methodology for data collection is also introduced that uses the Internet Archive to obtain an earliest known creation date for Web sites. The results show that both site age and site content are significant factors for the disciplines studied: library and information science, and law. Comparisons between the two fields also show disciplinary differences in Web site characteristics. Scholars and publishers should be particularly aware that richer content on a journal's Web site tends to generate links and thus the traffic to the site.
The study reported here tested the efficacy of an information retrieval system output summary and visualization scheme for undergraduates taking a Vietnam War history who were in Kuhlthau's Stage 3 of researching a history essay. The visualization scheme consisted of (a) the undergraduate's own visualization of his or her essay topic, drawn by the student on the bottom half of a sheet of paper, and (b) a visualization of the information space (determined by index term counting) on the top-half of the same page. To test the visualization scheme, students enrolled in a Vietnam War history course were randomly assigned to either the visualization scheme group, who received a high recall search output, or the nonvisualization group, who received a high precision search output. The dependent variable was the mark awarded the essay by the course instructor. There was no significant difference between the mean marks for the two groups. We were pleasantly surprised with this result given the bad reputation of high recall as a practical search strategy. We hypothesize that a more proactive visualization system is needed that takes the student through the process of using the visualization scheme, including steps that induce student cognition about task-subject objectives.
In this article, we examine the conceptual models that help us understand the development and sustainability of scholarly and professional communication forums on the Internet, such as conferences, pre-print servers, field-wide data sets, and collaboratories. We first present and document the information processing model that is implicitly advanced in most discussions about scholarly communications -the "Standard Model." Then we present an alternative model, one that considers information technologies as Socio-Technical Interaction Networks (STINs). STIN models provide a richer understanding of human behavior with online scholarly communications forums. They also help to further a more complete understanding of the conditions and activities that support the sustainability of these forums within a field than does the Standard Model. We illustrate the significance of STIN models with examples of scholarly communication forums drawn from the fields of high-energy physics, molecular biology, and information systems. The article also includes a method for modeling electronic forums as STINs.
This article introduces an effective rotation and scale invariant log-polar wavelet texture feature for image retrieval. The proposed feature is an attempt to enhance the existing content-based image retrieval systems that largely present difficulty in coping with images with changes in orientations and scales. The underlying feature extraction process involves a log-polar transform followed by an adaptive row shift invariant wavelet packet transform. The log-polar transform converts a given image into a rotation and scale invariant but row-shifted image, which is then further processed through an adaptive row-shift invariant wavelet packet transform operation to generate adaptively selected subbands of rotation and scale invariant wavelet coefficients, based on an information cost function. An energy signature is computed for each subband of these wavelet coefficients. To reduce feature dimensionality, only the most dominant log-polar wavelet energy signatures are selected for the feature vector for image retrieval. The overall feature extraction process is quite efficient and involves only 0(n (.) log n) complexity. Experimental results show that this rotation and scale invariant wavelet feature is quite effective for image retrieval and outperforms the traditional wavelet packet signatures.
Research and education in library and information science in Slovakia is presented as an example of the history, present state and future of information science research and collaboration in central European countries. The professional experience in the region since 1990 has been outlined and particular contribution to other countries proposed. Closer collaboration of central and east European countries in information research is needed. Wider context of professional experience covers rapid changes of local information environment, different ways of problem recognition and solutions. Structural changes are shown in professional areas like curricular revision, new categories and terminology, new research methods, new ways to conduct open professional discourse and exchange of ideas. Many professional events, projects, organizations, foundations and published works are named as examples of these changes. Several research projects of the Department of LIS in recent 10 years are described in detail including a new project on the interaction of man and the information environment. In conclusion, future directions of common research are outlined with the emphasis on digital libraries, cognitive and social aspects of the information process and diverse cultural approaches to information transfer and use.
In a recent paper in this journal, Loet Leydesdorff and Gaston Heimeriks (2001, Journal of the American Society for Information Science and Technology, 52, 1262-1294.) argue that biotechnology develops in a self-organizational mode, through interaction between the intellectual structure and the institutional network of the research field. This claim is empirically supported by a multivariate analysis of documents from core biotechnology journals. One unexpected finding in this paper is the relationship between the title words of documents and the region of their origin. This claim requires examination because, as will be shown, it seems to be an artifact of the method used. If this is so, it undermines the authors' theoretical claim that the production of knowledge is a self-organizing process.
Before the Web, the story of online information services was largely one of over-estimates and unmet expectations. This study examines sustained use and non-use of online services within organizations in a way that overcomes limitations of the traditional approaches that repeatedly led to exuberant usage projections. By adopting an open-systems view, we see that firms in highly technical and highly institutional environments have many more incentives to gather data and go online than do firms in low-tech, unregulated industries. But firms make important choices about partnering and outsourcing that can shift informational activities across organizational boundaries. Our analysis focuses on the informational environments of firms in three industries: law, real estate and biotech/pharmaceuticals. This environmental model provides richer conceptualizations about the use of information and communication technologies, including Internet technologies, and better projections about future use. In support of our analysis, we briefly discuss insights from an ongoing intranets study informed by an informational environments perspective.
The role of information retrieval (IR) in support of decision making and knowledge management has become increasingly significant. Confronted by various problems in traditional keyword-based IR, many researchers have been investigating the potential of natural language processing (NLP) technologies. Despite widespread application of NLP in IR and high expectations that NLP can address the problems of traditional IR, research and development of an NLP component for an IR system still lacks support and guidance from a cohesive framework. In this paper, we propose a theoretical framework called NLPIR that aims at integrating NLP into IR and at generalizing broad application of NLP in IR. Some existing NLP techniques are described to validate the framework, which not only can be applied to current research, but is also envisioned to support future research and development in IR that involve NLP.
Personal preferences in the development of categorical folders for bookmarks are examined in terms of both the choice and definition of folder domain and the overall structure of the folder system. Study participants from the financial industry were asked to organize the same set of finance-related bookmarks from a given list, as opposed to describing their organizational approaches using their own personal bookmarks, so that the organizational systems could be compared across the sample. The selection of folder domain is influenced by contextual factors such as intended use and relevancy to current projects. Similarly, the structure of the overall folder system was determined in part by participants' navigational preferences. While the majority of participants created folders that cover the topics of finance, government, accounting, news, law, and tax, the actual definition of these folders and the criteria for inclusion vary across the sample. Furthermore, these criteria cannot be readily discerned from the folder system itself. Variation in folder domain and definition could adversely affect the utility of bookmark management systems for multiple users that involve some degree of standardization. The same variation in interpretation of seemingly identical folders suggests that systems with automatic categorization would not provide users with enough flexibility in how they could organize and access their bookmarks.
Many web queries have geospatial dimensions. While online shopping is built on the premise that distance and location are irrelevant (with the possible exception of shipping charges), tourism and onsite inspection of goods have a geospatial dimension and distance and location are relevant factors. Current search engines build indices based on keyword occurrence and frequency for query negotiation using these indices. This approach is fast, robust, and generic but when queries are related to physical locations and distances rather than cyberdistances this approach leaves the user to sort through pages of results. In this paper, we describe an algorithm that assigns location coordinates dynamically to web sites based on the URL. A prototype search system was built using this algorithm that uses this information to re-rank the results of search engines for queries with a geospatial dimension. We found that over 80% of the URLs tested could be assigned correct location coordinates. This work makes a contribution to retrieval on the web by providing an alternative ranking order for search engine results so that users with queries with a geospatial dimension can more readily use the results of general search engines rather than special purpose applications.
Recently there have been appearing new applications of genetic algorithms to information retrieval, most of them specifically to relevance feedback. The evolution of the possible solutions are guided by fitness functions that are designed as measures of the goodness of the solutions. These functions are naturally the key to achieving a reasonable improvement, and which function is chosen most distinguishes one experiment from another. In previous work, we found that, among the functions implemented in the literature, the ones that yield the best results are those that take into account not only when documents are retrieved, but also the order in which they are retrieved. Here, we therefore evaluate the efficacy of a genetic algorithm with various order-based fitness functions for relevance feedback (some of them of our own design), and compare the results with the Ide dec-hi method, one of the best traditional methods.
In optimizing information flows in networks, it would be useful to predict aspects of the network traffic. Yet, the notion of predicting network traffic does not appear in the relevant literature reporting analysis of network traffic. This literature is both well developed and skeptical about the value of traditional time series analysis on network data. It has consistently reported three "traffic invariants" in the analysis of network and Internet traffic. This study uses such time series analysis on a day's worth of Internet log data and finds poor support for one of the invariants. In the preliminary analysis, evidence of nonlinearity was discovered in these data and the analysis presented here examines this question further. This study posits that nonlinear events may be a traffic invariant although this hypothesis would have to be investigated further. The appearance of nonlinear structures is important to the question of predicting network traffic because there are currently no methods to predict time series with nonlinear structures. The discovery of nonlinear structures, then, may mean that developing a predictive model is impossible with current techniques. On the other hand, these nonlinearities may result from interactions from other OSI Layers than the one studied.
This research applies Lotka's Law to metadata on open source software development. Lotka's Law predicts the proportion of authors at different levels of productivity. Open source software development harnesses the creativity of thousands of programmers worldwide, is important to the progress of the Internet and many other computing environments, and yet has not been widely researched. We examine metadata from the Linux Software Map (LSM), which documents many open source projects, and Sourceforge, one of the largest resources for open source developers. Authoring patterns found are comparable to prior studies of Lotka's Law for scientific and scholarly publishing. Lotka's Law was found to be effective in understanding software development productivity patterns, and offer promise in predicting aggregate behavior of open source developers.
The majority of a university faculty's work deals with information or knowledge-finding, getting, reading, and using it. We analyzed the demographic portion of a library use survey given to faculty and staff of the University of Tennessee at Knoxville (UTK) to obtain a profile of information-related activities carried out by university faculty. Journal articles were the predominant document type that faculty both read and authored. Faculty averaged 4.2 journal subscriptions per person, of which 84% were paid for personally. Twenty-five percent of the faculty had obtained some funds for information products and of those funded, the median amount provided was $500. Faculty spent 24 minutes per day using e-mail and 78 minutes per week using non-e-mail computer network. Faculty reported publishing 3.0 journal articles per year and 31% of the faculty had won an award for professional contributions in the previous 2 years.
The paper discusses techniques to emphasize patterns in citation data and to study their dynamics. In this context, the STATIS and the STATIS dual methods are presented. The methods are a generalization of the principal component analysis from a dynamical point of view. STATIS and its dual are applied in order to illustrate the dynamic of 'citing-cited patterns' by using citation data of sixteen major journals from the statistics field.
While most of us who study intellectual and technical advancement believe that innovations tend to swarm, the details of this process are not well understood. The aggregate-level behavior of US patents is examined as a way to better infer the process that generates innovation. The amount of swarming decreases as the observational period increases, which indicates that the process of innovation is not perfectly self-similar. Instead, the effects of innovations are mostly contained within specialized areas, and do not often trigger further advances in other fields.
An analysis of 1317 papers published in first fifty volumes during 1978 to 2001 of the international journal Scientometrics indicates the heterogeneity of the field with emphasis on scientometric assessment. The study indicates that the US share of papers is constantly on the decline while that of the Netherlands, India, France and Japan is on the rise. The research output is highly scattered as indicated by the average number of papers per institution. The scientometric output is dominated by the single authored papers, however, multi-authored papers are gaining momentum. Similar pattern has been observed for domestic and international collaboration.
This paper describes results of a scientometric study of the Institute of Molecular and Cell Biology (IMCB). The purpose of the study is to evaluate the research performance of IMCB in the first ten years since its establishment. Research inputs and three research outputs - publications, graduate students, and patents filed, are examined. The findings indicate that in the ten years, IMCB produced 395 research papers, 33 book chapters, 24 conference papers, and 4 monographs, graduated 46 PhDs and 14 MScs, and filed 10 patents. In its quest to become world-class, IMCB researchers have been very selective in where they publish - 95.6% of the articles were published in ISI journals. The articles received an average of 25 to 35 citations per article, and the percentage of uncited articles is 11.6%. Four articles received more than 200 citations, and 18 received between 100 to 200 citations.
Co-word analysis was applied to keywords assigned to MEDLINE documents contained in sets of complementary but disjoint literatures. In strategical diagrams of disjoint literatures, based on internal density and external centrality of keyword-containing clusters, intermediate terms (linking the disjoint partners) were found in regions of below-median centrality and density. Terms representing the disjoint literature themes were found in close vicinity in strategical diagrams of intermediate literatures. Based on centrality-density ratios, characteristic values were found which allow a rapid identification of clusters containing possible intermediate and disjoint partner terms. Applied to the already investigated disjoint pairs Raynaud's Disease - Fish Oil, Migraine Magnesium, the method readily detected known and unknown (but relevant) intermediate and disjoint partner terms. Application of the method to the literature on Prions led to Manganese as possible disjoint partner term. It is concluded that co-word clustering is a powerful method for literature-based hypothesis generation and knowledge discovery.
A classification of countries is made according to respective ranks in the scales of "publications per million persons" and "GDP per capita (ppp)". The result is a clustering of countries which share a common cultural attitude toward scientific research.
We introduce a new technique for quantifying and monitoring the effect a given set of time series has on the evolution of a single time series. The technique relies on the causal nature of this effect, and expresses the result in terms of partial and cross elasticities. As an application, we consider the case where the single time series consists of the number of patents filed over time, in a given category, and where the set of time series consists of the numbers of scientific articles published over time, for each one of a number of science domains. Finally, we use a quiver map for visualizing the elasticities and as a case study we illustrate our methodology on patents in the field of Biotechnology.
The purpose of this study was to investigate the relationship between journals' productivity and their citations in the field of semiconductors. Journal samples were gathered from the INSPEC database, 1978 to 1997 while the data of citation frequency, impact factor, cited half-life and citing half-life were obtained from Science Citation Index, Journal Citation Reports 1997 CD-ROM edition. One thousand and eight hundred and seventy seven journals publishing articles on semiconductors were retrieved. The nature for the data of journal productivity, impact factor, cited half-life and citing half-life are explored. Among these journals, only 672 journals that were covered in JCR were compared. Moreover, statistical tests of more productive journals with cumulative publication in semiconductor >100 were also conducted on the basis of all articles they published annually (for 1997). The results of the study showed that there is a significant correlation between journal productivity and citation frequency and between journal productivity and impact factor. However, there are no associations between journal productivity and cited half-life and between journal productivity and citing half-life.
Simonton's (1997) model of creative productivity, based on a blind variation-selection process, predicts scientific impact can only be evaluated retrospectively, after recognition has been achieved. We test this hypothesis using bibliometric data from the Human Factors journal, which gives an award for the best paper published each year. If Simonton's model is correct, award winning papers would not be cited much more frequently than non-award winning papers, showing that scientific success cannot be judged prospectively. The results generally confirm Simonton's model. Receipt of the award increases the citation rate of articles, but accounts for only 0.8% to 1.2% of the variance in the citation rate. Consistent with Simonton's model, the influence of the award on citation rate may reflect a selection process of an elite group of reviewers who are representative of the larger peer group that eventually determines the citation rate of the article. Consistent with Simonton's model, author productivity accounts for far more variance in the authors' total citation rate (58.9%) and in the citation rate of the authors' most cited article (12.6%) than does award receipt.
This study investigates the role of self-citation in the scientific production of Norway (1981-1996). More than 45,000 publications have been analysed. Using a three-year citation window we find that 36% of all citations represent author self-citations. However, this percentage is decreasing when citations are traced for longer periods. We find the highest share of self-citation among the least cited papers. There is a strong positive correlation between the number of self-citations and the number of authors of the publications. Still, only a minor part of the overall increase in citation rates that can be found for multi-authored papers is due to self-citations. Also, the share of self-citation shows significant variations among different scientific disciplines. The results are relevant for the discussion concerning use of citation indicators in research assessments.
Usenet newsgroups provide a popular means of scientific communication. We demonstrate striking order in the diversity of biology newsgroups: Submissions to newsgroups obey a form of Zipf's law, a simple power law for the frequency of posts as a function of the rank, by posting, of contributors. We show that a simple stochastic process, due to Gunther et al. (1992, 1996), Levitin and Schapiro (1993), and Schapiro (1994), accounts for this pattern and reproduces many of the properties of newsgroups. This model successfully predicts the relative contribution from each poster in terms of the size, the number of posters and total posts, of the newsgroup.
The increasing use of bibliometric indicators in science policy calls for a reassessment of their robustness and limits. The perimeter of journal inclusion within ISI databases will determine variations in the classic bibliometric indicators used for international comparison, such as world shares of publications or relative impacts. We show in this article that when this perimeter is adjusted using a natural criterion for inclusion of journals, the journal impact, the variation of the most common country indicators (publication and citation shares; relative impacts) with the perimeter chosen depends on two phenomena. The first one is a bibliometric regularity rooted in the main features of competition in the open space of science, that can be modeled by bibliometric laws, the parameters of which are "coverage- independent" indicators. But this regularity is obscured for many countries by a second phenomenon, the presence of a sub-population of journals that does not reflect the same international openness, the nationally- oriented journals. As a result indicators based on standard SCI or SCISearch perimeters are jeopardized to a certain extent by this sub-population which creates large irregularities. These irregularities often lead to an over-estimation of share and an under-estimation of the impact, for countries with national editorial tradition, while the impact of a few mainstream countries arguably benefits from the presence of this sub-population.
Reprint requests are commonly used to obtain a copy of an article. This study aims to correlate the number of reprint requests from a 10-year-sample of articles with the number of citations. The database contained 28 articles published in over a 10-year-period (1992-2001). For each separate article the number of citations and and the number of reprint requests were retrieved. In total 303 reprint requests were analysed. Reviews (median 9, range 1 to 95) and original articles (median 8, range 1-36) attracted most reprint requests. There was an excellent correlation between the number of requests and citations to article (two-tailed non-parametric Spearman rank test r = 0.55; 95% confidence interval 0.18-0.78, P < 0.005). Articles that received most reprint requests are cited more often.
wA generalization of both the Botafogo-Rivlin-Shneiderman compactness measure and the Wiener index is presented. These new measures for the cohesion of networks can be used in case a dissimilarity value is given between nodes in a network or a graph. It is illustrated how a set of weights between connected nodes can be transformed into a set of dissimilarity measures for all nodes. The new compactness measure for the cohesion of weighted graphs has several desirable properties related to the disjoint union of two networks. Finally, an example is presented of the calculation of the new compactness measures for a co-citation and a bibliographic coupling network.
The widespread use of on-line publishing of text promotes storage of multiple versions of documents and mirroring of documents in multiple locations, and greatly simplifies the task of plagiarizing the work of others. We evaluate two families of methods for searching a collection to find documents that are coderivative, that is, are versions or plagiarisms of each other. The first, the ranking family, uses information retrieval techniques; extending this family, we propose the identity measure, which is specifically designed for identification of coderivative documents. The second, the fingerprinting family, uses hashing to generate a compact document description, which can then be compared to the fingerprints of the documents in the collection. We introduce a new method for evaluating the effectiveness of these techniques, and demonstrate it in practice. Using experiments on two collections, we demonstrate that the identity measure and the best fingerprinting technique are both able to accurately identify coderivative documents. However, for fingerprinting parameters must be carefully chosen, and even so the identity measure is clearly superior.
With advances in computer graphics, a number of innovative approaches to information visualization have been developed (e.g., Card et a[, 1991). Some of these approaches create a mapping between information and corresponding structure in a virtual world. The resulting virtual worlds can be fully three dimensional (313) or they can be implemented as a series of 2D birds-eye "snapshots" that are traversed as if they were in 3D, using operations such as panning and zooming interactively (2.5D). This paper reports a study that contrasted 3D and 2.5D performance for people with differing levels of spatial and structure learning ability. Four data collection methods were employed: search task scoring; subjective questionnaires; navigational activity logging and analysis; and administration of tests for spatial and structure-learning abilities. Analysis of the results revealed statistically significant effects of user abilities, and information environment designs. Overall, this research did not find a performance advantage for using a 3D rather than a 2.5D virtual world. In addition, users in the lowest quartile of spatial ability had significantly lower search performance in the 3D environment. The findings suggest that individual differences in traits such as spatial ability may be important in determining the usability and acceptability of 3D environments.
The goal of this study was to discover the feasibility of adding haptic and auditory displays to traditional visual geographic information systems (GIS). The experiment was conducted with 51 participants to explore the difference in user performance (task completion time and accuracy) and user satisfaction with a multimodal GIS system, which was implemented with a haptic display, auditory display, and combined display. The experiment consisted of a series of 36 tasks in which the participants were asked to identify the highest or the middle valued state among nine U.S. states on maps. The results showed that haptic displays produce faster and more accurate performance than auditory displays and combined displays for more complex tasks. In terms of user satisfaction, the participants preferred the combined display even though they performed best with the haptic display.
Rule-based information filtering systems maintain user profiles where the profile consists of a set of filtering rules expressing the user's information filtering policy. Filtering rules may refer to various attributes of the data items subject to the filtering process. In personal rule-based filtering systems, each user has his/her own personal filtering rules. In stereotype rule-based filtering systems, a user is assigned to a group of similar users (his/her stereotype) from which he/she inherits the stereotype's filtering profile. This study compares the effectiveness of the two alternative rule-based filtering methods: stereotype-based rules versus personal rules. We conducted a comparison between filtering effectiveness when using the personal rules or when using the stereotype-based rules. Although, intuitively, personal filtering rules seem to be more effective because each user has his own tailored rules, our comparative study reveals that stereotype filtering rules yield more effective results. We believe that this is because users find it difficult to evaluate their filtering preferences accurately. The results imply that by using a stereotype it is possible not only to overcome the problem of user effort required to generate-a manual rule-based profile, but at the same time even provide a better initial user profile.
Author self-citation has long been of interest to those working in informetrics for what it reveals about the publishing behavior of individuals and their relationships within academic networks. While this research has produced interesting insights, it typically assumes either that self-citation is a neutral form of reporting not unlike references to others' work or an unsavory kind of academic egotism. By examining self-citation in a wider context of self-mention, however, the phenomenon can be seen as part of a more comprehensive rhetorical strategy for emphasizing a writer's personal contribution to a piece of research and strengthening his or her knowledge claims, research credibility, and wider standing in the discipline. These meanings are not easily revealed through quantitative bibliometric methods and require careful text analyses and discourse-based interviews with academics. In this paper I explore the use of self-citation and authorial mention in a corpus of 240 research articles and 800 abstracts in eight disciplines. Through an analysis of these texts and interviews with expert informants I show how self-mention is used and the ways these uses reflect both the promotional strategies of individuals and the epistemological practices of their disciplines.
An online survey targeted to politically interested Internet users assesses whether traditional media use is decreasing, increasing or remaining the same since users first started using the Web, bulletin boards/electronic mailing lists and chat rooms. Associations are made between media use gratifications, political attitudes and demographics and traditional media use, and further analysis determines whether these factors predict changes in the amount of time online users spend with traditional media. This study's findings are compared with a similar study conducted in 1996. News magazines and radio news took the hardest hit from the Internet in 2000 but in 1996 television news suffered the most. Generally, in both years the Internet had not altered media use patterns. In 1996 and 2000 more users claimed that the time they spent seeking political information from traditional media sources had stayed the same than had changed. However, the trend indicates that those Internet users whose media patterns have changed are abandoning traditional media at a much greater rate than they are increasing their use.
This paper describes various problems which may occur in quantitative research evaluation. It is shown that problems already arise when trying to define such seemingly simple scientometric elements as "personnel" or "budget". This has major consequences on the construction of indicators. Furthermore, it is demonstrated that different data sources as well as different data and indicators result in different, sometimes even contradicting outcomes.
To justify public investment in R&D activities especially those conducted by private companies, the effect to change their behavior into what could not be realized without public funds is required. This paper studies the "additionality" of Japanese R&D programmes by analyzing the patent applications of five case study projects. Changes and continuations in research themes between the results of the project and the results in five years before and after the project were measured using a similarity index. Also, the similarities between research groups in a project were measured. These show how each project was constituted by researchers with various types of knowledge. As a result, although all projects contained core research groups who continued their research in the project, the effect of mobilizing other researchers into new fields was shown to vary depending on the characteristics of the projects.
This article assesses the Brazilian psychiatric production and compares the numbers of articles published between 1981 and 1995 in Brazilian domestic journals and published in international journals. From the total number of articles analyzed, 87.2% were published in domestic journals. These probably will never reach the international scientific community. From the articles published in Brazil, 56.8% were review and opinion articles, while from the articles published in international journals, 69.8% were research articles. Publications in both Brazilian and international journals included few prospective research studies and research reports dealing with bipolar disorder and cocaine use. On the other hand, alcohol use disorder and major depressive disorder were the most commonly studied clinical fields published both in domestic and in international psychiatric journals.
We are currently experiencing an era that is facing increasing global environmental and societal problems (e. g., climate change, habitat destruction and economic recession). Scientific research projects are often required to emphasize and counter the effects of inequity and globalisation, and prioritise cooperation supported by cooperative research. This paper investigates whether publication of research that is carried out in least developed countries is done in cooperation with research institutes from these countries. The study uses the Current Contents database of peer-reviewed publications from more than 7,000 journals in all sciences (Biology and environmental sciences; Physical, chemical and earth sciences; Engineering, computing and technology; Life sciences; Clinical medicine; Arts and humanities; Social and behavioral sciences) published between 1 January 1999 and 3 November 2000. From a total of 1,601, 196 papers published, 2, 798 articles of research activities carried out in the 48 least developed countries were selected using title information as an indicator. Collaborative relationships between research institutions involved was then analysed within and between countries and sciences. Our results show that publications of research, carried out in the least developed countries, do not have co-authorship of local research institutes in 70% of the cases, and that a majority of the papers is published by research institutes from the most industrialised countries in the world. We employed the use of questionnaires sent to authors from papers in the above-mentioned database to detect possible causes of this high percentage of lack of authorship in the essential academic currency that 'publications' are. 'Neo-colonial science' is identified as one of them. In addition, there exists a large discrepancy between what the surveyed scientists say they find important in international collaboration and joint publishing, and the way they act to it. However, the interpretation given to the fact that institutional co-authorship is underrepresented for local research institutions in the least developed countries is less important than the fact itself, and future research should concentrate on a scientific way to equilibrate this adverse trend.
The main purpose of this study was to analyze the Italian journals indexed in the 2000 edition of the Journal Citation Reports (JCR) published by the Institute for Scientific Information (ISI) (Philadelphia, USA). The performance and the visibility of these journals were evaluated in terms of Impact Factor (IF), mean IF from citing journals and cited journals, and self-citing and self-cited rates. Seventy-three Italian journals were indexed in the JCR, 14 of which achieved an IF equal to or higher than one. Most citing journals were European and American, thus showing a fairly good visibility of the articles published in the 14 journals analyzed. The self-citing and self-cited rates showed a wide variation. The journal that appeared to perform best was the Journal of High Energy Physics, an electronic publication whose success seemingly confirms Internet circulation as an effective means to enhance the visibility and consequently the quality, in term of citations, of a journal. Italy's low overall expenditure on research & development (R&D) and low number of researchers compared to countries with longstanding high publishing standards and traditions are no doubt partly to blame for its poor performance in scientific publishing.
A two-level hierarchic system of fields and subfields of the sciences, social sciences and arts & humanities is proposed. The system was specifically designed for scientometric (evaluation) purposes with the ultimate goal of classifying every single document into a well-defined category. This goal was achieved using a three-step iterative process. The basic concepts and some preliminary results are presented.
This study investigates the citation patterns in the journal, Medical Principles and Practice from its inception in 1989 through 2000 (volumes 1-9). The data set includes 4740 references appended to the 221 original research articles. All of the citations were entered into a ProCite database for analysis. Specifically, this study addresses: (1) bibliometric patterns of cited works in terms of publication format, subject scatter, authorship characteristics, age of citations, geographic distribution, and language distribution; (2) productivity of journal titles; (3) the role of self-citation; and (4) how selected bibliometric indicators apply. Some of the findings include: journal articles are most frequently cited; English language publications dominate the literature; there is a trend of multiple authorship; and the pattern of aging is below the norm for medical literature. The results of the study can provide a benchmark to measure the user behavior of a particular group of researchers as well as for the provision of collection development and management decisions.
This paper focuses on the dichotomy between the multifaceted and multidimensional nature of contemporary R&D activity and unidimensional approaches to the measurement of its performance. While publications in refereed journals and citations are the most preferred indicators of research performance, there are also other indicators such as chapters in edited books, research reports, patents, algorithms, prototypes and designs, etc., which cannot be overlooked. Even when multiple indicators are used, they are used in isolation with the result that one gets only partial views of a multidimensional manifold. Here, a major problem is how to construct a composite measure of research performance, without assigning arbitrary weights to different measures of research output. This problem is particularly important for cross-institutional and cross-national comparisons of research performance. In this paper we have demonstrated the feasibility of constructing a multi-objective measure of research performance using Partial Order Scoring (POSCOR) algorithm developed by Hunya (1976). The algorithm is briefly described and applied to the empirical data on research outputs of 1460 research units in different socio-cultural, institutional and disciplinary settings. The potentialities and limitations of using POSCOR algorithm in scientometric analysis are briefly discussed.
This paper reports the results of an empirical study on the impact of three proximity measures: geographical distance, thematic distance and socio-economic distance among the set of 45 scientifically most advanced countries on their cooperation network. In network data, individuals (viz. countries) are linked to one another and the relationships are nested and embedded in groups, with the result that statistical assumptions of independence underlying ordinary least squares regression are systematically violated. Hence, we have used a non-parametric regression procedure, Quadratic Assignment Procedure (QAP), for regressing the matrix of transnational cooperation on the matrices of three proximity measures: geographic proximity, thematic proximity and socio-economic proximity. The results indicate that all the three proximity measures have the expected negative effect on transnational cooperation. Geographic proximity has greater impact than the other proximity measures.
A survey of linguistic dimensions of Web site hosting and interlinking of the universities of sixteen European countries is described. The results show that English is the dominant language both for linking pages and for all pages. In a typical country approximately half the pages were in English and half in one or more national languages. Normalised interlinking patterns showed three trends: 1) international interlinking throughout Europe in English, and additionally in Swedish in Scandinavia; 2) linking between countries sharing a common language, and 3) countries extensively hosting international links in their own major languages. This provides evidence for the multilingual character of academic use of the Web in Western Europe, at least outside the UK and Eire. Evidence was found that Greece was significantly linguistically isolated from the rest of the EU but that outsiders Norway and Switzerland were not.
We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.
Relevance Feedback consists in automatically formulating a new query according to the relevance judgments provided by the user after evaluating a set of retrieved documents. In this article, we introduce several relevance feedback methods for the Bayesian Network Retrieval Model. The theoretical frame on which our methods are based uses the concept of partial evidences, which summarize the new pieces of information gathered after evaluating the results obtained by the original query. These partial evidences are inserted into the underlying Bayesian network and a new inference process (probabilities propagation) is run to compute the posterior relevance probabilities of the documents in the collection given the new query. The quality of the proposed methods is tested using a preliminary experimentation with different standard document collections.
When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing on theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept on the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based on semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
A number of studies have examined the problems of query expansion in monolingual Information Retrieval (IR), and query translation for crosslanguage IR. However, no link has been made between them. This article first shows that query translation is a special case of query expansion. There is also another set of studies on inferential IR. Again, there is no relationship established with query translation or query expansion. The second claim of this article is that logical inference is a general form that covers query expansion and query translation. This analysis provides a unified view of different subareas of IR. We further develop the inferential IR approach in two particular contexts: using fuzzy logic and probability theory. The evaluation formulas obtained are shown to strongly correspond to those used in other IR models. This indicates that inference is indeed the core of advanced IR.
Fast, effective, and adaptable techniques are needed to automatically organize and retrieve information on the ever-increasing World Wide Web. In that respect, different strategies have been suggested to take hypertext links into account. For example, hyperlinks have been used to (1) enhance document representation, (2) improve document ranking by propagating document score, (3) provide an indicator of popularity, and (4) find hubs and authorities for a given topic. Although the TREC experiments have not demonstrated the usefulness of hyperlinks; for retrieval, the hypertext structure is nevertheless an essential aspect of the Web, and as such, should not be ignored. The development of abstract models of the IR task was a key factor to the improvement of search engines. However, at this time conceptual tools for modeling the hypertext retrieval task are lacking, making it difficult to compare, improve, and reason on the existing techniques. This article proposes a general model for using hyperlinks; based on Probabilistic Argumentation Systems, in which each of the above-mentioned techniques can be stated. This model will allow to discover some inconsistencies in the mentioned techniques, and to take a higher level and systematic approach for using hyperlinks for retrieval.
This study characterizes the usage and acceptance of electronic preprints (e-prints) in the literature of Chemistry. Survey of authors of e-prints appearing in the Chemistry Preprint Server (CPS) at http://preprints. chemweb.com indicates use of the CPS as a convenient vehicle for dissemination of research findings and for receipt of feedback before submitting to a peer-reviewed journal. Reception of CPS e-prints by editors of top chemistry journals is very poor. Only 6% of editors responding allow publication of articles that have previously appeared as e-prints. Concerns focus on the lack of peer review and the uncertain permanence of e-print storage. Consequently, it was not surprising to discover that citation analysis yielded no citations to CPS e-prints in the traditional literature of chemistry. Yet data collected and posted by the CPS indicates that the e-prints are valued, read, and discussed to a notable extent within the chemistry community. Thirty-two percent of the most highly rated, viewed, and discussed e-prints eventually appear in the journal literature, indicating the validity of the work submitted to the CPS. This investigation illustrates the ambivalence with which editors and authors view the CPS, but also gives an early sense of the potential free and rapid information dissemination, coupled with open, uninhibited discussion and evaluation, has to expand, enrich, and vitalize the scholarly discourse of chemical scientists.
In this article we further develop the theory for a stochastic model for the citation process in the presence of obsolescence to predict the future citation pattern of individual papers in a collection. More precisely, we investigate the conditional distribution -and its mean -of the number of citations to a paper after time t, given the number of citations it has received up to time t. In an important parametric case it is shown that the expected number of future citations is a linear function of the current number, this being interpretable as an example of a success-breeds-success phenomenon.
Information retrieval performance evaluation is commonly made based on the classical recall and precision based figures or graphs. However, important information indicating causes for variation may remain hidden under the average recall and precision figures. Identifying significant causes for variation can help researchers and developers to focus on opportunities for improvement that underlay the averages. This article presents a case study showing the potential of a statistical repeated measures analysis of variance for testing the significance of factors in retrieval performance variation. The TREC-9 Query Track performance data is used as a case study and the factors studied are retrieval method, topic, and their interaction. The results show that retrieval method, topic, and their interaction are all significant. A topic level analysis is also made to see the nature of variation in the performance of retrieval methods across topics. The observed retrieval performances of expansion runs are truly significant improvements for most of the topics. Analyses of the effect of query expansion on document ranking confirm that expansion affects ranking positively.
Can maps of science tell us anything about paradigms? The author reviews his earlier work on this question, including Kuhn's reaction to it. Kuhn's view of the role of bibliometrics differs substantially from the kinds of reinterpretations of paradigms that information scientists are currently advocating. But these reinterpretations are necessary if his theory will ever be empirically tested, and further progress is to be made in understanding the growth of scientific knowledge. A new Web tool is discussed that highlights rapidly changing specialties that may lead to new ways of monitoring revolutionary change in real time. It is suggested that revolutionary and normal science be seen as extremes on a continuum of rates of change rather than, as Kuhn originally asserted, as an all or none proposition.
This article discusses the rationale for creating historiographs of scholarly topics using a new program called HistCite(TM), which produces a variety of analyses to aid the historian identify key events (papers), people (authors), and journals in a field. By creating a genealogic profile of the evolution, the program aids the scholar in evaluating the paradigm involved.
Research fronts, defined as clusters of documents that tend to cite a fixed, time invariant set of base documents, are plotted as time lines for visualization and exploration. Using a set of documents related to the subject of anthrax research, this article illustrates the construction, exploration, and interpretation of time lines for the purpose of identifying and visualizing temporal changes in research activity through journal articles. Such information is useful for presentation to members of expert panels used for technology forecasting.
In their 1998 article "Visualizing a discipline: An author cocitation analysis of information science, 1972-1995," White and McCain used multidimensional scaling, hierarchical clustering, and factor analysis to display the specialty groupings of 120 highly-cited ("paradigmatic") information scientists. These statistical techniques are traditional in author cocitation analysis (ACA). It is shown here that a newer technique, Pathfinder Networks (PFNETs), has considerable advantages for ACA. In PFNETs, nodes represent authors, and explicit links represent weighted paths between nodes, the weights in this case being cocitation counts. The links can be drawn to exclude all but the single highest counts for author pairs, which reduces a network of authors to only the most salient relationships. When these are mapped, dominant authors can be defined as those with relatively many links to other authors (i.e., high degree centrality). Links between authors and dominant authors define specialties, and links between dominant authors connect specialties into a discipline. Maps are made with one rather than several computer routines and in one rather than many computer passes. Also, PFNETs can, and should, be generated from matrices of raw counts rather than Pearson correlations, which removes a computational step associated with traditional ACA. White and McCain's raw data from 1998 are remapped as a PFNET. It is shown that the specialty groupings correspond closely to those seen in the factor analysis of the 1998 article. Because PFNETs are fast to compute, they are used in AuthorLink, a new Web-based system that creates live interfaces for cocited author retrieval on the fly.
Knowledge domain visualization is a visual exploratory approach to the study of the development of a knowledge domain. In this study, we focus on the practical issues concerning modeling and visualizing scientific revolutions. We study the growth patterns of specialties derived from citation and cocitation data on string theory in physics. Special attention is given to the two superstring revolutions since the 1980s. The superstring revolutions are visualized, animated, and analyzed using the general framework of Thomas Kuhn's structure of scientific revolutions. The implications of taking this approach are discussed.
This article reports research on analyzing and visualizing the impact of governmental funding on the amount and citation counts of research publications. For the first time, grant and publication data appear interlinked in one map. We start with an overview of related work and a discussion of available techniques. A concrete example- grant and publication data from Behavioral and Social Science Research, one of four extramural research programs at the National Institute on Aging (NIA)-is analyzed and visualized using the VxInsight(R) visualization tool. The analysis also illustrates current existing problems related to the quality and existence of data, data analysis, and processing. The article concludes with a list of recommendations on how to improve the quality of grant-publication maps and a discussion of research challenges for indicator-assisted evaluation and funding of research.
This article examines the knowledge structure of the field of space communications using bibliometric mapping techniques based on textual analysis. A new approach with the aim of visualizing simultaneously the configuration of its scientific and technological knowledge bases is presented. This approach enabled us to overcome various limits of existing bibliometric methods dealing with science and technology relationships. The bibliometric map revealed weak cognitive interactions between science and technology at the worldwide level, although it brought out the systemic nature of the process of knowledge production at either side. We extended the mapping approach to the R&D activities of the Triad countries in order to characterize their specialization profiles and cognitive links on both sides in comparison with the structure of the field at the worldwide level. Results showed different patterns in the way the Triad countries organized their scientific and technological activities within the field.
This paper reports results from a project, which sought to investigate the relationship between study approaches and Web-based information seeking. Factor analyses were applied to data from over 500 queries submitting in response to three different search tasks to identify clusters of variables associated with three Web-based search strategies: Boolean, best-match, and combined. A consistent pattern emerged across the nine analyses in relation to a number of study approach variables. Boolean searching was consistently associated with a reproductive (as opposed to meaning-oriented) approach, anxiety (in the form of fear of failure), and high levels of active interest. Best-match was associated with the converse of all these measures. Combined searching was differentiated from both Boolean and best-match by being associated with poor time management. There was also some evidence of changes in strategy in relation to task complexity. A model is introduced which seeks to explain these results. This project was exploratory in nature, and the pattern of findings are proposed as prima facie evidence to support the notion that study approaches can influence choice of search strategies. The results are considered essentially as hypotheses for further systematic study, for which suggestions are made.
Three new metrics are introduced that measure the range of use of a university Web site by its peers through different heuristics for counting links targeted at its pages. All three give results that correlate significantly with the research productivity of the target institution. The directory range model, which is based upon summing the number of distinct directories targeted by each other university, produces the most promising results of any link metric yet. Based upon an analysis of changes between models, it is suggested that range models measure essentially the same quantity as their predecessors but are less susceptible to spurious causes of multiple links and are therefore more robust.
Users' queries for visual information in American history were studied to identify the image attributes important for retrieval and the characteristics of users' queries for images. The queries were collected from 38 faculty and graduate students of American history in 1999 in a local setting. Pre- and post-test questionnaires and interview were employed to gather users' requests and search terms. The Library of Congress American Memory photo archive was used to search for images. Thirty-eight natural language statements, 185 search terms provided by the participants, and 219 descriptors indicated by the participants in relevant retrieved records were analyzed to find the distribution of subject content of users' queries. Over half of the search requests fell into the category "general/nameable needs." It was also found that most image content was described in terms of kind of person, thing, event, or condition depending on location or time. Title, date, and subject descriptors were mentioned as appropriate representation of image subject content. The result of this study suggests the principle categories of search terms for users in American history, suggesting directions for the development of indexing tools and system design for image retrieval systems.
In a critical analysis of the recent development and deployment of the North American Industry Classification System, this article focuses on the discourse surrounding the creation of the system's "information" category. A close reading of the relevant government documents suggests that the category functions simultaneously to position information as a major sector of the economy and to organize data about information as a commodity, The discourse also points to the continuing complexities involved in conceptualizing information as a measurable object.
In this article, we propose a method for the comparative analysis of concentration in author productivity distributions. We define the notion of concentration on the basis of two viewpoints (absolute and relative concentration) and select G (Gini's index) and V (the number of authors) as suitable measures for these two viewpoints. We then discuss the statistical peculiarity of author productivity data (i.e., most of the statistical measures change systematically according to changes in the sample size) and we explain our method using developmental profiles, which takes into account the sample size dependency of statistical measures. Finally, by applying it to actual data, we demonstrate the usefulness of the proposed method.
In this paper, we present five user experiments on incorporating behavioral information into the relevance feedback process. In particular, we concentrate on ranking terms for query expansion and selecting new terms to add to the user's query. Our experiments are an attempt to widen the evidence used for relevance feedback from simply the relevant documents to include information on how users are searching. We show that this information can lead to more successful relevance feedback techniques. We also show that the presentation of relevance feedback to the user is important in the success of relevance feedback.
Author cocitation analysis (ACA), a special type of cocitation analysis, was introduced by White and Griffith in 1981. This technique is used to analyze the intellectual structure of a given scientific field. In 1990, McCain published a technical overview that has been largely adopted as a standard. Here, McCain notes that Pearson's correlation coefficient (Pearson's r) is often used as a similarity measure in ACA and presents some advantages of its use. The present article criticizes the use of Pearson's r in ACA and sets forth two natural requirements that a similarity measure applied in ACA should satisfy. It is shown that Pearson's r does not satisfy these requirements. Real and hypothetical data are used in order to obtain counterexamples to both requirements. It is concluded that Pearson's r is probably not an optimal choice of a similarity measure in ACA. Still, further empirical research is needed to show if, and in that case to what extent, the use of similarity measures in ACA that fulfill these requirements would lead to objectively better results in full-scale studies. Further, problems related to incomplete cocitation matrices are discussed.
The article highlights the importance of conceptual and theoretical work in the design of information retrieval systems. Two epistemological positions leading to different solutions in digital library (DL) design are examined: the information transfer perspective and the social contructionist knowledge production perspective. The first section of the paper explores how the information transfer perspective affects the principles by which documents are organized in DLs. The second section analyzes the basic assumptions of the knowledge production perspective. The third section discusses how social constructionist ideas affect the design principles and information architecture of DLs. The authors suggest that, in the electronic information environment, traditional noun-based approaches can be replaced by solutions that combine verbs and nouns to visualize the structure of conversations concerning a particular issue or topic. Finally, the potentials and problems of building constructionist digital libraries are discussed.
This paper revises David Ellis's information-seeking behavior model of social scientists, which includes six generic features: starting, chaining, browsing, differentiating, monitoring, and extracting. The paper uses social science faculty researching stateless nations as the study population. The description and analysis of the information-seeking behavior of this group of scholars is based on data collected through structured and semi-structured electronic mail interviews. Sixty faculty members from 14 different countries were interviewed by e-mail. For reality check purposes, face-to-face interviews with five faculty members were also conducted. Although the study confirmed Ellis's model, it found that a fuller description of the information-seeking process of social scientists studying stateless nations should include four additional features besides those identified by Ellis. These new features are: accessing, networking, verifying, and information managing. In view of that, the study develops a new model, which, unlike Ellis's, groups all the features into four interrelated stages: searching, accessing, processing, and ending. This new model is fully described and its implications on research and practice are discussed. How and why scholars studied here are different than other academic social scientists is also discussed.
Results from recent advances in link metrics have demonstrated that the hyperlink structure of national university systems can be strongly related to the research productivity of the individual institutions. This paper uses a page categorization to show that restricting the metrics to subsets more closely related to the research of the host university can produce even stronger associations. A partial overlap was also found between the effects of applying advanced document models and separating page types, but the best results were achieved through a combination of the two.
Type/Token-Taken informetrics is a new part of informetrics that studies the use of items rather than the items itself. Here, items are the objects that are produced by the sources (e.g., journals producing articles, authors producing papers, etc.). In linguistics a source is also called a type (e.g., a word), and an item a token (e.g., the use of words in texts). In informetrics, types that occur often, for example, in a database will also be requested often, for example, in information retrieval. The relative use of these occurrences will be higher than their relative occurrences itself; hence, the name Type/Token-Taken informetrics. This article studies the frequency distribution of Type/Token-Taken informetrics, starting from the one of Type/Token informetrics (i.e., source-item relationships). We are also studying the average number mu(*) of item uses in Type/Token-Taken informetrics and compare this with the classical average number mu in Type/Token informetrics. We show that mu(*) greater than or equal to mu always, and that mu(*) is an increasing function of mu. A method is presented to actually calculate mu(*) from mu, and a given a, which is the exponent in Lotka's frequency distribution of Type/Token informetrics. We leave open the problem of developing non-Lotkaian Type/TokenTaken informetrics.
Automated information retrieval relies heavily on statistical regularities that emerge as terms are deposited to produce text. This paper examines statistical patterns expected of a pair of terms that are semantically related to each other. Guided by a conceptualization of the text generation process, we derive measures of how tightly two terms are semantically associated. Our main objective is to probe whether such measures yield reasonable results. Specifically, we examine how the tendency of a content bearing term to clump, as quantified by previously developed measures of term clumping, is influenced by the presence of other terms. This approach allows us to present a toolkit from which a range of measures can be constructed. As an illustration, one of several suggested measures is evaluated on a large text corpus built from an on-line encyclopedia.
Garbage in, garbage out is a well-known phrase in computer analysis, and one that comes to mind when mining Web data to draw conclusions about Web users. The challenge is that data analysts wish to infer patterns of client-side behavior from server-side data. However, because only a fraction of the user's actions ever reaches the Web server, analysts must rely on incomplete data. In this paper, we propose a client-side monitoring system that is unobtrusive and supports flexible data collection. Moreover, the proposed framework encompasses client-side applications beyond the Web browser. Expanding monitoring beyond the browser to incorporate standard off ice productivity tools enables analysts to derive a much richer and more accurate picture of user behavior on the Web.
This paper proposes an effective term suggestion approach to interactive Web search. Conventional approaches to making term suggestions involve extracting co-occurring keyterms from highly ranked retrieved documents. Such approaches must deal with term extraction difficulties and interference from irrelevant documents, and, more importantly, have difficulty extracting terms that are conceptually related but do not frequently co-occur in documents. In this paper, we present a new, effective log-based approach to relevant term extraction and term suggestion. Using this approach, the relevant terms suggested for a user query are those that co-occur in similar query sessions from search engine logs, rather than in the retrieved documents. In addition, the suggested terms in each interactive search step can be organized according to its relevance to the entire query session, rather than to the most recent single query as in conventional approaches. The proposed approach was tested using a proxy server log containing about two million query transactions submitted to search engines in Taiwan. The obtained experimental results show that the proposed approach can provide organized and highly relevant terms, and can exploit the contextual information in a user's query session to make more effective suggestions.
This paper presents a novel user interface that provides global visualizations of large document sets in order to help users to formulate the query that corresponds to their information needs and to access the corresponding documents. An important element of the approach we introduce is the use of concept hierarchies (CHs) in order to structure the document collection. Each CH corresponds to a facet of the documents users can be interested in. Users browse these CHs in order to specify and refine their information needs. Additionally the interface is based on OLAP principles and multi-dimensional analysis operators are provided to users in order to allow them to explore a document collection.
Sequential patterns refer to the frequently occurring patterns related to time or other sequences, and have been widely applied to solving decision problems. For example, they can help managers determine which items were bought after some items had been bought. However, since fuzzy sequential patterns described by natural language are one type of fuzzy knowledge representation, they are helpful in building a prototype fuzzy knowledge base in a business. Moreover, each fuzzy sequential pattern consisting of several fuzzy sets described by the natural language is well suited for the thinking of human subjects and will help to increase the flexibility for users in making decisions. Additionally, since the comprehensibility of fuzzy representation by human users is a criterion in designing a fuzzy system, the simple fuzzy partition method is preferable. In this method, each attribute is partitioned by its various fuzzy sets with pre-specified membership functions. The advantage of the simple fuzzy partition method is that the linguistic interpretation of each fuzzy set is easily, obtained. The main aim of this paper is exactly to propose a fuzzy data mining technique to discover fuzzy sequential patterns by using the simple partition method. Two numerical examples are utilized to demonstrate the usefulness of the proposed method.
The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among non-English speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (LIS's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatlas.internet. com/big-picture/geographics/article/0,,5911-1013841,00. html). All of the evidences reveal the importance of cross-lingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue on Digital Libraries, February, 32(2),45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue on Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus on cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based on English/Chinese parallel corpus. When the searchers encounter retrieval problems, professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based on a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.
Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be "medically-related." This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or "concept space," and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with fine-grained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders on a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLS-systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.
The purpose of this study is to map semiconductor literature using journal co-citation analysis. The journal sample was gathered from the INSPEC database from 1978 to 1997. In the co-citation analysis, the data compiled were counts of the number of times two journal titles were jointly cited in later publications. It is assumed that the more two journals are cited together, the closer the relationship between them. The journal set used was the 30 most productive journals in the field of semiconductors. Counts of co-citations to the set of semiconductor journals were retrieved from SciSearch database, accessed through Dialog. Cluster analysis and multi-dimensional scaling were employed to create two-dimensional maps of journal relationships in the cross-citation networks. The following results were obtained through this co-citation study: The 30 journals fall fairly clearly into three clusters. The major cluster of journals, containing 17 titles, is in the subject of physics. The second cluster, consisting of 9 journals, includes journals primarily on material science. The remaining cluster represents research areas in the discipline of electrical and electronic engineering. All co-cited journals share similar co-citation profiles, reflected in high positive Pearson correlation. Two hundred and ninety-six pairs (68%) correlate at greater than 0.70. This shows that there is strong relationship between semiconductor journals. Five individual journals in five paired sets with co-citation frequency over 100,000 times include Physical Review B, Condensed Matter; Physical Review Letters; Applied Physics Letters; Journal of Applied Physics; and Solid State Communications.
The paper describes the need and importance of collaboration on scientific research. It discusses the present status of India's collaboration with China in S&T, analyses the collaborative research between India and China, as reflected in the co-authored papers, in particular its nature, strong and week areas and its impact in different subject fields and indicates the potential areas in S&T for future collaboration.
In a recent paper EGGHE & ROUSSEAU (2002) have readdressed the problem of defining the "core" of a subject's literature by focussing on the productivity of the contributing sources as measured by their influence on an overall concentration value. Here we generalise Egghe & Rousseau's empirical approach, based upon the Gini index, to a more theoretical setting. This allows a simple visualisation of the geometry of the procedure and a complete analysis in certain classic cases. We conclude that, without additional empirical support, the approach does not appear to offer real improvement on more established and intuitively appealing schemes.
Citations networks are a core topic of informetrics and science studies. This article proposes to bridge the cited and citing side of citation transactions by using a disaggregated form, the "referencing-structure" function (RSF). The RSF may be also seen as the "retrieval-structure" which, in a stylized co-citation or co-word model, gives the maximum retrieval that can be expected from the bibliometric characteristics of the field (retrieval and recall features are key issues in co-citation studies). The usual citation and reference distributions may be derived from aggregates or cuts respectively, of the RSF. The RSF representation also generates new points of views on the citing-cited distributions, such as the "iso-retrieval function". A rank version of RSF is also introduced. Part I is devoted to the definition and construction of the RSF, and to the general interpretation of its various aspects in the context of co-citation studies. Generalization to other co-item (co-word, hyperlinks" co-sitations") studies is discussed briefly. We also introduce a general form kindred to the Weibull distribution that can be used to fit cuts of the function. The forthcoming Part II will detail empirical fits, using a few experimental files.
Many academic journals of China began to be published in English when China opened door to the world more than 20 years ago. Tsinghua University started to publish an academic journal, the Tsinghua Science and Technology since 1996. We made statistical analyses on the regional distribution on the authors and of the references cited by the articles of Tsinghua Science and Technology from 1996 to 2002. The results show that although the authors are mainly from the Tsinghua University, the number of authors from other regions, especially the number of oversea authors, are increasing in recent years; the average number of articles cited by every article is increasing from 6.9 in 1996 to 13.4 in 2002. The results suggest that we must learn the successful experiences from well-known journals. Attracting high-level articles and realizing the internationalization of the journal will help us to develop the journal.
Using statistical method, the author analyzed the citation rate of articles published in Chinese Science Bulletin (CSB) between 1995 and 1999 in Science Citation Index Expanded (SCIE) databases. Results indicated that: 1. Majority of authors who published in CSB were Chinese; 2. The articles were basically cited by the authors themselves in the first year after publication; 3. The peak of total citation rate appeared in the third year after publication and the peak of non-self-citation rate was further delayed. There are relatively high self-citation rates of articles from CSB and most of these citations are from Chinese scientific journals. This indicates that our citation environment is limited to a closed circle. The author, therefore, proposed a strategy for changing the current conditions of Chinese scientific journals to raise their influence.
The graph structures of three national university publicly indexable Webs from Australia, New Zealand, and the UK were analyzed. Strong scale-free regularities for page indegrees, outdegrees, and connected component sizes were in evidence, resulting in power laws similar to those previously identified for individual university Web sites and for the AltaVista-indexed Web. Anomalies were also discovered in most distributions and were tracked down to root causes. As a result, resource driven Web sites and automatically generated pages were identified as representing a significant break from the assumptions of previous power law models. It follows that attempts to track average Web linking behavior would benefit from using techniques to minimize or eliminate the impact of such anomalies.
Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this article, we review the principal approaches to inversion, analyze their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approaches and propose a single-pass inversion method that, in contrast to previous approaches, does not require the complete vocabulary of the indexed collection in main memory, can operate within limited resources, and does not sacrifice speed with high temporary storage requirements. We show that the performance of the single-pass approach can be improved by constructing inverted files in segments, reducing the cost of disk accesses during inversion of large volumes of data.
As the demand for global information increases significantly, multilingual corpora has become a valuable linguistic resource for applications to cross-lingual information retrieval and natural language processing. In order to cross the boundaries that exist between different languages, dictionaries are the most typical tools. However, the general-purpose dictionary is less sensitive in both genre and domain. It is also impractical to manually construct tailored bilingual dictionaries or sophisticated multilingual thesauri for large applications. Corpus-based approaches, which do not have the limitation of dictionaries, provide a statistical translation model with which to cross the language boundary. There are many domain-specific parallel or comparable corpora that are employed in machine translation and cross-lingual information retrieval. Most of these are corpora between Indo-European languages, such as English/French and English/Spanish. The Asian/Indo-European corpus, especially English/Chinese corpus, is relatively sparse. The objective of the present research is to construct English/Chinese parallel corpus automatically from the World Wide Web. In this paper, an alignment method is presented which is based on dynamic programming to identify the one-to-one Chinese and English title pairs. The method includes alignment at title level, word level and character level. The longest common subsequence (LCS) is applied to find the most reliable Chinese translation of an English word. As one word for a language may translate into two or more words repetitively in another language, the edit operation, deletion, is used to resolve redundancy. A score function is then proposed to determine the optimal title pairs. Experiments have been conducted to investigate the performance of the proposed method using the daily press release articles by the Hong Kong SAR government as the test bed. The precision of the result is 0.998 while the recall is 0.806. The release articles and speech articles, published by Hongkong & Shanghai Banking Corporation Limited, are also used to test our method, the precision is 1.00, and the recall is 0.948.
This project analyzed 541,920 user queries submitted to and executed in an academic Website during a four-year period (May 1997 to May 2001) using a relational database. The purpose of the study is three-fold: (1) to understand Web users' query behavior; (2) to identify problems encountered by these Web users; (3) to develop appropriate techniques for optimization of query analysis and mining. The linguistic analyses focus on query structures, lexicon, and word associations using statistical measures such as Zipf distribution and mutual information. A data model with finest granularity is used for data storage and iterative analyses. Patterns and trends of querying behavior are identified and compared with previous studies.
This article focuses on analysing students' information needs in terms of conceptual understanding of the topic they propose to study and its consequences for the search process and outcome. The research subjects were 22 undergraduates of psychology attending a seminar for preparing a research proposal for a small empirical study. They were asked to make searches in the PsycINFO database for their task in the beginning and end of the seminar. A pre- and postsearch interview was conducted in both sessions. The students were asked to think aloud in the sessions. This was recorded, as were the transaction logs. The results show that during the preparation of research proposals different features of the students' conceptual structure were connected to the search success. Students' ability to cover their conceptual construct by query terms was the major feature affecting search success during the whole process. In the beginning also the number of concepts and the proportion of subconcepts in the construct contributed indirectly via search tactics to retrieving partly useful references. Students' ability to extract new query terms from retrieved items improved search results.
Subject classifications and thesauri have become more important than ever in the Web environment. Efforts made to organize information into subject classifications, or taxonomies, offer users the opportunity to substantially improve the effectiveness of their search and retrieval activities. This article continues earlier research on the development of a new definition of the field of information science and the creation of a "map" of the field showing subjects central to it and their relationships to those on the periphery. A case study describes the creation of a new classification structure (taxonomy) for the Information Science Abstracts (ISA) database, aiming to reflect and accommodate the rapid and continued technological and market changes affecting the information industry today and into the future. Based on a sample of some 3,000 ISA abstracts, two validation experiments were conducted by a three-member team comprising a database editor, a reference librarian, and an abstractor-indexer, who represent three of the major communities within the information science field. In the first experiment, the sample of abstracts was classified according to the proposed new taxonomy; after analysis of the data and revision of the taxonomy, it was revalidated and fine tuned in a second experiment. Indexer consistency measures obtained in this study were significantly higher than those found in previous studies. The taxonomy resulting from this research employs the concepts, definition, and map of information science previously developed. It presents them in an organized hierarchical view of the field and thus makes a significant contribution to information science.
A search for information can be viewed as a series of decisions made by the searcher. Two dimensions of the search environment affect a user's decisions: the user's knowledge, and the configuration of the information retrieval system. Drawing on previous findings on users' lack of search or domain knowledge, this article investigates what the user needs to know to make informed search decisions at the United States Bureau of Labor Statistics (BLS) Web site, which provides statistical information on labor and related topics. Its extensive Web site is a rich collection of statistical information, ranging from individual statistics such as the current Consumer Price Index (CPI), to a large statistical database called LABSTAT that can be queried to construct a table or time series on the fly. Two models of the search environment and the query process in LABSTAT are presented. They provide complementary views of the decision points at which help may be needed, and also suggest useful help content. Extensive examples based on the industry concept illustrate how the information could assist users' search decisions. The article concludes with a discussion of the role of help facilities in Web searching, and the interesting question of how to initiate the provision of help.
The fields of informetrics and scientometrics have suffered from the lack of a powerful test to detect the differences between two samples. We show that the Mann-Whitney test is a good test on the publication productivity of journals and of authors. Its main limitation is a lack of power on small samples that have small differences. This is not the fault of the test, but rather reflects the fact that small, similar samples have little to distinguish between them.
Productivity and impact of the Spanish Council for Scientific Research scientists in Natural Resources and Chemistry by gender and professional category are analysed. Scientific publications were downloaded from the Science Citation Index, years 1994-1999. A total of 260 Natural Resources scientists (24% of females) and 219 Chemistry ones (38% of females) were studied. Productivity tended to increase as professional category improved in the two areas. Within each category no significant differences in productivity were found between genders, but the outliers with the highest production were mostly males. Distribution of females by professional categories and number of years at the institution were analysed to detect possible gender discrimination in the promotion system. A more positive picture e merges in Chemistry than in Natural Resources, since a process of feminization of that area has started in the lowest professional categories and females' progression to the upper ranks is expected in the near future.
This contribution deepens the feasibility issues of building state-of-the-art patent indicators wit h historical patent documents available in electronic form from the German Patent Office since the introduction of the Patent Law for the German Empire in 1877. The paper is divided into two parts : a methodological discussion and a case study on the chemical sector in Germany. The development of the technology sector defined matches remarkably well with stylised facts that institutional analysis in the chemical sector have provided us with so far. Moreover, the possibility of varying the level of aggregation in the analysis of technological are as discloses empirical evidence for the path-dependent development in the chemical sector after the advent of the organic chemistry and its application in the chemical synthesis of dye stuffs. Our findings enhance institutional and historical contributions about technological change in the chemical sector and suggest new research questions for innovation studies.
Recent studies have reported on a steady decline of Sweden's relative citation impact in almost all science fields, above all in the life sciences. The authors attempt to shed light on the observed decline in Swedish neuroscience through a detailed citation analysis at different level of aggregations. Thus national citation data are de composed to the institutional, departmental and individual level. Both, the decomposition of national science indicators and changing collaboration patterns in Swedish neuroscience reveal interesting details on the 'anatomy' of a decline.
Today science policy makers in many countries worry about a brain drain, i.e., about permanently losing their best scientists to other countries. However, such a brain drain has proven to be difficult to measure. This article reports a test of bibliometric methods that could possibly be used to study the brain drain on the micro-level. An investigation of elite mobility must solve the three methodological problems of delineating a specialty, identifying a specialty's elite and identifying international mobility and migration. The first two problems were preliminarily solved by combining participant lists from elite conferences (Gordonconferences) and citation data. Mobility was measured by using the address information of publication data bases. The delineation of specialties has been identified as the crucial problem in studying elite mobility on the micro-level. Policy concerns of a brain drain were confirmed by measuring the mobility of the biomedical Angiotensin specialty.
Previous research has shown that Weblink based metrics can correlate with traditional research assessment at the university level. In this study, we test whether the same is true for the computer science departments in the UK. The relevant Web Impact Factors (WIFs) were calculated from the link data collected both from Alta Vista and the special academic crawler of the University of Wolverhampton.The numbers of staff members and Web pages in each computer science department were used as denominators for the WIFs calculation. The number of inlinks to the computer science departments correlated significantly with their research productivities, and WIFs with numbers of staff members as denominators correlated significantly with their Research Assessment Exercise (RAE) ratings.The number of staff members was confirmed to be a better indicator of departmental size than the number of Web pages within the department's domain.
Evaluation studies of scientific performance conducted during the past years more and more focus on the identification of research of the 'highest quality', 'top' research, or 'scientific excellence'. This shift in focus has lead to the development of new bibliometric methodologies and indicators. Technically, it meant a shift from bibliometric impact scores based on average values such as the average impact of all papers published by some unit to be evaluated towards indicators reflecting the top of the citation distribution, such as the number of 'highly cited' or 'top' articles. In this study we present a comparative analysis of a number of standard and new indicators of research performance or 'scientific excellence', using techniques applied in studies conducted by CWTS in recent years. It will be shown that each type of indicator reflects a particular dimension of the general concept of research performance. Consequently, the application of one single indicator only may provide an incomplete picture of a unit's performance. It is argued that one needs to combine the various types of indicators in order to offer policy makers and evaluators valid and useful assessment tools.
The author evaluates two major models of technological competitiveness of nations, and proposes a synthesized model based on their strengths and then complements it with additional measures. The paper addresses definition distinctions between the terms "competition" and "innovation", discusses the differing views on whether certain statistics are either input or output indicators, and reconsiders the relevance of the unit of analysis.
This article depicts some features of the geography of science and technology outputs in the EU, with a particular attention to regional "co-location" of these two pillars of the "knowledge-based society". Economists have, for a decade, paid great attention to local "spillovers" stating that industrial firms often draw advantages from the presence of nearby academic centres. The presence in the same areas of strong academic and technological resources is both a condition and a result of science-technology interactions. Concentrating on publications and patents as proxies of the science and technology level in regions, we built a typology of regions according to their commitment to the two knowledge-base activities and then analysed the co-locations of science and technology from several points of view. A fine-grain lattice, mainly based on standard Nuts3 level, was used. Co-location, at the EU level, is not a general rule. A strong potential for spillover/interaction does exist in the top-class regions which concentrate a high proportion of European S and T output. But for regions with a small/medium level of S&T activity, a divergence of orientations appears between a science-oriented family and a technology-oriented family, indicating an imbalance between local S and T resources. If we look at the S-oriented regions, whilst controlling for underlying factors, such as population and regional economic product, a significant geographic linkage between T and S appears. This suggests a trajectory of science-based technological development. A careful examination of S&T thematic alignments and specialisation is necessary to develop the hypothesis that fostering academic resources could increase the technological power along a growth path.
Network Denial-of-Service (DoS) attacks, which exhaust server resources and network bandwidth, can cause the target servers to be unable to provide proper services to the legitimate users and in some cases render the target systems inoperable and/or the target networks inaccessible. DoS attacks have now become a serious and common security threat to the Internet community. Public Key Infrastructure (PKI) has long been incorporated in various authentication protocols to facilitate verifying the identities of the communicating parties. The use of PKI has, however, an inherent problem as it involves expensive computational operations such as modular exponentiation. An improper deployment of the public-key operations in a protocol could create an opportunity for DoS attackers to exhaust the, server's resources. This paper presents a public-key based authentication and key establishment protocol coupled with a sophisticated client puzzle, which together provide a versatile solution for possible DoS attacks and various other common attacks during an authentication process. Besides authentication, the protocol also supports a joint establishment of a session key by both the client and the server, which protects the session communications after the mutual authentication. The proposed protocol has been validated using a formal logic theory and has been shown, through security analysis, to be able to resist, besides DoS attacks, various other. common attacks.
This article discusses the accelerating trend of ownership rights in digital property, copyright, in specific. This trend is in contrast to the stated legislative purpose of copyright law to be neutral as to the technology that either owners employ to embody the copyrighted work or that others employ to facilitate access and use of the work. Recent legislative initiatives as well as interpretive court decisions have undermined this important concept. There is an ascendancy of digital ownership rights that threatens to undermine the concept of technological neutrality, which in essence guarantees that ownership and well as "use" rights apply equally to analog and digital environments. The result of this skewing is two-fold: an unstable environment with respect to the access and use rights of individuals, institutions, and other users of copyrighted material, and the incentive of copyright owners to present works to the public in digital formats alone, where ownership rights are strongest. This article attempts to plot that digital ascendancy and demonstrate the undermining of neutrality principles.
User studies demonstrate that nondomain experts do not use the same information-seeking strategies as domain experts. Because of the transformation of integrated library systems into Information Gateways in the late 1990s, both nondomain experts and domain experts have had available to them the wide range of information-seeking strategies in a single system. This article describes the results of a study to answer three research questions: (1) do nondomain experts enlist the strategies of domain experts? (2) if they do, how did they learn about these strategies? and (3) are they successful using them? Interviews, audio recordings, screen captures, and observations were used to gather data from 14 undergraduate students who searched an academic library's Information Gateway. The few times that the undergraduates in this study enlisted search strategies that were characteristic of domain experts, it usually took perseverance, trial-and-error, serendipity, or a combination of all three for them to find useful information. Although this study's results provide no compelling reasons for systems to support features that make domain-expert strategies possible, there is need for system features that scaffold nondomain experts from their usual strategies to the strategies characteristic of domain experts.
We chronicle the use of acknowledgments in 20th-century scholarship by analyzing and classifying more than 4,500 specimens covering a 100-year period. Our results show that the intensity of acknowledgment varies by discipline, reflecting differences in prevailing sociocognitive structures and work practices. We demonstrate that the acknowledgment has gradually established itself as a constitutive element of academic writing, one that provides a revealing insight into the nature and extent of subauthorship collaboration. Complementary data on rates of coauthorship are also presented. to highlight the growing importance of collaboration and the increasing division of labor in contemporary research and scholarship.
In this article we suggest a user-subjective approach to Personal Information Management (PIM) system design. This approach advocates that PIM systems relate to the subjective value-added attributes that the user gives to the data stored in the PIM system. These attributes should facilitate system use: help the user find the information item again, recall it when needed, and use it effectively in the next interaction with the item. Driven from the user-subjective approach are three generic principles which are described and discussed: (a) The subjective classification principle, stating that all information items related to the same subjective topic should be classified together regardless of their technological format; (b) The subjective importance principle, proposing that the subjective importance of information should determine its degree of visual salience and accessibility; and (c) The subjective context principle, suggesting that information should be retrieved and viewed by the user in the same context in which it was previously used. We claim that these principles are only sporadically implemented in operating systems currently available on personal computers, and demonstrate alternatives for interface design.
To better understand how individuals and groups derive satisfaction from information, it is important to identify the information source preferences they apply in information seeking and decision making. Four informal propositions drove the structure and underlying logic of this study, forming a preliminary outline of a theory of information source preference profiles and their influence on information satisfaction. This study employed Social Judgment Analysis (SJA) to identify the information judgment preferences held by professional groups for six selected information sources: word of mouth, expert oral advice, Internet, print news, nonfiction books, and radio/television news. The research was designed as an hypothesis-generating exploratory study employing a purposive sample (n = 90) and generated four empirically supported, testable hypotheses about user satisfaction with information sources. The SJA judgment functions revealed the influences of volume and polarity (i.e., positive versus negative information) on information satisfaction. By advancing the understanding of how information source preferences can be identified empirically and their influence on information satisfaction, this research reflects a first, small step toward understanding "satisficing." Satisficing behaviors result in early termination of information search processes when individuals, facing incomplete information, are sufficiently satisfied to assume risks and execute decisions.
This article introduces the concept of relevance as viewed and applied in the context of IR evaluation, by presenting an overview of the multidimensional and dynamic nature of the concept. The literature on relevance reveals how the relevance concept, especially in regard to the multidimensionality of relevance, is many faceted, and does not just refer to the various relevance criteria users may apply in the process of judging relevance of retrieved information objects. From our point of view, the multidimensionality of relevance explains why some will argue that no consensus has been reached on the relevance concept. Thus, the objective of this article is to present an overview of the many different views and ways by which the concept of relevance is used-leading to a consistent and compatible understanding of the concept. In addition, special attention is paid to the type of situational relevance. Many researchers perceive situational relevance as the most realistic type of user relevance, and therefore situational relevance is discussed with reference to its potential dynamic nature, and as a requirement for interactive information retrieval (IIR) evaluation.
The explosion of the field of molecular biology is paralleled by the growth in usage and acceptance of Web-based genomic and proteomic databases (GPD) such as GenBank and Protein Data Bank in the scholarly communication of scientists. Surveys, case studies, analysis of bibliographic records from Medline and CAPlus, and examination of "Instructions to Authors" sections of molecular biology journals all confirm the integral role of GPD in the scientific literature cycle. Over the past 20 years the place of GPD in the culture of molecular biologists was observed to move from tacit implication to explicit knowledge. Originally journals suggested deposition of data in GDP but by the late 1980s, the majority of journals mandated deposition of data for a manuscript to be accepted for publication. A surge subsequently occurred in the number of articles retrievable from Medline and CAPlus using the keyword "GenBank." GPD were not found to a new form of publication, but rather a fundamental storage and retrieval mechanism for vast amounts of molecular biology information that support the creation of scientific intellectual property. For science to continue to advance, scientists unequivocally agreed that GDP must remain free of peer-review and available at no charge to the public. The results suggest that the existing models of scientific communication should be updated to incorporate GDP data deposition into the current continuum of scientific communication.
Multidimensional data analysis or On-line analytical processing (OLAP) offers a single subject-oriented source for analyzing summary data based on various dimensions. We demonstrate that the OLAP approach gives a promising starting point for advanced analysis and comparison among summary data in informetrics applications. At the moment there is no single precise, commonly accepted logical/conceptual model for multidimensional analysis. This is because the requirements of applications vary considerably. We develop a conceptual/logical multidimensional model for supporting the complex and unpredictable needs of informetrics. Summary data are considered with respect of some dimensions. By changing dimensions the user may construct other views on the same summary data. We develop a multidimensional query language whose basic idea is to support the definition of views in a way, which is natural and intuitive for lay users in the informetrics area. We show that this view-oriented query language has a great expressive power and its degree of declarativity is greater than in contemporary operation-oriented or SOL (Structured Query Language)-like OLAP query languages.
Collaboration is often a critical aspect of scientific research, which is dominated by complex problems, rapidly changing technology, dynamic growth of knowledge, and highly specialized areas of expertise. An individual scientist can seldom provide all of the expertise and resources necessary to address complex research problems. This paper describes collaboration among a group of scientists, and considers how their experiences are socially shaped. The scientists were members of a newly formed distributed, multi-disciplinary academic research center that was organized into four multi-disciplinary research groups. Each group had 14 to 34 members, including faculty, postdoctoral fellows and students, at four geographically dispersed universities. To investigate challenges that emerge in establishing scientific collaboration, data were collected about members' previous and current collaborative experiences, perceptions regarding collaboration, and work practices during the center's first year of operation. The data for the study includes interviews with members of the center, observations of videoconferences and meetings, and a center-wide sociometric survey. Data analysis has led to the development of a framework that identifies forms of collaboration that emerged among scientists (e.g., complementary and integrative collaboration) and associated factors, which influenced collaboration including personal compatibility, work connections, incentives, and infrastructure. These results may inform the specification of social and organizational practices, which are needed to establish collaboration in distributed, multi-disciplinary research centers.
There has been considerable interest in recent years in providing automated information services, such as information classification, by means of a society of collaborative agents. These agents augment each other's knowledge structures (e.g., the vocabularies) and assist each other in providing efficient information services to a human user. However, when the number of agents present in the society increases, exhaustive communication and collaboration among agents result in a large communication overhead and increased delays in response time. This paper introduces a method to achieve selective interaction with a relatively small number of potentially useful agents, based on simple agent modeling and acquaintance lists. The key idea presented here is that the acquaintance list of an agent, representing a small number of other agents to be collaborated with, is dynamically adjusted. The best acquaintances are automatically discovered using a learning algorithm, based on the past history of collaboration. Experimental results are presented to demonstrate that such dynamically learned acquaintance lists can lead to high quality of classification, while significantly reducing the delay in response time.
While collaboration is associated with higher article citation rates, a body of research has suggested that this is, in part, related to the access to a larger social network and the increased visibility of research this entails, rather than simply a reflection of greater quality. We examine the role of networks in article citation rates by investigating article publication by the nine New Zealand Government-owned Crown Research Institutes (CRIs), drawing on the Science Citation Index. We analyse an aggregate data set of all CRI publications with duplicates removed, and, in addition, investigate each CRI. We find that a greater number of authors, countries and institutions involved in co-publication increases expected citation rates, although there are some differences between the CRIs. However, the type of co-publication affects the expected citation rates. We discover a 'periphery effect' where greater levels of co-publication with domestic institutions decreases expected citation rates. We conclude that scientists working on the periphery looking to increase the visibility of their research should strive to link their research to the international research community, particularly through co-publication with international authors.
The performance of Brazilian male and female scientists in three scientific fields was assessed through their publications in the Science Citation Index from 1997-2001. Information on their sex and their ages, positions, and fellowship status was obtained from a census on all Brazilian scientists. The results showed that women participated most in immunology, moderately in oceanography and least in astronomy. Men and women published similar numbers of papers, and they were also of similar potential impact; they were also equally likely to collaborate internationally their salaries, suggesting that some sexual discrimination may still be occurring in the Brazilian peer-review process.
We investigate the relationship between the science intensity of technology domains and country's performance within these domains. The number of references in patents to scientific articles is considered as an approximation of the science intensity of a technology domain whereas a country's technological performance is measured in terms of its technological productivity (i.e. number of patents per capita). We use USPTO patent-data for eight European countries in ten technological domains. A variance analysis (ANOVA) is applied. Country as an independent variable does not explain a significant portion of the observed variance in science intensity (p=0.25). Technology domain, however, explains a significant portion of the observed variance (p<0.001). Moreover, in science intensive fields we find a positive relation between the science linkage intensity of these fields and the technological productivity of the respective countries involved. These findings seem to suggest the relevancy of designing innovation policies, aimed at fostering interaction between knowledge generating actors and technology producers, in a field specific manner.
Honour Index (HoI), a method to evaluate research performance within different research fields, was derived from the impact factor (IF). It can be used to rate and compare different categories of journals. HoI was used in this study to determine the scientific productivity of stem cell research in the Asian Four Dragons (Hong Kong, Singapore, South Korea and Taiwan) from 1981 to 2001. The methodology applied in this study represents a synthesis of universal indicator studies and bibliometric analyses of subfields at the micro-level. We discuss several comparisons, and conclude the developmental trend in stem cell research for two decades.
This paper investigates how the use of different statistical methods and study design characteristics affected the number of citations in psychiatric journals. Original research articles (N=448) from four psychiatric journals were reviewed. Aspects measured included the use of statistical methodology, presentation of results, description of procedures, country of the corresponding author and number of the authors. The use of statistical methods was not strongly associated with the further utilisation of an article. The effect was low compared to the impact of correspondence address or number of authors. Extended description of statistical procedures and an experimental study design had a positive effect to the received citations.
Two computer-based style programs were used to analyse the Abstracts, Introductions and Discussions of 80 educational psychology journal articles. Measures were made of the overall readability of the texts as well as of sentence lengths, difficult and unique words, articles, prepositions and pronouns. The results showed that the Abstracts scored worst on most of these measures of readability, the Introductions came next, and the Discussions did best of all. However, although the mean scores between the sections differed, the authors wrote in stylistically consistent ways across the sections. Thus readability was variable across the sections but consistent within the authors.
Neuroscience is one of the most active research fields in many countries including China, an economically and scientifically emerging country, where a rapid development has been occurred since the 1970s. In this study, a MEDLINE-based bibliometric analysis of Chinese international output in neuroscience was conducted for the period from 1984 through 2001. An attempt was made to identify the pattern of the growth and to obtain some quantitative indicators over the literature studied in order to review at the developing steps of neuroscience in China during the period.
The Internet has fostered a faster, more interactive and effective model of scholarly publishing. However, as the quantity of information available is constantly increasing, its quality is threatened, since the traditional quality control mechanism of peer review is often not used (e.g., in online repositories of preprints, and by people publishing whatever they want on their Web pages). This paper describes a new kind of electronic scholarly journal, in which the standard submission-review-publication process is replaced by a more sophisticated approach, based on judgments expressed by the readers: in this way, each reader is, potentially, a peer reviewer. New ingredients, not found in similar approaches, are that each reader's judgment is weighted on the basis of the reader's skills as a reviewer, and that readers are encouraged to express correct judgments by a feedback mechanism that estimates their own quality. The new electronic scholarly journal is described in both intuitive and formal ways. Its effectiveness is tested by several laboratory experiments that simulate what might happen if the system were deployed and used.
People working in physical proximity have access to information about one another. Much of this information is unavailable when people collaborate remotely using groupware. Being aware of other members of a team in a collaborative environment involves knowing both what people are doing and what is happening to the shared information space or artifact. Increasing the amount of information about the group in a computer-mediated system may increase the group's ability to complete the task. This article reports on a study that examined group performance on a task that was computer mediated with and without awareness information. In particular, the study examined how an awareness tool impacts the quality of the work effort and the communications between group members in the completion of a collaborative authoring task. The study found that the use of an awareness tool decreased the quality of the work effort. The number of communications also decreased when the tool was used. Although the results contradict some of the theoretical predictions, an examination of the data suggests theoretical support for a more complex interaction. There was evidence that the awareness tool may have reduced the users' need to communicate and this reduction in communications may have caused the reduction in the quality of the work effort. There is also data to suggest that the existence of the awareness tool may have negatively influenced the effort of some participants, and, it was that effort reduction that caused the reduction in the quality of the product.
Hierarchical text classification or simply hierarchical classification refers to as signing a document to one or more suitable categories from a hierarchical category space. In our literature survey, we have found that the existing hierarchical classification experiments used a variety of measures to evaluate performance. These performance measures often assume independence between categories and do not consider documents misclassified into categories that are similar or not far from the correct categories in the category tree. In this paper, we therefore propose new performance measures for hierarchical classification. The proposed performance measures consist of categoty similarity measures and distance-based measures that consider the contributions of misclassified documents. Our experiments on hierarchical classification methods based on SVM classifiers and binary Naive Bayes classifiers showed that SVM classifiers perform better than Naive Bayes classifiers on Reuters-21578 collection according to the extended measures. A new classifier-centric measure called blocking measure is also defined to examine the performance of subtree classifiers in a top-down level-based hierarchical classification method.
Little research has compared youngsters' use of CDROM and the Internet for information-seeking purposes. Nevertheless, the area has recently been addressed within a largely qualitative project more generally devoted to young people's information universes. Home access to the Internet was seen to be more limited than that to CD-ROM, although the former was consulted to tackle needs of a greater number of types. The strategies employed to exploit each form of information resource were essentially similar. No attempts were reported to check the credibility of any information retrieved from electronic sources. The Internet was, however, used more frequently beyond the informants' own homes than was CD-ROM. There was also greater employment of the Internet by adults acting on the youngsters' behalf. As Internet use for school purposes rose in accordance with age, that of CD-ROM declined. When youngsters themselves compared the two resources as information-seeking tools, CD-ROM software was criticized for its lack of detailed material and the Internet for the problems in locating what was desired. Project findings have implications in a range of areas, including the marketing of CD-ROM packages, research and development and practices within schools.
We present a preliminary empirical test of a maximum likelihood approach to using relevance data for training information retrieval (IR) parameters. Similar to language models, our method uses explicitly hypothesized distributions for documents and queries, but we add to this an explicitly hypothesized distribution for relevance judgments. The method unifies document-oriented and query-oriented views. Performance is better than the Rocchio heuristic for document and/or query modification. The maximum likelihood methodology also motivates a heuristic estimate of the MILE optimization. The method can be used to test competing hypotheses regarding the processes of authors' term selection, searchers' term selection, and assessors' relevancy judgments.
This study reports an analysis of American Chemical Society electronic journal downloads at Cornell University by individual IP addresses. While the majority of users (IPs) limited themselves to a small number of both journals and article downloads, a small minority of heavy users had a large effect on total journal downloads. There was a very strong relationship between the number of article downloads and the number of users, implying that a user-population can be estimated by just knowing the total use of a journal. Aggregate users (i.e. Library Proxy Server and public library computers) can be regarded as a sample of the entire user population. Analysis of article downloads by format (PDF versus HTML) suggests that individuals are using the system like a networked photocopier, for the purposes of creating print-on-demand copies of articles.
''Transliteration" in linguistics means the system of conveying as nearly as possible by means of one set of letters or characters the pronunciation of the words in languages written and printed in a totally different script. This term may be applied to a transcription in Latin letters of Greek, Hebrew, or the Slavonic languages written in the Cyrillic alphabet. We present in this article Greeklish, a Windows application that automatically produces English to Greek transliteration and back-transliteration (retransliteration). This transliteration is based on an algorithm with a table of associations between the two character sets. This table can be modified by the user so that it can cover personal preferences or formal present and future rules. The novelty of this system is its speed of operation, its simplicity, and its ease of use. Our examples use a Greek to Latin (English) alphabet mapping, but the Greeklish application can easily use any X to Latin mapping, where X is any non-Latin alphabet.
The input and output information of a national project of Japan for nano-technology will be analysed. In 1996 Japanese government stipulated a guideline to evaluate national technology projects on economic criteria as well as technological ones. In addition to the criteria intrinsic to economy but extrinsic to technology and unfriendly to technologists, however, another view more intrinsic to technology may be useful as well. This study will attempt to complement the governmentally stipulated evaluation method with a bibliometric one. Considering the interdisciplinary approach as a merit of national projects, this study will analyse how interdisciplinary information was used as input and was published as output by the project. Focussing on the publication behaviours of the project, information flow from technology to science or a development pattern of science pulled by technology will be discussed. Finally, the matching of the evaluation criteria to technology development and the friendliness of evaluation methods to technologists will be discussed.
This article analyses changes in publication patterns over a twenty-year period at Norwegian universities. Based on three surveys among academic staff; 1982, 1992 and 2001. covering all kinds of publications, the following general conclusions are drawn: (a) co-authorship has become more common, (b) the extent of publishing directed towards an international audience has increased, (c) the scientific article in an international journal has enhanced its position as the dominating type of publication, and (d) the number of publications per academic staff member has increased. The largest changes have taken place within the social sciences, which to an increasing extent approaches the publication pattern in the natural sciences. On the other hand, the large productivity differences between individual researchers have remained remarkably stable over the two decades in all fields of learning.
Age effects in scientific production are a consolidated stylised fact in the literature. At the level of scientist productivity declines with age following a predictable pattern. The problem of the impact of age structure on scientific productivity at the level of institutes in much less explored. The paper examines evidence from the Italian National Research Council. The path of hiring of junior researchers along the history of the institution is reconstructed. We find that age structure has a depressing effect on productivity and derive policy implications. The dynamic of growth of research institutes is introduced as a promising research field.
Seismology has several features that suggest it is a highly internationalized field: the subject matter is global, the tools used to analyse seismic waves are dependent upon information technologies, and governments are interested in funding cooperative research. We explore whether an emerging field like seismology has a more internationalised structure than the older, related field of geophysics. Using aggregated journal-journal citations, we first show that, within the citing environment, seismology emerged from within geophysics as its own field in the 1990s. The bibliographic analysis, however, does not show that seismology is more internationalised than geophysics: in 2000, seismology had a lower percentage of all articles co-authored on an international basis. Nevertheless, social network analysis shows that the core group of cooperating countries within seismology is proportionately larger and more distributed than that within geophysics. While the latter exhibits an established network with a hierarchy, the formation of a field in terms of new partnership relations is ongoing in seismology.
In recent years, the topic of knowledge production has been widely investigated in the advanced countries. However, the process by which knowledge is produced in the developing countries has not been fully explored or characterized. In Korea, the science and engineering fields strongly reflect systems of knowledge production in the universities and demonstrate the dynamics of systems of innovation for knowledge production. Through using a case study including data for knowledge production, in the field of information and telecommunication, the following general knowledge production, via domestic and foreign collaboration. Secondly, there has been an increasing trends towards the diversification of knowledge sources such as university-industry, and university-public research institutes. Finally, the establishment of a nation's knowledge base is influenced by governmental research and development policies.
Interdisciplinary research has been encouraged through the policies of many governmental and institutional funding agencies in Korea. This paper measured the degrees of interdisciplinary in individual and collaborative researches and analyzes the factors affecting it. This paper also examined flow of knowledge among different disciplines in science and engineering research using a database obtained from research proposals submitted to Korea Science and Engineering Foundation (KOSEF). The analysis indicated that 54.6% of collaborative research proposals were interdisciplinary, while 35.8% of individual research proposals were interdisciplinary. The analysis of knowledge inflow/outflow structure showed that Natural science served as a link between Life science and Engineering.
This study analyzes the age profile of scientific employees and its relation to personnel costs and scientific productivity within eight faculties at the University of Vienna. The age demography can overall be divided into two main categories: Category one faculties represent an increased number of younger aged researchers. (Catholic-, Protestant Theology, Law, Economics, Information Sciences, and Medicine), category two faculties show an increased number of older aged researchers (Social Sciences, Humanities, and Science). In addition, it can be demonstrated that the personnel costs for full professors are higher within four faculties (Catholic-, Protestant Theology, Law, and Economics and Information Sciences). Inevitably, this leads to savings for habilitated and non- habilitated researchers at these faculties. The faculty of Medicine represents a well-balanced use of personnel costs. Three faculties (Social Sciences, Humanities, and Sciences) have to pay dramatically more for their older aged habilitated and non-habilitated personnel. For the entire university and two faculties. Medicine and Humanities, a positive and significant relationship between age and the average weekly teaching performance is shown. This study suggests that institutions with a high percentage of older researchers, mainly in the categories of habilitated and non- habilitated personnel, must change their policy to become more flexible and attractive for new talented young people. Due to the fact, that this cannot only be realized through the introduction of new laws, each faculty must establish a scientific plan combined with reorganizations of the personnel structure and personnel costs.
The Web has become an important means of academic information exchange and can be used to give new insights into patterns of informal scholarly communication. This study develops new methods to examine patterns of university Web linking, focusing on Mainland China and Taiwan, and including language considerations. Multiple exploratory investigations into Web links were conducted between universities in these two places. Firstly, inlinks were counted to each university Web site from its national peers using four alternative Web document models. The results were shown to correlate significantly with research productivity in Taiwan but not in the Mainland, although in the latter case less reliable institutional data could have been the cause. For Taiwan, this is the first evidence of a scholarly association with academic linking for a non-English speaking region. It was then ascertained that the same link counts associated more strongly with scientific than social scientific research productivity in Taiwan. This confirms the general assumption of greater Web use by the hard sciences. We then investigated Taiwan-Mainland university cross-links, and found that although English is extensively used on the Web, there was no evidence that it was the language of preference for informal scholarly communication between the two areas.
This research creates an intelligent agent for automated collection development in a digital library setting. It uses a predictive model based on facets of each Web page to select scholarly works. The criteria came from the academic library selection literature, and a Delphi study was used to refine the list to 41 criteria. A Perl program was designed to analyze a Web page for each criterion and applied to a large collection of scholarly and nonscholarly Web pages. Bibliomining, or data mining for libraries, was then used to create different classification models. Four techniques were used: logistic regression, non-parametric discriminant analysis, classification trees, and neural networks. Accuracy and return were used to judge the effectiveness of each model on test datasets. In addition, a set of problematic pages that were difficult to classify because of their similarity to scholarly research was gathered and classified using the models. The resulting models could be used in the selection process to automatically create a digital library of Web-based scholarly research works. In addition, the technique can be extended to create a digital library of any type of structured electronic information.
Bibliographic databases contain surrogates to a particular subset of the complete set of literature; some databases are very narrow in their scope, while others are multidisciplinary. These databases overlap in their coverage of the literature to a greater or lesser extent. The topic of Fuzzy Set Theory is examined to determine the overlap of coverage in the databases that index this topic. It was found that about 63% of records in the data set are unique to only one database, and the remaining 37% are duplicated in from two to 12 different databases. The overlap distribution is found to conform to a Lotka-type plot. The records with maximum overlap are identified; however, further work is needed to determine the significance of the high level of overlap in these records. The unique records are plotted using a Bradford-type form of data presentation and are found to conform (visually) to a hyperbolic distribution. The extent and causes of intra-database duplication (records duplicated in the one database) are also examined. Finally, the overlap in the top databases in the dataset were examined, and a high correlation was found between overlapping records, and overlapping DIALOG OneSearch categories.
To understand the human experience of libraries and the implications this understanding has for library use and service, education, and design, 118 undergraduate students were asked to list three personally memorable incidents concerning library use. Following this, they were asked to write a short narrative of one of these experiences. Incidents reported by participants ranged from preschool to college age, and content analysis indicated that a majority took place at two or more grade levels, sometimes as early as the participant's first (preschool) visit to a library. Phenomenological analysis of individual narratives produced a thematic structure for each of the four grade levels represented in the data: elementary school and younger, middle school, high school, and college/adult. Themes common across all four levels include Atmosphere, Size and Abundance, Organization/Rules and Their Effects on Me, and What I Do in the Library. A theme of Memories was unique to narratives that took place during elementary and younger age levels. Although all remaining themes were noted across age levels, the relative importance of various themes and subthemes was different at different ages. Implications of the thematic structure for library practice are discussed.
The current research explores how intermediaries seek information from patrons, in particular by analyzing intermediaries' elicitation utterances through three dimensions-linguistic forms, utterance purposes, and communicative functions-to determine whether indeed any dimension appeared consistently, to be called "elicitation styles." Five intermediaries from four academic libraries (three national university libraries, one private university library) and one research institute library participated in the study. Thirty patrons with 30 genuine search requests were recruited; thus, 30 patron/intermediary information retrieval interactions making a total of 30 encounters were collected. Video/audio data were taped. Dialogues between patron and intermediary were transcribed. Statistical analysis revealed three types of elicitation styles among the five intermediaries, labeled, (1) situationally oriented, (2) functionally oriented, and (3) stereotyped. This study seeks an explanation for different elicitation styles. Qualitative analysis was applied to investigate "inquiring minds." An inquiring mind is termed to represent a mentality or tendency that one elicits certain threads of questions influenced by professional beliefs, individual characteristics, tasks, goals, and interactional contexts in conversation. The results of qualitative analysis specified three modes of inquiring minds of the intermediaries, namely: (1) information problem detection, (2) query formulation process, and (3) database instructions.
The papers presented at the 2002 Tri-Society Symposium on Chemical Information highlight questions we should consider as we develop new paradigms for information storage and retrieval systems. These new knowledge management systems will require novel approaches for data discovery, collection development, and the changing role of the librarian. This introductory essay discusses new and challenging integrated tools for data manipulation, the confusing and embryonic differential pricing and package deals for journal materials, and the changing role of the librarian in this rapidly transforming industry.
A "new model" academic chemistry library is proposed at the University of Illinois at Urbana-Champaign (UIUC) in which primary access to journals is electronic, replacing traditional print access, binding, and shelving. Print journals will continue to be purchased and archived unbound in a remote storage facility following unbound display and access for twelve months. The new model, initially proposed by administrative chemistry faculty, was assessed in a feasibility study which looked at the stability, quantity, and quality of electronic journals; it also included a survey of chemistry faculty, a review of internal management data, and an analysis of use of chemistry journals, both print and electronic. The feasibility study found support for the model in every area, but with a few caution flags and speed bumps predicted along the way.
Little data are available that can help librarians solve issues surrounding print versus online journals management, including ascertaining when print journals are no longer needed. This study examines the short-term effects of online availability on the use of print chemistry journals. The Duke University Chemistry Library gained access to Elsevier titles via ScienceDirect in February 2000. By comparing reshelving data for the print journals from 1999, 2000, and 2001, this study identifies the short-term changes in journals use that can be attributed to the introduction of ScienceDirect. In the first two years after ScienceDirect was introduced, use of print journals nearly halved. The diminished use of the print collection has important implications for collection management in sci-tech libraries.
Reader awareness of article corrections can be of critical importance in the physical and biomedical sciences. Comparison of errata and corrigenda in online versions of high-impact physical sciences journals across titles and publishers yielded surprising variability. Of 43 online journals surveyed, 17 had no links between original articles and later corrections. When present, hyperlinks between articles and errata showed patterns in presentation style, but lacked consistency. Variability in the presentation, linking, and availability of online errata indicates that practices are not evenly developed across the field. Comparison of finding tools showed excellent coverage of errata by Science Citation Index, lack of indexing in INSPEC, and lack of retrieval with SciFinder Scholar. The development of standards for the linking of original articles to errata is recommended.
An overview of. the development of electronic resources over the past three decades is provided, discussing key features, disadvantages, and benefits of traditional online databases and CD-ROM and Web-based resources. This analysis of gains and losses as information resources have shifted to the Internet provides a basis for identifying key issues in the decision to shift collections and resources toward purely digital formats. Ownership of content, licensing terms, and the proliferation of user interfaces are especially important and still unresolved concerns.
The project proposes and tests a comprehensive and systematic model of user evaluation of Web search engines. The project contains two parts. Part I describes the background and the model including a set of criteria and measures, and a method for implementation. It includes a literature review for two periods. The early period (1995-1996) portrays the settings for developing the model and the later period (1997-2000) places two applications of the model among contemporary evaluation work. Part 11 presents one of the applications that investigated the evaluation of four major search engines by 36 undergraduates from three academic disciplines. It reports results from statistical analyses of quantitative data for the entire sample and among disciplines, and content analysis of verbal data containing users' reasons for satisfaction. The proposed model aims to provide systematic feedback to engine developers or service providers for system improvement and to generate useful insight for system design and tool choice. The model can be applied to evaluating other compatible information retrieval systems or information retrieval (IR) techniques. It intends to contribute to developing a theory of relevance that goes beyond topicality to include value and usefulness for designing user-oriented information retrieval systems.
This paper presents an application of the model described in Part I to the evaluation of Web search engines by undergraduates. The study observed how 36 undergraduate used four major search engines to find information for their own individual problems and how they evaluated these engines based on actual interaction with the search engines. User evaluation was based on 16 performance measures representing five evaluation criteria: relevance, efficiency, utility, user satisfaction, and connectivity. Non-performance (user-related) measures were also applied. Each participant searched his/her own topic on all four engines and provided satisfaction ratings for system features and interaction and reasons for satisfaction. Each also made relevance judgements of retrieved items in relation to his/her own information need and participated in post-search interviews to provide reactions to the search results and overall performance. The study found significant differences in precision P,,, relative recall, user satisfaction with output display, time saving, value of search results, and overall performance among the four engines and also significant engine by discipline interactions on all these measures. In addition, the study found significant differences in user satisfaction with response time among four engines, and significant engine by discipline interaction in user satisfaction with search interface. None of the four search engines dominated in every aspect of the multidimensional evaluation. Content analysis of verbal data identified a number of user criteria and users evaluative comments based on these criteria. Results from both quantitative analysis and content analysis provide insight for system design and development, and useful feedback on strengths and weaknesses of search engines for system improvement.
This article proposes a summarization system for multiple documents. It employs not only named entities and other signatures to cluster news from different sources, but also employs punctuation marks, linking elements, and topic chains to identify the meaningful units (MUs). Using nouns and verbs to identify the similar MUs, focusing and browsing models are applied to represent the summarization results. To reduce information loss during summarization, informative words in a document are introduced. For the evaluation, a question answering system (QA system) is proposed to substitute the human assessors. In large-scale experiments containing 140 questions to 17,877 documents, the results show that those models using informative words outperform pure heuristic voting-only strategy by news reporters. This model can be easily further applied to summarize multilingual news from multiple sources.
Interdisciplinarity is considered the best way to face practical research topics since synergy between traditional disciplines has proved very fruitful. Studies on interdisciplinarity from all possible perspectives are increasingly demanded. Different interdisciplinarity measures have been used in case studies but, up to now, no general interdisciplinarity indicator useful for Science Policy purposes has been accepted. The bibliometric methodology presented here provides a general overview of all scientific disciplines, with special attention to their interrelation. This work aims to establish a tentative typology of disciplines and research areas according to their degree of interdisciplinarity. Interdisciplinarity is measured through a series of indicators based on Institute for Scientific Information (ISI) multi-assignation of journals in subject categories. Research areas and categories are described according to the quantity of their links (number of related categories) and their quality (with close or distant categories, diversity, and strength of links). High levels of interrelations between categories are observed. Four different types of categories are found through cluster analysis. This differentiates "big" interdisciplinarity, which links distant categories, from "small" interdisciplinarity, in which close categories are related. The location of specific categories in the clusters is discussed.
In their article "Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient," Ahlgren, Jarneving, and Rousseau fault traditional author cocitation analysis (ACA) for using Pearson's r as a measure of similarity between authors because it fails two tests of stability of measurement. The instabilities arise when rs are recalculated after a first coherent group of authors has been augmented by a second coherent group with whom the first has little or no cocitation. However, AJ&R neither cluster nor map their data to demonstrate how fluctuations in rs will mislead the analyst, and the problem they pose is remote from both theory and practice in traditional ACA. By entering their own rs into multidimensional scaling and clustering routines, I show that, despite rs fluctuations, clusters based on it are much the same for the combined groups as for the separate groups. The combined groups when mapped appear as polarized clumps of points in two-dimensional space, confirming that differences between the groups have become much more important than differences within the groups-an accurate portrayal of what has happened to the data. Moreover, r produces clusters and maps very like those based on other coefficients that AJ&R mention as possible replacements, such as a cosine similarity measure or a chi square dissimilarity measure. Thus, r performs well enough for the purposes of ACA. Accordingly, I argue that qualitative information revealing why authors are cocited is more important than the cautions proposed in the AJ&R critique. I include notes on topics such as handling the diagonal in author cocitation matrices, lognormalizing data, and testing r for significance.
Egghe has propounded the notion of Type/Token-Taken (T/TT) informetrics. In this note we show how his ideas relate to ones that are already well known in informetrics and resolve some of the specific problems posed.
The Swedish innovation system is analysed in terms of the interaction between academia, government and the private sector. For each of 21 Swedish regions we analyse the distribution of research activities, doctoral employment, and publication output, as well as the flow of doctoral graduates and the distribution of co-authorship links across regions and sectors. The three main urban regions have about 75 percent of all R&D activities and outputs. They also have a more balanced supply of academic, governmental and private research activities than the smaller regions, and the interactions among sectors with in these regions are more intense. The inter-regional flow of PhDs is also to the advantage of the big regions. So far, decentralization of the academic sector does not seem to have had as similar decentralizing effect on private R&D. Unless this imbalance changes, smaller regions will continue to be net exporters of skill and knowledge to the big region.
Although the systemic changes towards innovation networking between university-industry and governmental actors have recently found a place on the international policy and literature agenda, networking between the organizations and people - for the national survival, production and growth - has been deeply rooted in the Israeli system even before the establishment of the Israeli State in 1948. Internal and international constraints fostered the formation of personal links, as did institutional settings that promoted networking. This paper reviews the interaction of societal, organizational and cultural features that render innovation networks in Israel successful. The research focuses on the impacts of the Israeli Magnet Program on the Israeli R&D growth and performance. The implications of innovation networks for a late-developing country like Turkey are reviewed in the contexts of catching-up and cross-regional collaboration between the Israeli and Turkish industries and academics.
The interplay and cross-fertilization between science and technology, but also the specific role of science for technological development, have received ample attention in both the research and the policy communities. It is in this context that the concepts of "absorptive capacity" and "knowledge spillovers" play an important role. We operationalize the science-technology link by quantifying and modeling bibliographic references to the scientific literature as they occur in patents. This approach allows exploring the associative patterns between science creation (as emerging from the scientific literature) and technology development (as emerging from the patent literature). In the current paper, we focus on an analysis of the geographic distribution of the science citation patterns in patents, singling out two (different) fields of technological development, namely biotechnology and information technology. In both fields, the science citation flows from the European, Japanese and US science bases into USPTO and EPO-patents are explored and modeled. Intensive geographic citation flows between the regions are identified, pointing (amongst others) to the strength of both the US and the European science bases as sources for technological activity and creativity around the world.
Firms operating in science-based technological fields reflect some of the complexities of the science-technology interaction. The present study attempts to investigate these interactions by analyzing patent citations, publications and patent outputs of multinational corporations (MNCs) in 'thin film' technology. In particular we explore different characteristics of knowledge production and knowledge utilization of these firms. The results indicate no correlation between intensity of research activity and patents produced by the MNCs. The relationship between scientific and technological knowledge generation as well as the linkage between science and technology appear to be firm-specific rather than dependent on a technological or industrial sector. The dispersion of journal sources for the majority of patent citations of scientific literature as well as for the majority of scientific outputs is narrow. Basic journals play an important role in patent citation as well as in addressing research of MNCs in thin-film technology.
The challenges to conducting valid and complete outcome evaluations of cooperative research activities, like the National Science Foundation Industry/University Cooperative Research Centers (IUCRC) Program, are daunting. The current study tries to make a small but important contribution to this area by attempting to develop quantitative estimates of one center benefit - R&D cost avoidance. Cost avoidance is operationalized as R&D costs industrial members would have incurred but did not, because they participated in university-based industrial consortia, minus the costs of belonging to the consortia. Data were collected from a total of 18 industrial sponsors from three IUCRCs on 35 different research projects. Findings indicate that some firms do avoid R&D costs by participating in an IUCRC but the prevalence of this benefit varies across centers and across firms. The implications of these findings for policy, practice and future research are discussed.
This paper explores issues related to the impact of Science-Industry relationships on the knowledge production of academic research groups, in particular on the alleged shift to the more applied research end under the influence of business partners' needs. Our findings from a case study of the Belgian Katholieke Universiteit Leuven ((K.U. Leuven) show a significant steady growth over time of publications produced by academic research groups involved in University-Industry linkages, closely related to factors both internal and external to the university that have stimulated academic entrepreneurial behaviour. On an aggregated level for 1985-2000, basic research publications appear to be more present than applied ones, both in total numbers and in growth rates. Our findings show that applied and basic research publications generally rose together in the same year. No clear and generalised evidence of a shift towards the applied research end determined by the involvement in U-I linkages was found, the weak indications of such a shift within groups coming only for groups that have already high applied versus basic orientation. These results suggest that the academic research groups examined have developed a record of applied publications without affecting their basic research publications and, rather than differentiating between applied and basic research publications, it is the combination of basic and applied publications that consolidate the group's R&D potential. Accordingly, critical assessments of the University side of the emerging 'Triple Helix' need to take into account the dynamic nature of the research dimension.
This paper presents work directed at capturing the entrepreneurial and collaborative activity of university researchers. The Triple Helix points to the emergence of the entrepreneurial university as well as to an increasing overlay of activities in universities, industry and government. This study explores ways in which patent-based metrics could be utilized in a Triple Helix context, and how hybrid indicators could be developed by combining patent with survey data. More specifically, it aims to develop indicators that connect technological inventiveness of university researchers to both funding organizations and users, as well as to entrepreneurial activities by academics. The paper develops a simplified model of the innovation process to benchmark the relevance of the indicators to the Triple Helix. An analysis of Finnish academic patents illustrates that patent data can already provide useful indicators but, on its own, cannot provide information about how academic patents are interconnected with government or industry through funding or utilization links. An exclusive analysis of patents can point to patent concentrations on certain universities, to inventors and assignees, or to potential gaps in translating applied science into industrial technology. However, the patent data had to be combined with an inventor survey in order to relate academic patents more to their Triple Helix environment. The survey indicated that most patented academic inventions are connected to (often publicly funded) scientific research by the inventors and tend to be utilized in large firms rather than in start-up companies founded by academic entrepreneurs.
Growing income and wage inequality in a range of countries has raised concern. High-technology development may be contributing to this inequality, by encouraging higher wages at the upper end of the income distribution. Most studies of the possibility of this effect have used generic, aggregated data. In this paper, we introduce the possibility of linking wage inequality directly to specific industrial strategies using the Theil Index of inequality. This measure portrays the portion of wage inequality that is attributable to wages in specific industries. We illustrate this concept with data from U.S. states.
The paper presents a methodology for studying the interactions between science and technology. Our approach rests mostly on patent citation and co-word analysis. In particular, this study aims to delineate intellectual spaces in thin-film technology in terms of science/technology interaction. The universe of thin-film patents can be viewed as the macro-level and starting point of our analysis. Applying a bottom-up approach, intellectual spaces at the micro-level are defined by tracing prominent concepts in publications, patents, and their citations of scientific literature. In another step, co-word analysis is used to generate meso-level topics and sub-topics. Overlapping structures and specificities that emerge are explored in the light of theoretical understanding of science-technology interactions. In particular, one can distinguish prominent concepts among patent citations that either co-occur in both thin-film publications and patents or reach out to one of the two sides. Future research may address the question to what extent one can interpret directionality into this.
The aim of this mainly methodological paper is to present an approach for researching the triple helix of university-industry-government relations as a heterogeneous and multi-layered communication network. The layers included are: the formal scholarly communication in academic journals, the communication network based on project collaborations, and finally the communication of information over the 'virtual' network of web links. The approach is applied on typical 'Mode 2' fields such as biotechnology, while using a variety of data sources. We present some of the initial findings, which indicate the different structures and functions of the three layers of communication.
This paper reports on a new approach to study the linkage between science and technology. Unlike most contributions to this area we do not trace citations of scientific literature in patents but explore citations of patents in scientific literature. Our analysis is based on papers recorded in the 1996-2000 annual volumes of the CD-Edition of Science Citation Index (SCI) of the Institute for Scientific Information (ISI) and patent data provided by the US Patent and Trademark Office. Almost 30,000 US patents were cited by scientific research papers. We analysed the citation links by scientific fields and technological sectors. Chemistry-related subfields tended to cite patents more than other scientific area. Among technological sectors, chemical clearly dominates followed by drugs and medical patents as the most frequently cited categories. Further analyses included a country-ranking based on inventor-addresses of the cited patents, a more detailed inspection of the ten most cited patents, and an analysis of class-field transfers. The paper concludes with the suggestions for future research. One of them is to compare our 'reverse' citation data with 'regular' patent citation data within the same classification system to see whether citations occur, irrespectively of their directionality, in the same fields of science and technology. Another question is as to how one should interpret reverse citation linkages.
The aim of this paper is to propose a Vector Space Model as a new methodological approach which allows us to present the relationships between the elements of the Triple Helix Model (University, Industry, Government) in a spacial model by using the webpages of the National Research Councils of Germany and Spain as examples. Outlinks of the Biomedicine and Biology centres of these national councils were analysed with the intention of representing graphically these relationships through the Vector Space Model that allows for Multidimensional Scaling in three dimensions. Results show a map with the differences and similarities between the Spanish and German cases. It may be concluded that these results could become a qualitative indicator of a scientific and technical reality.
University-industry-government relations provide a networked infrastructure for knowledge-based innovation systems. This infrastructure organizes the dynamic fluxes locally and the knowledge base remains emergent given these conditions. Whereas the relations between the institutions can be measured as variables, the interacting fluxes generate a probabilistic entropy. The mutual information among the three institutional dimensions provides us with an indicator of this entropy. When this indicator is negative, self-organization can be expected. The self-organizing dynamic may temporarily be stabilized in the overlay of communications among the carrying agencies. The various dynamics of Triple Helix relations at the global and national levels, in different databases, and in different regions of the world, are distinguished by applying this indicator to scientometric and webometric data.
Survey coding is the task of assigning a symbolic code from a predefined set of such codes to the answer given in response to an open-ended question in a questionnaire (aka survey). This task is usually carried out to group respondents according to a predefined scheme based on their answers. Survey coding has several applications, especially in the social sciences, ranging from the simple classification of respondents to the extraction of statistics on political opinions, health and lifestyle habits, customer satisfaction, brand fidelity, and patient satisfaction. Survey coding is a difficult task, because the code that should be attributed to a respondent based on the answer she has given is a matter of subjective judgment, and thus requires expertise. It is thus unsurprising that this task has traditionally been performed manually, by trained coders. Some attempts have been made at automating this task, most of them based on detecting the similarity between the answer and textual descriptions of the meanings of the candidate codes. We take a radically new stand, and formulate the problem of automated survey coding as a text categorization problem, that is, as the problem of learning, by means of supervised machine learning techniques, a model of the association between answers and codes from a training set of precoded answers, and applying the resulting model to the classification of new answers. In this article we experiment with two different learning techniques: one based on naive Bayesian classification, and the other one based on multiclass support vector machines, and test the resulting framework on a corpus of social surveys. The results we have obtained significantly outperform the results achieved by previous automated survey coding approaches.
Author cocitation analysis was used to explore ongoing changes in the intellectual structure of the hybrid problem area of developmental dyslexia for the period 19941998, and to address ambiguities in results raised by an earlier study of these researchers for the years 1976-1993. Results suggest that: (1) discrepancies between the structure of the sociometric (personal) and author cocitation networks reflect real differences, not temporal factors; (2) differences between cocitation patterns and reports in the literature, and corresponding delays in the visibility of emerging perspectives, are likely due to the "inertia" of aggregate cocitation data and/or by shifts by neuroscience-vision researchers to publication in more prominent journals; (3) a sharp rise in link density for the neuroscience-vision subgroup indicates increased cohesiveness and growing maturation for this emerging perspective; (4) shifts in subgroup membership, link density, patterns of coauthorship, and multiple factor loadings suggest possible convergence between other subgroups in the network and identify individuals who may play boundary-spanning roles within the network; and (5) changing patterns of cocitation throughout the network suggest the increasing influence of studies relating to neurobiological mechanisms underlying dyslexia. The possible contributions of such boundary spanners in addressing the substantial information and communication challenges posed by the increased interdisciplinary character of scholarship in general also are discussed.
Topicality, while demonstrably an empirically manageable variable of investigation, engenders aspects of cognitive complexity that may, or may not, be easily managed during user interactions with IR systems. If an item retrieved from an IR system is considered to be on topic by a user, the meaning of that judgment may imply other underlying criteria. What makes an item on topic for users is the subject of this investigation. Although topicality has served to generate a great deal of attention in the body of information science literature, the meaning of topicality to IR system users has suffered from a lack of full understanding in designing more effective approaches to information search and retrieval. This investigation takes an inductive approach to the deductive extraction of characteristics that describe and explain how items retrieved from interactions with IR systems can be considered as on topic.
This study examines the literature of a multidisciplinary field, later-life migration, and evaluates the effectiveness of 12 bibliographic databases in indexing that literature. Five journals-three in social gerontology, one in rural sociology, and one in regional science-account for 40% of the papers published in this area. The disciplines that publish the most work on later-life migration are not necessarily those that provide the best index coverage, however. Moreover, four multidisciplinary databases each provide better index coverage than any single-subject index. The relatively low degree of overlap among the 12 databases suggests that scholars working on topics such as later-life migration must continue to rely on a wide range of bibliographic tools, both disciplinary and multidisciplinary.
Web citations have been proposed as comparable to, even replacements for, bibliographic citations, notably in assessing the academic impact of work in promotion and tenure decisions. We compared bibliographic and Web citations to articles in 46 journals in library and information science. For most journals (57%), Web citations correlated significantly with both bibliographic citations listed in the Social Sciences Citation Index and the ISI's Journal Impact Factor. Many of the Web citations represented intellectual impact, coming from other papers posted on the Web (30%) or from class readings lists (12%). Web citation counts were typically higher than bibliographic citation counts for the same article. Journals with more Web citations tended to have Web sites that provided tables of contents on the Web, while less cited journals did not have such publicity. The number of Web citations to journal articles increased from 1992 to 1997.
Technologies that were assumed to be critical or emerging in Materials, Manufacturing, and Industrial Engineering were combined from different sources. These were compared to recent data and trends based on publications as well as patents in these fields. Some of these technologies were found to be non-critical or non-emergent. Top-ten lists of critical and emerging technologies were derived using simple statistical tools and easily accessible databases. The present methodology is proposed as an effective procedure for priority setting in science and technology policy making.
This paper uses bibliographic coupling analysis to plot out a patent citation map. It explores the current research and development in the high-tech electronic companies in Taiwan, and the relationship between companies and industries. Fifty-eight high-tech electronic companies under this study, between 1998 and 2000, obtained 4,162 patents from U.S., and cited 24,852 patents during these years. Through the data from bibliographic coupling analysis, the paper categorizes these companies into 6 major groups: semiconductor, peripheral, scanner, notebook/monitor, system, IC design/packaging. This research also uses multidimensional scaling to plot out a patent citation map, graphically displaying the association among the groups. The result shows a higher similarity among companies in semiconductor sector, whereas the distinction between industries grows more and more ambivalent, even overlapping in some cases.
Nanotechnology and the sciences that are associated with it have attracted much attention. Experts from various fields believe that nanotechnology will be one of the key technologies affecting almost every aspect of the economy. While there are considerable efforts underway that aim to commercialise nanotechnology - carried by start-up companies as well as large internationally operating firms - most of the activity seems to focus on research and development activities. There have been a number of technology studies and investment reports that describe the opportunities associated with this emerging area. Over the years there have also been a number of bibliometric and patent studies that examined the field. This paper provides an overview of measuring nanotechnology with commonly used indicators of bibliometric and patent analyses.
The purpose of this study is to map semiconductor literature by author co-citation analysis in order to highlight major subject specializations in semiconductors and identify authors and their relationships within these specialties and within the field. Forty-six of the most productive authors were included in the sample list. Author samples were gathered from the INSPEC database from 1978 to 1997. The relatively low author co-citation frequencies indicate that there is a low connection among authors who publish in semiconductor journals and big differences among authors' research areas. Six sets of authors with co-citation greater than 100 times are M. Cardona and G. Lucovsky; T. Ito and K. Kobayashi; M. Cardona and G. Abstreiter; A. Y. Cho and H. Morkoc; C. R. Abernathy and W. S. Hobson; H. Morkoc and I. Akasaki. The Pearson correlation coefficient of author co-citation varies widely, i.e., from -0.17 to 0.92. This shows that some authors with high positive correlations are related in certain ways and co-cited, while other authors with high negative correlations may be rarely or never related and co-cited. Cluster analysis and multi-dimensional scaling are employed to create two-dimensional maps of author relationships in the cross-citation networks. It is found that the authors fall fairly clearly into three clusters. The first cluster covers authors in physics and its applications. The authors in the second group are experts in electrical and electronic engineering. The third group includes specialists in materials science. Because of its interdisciplinary nature and diverse subjects, semiconductor literature lacks a strong group of core authors. The field consists of several specialties around a weak center.
The scientific production measured by the number of mainstream joint publications, resulting from the cooperative research efforts between Chile and Spain, considering disciplines, application field, type of journal, impact factor, and research institutions involved, was analyzed for the 1991 2000 period. Databases from several institutions, such as the Institute for Scientific Information (ISI) in USA and other national organizations, were employed to quantify the number of publications and to determine the profile of the mutual-collaboration research groups of both countries. It was possible to establish the strong points of the mutual work in some disciplines and the formation of a critical mass of researchers, showing that the scientific cooperation between countries of emerging-economies, like Chile, and developed nations, as Spain, is possible and leads to mutual benefits.
Using sample data from the MathSciNet database from 1985 to 2000, we constructed the database and computer searching system of China's international cooperation in publication with other countries (or regions), and applied the international standard measure indexes of cooperation. The paper gives systematic scientific measure and evaluation of international mathematical research, especially for China. It also presents a matrix model of the cooperation network. During the 16 years, the trend toward cooperation of international mathematical research has increased substantially. The number of internationally co-authored papers increased at a speed of 6.99% per year in the word and at 15.91% per year in China.
According to GARFIELD (1980), most scientists can name an example of an important discovery that had little initial impact on contemporary research. And he uses by Mendel's work as a classical example. Delayed recognition is sometimes used by scientists as an argument against citation-based indicators based on citation windows defined for a short- or medium-term initial period beginning with the paper's publication year. This study is focussed on a large-scale analysis of the citation history of all papers indexed in the 1980 annual volume of the Science Citation Index. The objective is two-fold, particularly, to analyse whether the share of delayed recognition papers is significant and whether such papers are typical of the work of their authors at that time. In a first step, the background of advanced bibliometric models by Glanzel, Egghe, Rousseau and Burrell of stochastic citation processes and first-citation distributions is described briefly. The second part is devoted to the bibliometric analysis of first-citation statistics and of the phenomenon of citation delay. In a third step, finally, delayed reception publications have been studied individually. Their topics and the citation patterns of other papers by the same authors have been studied to uncover principles of regularity or exceptionality of delayed reception publications.
Since their arrival in the 1960s, electronic databases have been an invaluable tool for informetricians. Databases and their delivery mechanism have provided both the source of raw data, as well as the analytical tools for many informetric studies. In particular, the citation databases produced by the Institute for Scientific Information have been the key source of data for a whole range of citation-based research. However, there are also many problems and challenges associated with the use of online databases. Most of the problems arise because databases are designed primarily for information retrieval purposes, and informetric studies represent only a secondary use of the systems. The sorts of problems encountered by informetricians include: errors or inconsistency in the data itself, problems with the coverage, overlap and changeability of the databases; as well as problems and limitations in the tools provided by the database hosts such as DIALOG. For some informetric studies, the only viable solution to these problems is to download the data and perform offline correction and data analysis.
The application of online analytical processing (OLAP) technology to bibliographic databases is addressed. We show that OLAP tools can be used by librarians for periodic and ad hoc reporting, quality assurance, and data integrity checking, as well as by research policy makers for monitoring the development of science and evaluating or comparing disciplines, fields or research groups. It is argued that traditional relational database management systems, used mainly for day-to-day data storage and transactional processing, are not appropriate for performing such tasks on a regular basis. For the purpose, a fully functional OLAP solution has been implemented on Biomedicina Slovenica, a Slovenian national bibliographic database. We demonstrate the system's usefulness by extracting data for studying a selection of scientometric issues: changes in the number of published papers, citations and pure citations over time, their dependence on the number of co-operating authors and on the number of organisations the authors are affiliated to, and time-patterns of citations. Hardware, software and feasibility considerations are discussed and the phases of the process of developing bibliographic OLAP applications are outlined.
The need to understand the fabric of relationships that are building up on the World Wide Web calls for the application of tools that allow one to extract the underlying knowledge. Some of the most interesting relationships are those that are brought to light by co-linking analysis (the Web analogue of cocitation analysis). We here propose such an analysis based on the co-links that are generated within a closed web environment, using multivariate statistics (Principal Component Analysis, and Multidimensional Scaling) and a connection-based technique (Kohonen's Self-Organizing Maps). An application was made to a generic thematic environment, and the underlying relationships and structures were manifest in the interpretation of the results.
Brazil is considered to have an immature national innovation system. One significant situation that contributes to it is that Brazil concentrates its research efforts and inventiveness in academic environments, while the private sector has very little access to this activity. Measures are being taken to correct this situation. Nevertheless, scientists' attitudes towards the new situation will be instrumental for the success of such measures. For this reason, we have studied the behavior of Brazilian scientists from the biotechnological fields concerning Intellectual Property Rights. In this research 1032 researchers were electronically contacted and 150 responded. The 41 questions include indicators about the interviewees' perceptions about their institutions' support for patenting research results, their attitudes towards recent changes in Intellectual Property Rights legislation and about the interaction of researchers with demands from external interests.
A technique is presented for the identification of patterns from the links between large Web spaces and is applied to data concerning the interlinking of university Web sites in fifteen European countries. This is based upon a procedure for normalising the data so that it can be analysed using standard multivariate statistical techniques and is less susceptible to individual outliers than standard methods. The approach was successfully able to identify clusters of European countries based upon data for their universities' interlinking patterns. For example, the northern countries were differentiated from the southern with this method.
In this paper, the internal law of delay in the secondary literature publishing process is presented. The process is demonstrated to abide by the partial differential equation of periodical literature publishing process. A definite solution of the publishing delay process is derived. Accordingly, the expression of average publication delay indicator based on the particular solution is deduced. Then the problem is studied that some information of primary literatures is missed in information retrieval, and the relationship is established between the average delay indicator and the miss ratio of primary literatures in the index periodicals or databases. Also it is proposed that the primary literature should be used as a supplemental tool in information retrieval to guarantee the recall ratio.
Relative indicators are preferably used for the comparative evaluation of thematically different sets of journal papers. The Relative Publication Strategy and Relative Subfield Citedness (RPS/RW) function referring to a set of papers selected was found to be identical with the Mean Expected Citation Rate and Mean Observed Citation Rate (MECR/MOCR) function.
Passage of the Digital Millennium Copyright Act (DMCA) was a significant milestone in congressional information policy legislation. However, the results were widely criticized in some circles as providing too much power to certain stakeholder groups. This paper uses computer-based content analysis and a theoretical taxonomy of information policy values to analyze congressional hearing testimony. The results of document coding were then analyzed using a variety of statistical tools to map how different stakeholders framed issues in the debate and determine if congressional value statements about the legislation conformed more closely to certain stakeholders. Results of the analysis indicate that significant differences in the use of information policy terms occurred across stakeholders, and showed varying degrees of convergence between congressional or other stakeholders when framing information policy issues.
A common problem in the spatialization of information systems is the determination of geometry; i.e., dimensionality and metric. Such geometry is either chosen a priori or is inferred a posteriori from secondary data. Recent work emphasizes the use of geometric information latent in a system's navigational record. Resolving this information from its noisy background, however, requires an unambiguous criterion of selection. In this paper we use a previously published, statistical method for resolving a Web-based information system's geometry from navigational data. However, because of the method's (theoretical) sensitivity to data selection, a weighted frequency correction based on empirical probability distributions is applied. The effect of this correction on the Web-space geometry is investigated. Results indicate that the inferred geometry is robust; i.e., it does not significantly change under this probabilistic correction.
Modern Standard Arabic (MSA) is the official language used in all Arabic countries. In this paper we describe an investigation of the uniformity of MSA across different countries. Many studies have been carried out locally or regionally on Arabic and its dialects. Here we look on a more global scale by studying language variations between countries. The source material used in this investigation was derived from national newspapers available on the Web, which provided samples of common media usage in each country. This corpus has been used to investigate the lexical characteristics of Modern Standard Arabic as found in 10 different Arabic speaking countries. We describe our collection methods, the types of lexical analysis performed, and the results of our investigations. With respect to newspaper articles, MSA seems to be very uniform across all the countries included in the study, but we have detected various types of differences, with implications for computational processing of MSA.
The ever growing popularity of the Internet as a source of information, coupled with the accompanying growth in the number of documents made available through the World Wide Web, is leading to an increasing demand for more efficient and accurate information retrieval tools. Numerous techniques have been proposed and tried for improving the effectiveness of searching the World Wide Web for documents relevant to a given topic of interest. The specification of appropriate keywords and phrases by the user is crucial for the successful execution of a query as measured by the relevance of documents retrieved. Lack of users' knowledge on the search topic and their changing information needs often make it difficult for them to find suitable keywords or phrases for a query. This results in searches that fail to cover all likely aspects of the topic of interest. We describe a scheme that attempts to remedy this situation by automatically expanding the user query through the analysis of initially retrieved documents. Experimental results to demonstrate the effectiveness of the query expansion scheme are presented.
The problem of multiple interpretations of meaning in the indexing process has been mostly avoided by information scientists. Among the few who have addressed this question are Clare Beghtol and Jens Erik Mai. Their findings and findings of other researchers in the area of information science, social psychology, and psycholinguistics indicate that the source of the problem might lie in the background and culture of each indexer or cataloger. Are the catalogers aware of the problem? A general model of the indexing process was developed from observations and interviews of 12 catalogers in three American academic libraries. The model is illustrated with a hypothetical cataloger's process. The study with catalogers revealed that catalogers are aware of the author's, the user's, and their own meaning, but do not try to accommodate them all. On the other hand, they make every effort to build common ground with catalog users by studying documents related to the document being cataloged, and by considering catalog records and subject headings related to the subject identified in the document being cataloged. They try to build common ground with other catalogers by using cataloging tools and by inferring unstated rules of cataloging from examples in the catalogs.
This study employed the Perl program, Excel software, and some bibliometric techniques to investigate growth pattern, journal characteristics, and author productivity of the subject index ing literature from 1977 to 2000, based on the subject search of a descriptor field in the Library and Information Science Abstracts (LISA) database. The literature growth from 1977 to 2000 in subject indexing could be fitted well by the logistic curve. The Bradford plot of journal literature fits the typical Bradford-Zipf S-shaped curve. Twenty core journals making a significant contribution could be identified from the Bradford-Zipf distribution. Four major research topics in the area of subject indexing were identified as: (1) information organization, (2) information processing, (3) information storage and retrieval, and (4) information systems and services. It was also found that a vast majority of authors (76.7%) contributed only one article, which is a much larger percentage than the 60% of original Lotka's data. The 15 most productive authors and the key concepts of their research were identified.
This study, the fourth and last of a series designed to produce new information to improve retrievability of books in libraries, explores the effectiveness of retrieving a known-item book using words from titles only. From daily printouts of circulation records at the Walter Royal Davis Library of the University of North Carolina at Chapel Hill, 749 titles were taken and then searched on the 4-million entry catalog at the library of the University of Michigan. The principal finding was that searches produced titles having personal authors 81.4% of the time and anonymous titles 91.5% of the time; these figures are 15 and 5%, respectively, lower than the lowest findings presented in the previous three articles of this series.
In the current information age, people have to access various information. With the popularization of the Internet in all kinds of information fields and the development of communication technology, more and more information has to be processed in high speed. Data compression is one of the techniques in information data processing applications and spreading images. The objective of data compression is to reduce data rate for transmission and storage. Vector quantization (VQ) is a very powerful method for data compression. One of the key problems for the basic VQ method, i.e., full search algorithm, is that it is computationally intensive and is difficult for real time processing. Many fast encoding algorithms have been developed for this reason. In this paper, we present a reasonable half-L-2-norm pyramid data structure and a new method of searching and processing codewords to significantly speed up the searching process especially for high dimensional vectors and codebook with large size; reduce the actual requirement for memory, which is preferred in hardware implementation system, e.g., SOC (system-on-chip); and produce the same encoded image quality as full search algorithm. Simulation results show that the proposed method outperforms some existing related fast encoding algorithms.
This paper deals with the problem of modeling Web information resources using expert knowledge and personalized user information for improved Web searching capabilities. We propose a "Web information space" model, which is composed of Web-based information resources (HTML/XML [Hypertext Markup Language/Extensible Markup Language] documents on the Web), expert advice repositories (domain-expert-specified meta-data for information resources), and personalized information about users (captured as user profiles that indicate users' preferences about experts as well as users' knowledge about topics). Expert advice, the heart of the Web information space model, is specified using topics and relationships among topics (called metalinks), along the lines of the recently proposed topic maps. Topics and metalinks constitute metadata that describe the contents of the underlying HTML/XML Web resources. The metadata specification process is semiautomated, and it exploits XML DTDs (Document Type Definition) to allow domain-expert guided mapping of DTD elements to topics and metalinks. The expert advice is stored in an object-relational database management system (DBMS). To demonstrate the practicality and usability of the proposed Web information space model, we created a prototype expert advice repository of more than one million topics/metalinks for DBLP (Database and Logic Programming) Bibliography data set. We also present a query interface that provides sophisticated querying facilities for DBLP Bibliography resources using the expert advice repository.
Many authors have posited a social component in citation, the consensus being that the citers and citees often have interpersonal as well as intellectual ties. Evidence for this belief has been rather meager, however, in part because social networks researchers have lacked bibliometric data (e.g., pairwise citation counts from online databases), and citation analysts have lacked sociometric data (e.g., pairwise measures of acquaintanceship). In 1997 Nazer extensively measured personal relationships and communication behaviors in what we call "Globenet," an international group of 16 researchers from seven disciplines that was established in 1993 to study human development. Since Globenet's membership is known, it was possible during 2002 to obtain citation records* for all members in databases of the Institute for Scientific Information. This permitted examination of how members cited each other (intercited) in journal articles over the past three decades and in a 1999 book to which they all contributed. It was also possible to explore links between the intercitation data and the social and communication data. Using network-analytic techniques, we look at the growth of intercitation over time, the extent to which it follows disciplinary or interdisciplinary lines, whether it covaries with degrees of acquaintanceship, whether it,reflects Globenet's organizational structure, whether it is associated, with particular in-group communication patterns, and whether it is related to the cocitation of Globenet members. Results show cocitation to be a powerful predictor of intercitation in the journal articles, while being an editor or co-author is an important predictor in the book. Intellectual ties based on shared content did better as predictors than content-neutral social ties like friendship. However, interciters in Globenet communicated more than did noninterciters.
In May 1999, National Institutes of Health (NIH) Director Harold Varmus proposed an electronic repository for biomedical research literature server called "E-biomed." E-biomed reflected the visions of scholarly electronic publishing advocates: It would be fully searchable, be free to readers, and contain full-text versions of both preprint and postpublication biomedical research articles. However, within 4 months, the E-biomed proposal was radically transformed: The preprint section was eliminated, delays were instituted between article publication and posting to the archive, and the name was changed to "PubMed Central." This case study examines the remarkable transformation of the E-biomed proposal to PubMed Central by analyzing comments about the proposal that were posted to an online E-biomed forum created by the NIH, and discussions that took place in other face-to-face forums where E-biomed deliberations took place. We find that the transformation of the E-biomed proposal into PubMed Central was the result of highly visible and highly influential position statements made by scientific societies against the proposal. The literature about scholarly electronic publishing usually emphasizes a binary conflict between (trade) publishers and scholars/scientists. We conclude that: (1) scientific societies and the individual scientists they represent do not always have identical interests in regard to scientific e-publishing; (2) stakeholder politics and personal interests reign supreme in e-publishing debates, even in a supposedly status-free online forum; and (3) multiple communication forums must be considered in examinations of e-publishing deliberations.
We chronicle the use of acknowledgments in 20th century chemistry by analyzing and classifying over 2,000 specimens covering a 100-year period. Our results show that acknowledgment has gradually established itself as a constitutive element of academic writing-one that provides a revealing insight into the structural nature of subauthorship collaboration in science. Complementary data on rates of coauthorship are also presented to highlight the growing importance of teamwork and the increasing division of labor in contemporary chemistry. The results of this study are compared with the findings of a parallel study of collaboration in both the social sciences and the humanities.
MEDLINE(R) is a collection of more than 12 million references and abstracts covering recent life science literature. With its continued growth and cutting-edge terminology, spell-checking with a traditional lexicon based approach requires significant additional manual follow-up. In this work, an internal corpus based context quality rating a, frequency, and simple misspelling transformations are used to rank words from most likely to be misspellings to least likely. Eleven-point average precisions of 0.891 have been achieved within a class of 42,340 all alphabetic words having an a score less than 10. Our models predict that 16,274 or 38% of these words are misspellings. Based on test data, this result has a recall of 79% and a precision of 86%. In other words, spell checking can be done by statistics instead of with a dictionary. As an application we examine the time history of low a words in MEDLINE(R) titles and abstracts.
After several decades of heavy research activity on English stemmers, Arabic morphological analysis techniques have become a popular area of research. The Arabic language is one of the Semitic languages; it exhibits a very systematic but complex morphological structure based on root-pattern schemes. As a consequence, survey of such techniques proves to be more necessary. The aim of this paper is to summarize and organize the information available in the literature in an attempt to motivate researchers to look into these techniques and try to develop more advanced ones. This paper introduces, classifies, and surveys Arabic morphological analysis techniques. Furthermore, conclusions, open areas, and future directions are provided at the end.
This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCCs are organized in a tree: The root node of this hierarchy comprises all possible topics, and leaf nodes correspond to the most specialized topic areas defined. We describe a procedure that, given a resource identified by its LCSH, automatically places that resource in the LCC hierarchy. The procedure uses machine learning techniques and training data from a large library catalog to learn a model that maps from sets of LCSH to classifications from the LCC tree. We present empirical results for our technique showing its accuracy on an independent collection of 50,000 LCSH/LCC pairs.
This paper offers a new, nonlinear model of information-seeking behavior, which contrasts with earlier stage models of information behavior and represents a potential cornerstone for a shift toward a new perspective for understanding user information behavior. The model is based on the findings of a study on interdisciplinary information-seeking behavior. The study followed a naturalistic inquiry approach using interviews of 45 academics. The interview results were inductively analyzed and an alternative framework for understanding information-seeking behavior was developed. This model illustrates three core processes and three levels of contextual interaction, each composed of several individual activities and attributes. These interact dynamically through time in a nonlinear manner. The behavioral patterns are analogous to an artist's palette, in which activities remain available throughout the course of information-seeking. In viewing the processes in this way, neither start nor finish points are fixed, and each process may be repeated or lead to any other until either the query or context determine that information-seeking can end. The interactivity and shifts described by the model show information-seeking to be nonlinear, dynamic, holistic, and flowing. The paper offers four main implications of the model as it applies to existing theory and models, requirements for future research, and the development of information literacy curricula. Central to these implications is the creation of a new nonlinear perspective from which user information-seeking can be interpreted.
The Internet is increasingly being used as a source of reference information. Internet users need to be able to distinguish accurate information from inaccurate information. Toward this end, information professionals have published checklists for evaluating information. However, such checklists can be effective only if the proposed indicators of accuracy really do indicate accuracy. This study implements a technique for testing such indicators of accuracy and uses it to test indicators of accuracy for answers to ready reference questions. Many of the commonly proposed indicators of accuracy (e.g., that the Web site does not contain advertising) were not found to be correlated with accuracy. However, the link structure of the Internet can be used to identify Web sites that are more likely to contain accurate reference information.
A search tactic is a set of search moves that are temporally and semantically related. The current study examined the tactics of medical students searching a factual database in microbiology. The students answered problems and searched the database on three occasions over a 9-month period. Their search moves were analyzed in terms of the changes in search terms used from one cycle to the next, using two different analysis methods. Common patterns were found in the students' search tactics; the most common approach was the specification of a concept, followed by the addition of one or more concepts, gradually narrowing the retrieved set before it was displayed. It was also found that the search tactics changed over time as the students' domain knowledge changed. These results have important implications for designers in developing systems that will support users' preferred ways of formulating searches. In addition, the research methods used (the coding scheme and the two data analysis methods-zero-order state transition matrices and maximal repeating patterns [MRP] analysis) are discussed in terms of their validity in future studies of search tactics.
Information overload on the Web has created enormous challenges to customers selecting products for online purchases and to online businesses attempting to identify customers' preferences efficiently. Various recommender systems employing different data representations and recommendation methods are currently used to address these challenges. In this research, we developed a graph model that provides a generic data representation and can support different recommendation methods. To demonstrate its usefulness and flexibility, we developed three recommendation methods: direct retrieval, association mining, and high-degree association retrieval. We used a data set from an online bookstore as our research test-bed. Evaluation results showed that combining product content information and historical customer transaction information achieved more accurate predictions and relevant recommendations than using only collaborative information. However, comparisons among different methods showed that high-degree association retrieval did not perform significantly better than the association mining method or the direct retrieval method in our test-bed.
To determine the capability and resources of the Spanish R & D system to produce knowledge useful for the Biotechnology industries, an analysis of indicators derived from published work, scientific papers cited in US patents and inventions patented, has been carried out. The results show that the number of publications compares well with that of other European countries. The visibility of those publications seems evident as about two thirds of the authors studied have been cited in patents assigned to foreign enterprises, but very few of them have applied for patents. This is analysed in connection with the existing policies.
In this paper attempt has been made to unfold the intellectual base in ocean science and technology. The articles appeared in Science Citation Index (SCI) under Oceanography in the year 2000 were analyzed to decipher the scientist to scientist, organization to organization and country to country network structures. The causal linkages between the knowledge productivity function and the socio-economic imperatives of knowledge production units were studied.
In this paper we introduce two measures self-linked and self-linking that are the analogues of self-citing and self-cited rates for scientific journals. These rates are calculated for a sample of sites to assess their meaning and utility. Self-linked is the more meaningful measure for the sample sites. As a first step towards a better understanding of self-linking (linking within a site), a sample of pages from an academic site was characterized using the method of content analysis. Even though most of the links serve navigational or other technical purposes, the percentage of content-bearing links among the self-links is significant, and even the portion of research oriented links is non-negligible.
Using data sampled from top-level Web pages across five high-level domains and from sample pages within individual websites, the authors investigate the frequency distribution of outlinks in Web pages. The observed distributions were fitted to different theoretical distributions to determine the best-fitting model for representing outlink frequency across Web pages. Theoretical models tested include the modified power law (MPL), Mandelbrot (MDB), generalized Waring (GW), generalized inverse Gaussian-Poisson (GIGP), and generalized negative binomial (GNB) distributions. The GIGP and GNB provided good fits for data sets for top-level pages across the high level domains tested, with the GIGP performing slightly better. The lumpiness and bimodal nature of two of the observed outlink distributions from Web pages within a given website resulted in poor fits of the theoretical models. The GIGP was able to provide better fits to these data sets after the top components were truncated. The ability to effectively model Web page attributes, such as the distribution of the number of outlinks per page, paves the way for simulation models of Web page structural content, and makes it possible to estimate the number of outlinks that may be encountered within Web pages of a specific domain or within individual websites.
The present paper analyses the role of author self-citations aiming at finding basic regularities of self-citations within the process of documented scientific communication and thus laying the methodological groundwork for a possible critical view at self-citation patterns in empirical studies at any level of aggregation. The study consists of three parts; the first part of the study is concerned with the comparative analysis of the ageing of self-citations and of non-self citations, in the second part the possible interdependence between self-citations and foreign citations is analysed and in the third part the interrelation of the share of self-citations in all citations with other citation-based indicators is studied. The outcomes of this study are two-fold; first, the results characterise author self-citations - at least at the macro level - as an organic part of the citation process obeying rules that can be measured and described with the help of mathematical models. Second, these rules can be used in evaluative micro and meso analyses to identify significant deviations from the reference standards.
This study, based on the premise that references are a social product that reflects the social environment of a society, is an attempt to explore the co-existence of Korean and non-Korean literature in the references to Korean papers. 321 authors (papers) who published in 43 issues of 24 Korean journals focused on the social sciences were surveyed about their research channels and citation motivations, and the 11,358 references in the papers were analyzed. The findings were as follows : (1) The extent of the co-existence was that non-Korean literature was cited 1.9 times (65.3%) more often than Korean literature; (2) Research channel was the most common non-Korean channel orientation (55.8%); (3) The motivation for citations was significantly dependent on whether the literature cited was Korean or non-Korean. Non-Korean literature was chiefly cited for conceptual (20.7%), perfunctory (16.0%), and persuasive (15.1%) motivations; (4) The citations and citation motivations behind non-Korean literature were significantly influenced by research channel, discipline, focus of research, publishing career, and type of paper. Of these variables, research channel was frequently related to the citation of non-Korean literature. Finally, this study is very suggestive on two counts: (1) Citation motivation might constitute a new approach for exploring the production of knowledge by researchers. (2) This study has demonstrated, in particular, an empirical relationship between knowledge produced by Korean social scientists and non-Korean knowledge through the analysis of citation motivation.
Patterns of the foreign contributions published in six scientific journals on Earth Sciences published in different countries, have been studied as an approach for testing their level of internationalisation. Two of the multiple dimensions that determine the internationalisation of scientific journals are considered: the geographical distribution pattern of authors and the co-authorship linkages among them. The potential of the said journals to attract manuscripts by foreign authors and to promote international collaboration, through the publishing of co-authored papers involving or not scientists by its own country of publication, is investigated. Some other indicators on the degree of internationalisation of scientific journals, such as, language of publication, publishing institution, and national structure of editorial boards, are also considered. Finally, the geographic areas, the journal papers deal with, can be introduced as a new aspect of internationalisation. Three categories of journals clearly differentiated are identified and characterised: domestic, regional and international journals. The effect on publication and collaboration patterns, of geopolitical, cultural, economic and linguistic bonds among countries is discussed. The important role of domestic European journals on Earth Sciences is noted, as they are not only the main information source on the research carried out by local scientists whose study is focused on the geologic features of their country, but also, as an excellent vehicle of international diffusion for works by foreign scientists from developing countries. On the other hand, international collaborative articles in domestic journals constitute an indicator of the interest of the international community on the scientific studies in the publishing country.
The distribution of articles involving artificial neural networks (ANN) in the fields of medicine and biology and appearing in the ISI (Institute for Scientific Information) databases during the period 2000-2001 was analysed. The following parameters were considered: the number of articles, the total impact factor, the ISI journal category, the source country population, and the gross domestic product. Among the 803 articles and the 49 countries considered, the 5 most prolific (in term of the number of publications) were the USA, The United Kingdom, Germany, Italy, and Canada; other active countries included Sweden, Netherlands, Spain, France, Japan, and China. Comparison between the USA and the European Union, and the distribution of ANN publications among the subdisciplines of the life sciences and clinical medicine are also presented.
We propose an improved Data Envelopment Analysis (DEA) model to evaluate the efficiency of research groups in the area of information science in PR China. By taking the research groups as Decision Making Units (DMUs), the budget of the projects and size of the groups as inputs and the quantity and quality of publications produced by the groups as outputs of the model, the relative efficiencies of 21 research projects are evaluated. Then, we move to focus on the issues of knowledge management of the organizations that undertook these projects and attempt to explore the underlying reasons of high research efficiency. Through integrating the evaluation outcomes into research process, three indicators of knowledge management are identified for the best practice groups with highest research efficiency. The findings verify that the proposed model is valid and practical to assess research performances on the basis of bibliometric indicators.
Several major econometric studies have looked at mergers and acquisitions (M&As) across various industries and concluded that, in general, there is no synergy created or released by M&A activity. This investigation concentrates upon research and development (R&D) performance in the pharmaceutical industry to examine the impact of M&A activity on corporate productivity. Findings indicate that, when compared to those companies within the pharmaceutical industry that did not experience merger activity during comparable time periods, as well as to the industry as a whole, pharmaceutical companies that merged were able to achieve more favorable post-merger productivity scores than were attained prior to their merger.
We have created a system for music search and retrieval. A user sings a theme from the desired piece of music. The sung theme (query) is converted into a sequence of pitch-intervals and rhythms. This sequence is compared to musical themes (targets) stored in a database. The top pieces are returned to the user in order of similarity to the sung theme. We describe, in detail, two different approaches to measuring similarity between database themes and the sung query. In the first, queries are compared to database themes using standard string-alignment algorithms. Here, similarity between target and query is determined by edit cost. In the second approach, pieces in the database are represented as hidden Markov models (HMMs). In this approach, the query is treated as an observation sequence and a target is judged similar to the query if its HMM has a high likelihood of generating the query. In this article we report our approach to the construction of a target database of themes, encoding, and transcription of user queries, and the results of preliminary experimentation with a set of sung queries. Our experiments show that while no approach is clearly superior to the other system, string matching has a slight advantage. Moreover, neither approach surpasses human performance.
A research agenda for the study of digital reference is presented. The agenda stems from a research symposium held at Harvard University, Boston, Massachusetts in August 2002. The agenda defines digital reference as "the use of human intermediation to answer questions in a digital environment." The agenda also proposes the central research question in digital reference: "How can human expertise be effectively and efficiently incorporated into information systems to answer user questions?" The definition and question are used to outline a research agenda centered on how the exploration of digital reference relates to other fields of inquiry.
Academic and special libraries are in the midst of a shift toward hybrid collections. This shift from collection ownership to an information access model supports the distributed nature of learning and work. However, unanticipated consequences of these changes are emerging. One confounding result is a visible pattern of discontinuities in collections, with unique features for electronic products. Patterns of discontinuities encountered included the occurrence of intermittent holes and unintentionally masked information. This has both immediate and long-term implications for library users and services, and there are not yet coherent measures to assess these sorts of outcomes. A framework is required for the systematic evaluation of the effects of new systems such as bundled electronic resources. This research suggests that evaluating both use and non-use of electronic collections will supplement other acquisitions and service measures to support long-range planning and decision-making.
This study reports an analysis of referral URL data by the Cornell University IP address from the American Chemical Society servers. The goal of this work is to better understand the tools used and pathways taken when scientists connect to electronic journals. While various methods of referral were identified in this study, most individuals were referred infrequently and followed few and consistent pathways each time they connected. The relationship between the number and types of referrals followed an inverse-square law. Whereas the majority of referrals came from established finding tools (library catalog, library e-journal list, and bibliographic databases), a substantial number of referrals originated from generic Web searches. Scientists are also relying on local alternatives or substitutes such as departmental or personal Web pages with lists of linked publications. use of electronic mail as a method to refer scientists directly to online articles may be greatly underestimated. Implications for the development of redundant library services such as e-journal lists and the practice of publishers to allow linking from other resources are discussed.
this paper, it is suggested that a, number of theoretical and practical perspectives, on information literacy can be obtained through the examination of tenets of cognitive psychology. One aspect of cognitive psychology - information processing theory - is applied to the development of a two-stage model of the information retrieval process. This model of information retrieval has utility along two dimensions: firstly, in the conceptualization of the information retrieval process; and secondly, in the development of teaching strategies informed by such a model. The efficacy of this model was tested in a large two-phase experimental study at the University of Canberra, Australia. Statistically significant results support the effectiveness of the concept based teaching of information retrieval and the utility of the model as an explanation of the cognitive-underpinnings of information retrieval.
In this report, we investigate how retrieving information can be improved through task-related indexing of documents based on ontologies. Different index types, varying from content-based keywords to structured task-based indexing ontologies, are compared in an experiment that simulates the task of creating instructional material from a database of source material. To be able to judge the added value of task- and ontology-related indexes, traditional information retrieval performance measures are extended with new measures reflecting the quality of the material produced with the retrieved information. The results of the experiment show that a structured task-based indexing ontology improves the quality of the product created from retrieved material only to some extent, but that it certainly improves the efficiency and effectiveness of search and retrieval and precision of use.
This report analyzes the methodologies used in establishing interoperability among knowledge organization systems (KOS) such as controlled vocabularies and classification schemes that present the organized interpretation of knowledge structures. The development and trends of KOS are discussed with reference to the online era and the Internet era. Selected current projects and activities addressing KOS interoperability issues are reviewed in terms of the languages and structures involved. The methodological analysis encompasses both conventional and new methods that have proven to be widely accepted, including derivation/modeling, translation/adaptation, satellite and leaf node linking, direct mapping, co-occurrence mapping, switching, linking through a temporary union list, and linking through a thesaurus server protocol. Methods used in link storage and management, as well as common issues regarding mapping and methodological options, are also presented. It is concluded that interoperability of KOS is an unavoidable issue and process in today's networked environment. There have been and will be many multi-lingual products and services, with many involving various structured systems. Results from recent efforts are encouraging.
Hypothesis generation, a crucial initial step for making scientific discoveries, relies on prior knowledge, experience, and intuition. Chance connections made between seemingly distinct subareas sometimes turn out to be fruitful. The goal in text mining is to assist in this process by automatically discovering a small set of interesting hypotheses from a suitable text collection. In this report, we present open and closed text mining algorithms that are built within the discovery framework established by Swanson and Smalheiser. Our algorithms represent topics using metadata profiles. When applied to MEDLINE, these are MeSH based profiles. We present experiments that demonstrate the effectiveness of our algorithms. Specifically, our algorithms successfully generate ranked term lists where the key terms representing novel relationships between topics are ranked high.
Textbooks are more available in electronic format now than in the past. Because textbooks are typically large, the end user needs effective tools to rapidly access information encapsulated in textbooks stored in digital libraries. Statistical similarity-based links among hyper-textbooks are a means to provide those tools. In this paper, the design and the implementation of a tool that generates networks of links within and across hyper-textbooks through a completely automatic and unsupervised procedure is described. The design is based on statistical techniques. The overall methodology is presented together with the results of a case study reached through a working prototype that shows that connecting hyper-textbooks is an efficient way to provide an effective retrieval capability.
The Garfield (impact) Factor (GF) is one of the most frequently used scientometric indicators. In the present article it is shown that the main factors determining the value of the mean GF representing a set of journals are the number of articles published recently (articles referencing) related to those published in a previous time period (articles to be referenced) and the mean number of references in journal papers referring to the time period selected. It has been proved further that GF corresponds to the mean chance for citedness of journal papers. A new indicator, Specific Impact Contribution (SIC), is introduced, which characterizes the contribution of a subset of articles or a journal to the total impact of the respective articles or journals. The SIC index relates the share of a journal in citations divided by that in publications within a set of papers or journals appropriately selected. It is shown, however, that the normalized GFs of journals and the normalized SIC indicators are identical measures within any set of journals selected. It may be stated therefore that Garfield Factors of journals (calculated correctly) are appropriate scientometric measures for characterizing the relative international eminence of journals within a set of journals appropriately selected. It is demonstrated further that SIC indicators (and so GF indexes) correspond to the (number of citations per paper) indicators generally used, within the same set of papers.
Scientific literature is often fragmented, which implies that certain scientific questions can only be answered by combining information from various articles. In this paper, a new algorithm is proposed for finding associations between related concepts present in literature. To this end, concepts are mapped to a multidimensional space by a Hebbian type of learning algorithm using co-occurrence data as input. The resulting concept space allows exploration of the neighborhood of a concept and finding potentially novel relationships between concepts. The obtained information retrieval system is useful for finding literature supporting hypotheses and for discovering previously unknown relationships between concepts. Tests on artificial data show the potential of the proposed methodology. In addition, preliminary tests on a set of Medline abstracts yield promising results.
This study explores the usefulness of bibliometric analyses to detect trends in the research profile of a therapeutic drug, for which Aspirin was selected. A total of 22,144 documents dealing with Aspirin and published in journals covered by MEDLINE during the years 19652001 are studied. The research profile of Aspirin over the 37-year period is analyzed through Aspirin subheadings and MeSH indexing terms. Half of the documents had Aspirin as a major indexing term, being the main aspects studied therapeutic uses (28% of the documents), pharmacodynamics (26%), adverse effects (18%), and administration and dosage (10%). A frequency data table crossing indexing terms x years is examined by correspondence analysis to obtain time trends, which are shown graphically in a map. Four time periods with a different distribution of indexing terms are identified through cluster analysis. The indexing term profile of every period is obtained by comparison of the distribution of indexing terms of each cluster with that of the whole period by means of the Chi-2 test. The research profile of the drug tends to change faster with time. The most relevant finding is the expanding therapeutic profile of Aspirin over the period. The main advantages and limitations of the methodology are pointed out.
At the same time that the Internet is becoming more accessible to large numbers of people and information consumers are becoming information producers, traditional methods of organizing, describing, and providing access to "documents" are being overwhelmed by the ever-increasing number of digitized materials. Another parallel occurrence is the disappearance of cultural and indigenous knowledge as environments and peoples cease to exist. Therefore, the knowledge and ability to build and describe collections needs to be spread among a larger distributed group of participants. Three mechanisms are needed to facilitate this "unlocking" of collections and their management: the distributed description and annotation of documents, distributed collection building, and distributed knowledge creation.
The paper analyses the asymmetric volatility of patents related to pollution prevention and abatement (hereafter, anti-pollution) technologies registered in the USA. Ecological and pollution prevention technology patents have increased steadily over time, with the 1990's having been a period of intensive patenting of technologies related to the environment. The time-varying nature of the volatility of anti-pollution technology patents registered in the USA is examined using monthly data from the US Patent and Trademark Office for the period January 1975 to December 1999. Alternative symmetric and asymmetric volatility models, such as GARCH, GJR and EGARCH, are estimated and tested against each other using full sample and rolling windows estimation.
Knowledge diffusion is the adaptation of knowledge in a broad range of scientific and engineering research and development. Tracing knowledge diffusion between science and technology is a challenging issue due to the complexity of identifying emerging patterns in a diverse range of possible processes. In this article, we describe an approach that combines complex network theory, network visualization, and patent citation analysis in order to improve the means for the study of knowledge diffusion. In particular, we analyze patent citations in the field of tissue engineering. We emphasize that this is the beginning of a longer-term endeavor that aims to develop and deploy effective, progressive, and explanatory visualization techniques for us to capture the dynamics of the evolution of patent citation networks. The work has practical implications on resource allocation, strategic planning, and science policy.
Citation distributions are extremely skewed. This paper addresses the following question: To what extent are national citation indicators influenced by a small minority of highly cited articles? This question has not been studied before at the level of national indicators. Using the scientific production of Norway as a case, we find that the average citation rates in major subfields are highly determined by one or only a few highly cited papers. Furthermore, there are large annual variations in the influence of highly cited papers on the average citation rate of the subfields. We conclude that an analysis of the underlying data for national indicators may be useful in creating awareness towards the occurrence of particular articles with great influence on what is normally considered an indicator of "national performance", and that the common interpretation of the indicator on research policy level needs to be informed by this fact.
In an old paper [M.K. Buckland. Are obsolescence and scattering related? Journal of Documentation 28 (3) (1972) 242-246] Buckland poses the question if certain types of obsolescence of scientific literature (in terms of age of citations) implies certain types of journal scattering (in terms of cited journals). This problem is reformulated in terms of one- and two-dimensional obsolescence and linked with one- and two-dimensional growth, the latter being studied by Naranan. Naranan shows that two-dimensional exponential growth (i.e. of the journals and of the articles in journals) implies Lotka's law, a law belonging to two-dimensional informetrics and describing scattering of literature in a concise way. In this way we obtain that exponential aging of journal citations and of article citations imply Lotka's law and a relation is given between the exponent U, in Lotka's law and the aging rates of the two obsolescence processes studied.
In this article we present a precise definition of the notion "own-group preference" and characterize all functions capable of correctly measuring it. Examples of such functions are provided. The weighted Lorenz curve and the theory developed for it will be our main tools for reaching this goal. We further correct our earlier articles on this subject. In the context of own-language preference, Bookstein and Yitzhaki proposed the logarithm of the odds-ratio as an acceptable measure of own-group preference. We now present a general framework within which the concept of own-group preference, and its opposite, namely own-group aversion, can be precisely pinpointed. This framework is derived form inequality theory and is based on the use of the weighted Lorenz curve. The concept of own-group preference is an interesting notion with applications in different fields such as sociology, political sciences, economics, management science and of course, the information sciences. Some examples are provided.
In this paper, we describe the development of a methodology and an instrument to support a major research funding allocation decision by the Flemish government. Over the last decade, and in parallel with the decentralization and the devolution of the Belgian federal policy authority towards the various regions and communities in the country, science and technology policy have become a major component of regional policy making. In the Flemish region, there has been an increasing focus on basing the funding allocation decisions that originate from this policy decentralization on "objective, quantifiable and repeatable" decision parameters. One of the data sources and indicator bases that have received ample attention in this evolution is the use of bibliometric data and indicators. This has now led to the creation of a dedicated research and policy support staff, called "Steunpunt O&O Statistieken," and the first time application of bibliometric data and methods to support a major inter-university funding allocation decision. In this paper, we analyze this evolution. We show how bibliometric data have for the first time been used to allocate 93 million Euro of public research money between 6 Flemish universities for the fiscal year 2003, based on Web-of-Science SCI data provided to "Steunpunt O&O Statistieken" via a license agreement with Tbomson-ISI. We also discuss the limitations of the current approach that was based on inter-university publication and citation counts. We provide insights into future adaptations that might make it more representative of the total research activity at the universities involved (e.g., by including data for the humanities) and of its visibility (e.g., by including impact measures). Finally, based on our current experience and interactions with the universities involved, we speculate on the future of the specific bibliometric approach that has now been adopted. More specifically, we hypothesize that the allocation method now developed and under further improvement will become more criticized if it turns out that it (1) also starts influencing intra-university research allocation decisions and, as a consequence (2) introduces adverse publication and citation behaviors at the universities involved.
In a recent paper the authors have studied the role of author self-citations within the process of documented scientific communication. Two important regularities such as the relative fast ageing of self-citations with respect to foreign citations and the "square-root law" characterising the conditional expectation of self-citations for given number of foreign citation have been found studying the phenomenon of author self-citations at the macro level. The goal of the present paper is to study the effect of author self-citations on macro indicators. The analysis of citation based indicators for 15 fields in the sciences, social sciences and humanities substantiates that at this level of aggregation there is no need for any revision of national indicators and the underlying journal citation measures in the context of excluding self-citations.
This paper investigates two bibliometric problems: the listing of books in a specialist area (ornithology) and the determination of the citation pattern to individual authors, who often re-issue their books in later editions. James Bond, a Philadelphia ornithologist, who specialised in the birds of the West Indies, is used as an example of a naturalist whose long career led to many journal articles and enduring scientific fame through a well-known book. He also attained some unexpected notoriety through the use of his name by a popular novelist. Methods for the evaluation of his book and associated bird checklists in comparison with other similar works are presented on the basis of their citations.
Processes and technology of reciprocating internal combustion engines (ICE) constitute a research field whose characteristics regarding information production and diffusion are determined by multidisciplinarity, the existence of pseudo-technical literature and the influence of confidentiality on the presentation of research outputs. The objective of this study is to provide a quantitative and objective basis for the evaluation of research in this field. This has been accomplished by identifying the most productive journals and the most cited sources, using the SCI and citation analysis. From this analysis, core journals have been identified, showing that their importance in this research area does not correlate with their impact factor. Moreover, conference proceedings (particularly those published by the Society of Automotive Engineers) are shown to be the most important information source in this field.
We shall generalize the concept of our previous paper (SHIRABE & TOMIZAWA, 2002), which proposed an index for international scientific co-authorship. Based on a simple model of domestic and international co-authorships, we focused on likelihood of overseas access to co-authorships in the paper. Here, in consideration of bidirectionality of international co-authorship, we shall extend our previous index to two symmetrical indices. The indices can draw a reasonably clear picture of international co-authorship, with regard to difference in patterns of international co-authorship among countries.
Links analysis proved to be very fruitful on the Web. Google's very successful ranking algorithm is based on link analysis. There are only a few studies that analyzed links qualitatively, most studies are quantitative. Our purpose was to characterize these links in order to gain a better understanding why links are created. We limited the study to the academic environment, and as a specific case we chose to characterize the interlinkage between the eight Israeli universities.
The paper is a bibliometric study of the publication and citation patterns and impact of South African research 1981-2000 in five selected research fields: Animal Plant sciences; Chemistry; Biochemistry; Microbiology & molecular biology, including genetics; and Physics, excluding Space science. Data are collected from Science Citation Index via the ISI product National Science Indicators. With the exception of Microbiology & molecular biology and Physics the results demonstrate a decrease of SA publications from 1986-1990. The SA world share declines for all five fields. First from the period 1994-1998 the Animal & plant sciences and Microbiology & molecular biology turn the decline into an increase. Absolute citation impact is increasing for all the fields from 1989-1993, except for Chemistry. One reason for the increase is a lower publication output. General & internal medicine, as an supplementary volume-heavy field observed, declines in citations until that same period from which it becomes stable, also in impact, but with a marked decrease in cited paper proportion. In citation world shares the five fields combined show positive signs also since 1989-1993, after which period the international eco-political embargo of SA was lifted. However, Biochemistry and Chemistry continue to decline during the 1990s. Citation impact relative to the world shows a similar pattern, but stagnation appears towards the end of the 1990s in all the observed fields combined. The trends are quite similar to those of Mexico and New Zealand. It is thus highly uncertain if a general citation embargo of SA occurred; yet, in some fields like the Animal & plant sciences, Veterinary science, Chemistry, and General & internal medicine there are signs that a mild citation embargo might have occurred. However, the economic embargo, combined with a significant brain drain, may have had an effect on the publication productivity, after it was lifted. For all indicators Chemistry is undergoing a marked decline during the last decade. This is in line with the negative trends for General & internal medicine, whereas some other medical specialities, biology, economics and other social sciences, the engineering fields and materials sciences keep stable or increase their production. SA is in line with the Mexican development but below that of New Zealand, seemingly losing ground to the developed countries.
This paper first describes the recent development that scientists and engineers of many disciplines, countries, and institutions increasingly engage in nanoscale research at breathtaking speed. By co-author analysis of over 600 papers published in "nano journals" in 2002 and 2003, I investigate if this apparent concurrence is accompanied by new forms and degrees of multi- and interdisciplinarity as well as of institutional and geographic research collaboration. Based on a new visualization method, patterns of research collaboration are analyzed and compared with those of classical disciplinary research. I argue that current nanoscale research reveals no particular patterns and degrees of interdisciplinarity and that its apparent multidisciplinarity consists of different largely mono-disciplinary fields which are rather unrelated to each other and which hardly share more than the prefix "nano".
A 'Sleeping Beauty in Science' is a publication that goes unnoticed ('sleeps') for a long time and then, almost suddenly, attracts a lot of attention ('is awakened by a prince'). We here report the -to our knowledge- first extensive measurement of the occurrence of Sleeping Beauties in the science literature. We derived from the measurements an 'awakening' probability function and identified the 'most extreme Sleeping Beauty so far'.
This paper proposes a new unified probabilistic model. Two previous models, Robertson et al.'s "Model 0" and "Model 3," each have strengths and weaknesses. The strength of Model 0 not found in Model 3, is that it does not require relevance data about the particular document or query, and, related to that, its probability estimates are straightforward. The strength of Model 3 not found in Model 0 is that it can utilize feedback information about the particular document and query in question. In this paper we introduce a new unified probabilistic model that combines these strengths: the expression of its probabilities is straightforward, it does not require that data must be available for the particular document or query in question, but it can utilize such specific data if it is available. The model is one way to resolve the difficulty of combining two marginal views in probabilistic retrieval.
As retrieval set size in information retrieval (IR) becomes larger, users may need greater interactive opportunities to determine for themselves potential relevance of the resources offered by a given collection. A parts-of-document approach, coupled with an interactive graphic interface and control panel, permits end users to tailor the information seeking (IS) session. Applying the model described by the author in a previous paper in this journal, this paper explores two issues: whether a group of information seekers in the same research domain will want to use this type of IR interaction, and whether such interaction is more successful than relevancy ranked lists, based on the general vector model. In addition, the paper proposes the use of gradient space as a means of capturing end users' cognitive states - decision-making points-during a parts-of-document-based IR session. It concludes that, for a group of biomedical researchers, a parts-of-document approach is preferred for certain IR situations and that gradient space provides designers of systems with empirical evidence suited for systems analysis.
This article proposes a method to design cataloging rules by utilizing conceptual modeling of the cataloging process and also by applying the concept "orientedness." It also proposes a general model for the cataloging process at the conceptual level, which is independent of any situation/system or cataloging code. A design method is made up of the following phases, including the development of a general model. Functional and non-functional requirements are first specified by use of orientedness. Also, cataloger tasks are defined, which are constituents of the cataloging process. Second, a core model is built, which consists of (1) basic event patterns under each task, (2) action patterns applicable to each event, and (3) orientedness involved in an event-action pair. Third, the core model is propagated to reflect the characteristics of an individual data element and also a certain class of materials. Finally, the propagated model is defined by choosing pairs of event and action patterns in the model while referring to orientedness indicated in each event-action pair, in order to match a particular situation. As a result, a set of event-action pairs reflecting specific requirements through categories of orientedness is obtained, and consistent and scalable design can, therefore, be attained.
The term author co-citation is defined and classified according to four distinct forms: the pure first-author co-citation, the pure author co-citation, the general author co-citation, and the special co-author/co-citation. Each form can be used to obtain one count in an author co-citation study, based on a binary counting rule, which either recognizes the co-citedness of two authors in a given reference list (1) or does not (0). Most studies using author co-citations have relied solely on first-author co-citation counts as evidence of an author's oeuvre or body of work contributed to a research field. In this article, we argue that an author's contribution to a selected field of study should not be limited, but should be based on his/her complete list of publications, regardless of author ranking. We discuss the implications associated with using each co-citation form and show where simple first-author co-citations fit within our classification scheme. Examples are given to substantiate each author co-citation form defined in our classification, including a set of sample Dialog(TM) searches using references extracted from the SciSearch database.
Most common effectiveness measures for information retrieval systems are based on the assumptions of binary relevance (either a document is relevant to a given query or it is not) and binary retrieval (either a document is retrieved or it is not). In this article, these assumptions are questioned, and a new measure named ADM (average distance measure) is proposed, discussed from a conceptual point of view, and experimentally validated on Text Retrieval Conference (TREC) data. Both conceptual analysis and experimental evidence demonstrate ADM's adequacy in measuring the effectiveness of information retrieval systems. Some potential problems about precision and recall are also highlighted and discussed.
Intellectual and technological talents and skills are the driving force for scientific and industrial development, especially in our times characterized by a knowledge-based economy. Major events in society and related political decisions, however, can have a long-term effect on a country's scientific well-being. Although the Cultural Revolution took place from 1966 to 1976, its aftermath can still be felt. This is shown by this study of the production and productivity of Chinese scientists as a function of their age. Based on the 1995-2000 data from the Chinese Science Citation database (CSCD), this article investigates the year-by-year age distribution of scientific and technological personnel publishing in China. It is shown that the "Talent Fault" originating during the Cultural Revolution still exists, and that a new gap resulting from recent brain drain might be developing. The purpose of this work is to provide necessary information about the current situation and especially the existing problems of the S&T workforce in China.
The INitiative for the Evaluation of XML retrieval, (INEX) aims at providing an infrastructure to evaluate the effectiveness of content-oriented XML retrieval systems. To this end, in the first round of INEX in 2002, a test collection of real world XML documents along with a set of topics and respective relevance assessments have been created with the collaboration of 36 participating organizations. In this article, we provide an overview of the first round of the INEX initiative.
The twentieth century saw the progressive collectivization of science-dramatic growth in teamwork in general and large-scale collaboration in particular. Cognitive partnering in the conduct of research and scholarship has become commonplace, and this trend is reflected in rates of co-authorship and sub-authorship collaboration. The effects of these developments on academic writing are discussed and theorized in terms of distributed cognition.
This article investigates a new, effective browsing approach called metabrowsing. It is an alternative for current information retrieval systems, which still face six prominent difficulties. We identify and classify the difficulties and show that the metabrowsing approach alleviates the difficulties associated with query formulation and missing domain knowledge. Metabrowsing is a high-level way of browsing through information: instead of browsing through document contents or document surrogates, the user browses through a graphical representation of the documents and their relations to the domain. The approach requires other cognitive skills from the user than what is currently required. Yet, a user evaluation in which the metabrowsing system was compared with an ordinary query-oriented system showed only some small indicatory differences in effectiveness, efficiency, and user satisfaction. We expect that more experience with metabrowsing will result in a significantly better performance difference. Hence, our conclusion is that the development of new cognitive skills requires some time before the technologies are ready to be used.
Text Categorization is the process of assigning documents to a set of previously fixed categories. A lot of research is going on with the goal of automating this time-consuming task. Several different algorithms have been applied, and Support Vector Machines (SVM) have shown very good results. In this report, we try to prove that a previous filtering of the words used by SVM in the classification can improve the overall performance. This hypothesis is systematically tested with three different measures of word relevance, on two different corpus (one of them considered in three different splits), and with both local and global vocabularies. The results show that filtering significantly improves the recall of the method, and that also has the effect of significantly improving the overall performance.
This article describes a collaboratively engineered general-purpose knowledge management (KM) ontology that can be used by practitioners, researchers, and educators. The ontology is formally characterized in terms of nearly one hundred definitions and axioms that evolved from a Delphi-like process involving a diverse panel of over 30 KM practitioners and researchers. The ontology identifies and relates knowledge manipulation activities that an entity (e.g., an organization) can perform to operate on knowledge resources. It introduces a taxonomy for these resources, which indicates classes of knowledge that may be stored, embedded, and/or represented in an entity. It recognizes factors that influence the conduct of KM both within and across KM episodes. The Delphi panelists judge the ontology favorably overall: its ability to unify KM concepts, its comprehensiveness, and utility. Moreover, various implications of the ontology for the KM field are examined as indicators of its utility for practitioners, educators, and researchers.
The concepts of Shannon information and entropy have been applied to a number of information retrieval tasks such as to formalize the probabilistic model, to design practical retrieval systems, to cluster documents, and to model texture in image retrieval. In this report, the concept of entropy is used for a different purpose. It is shown that any positive Retrieval Status Value (RSV)-based retrieval system may be conceived as a special probability space in which the amount of the associated Shannon information is being reduced; in this view, the retrieval system is referred to as Uncertainty Decreasing Operation (UDO). The concept of UDO is then proposed as a theoretical background for term and query discrimination power, and it is applied to the computation of term and query discrimination values in the vector space retrieval model. Experimental evidence is given as regards such computation; the results obtained compare well to those obtained using vector-based calculation of term discrimination values. The UDO-based computation, however, presents advantages over the vector-based calculation: It is faster, easier to assess and handle in practice, and its application is not restricted to the vector space model. Based on the ADI test collection, it is shown that the UDO-based Term Discrimination Value (TDV) weighting scheme yields better retrieval effectiveness than using the vector-based TDV weighting scheme. Also, experimental evidence is given to the intuition that the choice of an appropriate weighting scheme and similarity measure depends on collection properties, and thus the UDO approach may be used as a theoretical basis for this intuition.
Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR task-discovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is well known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs on GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations on the design of fitness functions for genetic-based information retrieval experiments.
Collection sizes, query rates, and the number of users of Web search engines are increasing. Therefore, there is continued demand for innovation in providing search services that meet user information needs. In this article, we propose new techniques to add additional terms to documents with the goal of providing more accurate searches. Our techniques are based on query association, where queries are stored with documents that are highly similar statistically. We show that adding query associations to documents improves the accuracy of Web topic finding searches by up to 7%, and provides an excellent complement to existing supplement techniques for site finding. We conclude that using document surrogates derived from query association is a valuable new technique for accurate Web searching.
In order to investigate the nature of Merton's contribution to the sociology of science, I examine how his work has been cited by groups of authors who are highly co-cited with Merton. The groups differ substantially both in terms of which of Merton's publications they cite,and how they cite them. This implies that subsequent scholars have found Merton's sociology of science work valuable for many different reasons. This pattern is probably true for Merton's sociological oeuvre as a whole, and suggests that scholarly preeminence in the social sciences consists of making contributions that many different groups of scholars judge to be useful in justifying the importance of their own research.
In a series of seminal studies Robert K. Merton created a coherent theoretical view of the social system of science that includes the salient features of the formal publication system, thereby providing a theoretical basis for scientometrics and citationology. A fundamental precept of this system is the view of citations as symbolic payment of intellectual debts. When this concept is merged with a complementary theory of the conceptual symbolism of citations, the possibility for a rapprochement of the normative and constructivist theories is achieved, where the dual function of citations as vehicles of peer recognition and constructed symbols for specific original achievements in science is realized. This new synthesis is embodied in a citation classification system,the citation cube, with dimensions of normative compliance, symbolic consensus, and disinterestedness (self-citation).
This essay examines Robert K. Merton's perspective on how priority relates to the provision of the public good knowledge. Economists have long been interested in the provision of the class of goods that are referred to as "public." By definition, public goods are not used up when consumed and are goods from which it is difficult to exclude potential users. The provision of public goods presents special challenges to the market that do not exist in the provision of private goods. Scientific research has properties of a public good. Merton recognized the public nature of science. In this he was not alone. The genius of Merton is that he not only recognized that science has properties of a public good but stood the public-private distinction on its head, proposing that the reward structure of science, based on priority, functioned to make a public good private. In economic terms,Merton recognized that it is the public nature of knowledge that facilitates establishing the idea as the private property of the scientist.
A citation identity is a list of an author's citees ranked by how frequently that author has cited them in publications covered by the Institute for Scientific Information. The same Dialog software that creates identities can simultaneously show the overall citation counts of citees,which indicate their reputations. Using identities for 28 authors in several disciplines of science and scholarship, I show that the reputational counts of their citees always have an approximately log-normal distribution:citations to very famous names are roughly balanced by citations to obscure ones, and most citations go to authors of middling reputation. These results undercut claims by constructivists that the main function of citation is to marshal "big-name" support for arguments at the expense of crediting lesser-known figures. The results are better explained by Robert K. Merton's norm of universalism, which holds that citers are rewarding use of relevant intellectual property, than by the constructivists' particularism, which holds that citers are trying to persuade through manipulative rhetoric. A universalistic citation pattern appears even in Alan Sokal's famous hoax article, where some of his citing was deliberately particularistic. In fact, Sokal's basic adherence to universalism probably helped his hoax succeed, which suggests the strength of the Mertonian norm. In specimen cases, the constructivists themselves are shown as conforming to it.
A bibliometric analysis was performed to assess the quantitative trend of Patent Ductus Arteriosus (PDA) treatment research, including intravenous injection of indomethacin and surgery. The documents studied were retrieved from the Science Citation Index (SCI) for the period from 1991 to 2002. The publication pattern concerning authorship, collaboration, original countries, citation frequency, document type, language of publication, distribution of journals, page count and the most frequently cited papers were performed. The results indicated that either treatment was not the recent emphasis of PDA research. The publishing countries of both treatments have also denoted that these researches were mostly done in Europe and North America. Both surgery and drug treatments had few international collaboration papers. English was the dominant language, and collaboration of two to six authors was the most popular level of co-authorship.
The steady state solution of differential equations of periodical publication process is deduced, and based on this, the indicator of periodical publication delay, which reflects the degree of information ageing in editorial board of a periodical, is established. The indicator is proved to be the sum of two items: the pure publication delay, which reflects the editing rapidity of a periodical, and the ratio of deposited contribution quantity to the publishing quantity in one year, which reflects the waiting period of adopted papers deposited in editorial board. As a demonstration, the delay indicators of seven periodicals are calculated. Finally, the application of this indicator is discussed.
The publication and citation patterns of the Mexican community in elementary particle physics (MEPP) were determined by bibliometric analysis of the scientific production and citations registered in the SPIRES-HEP system from 1971 to 2000. All papers, both citing and cited, were classified as theoretical, phenomenological or experimental according to the type of study carried out and citing papers as local (Mexican) or foreign. The growth dynamics of the citation patterns over the thirty-year period was also studied. Results show that the Mexican scientific community in EPP follow the pre-publication and pre-citation communication patterns typical of a Big Science field.
A new method of classification of biomedical research journals by research level (RL) into clinical or basic, or somewhere in between, is described that updates the system developed by CHI Research Inc. nearly 30 years ago. It is based on counting articles that have one of about 100 "clinical" title words, or one of a similar number of "basic" title words, or both. It allows over 3000 journals in the Science Citation Index (or other databases) to be classified rapidly and transparently, for changes in their research level with time, and for many individual papers in "mixed" journals to be categorised as clinical or basic.
The aggregated journal-journal citation matrix of the Journal Citation Report 2001 of the Social Science Citation Index is analyzed as a single domain in terms of both its eigenvectors and the bi-connected components contained in it. The traditional disciplines (e.g., economics, psychology, or political science) can be retrieved using both methods. These main disciplines do interact marginally. The space between them is occupied by a large number of small clusters of journals indicating specialties that gravitate among the major disciplines. These specialties operate in a mode different from that of the disciplines. For example, the impact factors are low on average and the developments remain volatile. Factor analysis enables us to study how the smaller bi-connected components are related to the larger ones. Factor analysis also highlights methodological differences among groups which may be theoretically connected in a single bicomponent.
We have developed a formula that assigns relative values to each author of the list of authors in any publication according to the authors' relative positions. The formula satisfies several criteria of theoretical and practical significance. We tested the formula's validity and usefulness with bibliographical references from the INSPEC database, mainly from the physical sciences. Enforced alphabetical sorting, different names of single authors and other statistical disturbances are accounted for. Our results demonstrate that our formula, or any other that satisfies several objective and quantitative criteria, can and often should be used as an additional criterion in the processes of evaluating relative scientific productivity, detecting experts in a given discipline, etc.
This article introduces a new modified method for calculating the impact factor of journals based on the current ISI practice in generating journal impact factor values. The impact factor value for a journal calculated by the proposed method, the so-called Cited Half-Life Impact Factor (CHAL) method, which is based on the ratio of the number of current year citations of articles from the previous X years to that of articles published in the previous X years, the X value being equal to the value of the cited half-life of the journal in the current year. Thirty-four journals in the Polymer Science Category from the ISI Subject Heading Categories were selected and examined. Total citations, impact factors and cited half-life of the 34 journals during the last five years (19972001) were retrieved from the ISI Journal Citation Reports and were used as the data source for the calculations in this work, the impact factor values from ISI and CHAL methods then being compared. The positions of the journals ranked by impact factors obtained from the ISI method were different from those from the CHAL method. It was concluded that the CHAL method was more suitable for calculating the impact factor of the journals than the existing ISI method.
The academic level and scientific reputation is the most important merit of a research university. Publication of the scientific achievement in the world leading scientific journals is the key to asses a university's overall performance. Peking University is a leading university among the Chinese research universities, and the number of papers published in Science Citation Index (SCI) indexed journals has been on the top of the national list. In this paper, based on our long-term experience and practice in scientific management, we use scientometrics and informetrics method to analyze the academic performance of the researchers, departments and schools of Peking University, mainly using the citations of publications. Highly cited papers are specially important to the reputation of our university. We compare those data with some selected world well-known universities, hence, some important information can be deduced for the policy decision of the university. The results presented here is not only an academic survey, but also a guideline for the future strategic development of Peking University.
We explore perceived creativity in scholarship as it relates to scholarly reputation in the field of management. The effects of quantity (total refereed publications, national paper presentations) and quality (proportion of articles in premier journals, editorial activity, research awards) dimensions of scholarly activity are also considered. Our results suggest that the quality dimensions are positively associated with reputation, but that the perceived creativity of a scholar's work further influences reputation, and partially mediates the relationship between some quality measures and reputation. These results suggest that quality, creativity in particular, is more important than quantity for the accumulation of reputation.
The diffusion of the Internet has radically expanded the readily available sources for information of all types. Information that was once obtained second-hand from friends and acquaintances -the traditional "two-step flow"-is now found easily through the Internet. The authors make use of survey data to explore this thesis in regards to information sources about genetic testing and the influence of the Internet on the information seeking behaviors of the public. A telephone survey of a random sample of 882 adults asked them about their knowledge of, concerns about, and interest in genetic testing. Respondents were most likely to first turn to the Internet for information about cancer genetics, second to public libraries, and third to medical doctors. Overall, doctors were the most likely source to be consulted when second and third choices are considered. Age, income, and self-reported understanding of genetics are shown to be predictors of whether someone goes to medical professionals for advice, rather than to the Internet or public library. The results raise questions about the apparent tendency of the public to regard the Internet as the best source of information on complex topics like genetics, for which it may be ill-suited.
The emergence of information portal systems in the past few years has led to a greatly enhanced Web-based environment for users seeking information online. While considerable research has been conducted on user information-seeking behavior in regular IR environments over the past decade, this paper focuses specifically on how users in a medical science and clinical setting carry out their daily information seeking through a customizable information portal system (MyWelch). We describe our initial study on analyzing Web usage data from MyWelch to see whether the results conform to the features and patterns established in current information-seeking models, present several observations regarding user information-seeking behavior in a portal environment, outline possible long-term user information-seeking patterns based on usage data, and discuss the direction of future research on user information-seeking behavior in the MyWelch portal environment.
Studies of everyday life information seeking have begun to attend to incidental forms of information behavior, and this more inclusive understanding of information seeking within broader social practices invites a constructionist analytical paradigm. Positioning theory is a constructionist framework that has proven useful for studying the ways in which interactional practices contribute to information seeking. Positions can construct individuals or groups of people in ways that have real effects on their information seeking. This article identifies some specific types of discursive positioning and shows how participants in a clinical care setting position themselves and one another in ways that justify different forms of information seeking and giving. Examples are drawn from an ongoing study of information seeking in prenatal midwifery encounters. The data consist of audio recordings of nine prenatal midwifery visits and of 18 follow-up interviews, one with each participating midwife and pregnant woman. The midwifery model of care is based on a relationship in which the midwife provides the pregnant woman with information and support necessary for making informed decisions about her care. Midwife-client interactions are therefore an ideal context for studying information seeking and giving in a clinical encounter.
The aim of this research was to develop a conceptual framework that would help to collect and understand the information needs of a target community. Even though many information behavior frameworks already exist, they tend to focus on different aspects of the person and their interaction with information. It was proposed that a synthesis of the existing frameworks could lead to one comprehensive framework. Previous research was analyzed and an initial framework defined. It was piloted, adapted, and then applied to data on informal carers, who are people caring for another person, generally a relative, for more than 14 hours per week, and who are not paid for it. The data stemmed from 60 interviews that were transcribed and coded. This paper presents the data on informal carers and their information experience using the final framework. It serves to demonstrate how the framework sensitizes the researcher to certain types of significant data, enables the organization of the data, indicates the relationships between different types of data, and, overall, helps to provide a rich picture of the target community's information needs. In conclusion, the paper discusses the differences and advantages of the framework in relation to previous work, the limitations of the study, and possible further research.
As health information seekers pursue their "right to know" when investigating medical options, the question of reliable health information resources becomes paramount. Previous research has not addressed widely the connection between women as the health information seeker and quality health information, nor has women's awareness of specific health and medical resources been adequately evaluated. A study with a convenience sample of 119 women assessed the process of seeking health information (women's health information needs, the search strategies they employed for filling the information need, and the use of the health information found), and their awareness of specific health and medical information resources. Our survey instrument was based on Kuhlthau's Information Search Process (ISP) model. Results appeared to address the uncertainty stage of the ISP model, as there were conflicting responses regarding the facility of locating information, the usefulness of the information found, and whether or not the subjects' health questions were answered. The study also identified a low awareness by our respondents of specific health and medical information resources. There is an opportunity for health information providers to play a role in mediating at this uncertainty stage to connect health information seekers with reliable information.
In this study, we attempt to evaluate the performance of the World Wide Web as an information resource in the domain of international travel. The theoretical framework underpinning our approach recognizes the contribution of models of information seeking behavior and of information systems in explaining World Wide Web usage as an information resource. Specifically, a model integrating the construct of uncertainty in information seeking and the task-technology fit model is presented. To test the integrated model, 217 travelers participated in a questionnaire-based empirical study. Our results confirm that richer (or enhanced) models are required to evaluate the broad context of World Wide Web (the Web) usage as an information resource. Use of the Web for travel tasks, for uncertainty reduction, as an information resource, and for mediation all have a significant impact on users' perception of performance, explaining 46% of the variance. Additionally, our study contributes to the testing and validation of metrics for use of the Web as an information resource in a specific domain.
As more people gain at-home access to the Internet, information seeking on the Web has become embedded in everyday life. The objective of this study was to characterize the home as an information use environment and to identify a range of information seeking and Web-search behaviors at home. Twelve Northern California residents were recruited, and the data were collected through semi-structured at-home interviews based on a self-reported Search Activities Diary that subjects kept over a 3-5 day period. The data were analyzed on four levels: home environment, information seeking goals, information retrieval interaction, and search query. Findings indicated that the home, indeed, provided a distinct information use environment beyond physical setting alone in which the subjects used the Web for diverse purposes and interests. Based on the findings, the relationships among home environment, Web context, and interaction situation were identified with respect to user goals and information-seeking behaviors.
We elaborate on Pettigrew's (1998, 1999) theory of information grounds while using an outcome evaluation approach enriched by its focus on context to explore the use of need-based services by immigrants in New York City. Immigrants have substantial information and practical needs for help with adjusting to life in a new country. Because of differences in language, culture, and other factors such as access, new immigrants are a difficult population to study. As a result, little research has examined their predilections from an information behavior perspective. We report findings from a qualitative study of how literacy and coping skills programs are used by and benefit the immigrant customers of the Queens Borough Public Library (QBPL). From our interviews and observation of 45 program users, staff, and other stakeholders, we derived a grand context (in Pettigrew's terms) woven from three subcontexts: the immigrants of Queens, New York; the QBPL, its service model, and activities for immigrants; and professional contributions of QBPL staff. Our findings are discussed along two dimensions: (a) building blocks toward information literacy, and (b) personal gains achieved by immigrants for themselves and their families. We conclude that successful introduction to the QBPL-as per its mission, programming, and staff-can lead immigrants to a synergistic information ground that can help in meeting broad psychological, social, and practical needs.
According to authors like H.E. Stanley and others, growth dynamics of university research displays a quantitative behaviour similar to the growth dynamics of firms acting under competitive pressure. Features of such behaviour are probability distributions of annual growth rates or the standard deviation of growth rates. We show that a similar statistical behaviour can be observed in the growth dynamics of German university enrolments or in the growth dynamics of physics and mathematics, both for the 19th century. Since competitive pressure was generally weak at that time, interpretations of statistical similarities as to pointing to a "firm-like behavior" are questionable.
The empirical question addressed in this contribution is: How does the relative frequency at which authors in a research field cite 'authoritative' documents in the reference lists in their papers vary with the number of references such papers contain? 'Authoritative' documents are defined as those that are among the ten percent most frequently cited items in a research field. It is assumed that authors who write papers with relatively short reference lists are more selective in what they cite than authors who compile long reference lists. Thus, by comparing in a research field the fraction of references of a particular type in short reference lists to that in longer lists, one can obtain an indication of the importance of that type. Our analysis suggests that in basic science fields such as physics or molecular biology the percentage of 'authoritative' references decreases as bibliographies become shorter. In other words, when basic scientists are selective in referencing behavior, references to 'authoritative' documents are dropped more readily than other types. The implications of this empirical finding for the debate on normative versus constructive citation theories are discussed.
A survey of authors of highly cited papers in 22 fields was undertaken in connection with a new bibliometric resource called Essential Science Indicators (ESI(R)). Authors were asked to give their opinions on why their papers are highly cited. They generally responded by describing specific internal, technical aspects of their work, relating them to external or social factors in their fields of study. These self-perceptions provide clues to the factors that lead to high citation rate, and the importance of the interaction between internal and external factors. Internal factors are revealed by the technical terminology used to describe the work, and how it is situated in the problem domain for the field. External factors are revealed by a different vocabulary describing how the work has been received within the field, or its implications for a wider audience. Each author's response regarding a highly cited work was analyzed on four dimensions: the author perception of its novelty, utility, significance, and interest. A co-occurrence analysis of the dimensions revealed that interest, the most socially based dimension, was most often paired with one of the other more internal dimensions, suggesting a synergy between internal and external factors.
This study develops and tests an integrated conceptual model of journal evaluation from varying perspectives of citation analysis. The main objective is to obtain a more complete understanding of the external factors affecting journal citation impact; that is, a theoretical construct measured by a number of citation indicators. Structural equation modelling (SEM) with partial least squares (PLS) is used to test the conceptual model with empirical data from journals in clinical neurology. Interrelationships among journal citation impact and four external factors (journal characteristics, journal accessibility, journal visibility and journal internationality) have been successfully explored, and the conceptual model of journal evaluation has been examined.
The paper describes the construction and functions of the Citation Database for Japanese Papers (CJP) developed at the National Institute of Informatics, Japan (NII), and the Impact Factors of CJP's source journals. Then statistical analyses of multidimensional scaling on citation counts for the academic society journals to measure relationship among the societies are described. We also introduce a new citation navigation system, CiNii, which enables users to access various resources provided by NIL such as NACSIS Electronic Library Service (NACSIS-ELS) to get electronic full-text of journal articles through citation links. Recent political developments in Japan towards enhancement of scientific information infrastructure are also introduced with its implication to research evaluation systems incorporating citation analyses.
Both the United States and the European Union have set goals for worldwide leadership of science and technology. While the U. S. leads in most input quantitative indicators, output indicators may be more specific for determining present leadership. They show that the EU has taken the lead in important metrics and is challenging the U. S. in others. Qualitative indicators of fields of research and development, based on expert review studies organized by the authors, confirm that many EU labs are equal or better than those in the U.S.
The tremendous social and political changes that culminated in the Soviet Union's dissolution had a great impact on the Russian science community. Due to the Russian transformation to a market economy a new model of R&D emerged on the basis of the higher education system (R&D in universities). This paper is part of a project, the main goals of which were to analyse the impact of competitive funding on R&D in provincial universities, the distribution of funding by the Russian Foundation for Basic Research, and the level of cross-sectoral and international collaboration. This paper gives a descriptive overview of R&D conducted at the 380 provincial universities, looking at 9,800 applications, 1,950 research projects, 19,981 individuals, and more than 29,600 publications for the period 1996-2001. Our data demonstrated a positive tendency in demographic statistics in the provinces. A map of intra-national collaboration taking place in 1995 2002 in provincial universities situated in different economic regions was designed. Our data show a strong collaboration within the regions, which is an important factor of sustainability. Publication output grew by a factor two or two-and half in six years. The share in output on mathematics was the highest at about 45%, physics and chemistry had equal shares of about 20% each. Researchers from the Ural and Povolzh'e regions were more active in knowledge dissemination than their colleagues from the other nine economic-geographic regions. Bibliometric analysis of more than 1,450 international collaborative publications for 1999 2001 demonstrated a strong shift in collaboration partners from Former East Block and former USSR countries to Western Europe, USA and Japan. Among the regions, Povolzh'e, Ural, Volgo-Vyatsky and Central Chemozem'e demonstrated a stronger tendency to collaborate. This collaboration depends heavily on financial support from foreign countries.
This paper traces the history of China Scientific and Technical Papers and Citations database (CSTPC) since its founding in 1988. The fact that most Chinese scientists publish their research results in Chinese journals requires that China establish SCI counterparts dedicated to domestic S & T journals. The article describes the selection criteria for source journals, the approach used to adjust the structure of source journals, the criteria for selecting items to be included in the database, and the indexing method. Then it discusses the impact upon government R & D administration agencies and the science community in general by both CSTPC team and CSTPC database. Finally, the article analyzes the main factors that lead to the primary success of CSTPD. The authors encourages information workers in other non-English developing countries to build up similar databases.
This paper presents qualitative philosophical, sociological, and historical arguments in favor of collaborative research having greater epistemic authority than research performed by individual scientists alone. Quantitatively, epistemic authority is predicted to correlate with citations, both in number, probability of citation, and length of citation history. Data from a preliminary longitudinal study of 33 researchers supports the predicted effects, and, despite the fallacy of asserting the consequent, is taken to confirm the hypothesis that collaborative research does in fact have greater epistemic authority.
The increasing cooperation in science, which has led to larger co-authorship networks, requires the application of new methods of analysis of social networks in bibliographic co-authorship networks as well as in networks visible on the Web. In this context, a number of interesting papers on the "Erdos Number", which gives the shortest path (geodesic distance) between an author and the well-known Hungarian mathematician Erdos in a co-authorship network, have been published recently. This paper develops new methods concerning the position of highly productive authors in the network. Thus a relationship of distribution of these authors among the clusters in the co-authorship network could be proved to be dependent upon the size of the clusters. Highly productive authors have, on average, low geodesic distances and thus shorter length of paths to all the other authors of a specialism compared to low productive authors, whereas the influencing possibility of highly productive scientists gets distributed amongst others in the development of the specialism. A theory on the stratification in science with respect to the over random similarity of scientists collaborating with one another, previously covered with other empirical methods, could also be confirmed by the application of geodesic distances. The paper proposes that the newly developed methodology may also be applied to visible networks in future studies on the Web. Further investigation is warranted into whether co-authorship and web networks have similar structures with regards to author productivity and geodesic distances.
Several research studies and reports on national and European science and technology indicators have recently presented figures reflecting intensifying scientific collaboration and increasing citation impact in practically all science areas and at all levels of aggregation. The main objective of this paper is twofold, namely first to analyse if the number or weight of actors in scientific communication has increased, if patterns of documented scientific communication and collaboration have changed in the last two decades and if these tendencies have inflationary features. The second question is concerned with the role of scientific collaboration in this context. In particular, the question will be answered to what extent co-authorship and publication activity, on one hand, and co-authorship and citation impact, on the other hand, do interact. The answers found to these questions have strong implication for the application of bibliometric indicators in research evaluation, moreover, the construction of indicators applied to trend analyses and studies based on medium-term or long-term observations have to be reconsidered to guarantee the validity of conclusions drawn from bibliometric results.
Many studies have tried to describe patterns of research collaboration through observing coauthorship networks. Those studies mainly analyze static networks, and most of them do not consider the development of networks. hi this study, we turn our attention to the development of personal collaboration networks. On the basis of an analysis from two viewpoints, i.e., growth in the number of collaborating partners and change in the relationship strength with partners, we describe and compare the characteristics of four different domains, i.e., electrical engineering, information processing, polymer science, and biochemistry.
The information analysis process includes a cluster analysis or classification step associated with an expert validation of the results. In this paper, we propose new measures of Recall/Precision for estimating the quality of cluster analysis. These measures derive both from the Galois lattice theory and from the Information Retrieval (IR) domain. As opposed to classical measures of inertia, they present the main advantages to be both independent of the classification method and of the difference between the intrinsic dimension of the data and those of the clusters. We present two experiments on the basis of the MultiSOM model, which is an extension of Kohonen's SOM model, as a cluster analysis method. Our first experiment on patent data shows how our measures can be used to compare viewpoint-oriented classification methods, such as MultiSOM, with global cluster analysis method, such as WebSOM Our second experiment, which takes part in the EICSTES EEC project, is an original Webometrics experiment that combines content and links classification starting from a large non-homogeneous set of web pages. This experiment highlights the fact that break-even points between our different measures of Recall/Precision can be used to determine an optimal number of clusters for web data classification. The content of the clusters obtained when using different break-even points are compared for determining the quality of the resulting maps.
This article aims to study the total backlink counts, external backlink counts and the Web Impact Factors (WIFs) for Chinese university websites. By studying whether the backlink counts and WIFs of websites associate with the comprehensive ratings and the research ratings for Chinese universities, the article demonstrates that the external backlink count can be a better evaluation measure for university websites than WIF. The study also investigated issues about data collection by using different search engines. It shows that data collected by Alta Vista are more stable than AllTheWeb.
An investigation of links to 89 US academic departments from three different disciplines gave insights into the kinds of international regions and national domains that linked to them. While significant correlations were found between total counts of international inlinks and total publication impact in Psychology and Chemistry, counts of international inlinks to History departments were too small to give a significant result. The correlations suggest that international links may reflect, to a certain extent, patterns of scholarly communication. Even though History departments attracted a significantly lower percentage of international inlinks than those of Chemistry and Psychology, the main source of links for all three disciplines was from Europe. Analyses of national inlinks, characterized by gTLDs (generic Top Level Domains), showed that the major source of links for all disciplines was .edu sites, followed by .com, .org, .net. As a whole, international regional differences in disciplines were stronger than gTLD differences, although in both cases discrepancies were not of a large scale.
Websites of China's top 100 information technology (IT) companies were examined. Link count to a company's website was found to correlate with the company's revenue, profit, and research and development expenses. This suggests that Web hyperlinks to commercial sites can be a business performance indicator and thus a source of business information. This information is useful for Web business intelligence and Web data mining. As a comparison to IT companies, China's top 100 privately owned companies were also studied. No relationship between link count and the business performance measure was found for these companies due probably to the heterogeneous nature of this group. Data collection issues for webometrics research were also explored in the study.
We show that the composition of two information production processes (IPPs), where the items of the first IPP are the sources of the second, and where the ranks of the sources in the first IPP agree with the ranks of the sources in the second IPP, yields an IPP which is positively reinforced with respect to the first IPP. This means that the rank-frequency distribution of the composition is the composition of the rank-frequency distribution of the first IPP and an increasing function phi, which is explicitly calculable from the two IPPs' distributions. From the rank-frequency distribution of the composition, we derive its size-frequency distribution in terms of the size-frequency distribution of the first IPP and of the function phi. The paper also relates the concentration of the reinforced IPP to that of the original one. This theory solves part of the problem of the determination of a third IPP from two given ones (so-called three-dimensional informetrics). In this paper we solved the "linear" case, i.e., where the third IPP is the composition of the other two IPPs.
This paper gives an overview of the diachronous (prospective) and synchronous (retrospective) approach to ageing studies of scientific literature from the perspective of technical reliability, visualising the different aspects that can be analysed by the two approaches. The main objective is to deepen the understanding of the mechanism and the theory underlying the two aproaches, and is to show that the difference between the diachronous and synchronous model is not "Just counting into opposite directions". In this context, a stochastic model is presented showing that one and the same model can be used to describe both diachronous and synchronous perspectives of citation processes. On the basis of this model, it is explained how some diachronous and synchronous citation-based indicators can be re-calculated for changing publication periods and citation windows underlying their construction. The paper is concluded by several applications such as the definition and calculation of diachronous (prospective) and synchronous (retrospective) journal impact measures and other citation indicators used in research evaluation.
The multivariate Waring distribution is developed and investigated. A special case, the bivariate Waring distribution, is considered. It is shown that the distributions have some nice properties as multivariate distribution. Some applications to the distribution of scientific productivity are discussed.
Domain analysis was first used 15 years ago as one of the most important techniques for software reuse. Even today, new techniques appear every year, and different authors propose different domain representation structures to represent and store all the different software components and the relationships among them. These relationships among components are the kernel of the domain semantics. In this report, a set of techniques and tools is presented regarding mathematical, statistical, and neural fields that, when linked together, enable semiautomatically building domain representations and storing them in a thesaurus structure of software components. Thesaurus structures, widely used in information science, are presented as the domain-modeling key concept, due to their higher automation possibilities compared with previous structures. New metrics to evaluate the quality, consistency, and completeness of the domain model obtained through this technique are also presented.
Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform well when applied to this problem. Detailed results and analyses are included to support our conclusions.
To make good decisions, businesses try to gather good intelligence information. Yet managing and processing a large amount of unstructured information and data stand in the way of greater business knowledge. An effective business intelligence tool must be able to access quality information from a variety of sources in a variety of forms, and it must support people as they search for and analyze that information. The EBizPort system was designed to address information needs for the business/IT community. EBizPort's collection-building process is designed to acquire credible, timely, and relevant information. The user interface provides access to collected and metasearched resources using innovative tools for summarization, categorization, and visualization. The effectiveness, efficiency, usability, and information quality of the EBizPort system were measured. EBizPort significantly outperformed Brint, a business search portal, in search effectiveness, information quality, user satisfaction, and usability. Users particularly liked EBizPort's clean and user-friendly interface. Results from our evaluation study suggest that the visualization function added value to the search and analysis process, that the generalizable collection-building technique can be useful for domain-specific information searching on the Web, and that the search interface was important for Web search and browse support.
Clear and precise queries are a necessity when searching very large document collections, especially when query-based retrieval is the only means of exploration. We propose system-mediated information access as a solution for users' well-documented inability to formulate good queries. Our approach is based on two main assumptions: first, on the ability of document clustering to reveal the topical, semantic structure of a problem domain represented by a specialized "source collection," and, second, on the capacity of statistical language models to convey content. Taking the role of the human mediator or intermediary searcher, a mediation system interacts with the user and supports her exploration of a relatively small source collection, chosen to be representative for the problem domain. Based on the user's selection of relevant "exemplary" documents and clusters from this source collection, the system builds a language model of her information need. This model is subsequently used to derive "mediated queries," which are expected to convey precisely and comprehensively the user's information need, and can be submitted by the user to search any large and heterogeneous "target collections." We present results of experiments that simulated various mediation strategies and compared the effect on mediation effectiveness of a variety of parameters, such as the similarity measure, the weighting scheme, and the clustering method. They provide both upperbounds of performance that can potentially be reached by real end users and a comparison between the effectiveness of these strategies. The experimental evidence suggests that information retrieval mediated through a clustered specialized collection has potential to improve effectiveness significantly.
We introduce a framework of multiple viewpoint systems for describing and designing systems that use more than one representation or set of relevance judgments on the same collection. A viewpoint is any representational scheme on some collection of data objects together with a mechanism for accessing this content. A multiple viewpoint system allows a searcher to pose queries to one viewpoint and then change to another viewpoint while retaining a sense of context. Multiple viewpoint systems are well suited to alleviate vocabulary mismatches and to take advantage of the possibility of combining evidence. We discuss some of the issues that arise in designing and using such systems and illustrate the concepts with several examples.
This report presents a case study of the development of an interface for a novel and complex form of document retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. A study involving users from the beginning of the design process is described, and it covers initial examination of user needs and tasks, preliminary design and testing of interface components, building, testing, and refining the interface, and, finally, conducting usability tests of the system. Lessons are learned at every stage of the process, leading to a much more informed view of how such an interface should be built.
Studying three Chinese major universities of different type, this article attempts to validate earlier results related to authors' name order in papers co-authored by graduate candidates and their supervisors. Candidates for the doctoral degree as well as the master's degree are considered. Defining the g-ratio as the fraction of co-authored publications where the graduate student's name precedes that of the supervisor's we obtain the following results. 1) Generally, master's level g-ratios are smaller than the corresponding doctoral level g-ratios. 2) The three doctoral g-ratio time series have a common characteristic: they tend to a limiting target value of somewhat more than 80%. The master's time series of the three universities extend themselves in parallel with the doctoral time series. 3) The g-ratio of collaborative papers related to the dissertation is higher than the g-ratio of collaborative papers not related to the dissertation. This is true on the doctoral level as well as on the master's level. 4) Different disciplines have different g-ratios, representing disciplinary customs in graduate candidate-supervisor collaboration, the highest g-ratio in the doctoral case occurring in biology (except for Tsinghua University that does not offer courses in biology). 5) There exist only small differences between the g-ratios of different kinds of universities. 6) In recent years, the same candidate-supervisor collaboration patterns exist in international publications as in domestic ones. The fact that the doctoral g-ratios of all three universities are as high as 80% reflects a universal regularity in the structure of scientific collaboration between doctoral candidates and their supervisors in China.
Within the scope of this article we went further into the question to what extent particularistic attributes - social origin and gender - can affect selection processes (1) in access to and (2) in later career attainment after achieving the doctoral degree. The analyses are based on a questionnaire survey (n = 2 244) among doctoral degree holders achieving the doctoral degree in six selected disciplines (biology, electrical engineering, German studies, mathematics, social sciences, and business studies/economics) at German universities. In terms of our first object of investigation, the analyses show that in four out of six disciplines doctoral degree holders are a selected group compared to university graduates with regard to both social origin and gender. In terms of our second object of investigation - the impact of particularistic attributes on several indicators of further career attainment after achieving the doctoral degree (career inside or outside higher education and science, career position and income) - the results point to a stronger impact of gender compared to social origin.
The notion of knowledge-based economy premises that technological knowledge be created, accumulated and disseminated through the interactive learning among principal actors in the national system. This paper analyzes, from a dynamic perspective, the structure of inter-industrial technological knowledge. Both human-driven disembodied channel and capital-driven embodied channel are investigated based on network analysis. The set of empirical data covers the Korean manufacturing sector during the 1980s. Overall, density of network tends to be increasing over time, implying that knowledge network becomes expanded and intensified. A number of distinctive features are identified between knowledge types and industrial categories. The findings in turn render important policy implications that should be addressed when developing technology policy. Clearly, the policy framework needs to be industry-specific and country-specific in accordance with the development stage and industrial structure of reference time.
Scientometric analysis of synchronous references in the nine Physics Nobel lectures by Nicolaas Bloembergen (1981), Arthur L. Schawlow (1981), Kai M. Siegbalm (1981), Kenneth G. Wilson (1982), Subrahmanyan Chandrasekhar (1983), William A. Fowler (1983), Carlo Rubbia (1984), Simon van der Meer (1984), and Klaus von Klitzing (1985) indicated high variations: No. of Synchronous References ranged from 24 (Meer) to 283 (Siegbahn); Synchronous Self-References ranged from 5 (Rubbia) to 88 (Siegbahn); synchronous references to others ranged from 10 (Chandrasekhar) to 255 (Wilson); Synchronous Self-Reference Rates ranged from 6.66% (Rubbia) to 65.51% (Chandrasekhar); Single-Authored References ranged from 15 (Klitzing) to 160 (Wilson); Multi-Authored References ranged from 4 (Chandrasekhar) to 194 (Siegbahn); Collaboration Coefficient in the synchronous references ranged from 0.14 (Chandrasekhar) to 0.75 (Klitzing); and Recency (age of 50% of the latest references) ranged from 2 (Klitzing) to 18 (Chandrasekhar) years. Seventy five per cent of the references belonged to journal articles. Highly referred journals were Astrophysical Journal, Physical Review B, Physical Review Letters, Arkiv, Fuer, Fysik, Surface Science, Physics Letters, and IEEE Transactions on Nuclear Science.
Severe Acute Respiratory Syndrome (SARS) has become the major of health issues since its outbreak early 2003. No analyses by bibliometric technique that have examined this topic exist in the literature. The objective of this study is to conduct a bibliometric analysis of all SARS-related publications in Science Citation Index (SCI) in the early stage. A systematic search was performed using the SCI for publications since SARS outbreak early 2003. Selected documents included 'severe acute respiratory syndrome' or 'SARS' as a part of its title, abstract, or keyword from the beginning stage of SARS outbreak, March till July 8, 2003. Analysis parameters included authorship, patterns of international collaboration, journals, language, document type, research institutional address, times cited, and reprint address. Citation analysis was mainly based on impact factor as defined by Journal Citation Reports (JCR) issued in 2002 and on the actual citation impact (ACI), which has been used to assess the impact relative to the whole field and has been defined as the ratio between individual citation per publication value and the total citation per publication value. Thirty-two percent of total share was published as news features, 25% as editorial materials, 22% as articles, 13% as letters, and the remaining being biographic items, corrections, meeting abstracts, and reprints. The US dominated the production by 30% of the total share followed closely by Hong Kong with 24%. Sixty-three percent of publication was published by the mainstream countries. The SARS publication pattern in the past few months suggests immediate citation, low collaboration rate, and English and mainstream country domination in production. We observed no associations of research indexes with the number of cases.
The population of Iran has nearly doubled in less than 25 years, while the number of university students has increased more than 10 times and 720 Ph. D. degrees have been awarded in basic science in the past 10 years. Despite the great difficulties that the Iranian scientists have been facing for more than two decades (as a consequence of a social revolution, 8 years of a destructive war imposed by Iraq, excessive brain drain, discriminatory practices by some international journals in publishing the Iranian articles, and unfair sanctions imposed by the industrialized countries) Iran's science is still thriving and the current number of yearly scientific publications exceeds 1500. When normalized with respect to the number of researchers and the research budget, the Iranian scientists seem to outperform most of their counterparts in the advanced industrialized nations. Main reason: total engagement in truncated research activities (basic or applied) leading solely to pure publications; lack of infrastructure for developmental research activities leading to new technologies. The average impact factor of the papers in various fields of basic science seems quite satisfactory considering the difficult conditions the Iranian scientists are working under. Should the research budgets and conditions improve and the unfair sanctions currently imposed by the world politics be eliminated, a far better contribution to the world science can be expected.
This study presents a bibliometric analysis of the scientific production in the food science and technology (EST) field for the period 1991-2000, in Iberian-America (IA). Eight selected IA countries contributed 97.6% of the IA production and accounted for a 6.6% of the world production. The most frequent document type is journal article published in English. Retrieved records display characteristical authorship patterns and preferred subject areas. Spain, Brazil, Mexico, Argentina and Portugal determine the IA pattern of sources of publication. The fifty top ranked journals, 80% of which were indexed by the SCIE, encompass two-thirds of the IA production.
The following problem has never been studied : Given A, the total number of items (e.g. articles) and T, the total number of sources (e.g. journals that contain these articles) (hence A>T), when is there a Lotka function f(j) = D/j(alpha) that represents this situation (i.e. where to) denotes the density of the sources in the item-density j)? And, if it exists, what are the formulae for D and alpha? This problem is solved in both cases with j is an element of [1, rho]: where (a) rho = infinity and where (b) rho < &INFIN;. Note that p = the maximum density of the items. If &rho; = &INFIN;, then A and T determine uniquely D and &alpha;. If &rho; < infinity, then we have, for every alpha less than or equal to 2, a solution for D and rho, hence for f. If rho < &INFIN; and &alpha; > 2 then we show that a solution exists if and only if mu = A/T < &alpha;-1/&alpha;-2. This sheds some light on the source-item coverage power of Lotka's law.
I examine whether or not new scientific specialties present young scientists with better opportunities to make significant discoveries than established specialties by examining a series of significant discoveries in the first 22 years of the field of bacteriology. I found that it was middle aged scientists, not young scientists, who were responsible for a disproportionate number of significant discoveries. I argue that in order to make significant discoveries scientists need to work their way into the center of the social network of a scientific research community. Only then will they have access to the material and social resources necessary to make such discoveries.
Our objective is the generation of schematic visualizations as interfaces for scientific domain analysis. We propose a new technique that uses thematic classification (classes and categories) as entities of cocitation and units of measure, and demonstrate the viability of this methodology through the representation and analysis of a domain of great dimensions. The main features of the maps obtained are discussed, and proposals are made for future improvements and applications.
While most research in the area of human-information behavior has focused on a single dimension - either the psychological or the social-this case study demonstrated the importance of a multidimensional approach. The Cognitive Work Analysis framework guided this field study of one event of collaborative information retrieval (CIR) carried out by design engineers at Microsoft, including observations and interviews. Various dimensions explained the motives for this CIR event and the challenges the participants encountered: the cognitive dimension, the specific task and decision, the organization of the teamwork, and the organizational culture. Even though it is difficult at times to separate one dimension from another, and all are interdependent, the analysis uncovered several reasons for design engineers to engage in CIR, such as when they are new to the organization or the team, when the information lends itself to various interpretations, or when most of the needed information is not documented. Similar multidimensional studies will enhance our understanding of human-information behavior.
Rough set theory is a relatively new intelligent technique used in the discovery of data dependencies; it evaluates the importance of attributes, discovers the patterns of data, reduces all redundant objects and attributes, and seeks the minimum subset of attributes. Moreover, it is being used for the extraction of rules from databases. In this paper, we present a rough set approach to attribute reduction and generation of classification rules from a set of medical datasets. For this purpose, we first introduce a rough set reduction technique to find all reducts of the data that contain the minimal subset of attributes associated with a class label for classification. To evaluate the validity of the rules based on the approximation quality of the attributes, we introduce a statistical test to evaluate the significance of the rules. Experimental results from applying the rough set approach to the set of data samples are given and evaluated. In addition, the rough set classification accuracy is also compared to the well-known ID3 classifier algorithm. The study showed that the theory of rough sets is a useful tool for inductive learning and a valuable aid for building expert systems.
In this paper, we focus on the effect of graded relevance on the results of interactive information retrieval (IR) experiments based on assigned search tasks in a test collection. A group of 26 subjects searched for four Text REtrieval Conference (TREC) topics using automatic and interactive query expansion based on relevance feedback. The TREC- and user-suggested pools of relevant documents were reassessed on a four-level relevance scale. The results show that the users could identify nearly all highly relevant documents and about half of the marginal ones. Users also selected a fair number of irrelevant documents for query expansion. The findings suggest that the effectiveness of query expansion is closely related to the searchers' success in retrieving and identifying highly relevant documents for feedback. The implications of the results on interpreting the findings of past experiments with liberal relevance thresholds are also discussed.
This article attempts to verify the hypothesis of the document presentation order by an empirical, two-stage experiment. It aims to identify the relationship between number of documents judged and order effect. The results indicate that significant order effect takes place when 15 and 30 documents are presented. Sets with 45 and 60 documents still reveal the order effect. However, subjects are not influenced by order of presentation when the set of documents has 5 and 75 members, respectively.
The present paper analyzes the changes that occurred to a set of Web pages related to "informetrics" over a period of 5 years between June 1998 and June 2003. Four times during this time span, in 1998,1999, 2002, and 2003, we monitored previously located pages and searched for new ones related to the topic. Thus, we were able to study the growth of the topic, while analyzing the rates of change and disappearance. The results indicate that modification, disappearance, and resurfacing cannot be ignored when studying the structure and development of the Web.
Via the Internet, information scientists can obtain cost-free access to large databases in the "hidden" or "deep Web." These databases are often structured far more than the Internet domains themselves. The patent database of the U.S. Patent and Trade Office is used in this study to examine the science base of patents in terms of the literature references in these patents. University-based patents at the global level are compared with results when using the national economy of the Netherlands as a system of reference. Methods for accessing the online databases and for the visualization of the results are specified. The conclusion is that "biotechnology" has historically generated a model for theorizing about university-industry relations that cannot easily be generalized to other sectors and disciplines.
N-grams have been widely investigated for a number of text processing and retrieval applications. This article examines the performance of the digram and trigram term conflation techniques in the context of Arabic free text retrieval. It reports the results of using the N-gram approach for a corpus of thousands of distinct textual words drawn from a number of sources representing various disciplines. The results indicate that the digram method offers a better performance than trigram with respect to conflation precision and conflation recall ratios. In either case, the N-gram approach does not appear to provide an efficient conflation approach due to the peculiarities imposed by the Arabic infix structure that reduces the rate of correct N-gram matching.
More than 100 U.S. governmental agencies offer links through FedStats, a centralized Web site that facilitates access to statistical tables, reports, and agencies. This and similar large collections need appropriate interfaces to guide the general public to easily and successfully find information they seek. This paper summarizes the results of 3 empirical studies of alternate organization concepts of the FedStats Topics Web page. Each study had 15 participants. The evolution from 645 alphabetically organized links, to 549 categorically organized links, to 215 categorically organized links tied to portal pages produced a steady rise in successful task completion from 15.6 to 24.4 to 42.2%. User satisfaction also increased. We make recommendations based on these data and our observations of users.
One might argue that the future of knowledge work is manifested in how open-source communities work. Knowledge work, as argued by Drucker (1968); Davenport, Thomas, and Cantrell (2002); and others, is comprised of specialists who collaborate via exchange of know-how and skills to develop products and services. This is exactly what an open-source community does. To this end, in this brief communication we conduct an examination of open-source communities and generate insights on how to augment current knowledge management practices in organizations. The goal is to entice scholars to transform closed knowledge management agendas that exist in organizations to ones that are representative of the open-source revolution.
The intentionally ambiguous expression "Popular Music Browser" reflects the two main goals of this project, which started in 1998, at Sony Computer Science Laboratories. First, we are interested in human-centered issues related to browsing "Popular Music." Popular here means that the music accessed is distributed widely and known to many listeners. Second, we consider "popular browsing" of music, i.e., making music accessible to nonspecialists (music lovers) and allowing sharing of musical tastes and information within communities, departing from the usual, single-user view of digital libraries. This research project covers all areas of the music-to-listener chain, from music description, descriptor extraction from the music signal, or data mining techniques, similarity-based access, and novel music retrieval methods such as automatic sequence generation, and to user interface issues. This article describes the scientific and technical issues at stake and the results obtained, and is illustrated by prototypes developed within the European IST project Cuidado.
As the dimension and number of digital music archives grow, the problem of storing and accessing multimedia data is no longer confined to the database area. Specific approaches for music information retrieval are necessary to establish a connection between textual and content-based metadata. This article addresses such issues with the intent of surveying our perspective on music information retrieval. In particular, we stress the use of symbolic information as a central element in a complex musical environment. Musical themes, harmonies, and styles are automatically extracted from electronic music scores and employed as access keys to data. The database schema is extended to handle audio recordings. A score/audio matching module provides a temporal relationship between a music performance and the score played. Besides standard free-text search capabilities, three levels of retrieval strategies are employed. Moreover, the introduction of a hierarchy of input modalities assures meeting the needs and matching the expertise of a wide group of users. Singing, playing, and notating melodic excerpts is combined with more advanced musicological queries, such as querying by a sequence of chords. Finally, we present some experimental results and our future research directions.
We have explored methods for music information retrieval for polyphonic music stored in the MIDI format. These methods use a query, expressed as a series of notes that are intended to represent a melody or theme, to identify similar pieces. Our work has shown that a three-phase architecture is appropriate for this task in which the first phase is melody extraction, the second is standardization, and the third is query-to-melody matching. We have investigated and systematically compared algorithms for each of these phases. To ensure that our results are robust, we have applied methodologies that are derived from text information retrieval: We developed test collections and compared different ways of acquiring test queries and relevance judgments. In this article we review this program of work, compare it to other approaches to music information retrieval, and identify outstanding issues.
The article describes the project on music information retrieval that has been carried out at the University of Padova, Italy. The research work has been characterized by the synergy of the modular integration of sound techniques of melody processing and of statistical information retrieval. After illustrating the background from which the project has originated, we describe the complete process, from methodology design through evaluation and system implementation. Conclusions, impacts on research in music information retrieval, and future directions are also described.
This article describes the research and development of an efficient Music Information Retrieval (MIR) engine that is embedded in a karaoke software package targeted for Asian people's need of music retrieval. The MIR engine has a multi-modal interface that allows queries by singing, humming, tapping, speaking, and writing. In particular, we discuss the design philosophy, technical barriers, and performance evaluation of such an engine, as well as its current and potential commercial applications. Feedbacks and feature requests from users, which greatly influence our future work, are also addressed.
The constantly increasing amount of audio available in digital form necessitates the development of software systems for analyzing and retrieving digital audio. In this work, we describe our efforts in developing such systems. More specifically, we describe the design philosophy behind our approach, the specific problems we try to solve, and how we evaluate the performance of our algorithms. Automatic music analysis and retrieval of non-speech digital audio is a relatively new field, and the existing techniques are far from perfect. To improve the performance of the developed techniques, two main techniques are used: (1) integration of information from multiple analysis and retrieval algorithms and (2) the use of graphical user interfaces that enable the user to provide feedback to the design, development, and evaluation of the algorithms. All the developed algorithms and user interfaces are integrated under MARSYAS, a software framework for research in computer audition.
Automatic generation of play lists for commercial broadcast radio stations has become a major research topic. Audio identification systems have been around for a while, and they show good performance for clean audio files. However, songs transmitted by commercial radio stations are highly distorted to cause greater impact on the casual listener. This impact helps increase the probability that the listener will stay tuned in, but the price we have to pay is a severe modification in the audio itself. This causes the failure of traditional identification systems. Another problem is the fact that songs are never played from the beginning to the end. Actually, they are put on the air several seconds after their real beginning and almost always under the voice of a speaker. The same thing happens at the end. In this article, we present the RAA project, which was conceived to deal with real broadcast audio problems. The idea behind this project is to extract automatically an audio fingerprint (the so-called AudioDNA) that identifies the fragment of audio. This AudioDNA has to be robust enough to appear almost the same under several degrees of distortion. Once this AudioDNA is extracted from the broadcast audio, a matching algorithm is able to find its fragments inside a database. With this approach, the system can find not only a whole song but also small fragments of it, even with high distortion caused by broadcast (and DJ) manipulations.
This article describes the digital music library work at the University of Waikato, New Zealand. At the heart of the project is a music information retrieval workbench for evaluating algorithms and performing experiments used in conjunction with four datasets of symbolic notation ranging from contemporary to classical pieces. The outcome of this experimentation is woven together with strands from our larger digital library project to form the Web-based music digital library MELDEX (short for melody index). An overview of the workbench software architecture is given along with a description of how this fits the larger digital library design, followed by several examples of MELDEX in use.
Until recently, most research on music information retrieval concentrated on monophonic music. Online Music Retrieval and Searching (OMRAS) is a three-year project funded under the auspices of the JISC (Joint Information Systems Committee)/NSF (National Science Foundation) International Digital Library Initiative which began in 1999 and whose remit was to investigate the issues surrounding polyphonic music information retrieval. Here we outline the work OMRAS has achieved in pattern matching, document retrieval, and audio transcription, as well as some prototype work in how to implement these techniques into library systems.
There are many problems in the musical signal domain that have not been solved yet. Among such problems are automatic recognition and editing of musical sound patterns, retrieval of audio material, detection of transient states, and articulation features in sounds. It seems that the key challenge is in building inexpensive browsers to retrieve audio material contained in multimedia bases and Internet sites. This overview shows motivation and intermediate and long-term goals of the research projects that have conducted for several years in the Multimedia Systems Department, Gdansk University of Technology, Poland.
Reports the results of a citation study on Watson and Crick's 1953 paper announcing the discovery of the double helix structure of DNA. The paper has been cited more than 2,000 times since 196 1, and there is no sign of any obsolescence to this article. An analysis was undertaken of the journals in which the citations appeared, and of mistakes in the bibliographic citations provided by citing articles. Watson and Crick themselves have only cited their own paper twice since 1961. An analysis was also undertaken of the reasons why the paper was cited; 100 citing articles were identified and read. The reasons for citing were then categorised using the Oppenheim and Renn method. Compared to earlier studies, it was found that a greater proportion of citations were for historical reasons, a smaller proportion of citing articles were actively using the Watson and Crick article, and a similar, but low proportion were criticising the Watson and Crick article.
With the globalisation of the job market, higher education is undergoing structural changes and education scenario worldwide is experiencing dramatic and accelerating changes in patterns of creation of new knowledge. Similar activities are being witnessed in India as regards to the production of highly qualified S&T personnel in different disciplines. In this paper a comparative analysis of doctorates produced in India during 1974 to 1999 in different fields is carried out with the help of mathematical models. Besides analysing the trends of highly qualified S&T personnel with the help of known mathematical models, a few new substitution models have been proposed and applied to explain the movement of researchers from one discipline to the other. Findings suggest that arts, commerce, education and medicine depict growing trends, whereas agriculture, science and veterinary science are traversing a declining path. Further, proposed models are found to be flexible in nature and can capture and explain the shifting patterns very well. These models are comparable to other known models dealing with technology substitution.
The present study characterizes the dynamic publication activity of global knowledge management (KM) by data collected through a search restricted to articles in ISI Web of Science. A total of 2727 unique authors had contributed 1407 publications since 1975. The overwhelming majority (2349 or 86%) of them wrote one publication. The productive authors, their contribution and authorship position are listed to indicate their productivity and degree of involvement in their research publications. The sum of research output of the first or responsible authors from USA, UK and Germany reaches 57% of the total productivity. The distribution of articles is rather widespread - they published in 462 titles of serials, spanning 110 Journal Citation Reports subject categories. The higher quality journals make publication of findings more visible. A Pearson's correlation coefficient is statistically found to be significant between citation frequency of article and impact factor of journal, instead of authorship pattern. The results also indicate that R&D expenditures were actually not proportional to research productivity or citation counts. As the subject highly interacts with other disciplines, the field of KM has not yet developed its own body of literature. KM might have been evolving an interdisciplinary theory that is developing at the boundaries of scientific disciplines.
Various distributions of the Nobel laureates in physics in the 20th century and their discoveries are considered. It is shown that the time-interval between the discovery and its recognition can be approximately described by a lognormal distribution. The ratio of the numbers of laureates awarded for the experimental and the theoretical discoveries was rather different in various decades; this was determined by some "waves" of discoveries and in the initial period probably by some subjective factors. The probability to obtain this prize for the theorist is larger than for the experimenter. The main part of the awards was given to the scientists working in the main fields of modem physics: small distances and solid state physics. Some fields of physics such as mathematical physics, relativity, statistical physics were ignored completely. The worrying tendency of an increasing average age of laureates towards their retirement age is indicated.
Brazilian university-based science has grown rapidly in the last 20 years. Most of the PhD-level teaching, research, and technical publications are based in the government-supported universities, although there are also privately supported universities, which educate a large fraction of Brazilian attorney, business people, and other professions. We investigate here the relationship between type of university, numbers of degree program offered, number of faculty members, and number of published papers. Twelve universities, all government supported, are found to produce a very large fraction of publications and to house the best qualified PhD programs. We find that there is a strong correlation between research carried out with foreign collaborators and rate at which the resulting publications are cited. This trend is characteristic of many developing and less developed nations.
A top journal is defined as a journal which is within the first 10% of journals ranked by impact factor in the SCI list, within a particular scientific subfield, for the year considered. Journals which were for 11 or more years within the first 10% were considered top journals during the whole period even though they were not within the first 10%, in some of the years covered by this study. In the period from 1980 to 2000, the Croatian scientists affiliated with research institutions within the Republic of Croatia, published a total of 13,021 papers in journals covered by the Science Citation Index (SCI). Out of these papers, only 2,720 were published in top journals. This amounts to 20.9% of the total, and this is below the world average of 29.5% for the same scientific subfields. Out of the above 2,720 publications, 1,250 (46.0%) were published in international collaboration, and 335 (12.3%) papers were Meeting Abstracts. The Croatian scientists were most productive in the main scientific fields: Physics (875 papers; 32.2%), Medicine (786 papers; 28.9%), and Chemistry (580 papers; 21.3%). All others fields, taken together, comprised 17.6% of the total scientific output. Of the 786 medical papers, 290 were Meeting Abstracts, or 36.9% of the total output in the field of Medicine, and medical Meeting Abstracts represent 86.6% of the total number of abstracts (33 5). Articles (2,060) represent 75.7% of the total Croatian scientific output in top journals.
Firms are increasingly dependent on networks and network visibility for innovation. Bibliometric impact can be regarded as a measure of a firm's visibility in knowledge-producing networks and may explain why companies publish their results. However, this visibility varies across disciplines. This paper examines publications produced by Danish companies in 1996, 1998 and 2000 to show how citation and collaboration patterns relate in different disciplines. The main findings are that for disciplines characterized by international collaboration and many authors per paper, international collaboration results in a greater number of citations. National collaboration does not, however, seem to make any difference to citation impact in industrial research. In disciplines where multinational collaboration and multi-authorship is uncommon, no clear picture of impact patterns can be obtained. By extension, this research may provide knowledge on how citations of papers in scientific journals can be used as a potential window to scientific networks for firms.
To evaluate the contribution to international dermatological literature made by authors from European Union (EU) countries. Using MedLine, a selection was made of articles by EU authors published between 1987 and 2000 in 32 dermatological journals, classified as such by the Institute for Scientific hiformation. Overall 19,225 documents were published by European authors in the selected dermatological journals from 1987 to 2000. The leading countries in terms of output were the United Kingdom, Germany, Italy and France. The leading countries in number of articles after taking into account the gross domestic product and the population were Denmark, Finland and Sweden. The main journals were the British Journal of Dermatology (14.5% of articles from European authors), Contact Dermatitis (13.7%), Journal of Investigative Dermatology (7.3%), Journal of American Academy of Dermatology (6.4%), and Acta Dermato-Venereologica (6.1%). The country with the highest output of papers by journal was the United Kingdom (11 journals) followed by Germany (9 journals), Italy (6 journals), France (3 journals), Spain (2 journals) and Sweden (1 journal). In conclusions: the scientific production of European Union research on dermatology is highest in northern countries.
Evaluation of research quality is becoming more important in the field of library and information science, as in other fields. This pilot study is a preliminary attempt to address issues associated with determining the quality of the published research in one area of library and information science, specifically school librarianship. The main aims were (1) to test the extent to which experienced evaluators agreed in their rankings of published research articles on the basis of quality and (2) to investigate the approaches to evaluation used by the experienced evaluators. A qualitative, naturalistic research design was used. On the basis of a comprehensive literature review, four approaches to evaluation were identified; they were generally supported through an analysis of the responses of the experienced evaluators. However, although the majority of the evaluators agreed on the article ranked lowest, basic statistical analyses showed less agreement about the other articles. Although subject knowledge (of the field of school librarianship in this case) may have some influence on the evaluations, cluster analysis suggests that there may be differences in the value perceptions of the evaluators that also carry weight. More research would be needed to gain a better understanding of these value perceptions and their relationship, if any, to the four approaches to evaluation that were identified through the literature.
In upholding the Children's Internet Protection Act (CIPA), the U.S. Supreme Court has forced public libraries to face difficult issues about filtering Internet content. The implementation of filters creates a range of practical issues for libraries and also raises myriad research issues related to the effects of CIPA on public library services and on access to Internet-based information in public libraries. Using a multimethod, iterative research strategy, this article explores selected areas related to filtering that may affect the provision of Internet content and services in public libraries. This study presents preliminary data about the impact of CIPA on public libraries and offers a perspective of what research is necessary to provide a better understanding of the impacts of CIPA and to determine what research would need to be conducted for potential future legal challenges to the application of CIPA in public libraries.
This article describes and discusses the detailed procedures followed by two intergenerational teams comprising the researchers and a group of eight grade-six elementary students (ages 11 to 12 years) and a group of six third-grade elementary students (ages 8 to 9 years), respectively, in designing two prototype Web portals intended for use by elementary school students. These procedures were based on three design theories: Contextual Inquiry, Participatory Design, and Cooperative Inquiry. The article also presents and describes the two resulting Web portal prototypes and discusses the design criteria employed by the teams. Conclusions are elaborated on the basis of this research experience regarding how such a design process should be conducted in the context of an intergenerational team, and what characteristics young users expect to find in Web portals that they will use to support their informational needs in terms of elementary school projects and assignments.
The goal of the scientometric analysis presented in this article was to investigate international and regional (i.e., German-language) periodicals in the field of library and information science (LIS). This was done by means of a citation analysis and a reader survey. For the citation analysis, impact factor, citing half-life, number of references per article, and the rate of self-references of a periodical were used as indicators. In addition, the leading LIS periodicals were mapped. For the 40 international periodicals, data were collected from ISI's Social Sciences Citation Index Journal Citation Reports (JCR); the citations of the 10 German-language journals were counted manually (overall 1,494 source articles with 10,520 citations). Altogether, the empirical base of the citation analysis consisted of nearly 90,000 citations in 6,203 source articles that were published between 1997 and 2000. The expert survey investigated reading frequency, applicability of the journals to the job of the reader, publication frequency, and publication preference both for all respondents and for different groups among them (practitioners vs. scientists, librarians vs. documentalists vs. LIS scholars, public sector vs. information industry vs. other private company employees). The study was conducted in spring 2002. A total of 257 questionnaires were returned by information specialists from Germany, Austria, and Switzerland. Having both citation and readership data, we performed a comparative analysis of these two data sets. This enabled us to identify answers to questions like: Does reading behavior correlate with the journal impact factor? Do readers prefer journals with a short or a long half-life, or with a low or a high number of references? Is there any difference in this matter among librarians, documentalists, and LIS scholars?.
Objectivity, in the form of the application of external scrutiny according to standards agreed within a research community, is an essential characteristic of information science research whether pursued from positivist, interpretative, or action research perspectives. Subjectivity may represent both a legitimate focus of research (e.g., people's perceptions and attitudes) and a legitimate component of methodology (e.g., enabling researchers to enter, experience, and share the perceived worlds of their subjects). However, subjectivity that both (a) is not open to external scrutiny and (b) gives rise to contingent dependencies is problematic for research. The issue of problematic types of subjectivity is considered, and the contributions to the debate concerning possible solutions of two key thinkers-the cybernetician Gordon Pask and the methodological philosopher Brenda Dervin-are discussed. The need identified by Dervin for researchers to be able to escape (expose and test) their own assumptions is explored in terms of a dynamic interplay between relatively subjective and objective forces, each requiring the liberating and constraining energies of the other. The extent to which meta-methodological awareness-a prerequisite for any such escape-can be fostered, for example, by the quality of research environments, is explored along with implications for those responsible for managing and funding research.
Given that time is money, Web searching can be a very expensive proposition. Even with the best search technology, the usefulness of search results depends on the searcher's ability to use that technology effectively. In an effort to improve this ability, our research investigates the effects of logic training, interface training, and the type of search interface on the search process. In a study with 145 participants, we found that even limited training in basic Boolean logic improved performance with a simple search interface. Surprisingly, for users of an interface that assisted them in forming syntactically correct Boolean queries, performance was negatively affected by logic training and unaffected by interface training. Use of the assisted interface itself, however, resulted in strong improvements in performance over use of the simple interface. In addition to being useful for search engine providers, these findings are important for all companies that rely heavily on search for critical aspects of their operations, in that they demonstrate simple means by which the search experience can be improved for their employees and customers.
To explore issues of user interface design and experience, including culturally preferred design elements, a study was conducted analyzing sites in Germany, Japan, and the United States (30 municipal sites in each country). Design elements considered are use of symbols and graphics, color preferences, site features (links, maps, search functions, and page layout), language, and content. Significant modal differences were found in each of the listed categories. Outcomes from the study are used to discuss future research directions in the areas of experience design and localization.
This study analyzes the similarities and differences of performance of information management (IM) and knowledge management (KM) research publication indexed by the SCI-EXPANDED, SSCI and A&HCI databases since 1994 with informetric methods in order to explore a developing tendency in the near future. The bibliographic search supplied 1199 IM and 1063 KM records. A very few of IM and KM authors contributed two or more articles. Four countries dominated global IM and KM research productivity, while a few institutions played remarkable roles in scholarly activity. IM journals distributed widespread and 84 per cent just published one or two articles; KM publications were rather concentrated to core and borderline periodicals, fitting Bradford's law of scattering and. The result of Pearson's correlation coefficients analysis indicates that the higher the journal impact factor, the more times the published article is cited. The author concludes that KM has been leading IM in both publication productivity and academic population and the tendency is overwhelmingly growing.
This paper presents an overview of the general model of scientific growth proposed by D. J. de Solla Price. Firstly, the formulation of the model is examined using the seminal sources. Later, forerunners, offshoots and criticisms to the model are discussed. Finally, an integrative review using retrieved empirical studies exposes the complexity and diversity of models of scientific growth and the absence of consistent patterns.
Patents generated from scientific research indicate academic involvement in technology development. Academic patenting activity is recent, even in developed countries. This study compares patenting activity of Brazilian and American universities. Brazilian universities had 29.5-fold increase in applications and 4.01-fold in grants (1990-2001), about twice the increase presented by American universities in this period. However, a significant fraction of Brazilian academic applications are abandoned due to the lack of specialized staff to help in writing and to shepherd the application through the patenting process in universities. The participation of research institutes in technological innovation is increasing steadily, even without financial incentives.
The paper compares the research performance in computer science of four major Western countries, India and China, based on the data abstracted from INSPEC database during the period 1993-2002. A total of 9,632 computer science papers recorded in INSPEC database were used for the comparison. The findings indicate that, on the one hand, the number of papers produced in China has considerably increased in the past few years. Particularly, in recent years, China occupies a remarkable high position in terms of counts of papers indexed by the INSPEC database. On the other hand, Chinese scientists preferred to publish in domestic journals and proceedings and shares of SCI-papers to the total journal papers for China have still remained the lowest. This indicates that the research activities of Chinese scientists in computer science are still rather "local" and suffer from a low international visibility. Various scientometric indicators, such as Normalized Impact Factor, ratio of papers in high quality journals are further adopted to analyze research performance and diverse finding are obtained. Nevertheless, for these surrogate indicators, China has optimistically achieved great progress, characterized with "low level of beginning and high speed of developing". The policy implication of the findings lies in that China, as well as other less developed countries in science, can earn relative competitive advantages in some new emerging or younger disciplines such as computer science by properly using catch-up strategy.
The characteristics of Indian and Chinese patenting activity in the US patent system are examined by delineating two categories of patents; 'nationally assigned', and 'invented not nationally assigned' patents (not-nationally assigned patents in short). Further within the above two categories, patents are distinguished and analysed in terms of patent types: utility, design, and plant patents. Indian patents are mainly of utility type whereas China's activity falls in both utility and design. In the `nationally assigned' patents, the different types of institutions involved and linkages are much higher for China. However, 'not-nationally assigned' patents of both the countries are dominated by industry and inter-institutional collaborations are sparse. Patents addressing technology sectors (analysis based on utility patents) do not exhibit major differences between the two categories in Chinese patents and address with varying degree all technology sectors. Unlike China, India's `nationally assigned' patents are concentrated in chemical and drugs & medical whereas their 'not-nationally assigned' patents are similar to that of China in addressing technology sectors. In design patents, Chinese `nationally assigned' patents mainly cover ornamental design of lighting equipments whereas their 'not-nationally assigned' patents are mainly in design equipment for production, distribution or transformation of energy. Further, few firms are active in design patents in both the categories. India's design activity is insignificant in both the categories. The paper concludes by examining the results in the policy context.
A method to identify core documents within a given subject domain has been developed by the author. The method builds on the concept of polyrepresentation by using different search rationales in several databases and isolating the overlaps between them. This paper delineates the ideas behind the method and describes the study done to measure its effectiveness.
In recent papers, the authors have studied basic regularities of author self-citations. The regularities are related to the ageing, to the relation between self-citations and foreign citations and to the interdependence of self-citations with other bibliometric indicators. The effect of multi-authorship on citation impact has been shown in other bibliometric studies, for instance, by PERSSON et al. (2004). The question arises whether those regularities imply any relation between number of co-authors and the extent of author self-citations. The results of the present paper confirm the common notion of such effects only in part. The authors show that at the macro level multi-authorship does not result in any exaggerate extent of self-citations.
The emerging influence of new information and communication technologies (ICT) on collaboration in science and technology has to be considered. In particular, the question of the extent to which collaboration in science and in technology is visible on the Web needs examining. Thus the purpose of this study is to examine whether broadly similar results would occur if solely Web data was used rather than all available bibliometric co-authorship data. For this purpose a new approach of Web visibility indicators of collaboration is examined. The ensemble of COLLNET members is used to compare co-authorship patterns in traditional bibliometric databases and the network visible on the Web. One of the general empirical results is a high percentage (78%) of all bibliographic multi-authored publications become visible through search of engines in the Web. One of the special studies has shown Web visibility of collaboration is dependent on the type of bibliographic multi-authored papers. The social network analysis (SNA) is applied to comparisons between bibliographic and Web collaboration networks. Structure formation processes in bibliographic and Web networks are studied. The research question posed is to which extent collaboration structures visible in the Web change their shape in the same way as bibliographic collaboration networks over time. A number of special types of changes in bibliographic and Web structures are explained.
This paper present a compound approach for Webometrics based on an extension the self-organizing multimap MultiSOM model. The goal of this new approach is to combine link and domain clustering in order to increase the reliability and the precision of Webometrics studies. The extension proposed for the MultiSOM model is based on a Bayesian network-oriented approach. A first experiment shows that the behaviour of such an extension is coherent with its expected properties for Webometrics. A second experiment is carried out on a representative Web dataset issued from the EISCTES IST project context. In this latter experiment each map represents a particular viewpoint extracted from the Web data description. The obtained maps represented either thematic or link classifications. The experiment shows empirically that the communication between these classifications provides Webometrics with new explaining capabilities.
Co-authorship analysis is a well-established tool in bibliometric analysis. It can be used at various levels to trace collaborative links between individuals, organisations, or countries. Increasingly, informetric methods are applied to patent data. It has been shown for another method that bibliometric tools cannot be applied without difficulty. This is due to the different process in which a patent is filed, examined, and granted and a scientific paper is submitted, refereed and published. However, in spite of the differences, there are also parallels between scholarly papers and patents. For instance, both papers and patents are the result of an intellectual effort, both disclose relevant information, and both are subject to a process of examination. Given the similarities, we shall raise the question as to which extent one can transfer co-authorship analysis to patent data.
This article reports findings from a study of patterns of foreign authorship of articles, and international composition of journal editorial boards in five leading journals in the field of information science, and scientometrics. The study covers an American journal and four European journals. Bibliographic data about foreign authors and their national affiliation from five selected years of publication were analyzed for all journals. The foreign input of articles were extremely high in Information Processing & Management, and Scientometrics, and were relatively low in the other three journals. The number of foreign countries contributing in all journals have increased rapidly since 1996. Canada, England, Belgium, Netherlands, China, and Spain were the countries with high contributions in JASIST. The authors from the USA have dominated the foreign-authored articles in all European journals. A simple linear regression analysis showed that 60% of variation in the proportion of foreign-authored articles in the set of five journals over the selected years could be explained by the percentage of foreign members on the editorial boards of the journals.
Two previous webometrics studies found a relationship between the number of inlinks to a commercial site and the company's business performance measures. Thus inlink counts to commercial sites could be a potential source of business information. However, those studies examined top ranking information technology companies in the U.S. and China. Whether the above-mentioned relationship holds for all companies regardless of ranking and in other countries is unknown. The study reported in this paper investigated this question. The study includes all information technology companies in the U.S. and Canada and gathered both business performance data and website data for these companies. It found significant correlation between business performance measures and inlinks to the company websites. The correlation was still significant even after the size of the company and the website age were accounted for. The conclusion is robust to the search engine used for data collection. Data collection issues for webometrics research were also explored.
In this article, we define webometrics within the framework of informetric studies and bibliometrics, as belonging to library and information science, and as associated with cybermetrics as a generic subfield. We develop a consistent and detailed link typology and terminology and make explicit the distinction among different Web node levels when using the proposed conceptual framework. As a consequence, we propose a novel diagram notation to fully appreciate and investigate link structures between Web nodes in webometric analyses. We warn against taking the analogy between citation analyses and link analyses too far.
In this article, I investigate the reliability, in the social science sense, of collecting informetric data about the World Wide Web by Web crawling. The investigation includes a critical examination of the practice of Web crawling and contrasts the results of content crawling with the results of link crawling. It is shown that Web crawling by search engines is intentionally biased and selective. I also report the results of a large-scale experimental simulation of Web crawling that illustrates the effects of different crawling policies on data collection. It is concluded that the reliability of Web crawling as a data collection technique is improved by fuller reporting of relevant crawling policies.
Because of the increasing presence of scientific publications on the Web, combined with the existing difficulties in easily verifying and retrieving these publications, research on techniques and methods for retrieval of scientific Web publications is called for. In this article, we report on the initial steps taken toward the construction of a test collection of scientific Web publications within the subject domain of plant biology. The steps reported are those of data gathering and data analysis aiming at identifying characteristics of scientific Web publications. The data used in this article were generated based on specifically selected domain topics that are searched for in three publicly accessible search engines (Google, AllTheWeb, and AltaVista). A sample of the retrieved hits was analyzed with regard to how various publication attributes correlated with the scientific quality of the content and whether this information could be employed to harvest, filter, and rank Web publications. The attributes analyzed were inlinks, outlinks, bibliographic references, file format, language, search engine overlap, structural position (according to site structure), and the occurrence of various types of metadata. As could be expected, the ranked output differs between the three search engines. Apparently, this is caused by differences in ranking algorithms rather than the databases themselves. In fact, because scientific Web content in this subject domain receives few inlinks, both AltaVista and AllTheWeb retrieved a higher degree of accessible scientific content than Google. Because of the search engine cutoffs of accessible URLs, the feasibility of using search engine output for Web content analysis is also discussed.
How do authors refer to Web-based information sources in their formal scientific publications? It is not yet well known how scientists and scholars actually include new types of information sources, available through the new media, in their published work. This article reports on a comparative study of the lists of references in 38 scientific journals in five different scientific and social scientific fields. The fields are sociology, library and information science, biochemistry and biotechnology, neuroscience, and the mathematics of computing. As is well known, references, citations, and hyperlinks play different roles in academic publishing and communication. Our study focuses on hyperlinks as attributes of references in formal scholarly publications. The study developed and applied a method to analyze the differential roles of publishing media in the analysis of scientific and scholarly literature references. The present secondary databases that include reference and citation data (the Web of Science) cannot be used for this type of research. By the automated processing and analysis of the full text of scientific and scholarly articles, we were able to extract the references and hyperlinks contained in these references in relation to other features of the scientific and scholarly literature. Our findings show that hyperlinking references are indeed, as expected, abundantly present in the formal literature. They also tend to cite more recent literature than the average reference. The large majority of the references are to Web instances of traditional scientific journals. Other types of Web-based information sources are less well represented in the lists of references, except in the case of pure e-journals. We conclude that this can be explained by taking the role of the publisher into account. Indeed, it seems that the shift from print-based to electronic publishing has created new roles for the publisher. By shaping the way scientific references are hyperlinking to other information sources, the publisher may have a large impact on the availability of scientific and scholarly information.
Recent Web-searching and -mining tools are combining text and link analysis to improve ranking and crawling algorithms. The central assumption behind such approaches is that there is a correlation between the graph structure of the Web and the text and meaning of pages. Here I formalize and empirically evaluate two general conjectures drawing connections from link information to lexical and semantic Web content. The link-content conjecture states that a page is similar to the pages that link to it, and the link-cluster conjecture that pages about the same topic are clustered together. These conjectures are often simply assumed to hold, and Web search tools are built on such assumptions. The present quantitative confirmation sheds light on the connection between the success of the latest Web-mining techniques and the small world topology of the Web, with encouraging implications for the design of better crawling algorithms.
Although time has been recognized as an important dimension in the co-citation literature, to date it has not been incorporated into the analogous process of link analysis on the Web. In this paper, we discuss several aspects and uses of the time dimension in the context of Web information retrieval. We describe the ideal case where search engines track and store temporal data for each of the pages in their repository, assigning timestamps to the hyperlinks embedded within the pages. We introduce several applications which benefit from the availability of such timestamps. To demonstrate our claims, we use a somewhat simplistic approach, which dates links by approximating the age of the page's content. We show that by using this crude measure alone it is possible to detect and expose significant events and trends. We predict that by using more robust methods for tracking modifications in the content of pages, search engines will be able to provide results that are more timely and better reflect current real-life trends than those they provide today.
The Web is a huge source of information, and one of the main problems facing users is finding documents which correspond to their requirements. Apart from the problem of thematic relevance, the documents retrieved by search engines do not always meet the users' expectations. The document may be too general, or conversely too specialized, or of a different type from what the user is looking for, and so forth. We think that adding metadata to pages can considerably improve the process of searching for information on the Web. This article presents a possible typology for Web sites and pages, as well as a method for propagating metadata values, based on the study of the Web graph and more specifically the method of cocitation in this graph.
We present a novel session identification method based on statistical language modeling. Unlike standard timeout methods, which use fixed time thresholds for session identification, we use an information theoretic approach that yields more robust results for identifying session boundaries. We evaluate our new approach by learning interesting association rules from the segmented session files. We then compare the performance of our approach to three standard session identification methods-the standard timeout method, the reference length method, and the maximal forward reference method-and find that our statistical language modeling approach generally yields superior results. However, as with every method, the performance of our technique varies with changing parameter settings. Therefore, we also analyze the influence of the two key factors in our language-modeling-based approach: the choice of smoothing technique and the language model order. We find that all standard smoothing techniques, save one, perform well, and that performance is robust to language model order.
In the study of growth dynamics of artificial and natural systems, the scaling properties of fluctuations can exhibit information on the underlying processes responsible for the observed macroscopic behavior according to H.E. Stanley and colleagues (Lee, Amaral, Canning, Meyer, Stanley, 1998; Plerou, Amaral, Gopikrishnan, Meyer, Stanley, 1999; Stanley et al., 1996). With such an approach, they examined the growth dynamics of firms, of national economies, and of university research fundings and paper output. We investigated the scaling properties of journal output and impact according to the Journal Citation Reports (JCR; ISI, Philadelphia, PA) and find distributions of paper output and of citations near to lognormality. Growth rate distributions are near to Laplace "tents," however with a better fit to Subbotin distributions. The width of fluctuations decays with size according to a power law. The form of growth rate distributions seems not to depend on journal size, and conditional probability densities of the growth rates can thus be scaled onto one graph. To some extent even quantitatively, all our results are in agreement with the observations of Stanley and others. Further on, a Matthew effect of journal citations is confirmed. If journals "behave" like business firms, a better understanding of Bradford's Law as a result of competition among publishing houses, journals, and topics suggests itself.
In this article a social constructionist approach to information technology (IT) literacy is introduced. This approach contributes to the literature on IT literacy by introducing the concept of IT self as a description of the momentary, context-dependent, and multilayered nature of interpretations of IT competencies. In the research literature, IT literacy is often defined as sets of basic skills to be learned, and competencies to be demonstrated. In line with this approach, research on IT competencies conventionally develops models for explaining user acceptance, and for measuring computer-related attitudes and skills. The assumption is that computer-related attitudes and self-efficacy impact IT adoption and success in computer use. Computer self-efficacy measures are, however, often based on self-assessments that measure interpretations of skills rather than performance in practice. An analysis of empirical interview data in which academic researchers discuss their relationships with computers and IT competence shows how a self-assessment such as "computer anxiety" presented in one discussion context can in another discussion context be consigned to the past in favor of a different and more positive version. Here it is argued that descriptions of IT competencies and computer-related attitudes are dialogic social constructs and closely tied with more general implicit understandings of the nature of technical artifacts and technical knowledge. These implicit theories and assumptions are rarely taken under scrutiny in discussions of IT literacy yet they have profound implications for the aims and methods in teaching computer skills.
Among managers, those who are responsible for non-profit organizations in general and arts organizations in particular have been an understudied group. These managers have much in common with their for-profit counterparts, but their environment also differs in significant ways. The goal of this exploratory research effort was to identify how senior administrators in fine arts museums and symphony orchestras go about identifying and acquiring the information they want to complete a range of management tasks. Deciding when and where to look for information, obtaining the "right" information at the time it is needed, evaluating its credibility and utility, and determining when "enough" information has been collected are challenges facing this group of information users every day. A multiple-case studies design involving a replication strategy was selected to structure the research process. Data were collected from 12 arts administrators using a pretested interview protocol that included the Critical Incident Technique. Patterns in the data were identified, and the data were further reviewed for disconfirming evidence. The study resulted in a list of the types and sources of information that arts administrators use as well as a list of the factors or "stopping criteria" that influence them to end their information-seeking process. A model describing the way in which arts administrators go about acquiring the information they want was also developed. The main findings of the study are (a) arts administrators do not consider information seeking to be a discrete management task, (b) they rely heavily on direct personal experience to fill their information-seeking needs, and (c) they are "satisficers" when it comes to seeking information. Based on Simon's alternative to rational choice theory, satisficers are people who are willing to pursue a "good enough" option rather than the best possible option (maximizers) (Simon, 1956). Since arts administrators have not been studied in the context of LIS research before, understanding more about where they go for information, what factors influence the level of effort they are willing to invest in seeking information, and how they decide when they have "enough" information provides insights into the information-seeking behavior of a new user group. Furthermore, although this research effort is focused on specific users in a specific field, the results from this study may be compared to what we already know about other user groups to confirm and expand existing models of information-seeking behavior.
The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed online digital library which has become the dominant means by which astronomers search, access, and read their technical literature. Digital libraries permit the easy accumulation of a new type of bibliometric measure: the number of electronic accesses ("reads") of individual articles. By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create second-order bibliometric operators, a customizable class of collaborative filters that permits substantially improved accuracy in literature queries. Using the ADS usage logs along with membership statistics from the International Astronomical Union and data on the population and gross domestic product (GDP), we have developed an accurate model for worldwide basic research where the number of scientists in a country is proportional to the GDP of that country, and the amount of basic research done by a country is proportional to the number of scientists in that country times that country's per capita GDP. We introduce the concept of utility time to measure the impact of the ADS/URANIA and the electronic astronomical library on astronomical research. We find that in 2002 it amounted to the equivalent of 736 full-time researchers, or $250 million, or the astronomical research done in France.
In this paper we discuss the construction of information systems ontologies. We summarize and discuss Barry Smith's review (2003a) of the field in the paper, "Ontology." In that essay Smith concludes with a plea for ontologies that reflect the categories of current scientific theories because they represent our best knowledge of the world. In this context, we develop an argument for a hermeneutic approach to ontologies-one compatible with the orientation introduced into information science by Winograd and Flores (1986) and later developed by many others. To do this, we argue that the literature in the philosophy and history of science supports a hermeneutic interpretation of the nature and growth of science. This, given Smith's argument, shows the relevance of hermeneutics to the creation of information system ontologies. The problems associated with understanding and creating information systems ontologies are addressed fruitfully only if one begins by acknowledging that databases are mechanisms for communication involving judgments and interpretations by intelligent and knowledgeable users. The main contributions of this paper are our conclusions that (a) information system ontologies should take into consideration a perspective of the philosophy and history of science, and (b) hermeneutics as construed by Gadamer (1975, 1979) constitutes a place from which we can understand the tasks of information ontologists and database users.
In this paper we address the various formulations of impact of articles, usually groups of articles as gauged by citations that these articles receive over a certain period of time. The journal impact factor, as published by ISI (Philadelphia, PA), is the best-known example of a formulation of impact of journals (considered as a set of articles) but many others have been defined in the literature. Impact factors have varying publication and citation periods and the chosen length of these periods enables, e.g., a distinction between synchronous and diachronous impact factors. It is shown how an impact factor for the general case can be defined. Two alternatives for a general impact factor are proposed, depending on whether different publication years are seen as a whole, and hence treating each one of them differently, or by operating with citation periods of identical length but allowing each publication period different starting points.
The PageRank method is used by the Google Web search engine to compute the importance of Web pages. Two different views have been developed for the interpretation of the PageRank method and values: (a) stochastic (random surfer): the PageRank values can be conceived as the steady-state distribution of a Markov chain, and (b) algebraic: the PageRank values form the eigenvector corresponding to eigenvalue 1 of the Web link matrix. The Interaction Information Retrieval (12 R) method is a nonclassical information retrieval paradigm, which represents a connectionist approach based on dynamic systems. In the present paper, a different interpretation of PageRank is proposed, namely, a dynamic systems viewpoint, by showing that the PageRank method can be formally interpreted as a particular case of the Interaction Information Retrieval method; and thus, the PageRank values may be interpreted as neutral equilibrium points of the Web.
Search engines are very popular tools for collecting information from distributed resources. They provide not only search facilities, but they also offer directories for users to browse content divided into groups. In this paper, we've adopted an individual differences approach to explore user's attitudes towards various interface features provided by existing Web directories. Among a variety of individual differences, cognitive style is a particularly important characteristic that influences the effectiveness of information seeking. Empirical results indicate that users' cognitive styles influence their reactions to the organization of subject categories, presentation of the results, and screen layout. We developed a set of design guidelines on the basis of these results, and propose a flexible interface that adopts these guidelines to accommodate the preferences of different cognitive style groups.
In this paper we investigate the scattering of journals and literature obsolescence reflected in more than 137,000 document delivery requests submitted to a national document delivery service. We first summarize the major findings of the study with regards to the performance of the service. We then identify the "core" journals from which article requests were satisfied and address the following research questions: (a) Does the distribution of (core) journals conform to the Bradford's Law of Scattering? (b) Is there a relationship between usage of journals and impact factors, journals with high impact factors being used more often than the rest? (c) Is there a relationship between usage of journals and total citation counts, journals with high total citation counts being used more often than the rest? (d) What is the median age of use (half-life) of requested articles in general? (e) Do requested articles that appear in core journals get obsolete more slowly? (f) Is there a relationship between obsolescence and journal impact factors, journals with high impact factors being obsolete more slowly? (g) Is there a relationship between obsolescence and total citation counts, journals with high total citation counts being obsolete more slowly? Based on the analysis of findings, we found that the distribution of highly and moderately used journal titles conform to Bradford's Law. The median age of use was 8 years for all requested articles. Ninety percent of the articles requested were 21 years of age or younger. Articles that appeared in 168 core journal titles seem to get obsolete slightly more slowly than those of all titles. We observed no statistically significant correlations between the frequency of journal use and ISI journal impact factors, and between the frequency of journal use and ISI (Institute for Scientific Information, Philadelphia, PA) cited half-lives for the most heavily used 168 core journal titles. There was a weak correlation between usage of journals and ISI-reported total citation counts. No statistically significant relationship was found between median age of use and journal impact factors and between median age of use and total citation counts. There was a weak negative correlation between ISI journal impact factors and cited half-lives of 168 core journals, and a weak correlation between ISI citation half-lives and use half-lives of core journals. No correlation was found between cited half-lives of 168 core journals and their corresponding total citation counts as reported by ISI. Findings of the current study are discussed along with those of other studies.
A survey of American Society for Information Science and Technology (ASIST) members was administered via the Web in May 2003. The survey gathered demographic data about members and their preferences and expectations in regard to conferences and other ASIST products and services. With about a 32% return rate, findings were compared with an earlier survey conducted in 1979, which provides a glimpse of how the Society has changed and what needs to be done to ensure a healthy future development. The gender split has remained the same but members are about 5 years older on average than they were in 1979. A significant shift has occurred in members' institutional affiliations, from the largest group being in the industrial sector to the largest group being in educational institutions. Members on average reported slightly higher incomes (after adjusting for inflation) in 2003 than in 1979. Since 1979, a larger percentage of members have earned a doctoral degree. The most common field of study is library and information science. About half of the respondents reported that ASIST is their primary professional society. Their primary reason for maintaining ASIST membership is "learning about new developments/issues in the field." The most common responses to the question about what factors would make ASIST conferences more appealing related to lowering costs. Other responses related to attitudes about the ASIST Bulletin and the value of other proposed products and services are summarized and reported. Detailed analyses of relationships among different variables made possible a deeper understanding of members' needs and expectations, which provides directions for design of programs and services.
When do information retrieval systems using two document clusters provide better retrieval performance than systems using no clustering? We answer this question for one set of assumptions and suggest how this may be studied with other assumptions. The "Cluster Hypothesis" asks an empirical question about the relationships between documents and user-supplied relevance judgments, while the "Cluster Performance Question" proposed here focuses on the when and why of information retrieval or digital library performance for clustered and unclustered text databases. This may be generalized to study the relative performance of m versus n clusters.
Digital libraries such as the NASA Astrophysics Data System (Kurtz et al., 2005) permit the easy accumulation of a new type of bibliometric measure, the number of electronic accesses ("reads") of individual articles. We explore various aspects of this new measure. We examine the obsolescence function as measured by actual reads and show that it can be well fit by the sum of four exponentials with very different time constants. We compare the obsolescence function as measured by readership with the obsolescence function as measured by citations. We find that the citation function is proportional to the sum of two of the components of the readership function. This proves that the normative theory of citation is true in the mean. We further examine in detail the similarities and differences among the citation rate, the readership rate, and the total citations for individual articles, and discuss some of the causes. Using the number of reads as a bibliometric measure for individuals, we introduce the read-cite diagram to provide a two-dimensional view of an individual's scientific productivity. We develop a simple model to account for an individual's reads and cites and use it to show that the position of a person in the read-cite diagram is a function of age, innate productivity, and work history. We show the age biases of both reads and cites and develop two new bibliometric measures which have substantially less age bias than citations: SumProd, a weighted sum of total citations and the readership rate, intended to show the total productivity of an individual; and Read10, the readership rate for articles published in the last 10 years, intended to show an individual's current productivity. We also discuss the effect of normalization (dividing by the number of authors on a paper) on these statistics. We apply SumProd and Read10 using new, nonparametric techniques to compare the quality of different astronomical research organizations.
We develop a context-based generic cross-lingual retrieval model that can deal with different language pairs. Our model considers contexts in the query translation process. Contexts in the query as well as in the documents based on co-occurrence statistics from different granularity of passages are exploited. We also investigate cross-lingual retrieval of automatic generic summaries. We have implemented our model for two different cross-lingual settings, namely, retrieving Chinese documents from English queries as well as retrieving English documents from Chinese queries. Extensive experiments have been conducted on a large-scale parallel corpus enabling studies on retrieval performance for two different cross-lingual settings of full-length documents as well as automated summaries.
We present a model for estimating the probability that a pair of author names (sharing last name and first initial), appearing on two different Medline articles, refer to the same individual. The model uses a simple yet powerful similarity profile between a pair of articles, based on title, journal name, coauthor names, medical subject headings (MeSH), language, affiliation, and name attributes (prevalence in the literature, middle initial, and suffix). The similarity profile distribution is computed from reference sets consisting of pairs of articles containing almost exclusively author matches versus nonmatches, generated in an unbiased manner. Although the match set is generated automatically and might contain a small proportion of nonmatches, the model is quite robust against contamination with nonmatches. We have created a free, public service ("Author-ity": http://arrowsmith.psych.uic.edu) that takes as input an author's name given on a specific article, and gives as output a list of all articles with that (last name, first initial) ranked by decreasing similarity, with match probability indicated.
Web navigation enables easy access to vast amounts of information and services. However, it also poses a major risk to users' privacy. Various eavesdroppers constantly attempt to violate users' privacy by tracking their navigation activities and inferring their interests and needs (profiles). Users who wish to keep their intentions secret forego useful services to avoid exposure. The computer security community has concentrated on improving users' privacy by concealing their identity on the Web. However, users may want or need to identify themselves over the Net to receive certain services but still retain their interests, needs, and intentions in private. PRAW-a PRivAcy model for the Web suggested in this paper-is aimed at hiding users' navigation tracks to prevent eavesdroppers from inferring their profiles but still allowing them to be identified. PRAW is based on continuous generation of fake transactions in various fields of interests to confuse eavesdroppers' automated programs, thus providing them false data. A privacy measure is defined that reflects the difference between users' actual profile and the profile that eavesdroppers might infer. A prototype system was developed to examine PRAW's feasibility and conduct experiments to test its effectiveness. Encouraging results and their analysis are presented, as well as possible attacks and known limitations.
Introducing context to a user query is effective to improve the search effectiveness. In this article we propose a method employing the taxonomy-based search services such as Web directories to facilitate searches in any Web search interfaces that support Boolean queries. The proposed method enables one to convey current search context on taxonomy of a taxonomy-based search service to the searches conducted with the Web search interfaces. The basic idea is to learn the search context in the form of a Boolean condition that is commonly accepted by many Web search interfaces, and to use the condition to modify the user query before forwarding it to the Web search interfaces. To guarantee that the modified query can always be processed by the Web search interfaces and to make the method adaptive to different user requirements on search result effectiveness, we have developed new fast classification learning algorithms.
In this article, I analyze the role of Donald J. Urquhart in the creation of modern library and information science. Urquhart was one of the chief architects of information science in Britain and founder of the National Lending Library for Science and Technology (NLL), which evolved into the present-day British Library Document Supply Centre (BLDSC). In particular, I focus on the part played by Urquhart in the development of that branch of information science termed bibliometrics, the application of mathematical and statistical techniques to information phenomena, pursuing both historical and practical aims. The article is intended not only to trace the history of the probability distributions applicable to library use and other facets of human knowledge but also to demonstrate how these distributions can be used in the evaluation and management of scientific journal collections. For these purposes, the paper is divided into three parts of equal importance. The first part is statistical and establishes the theoretical framework, within which Urquhart's work is considered. It traces the historical development of the applicable probability distributions, discussing their origins on the European continent and how Continental principles became incorporated in the biometric statistics that arose in Britain as a result of the Darwinian revolution. This part analyzes the binomial and Poisson processes, laying out the reasons why the Poisson process is more suitable for modeling information phenomena. In doing so, it describes key distributions arising from these processes as well as the various tests for these distributions, citing the literature that shows how to conduct these tests. Throughout the discussion, the relationship of these distributions to library use and the laws of information science is-emphasized. The second part of the article analyzes the pioneering role of Urquhart as a conduit for the entry of these probability distributions into librarianship, converting it into library and information science. He was the first librarian to apply probability to library use, utilizing it not only to establish and manage the scientific journal collections of the NLL but also to evolve his Law of Supralibrary Use. Urquhart's work is portrayed within the context of a general trend to adopt probabilistic methods for analytical purposes, and a major premise of this article is that his law and the probabilistic breakthrough, on which it was based, were most likely in Britain. which was one of the few countries not only to develop but also maintain the necessary scientific preconditions. The third-and concluding section-discusses how Urquhart's Law forces a probabilistic reconceptualization of the functioning of the scientific journal system as well as the law's practical implications for journal sales, collection evaluation and management, resource sharing, and the transition from the paper to the electronic format.
Highly portable information collection and transmission technologies such as radio frequency identification (RFID) tags and smart cards are becoming ubiquitous in government and business-employed in functions including homeland security, information security, physical premises security, and even the control of goods in commerce. And, directly or indirectly, in many of these applications, it is individuals and their activities that are tracked. Yet, a significant unknown is (a) whether the public understands these technologies and the manner in which personally identifiable information may be collected, maintained, used, and disseminated; and (b) whether the public consents to these information practices. To answer these and related questions, we surveyed a select group of citizens on the uses of this technology for business as well as homeland security purposes. We found a significant lack of understanding, a significant level of distrust even in the context of homeland security applications, and a very significant consensus for governmental regulation. We conclude that a primary objective for any organization deploying these technologies is the promulgation of a comprehensive Technology Privacy Policy, and we provide detailed specifications for such an effort.
The importance of Intelligence and Security Informatics (ISI) has significantly increased with the rapid and large-scale migration of local/national security information from physical media to electronic platforms, including the Internet and information systems. Motivated by the significance of ISI in law enforcement (particularly in the digital government context) and the limited investigations of officers' technology-acceptance decision-making, we developed and empirically tested a factor model for explaining law-enforcement officers' technology acceptance. Specifically, our empirical examination targeted the COPLINK technology and involved more than 280 police officers. Overall, our model shows a good fit to the data collected and exhibits satisfactory power for explaining law-enforcement officers' technology acceptance decisions. Our findings have several implications for research and technology management practices in law enforcement, which are also discussed.
In this article we present a semi-supervised active learning algorithm for pattern discovery in information extraction from textual data. The patterns are reduced regular expressions composed of various characteristics of features useful in information extraction. Our major contribution is a semi-supervised learning algorithm that extracts information from a set of examples labeled as relevant or irrelevant to a given attribute. The approach is semi-supervised because it does not require precise labeling of the exact location of features in the training data. This significantly reduces the effort needed to develop a training set. An active learning algorithm is used to assist the semi-supervised learning algorithm to further reduce the training set development effort. The active learning algorithm is seeded with a single positive example of a given attribute. The context of the seed is used to automatically identify candidates for additional positive examples of the given attribute. Candidate examples are manually pruned during the active learning phase, and our semi-supervised learning algorithm automatically discovers reduced regular expressions for each attribute. We have successfully applied this learning technique in the extraction of textual features from police incident reports, university crime reports, and patents. The performance of our algorithm compares favorably with competitive extraction systems being used in criminal justice information systems.
For the sake of national security, very large volumes of data and information are generated and gathered daily. Much of this data and information is written in different languages, stored in different locations, and may be seemingly unconnected. Crosslingual semantic interoperability is a major challenge to generate an overview of this disparate data and information so that it can be analyzed, shared, searched, and summarized. The recent terrorist attacks and the tragic events of September 11, 2001 have prompted increased attention on national security and criminal analysis. Many Asian countries and cities, such as Japan, Taiwan, and Singapore, have been advised that they may become the next targets of terrorist attacks. Semantic interoperability has been a focus in digital library research. Traditional information retrieval (IR) approaches normally require a document to share some common keywords with the query. Generating the associations for the related terms between the two term spaces of users and documents is an important issue. The problem can be viewed as the creation of a thesaurus. Apart from this, terrorists and criminals may communicate through letters, e-mails, and faxes in languages other than English. The translation ambiguity significantly exacerbates the retrieval problem. The problem is expanded to crosslingual semantic interoperability. In this paper, we focus on the English/Chinese crosslingual semantic interoperability problem. However, the developed techniques are not limited to English and Chinese languages but can be applied to many other languages. English and Chinese are popular languages in the Asian region. Much information about national security or crime is communicated in these languages. An efficient automatically generated thesaurus between these languages is important to crosslingual information retrieval between English and Chinese languages. To facilitate crosslingual information retrieval, a corpus-based approach uses the term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model to cross the language boundary. In this paper, the text-based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. We also introduce an algorithmic approach to generate a robust knowledge base based on statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, which can aid in semantics-based crosslingual information management and retrieval.
The September 11 attack and the following investigations show that there is a serious information sharing problem among the relevant federal government agencies, and the problem can cause substantial deficiencies in terrorism attack detection. In this paper we provide a systematic analysis of the causes of this problem; and conclude that existing secure information sharing technologies and protocols cannot provide enough incentives for government agencies to share information with each other without worrying that their own interests can be jeopardized. Although trust-based information access is well studied in the literature, the existing trust models, which are based on certified attributes, cannot support effective information sharing among government agencies, which requires an interest-based trust model. To solve this information sharing problem, we propose an innovative interest-based trust model and a novel information sharing protocol, where a family of information sharing policies are integrated, and information exchange and trust negotiation are interleaved with and interdependent upon each other. In addition, an implementation of this protocol is presented using the emerging technology of XML Web Services. The implementation is totally compatible with the Federal Enterprise Architecture reference models and can be directly integrated into existing E-Government systems.
Since spring of 2002 we have been working on a methodology, decision model, and cognitive support system to aid with effective allocation of anti-terrorism (AT) resources at Marine Corps installations. The work has so far been focused on the military domain, but the model and the software tools developed to implement it are generalizable to a range of commercial and public-sector settings including industrial parks, corporate campuses, and civic facilities. The approach suggests that anti-terrorism decision makers determine mitigation project allocations using measures of facility priority and mitigation project utility as inputs to the allocation algorithm. The three-part hybrid resource allocation model presented here uses multi-criteria decision-making techniques to assess facility (e.g., building, hangar) priorities, a utility function to calculate anti-terrorism project mitigation values (e.g., protective glazing, wall coatings, and stand-off barriers) and optimization techniques to determine resource allocations across multiple, competing AT mitigation projects. The model has been realized in a cognitive support system developed as a set of loosely coupled Web services. The approach, model, and cognitive support system have been evaluated using the cognitive walkthrough method with prospective system users in the field. In this paper we describe the domain, the problem space, the decision model, the cognitive support system and summary results of early model and system evaluations.
A majority of critical decisions requires collaborative efforts among analysts to build situation awareness. Teams of decision makers frequently have to react to incoming facts and developing events in a timely fashion such that the consequences of the decisions made largely have a positive impact on a developing situation. This problem is further exacerbated due to the multitude of agencies involved in the decision-making process. Thus, the decision-making processes faced by the intelligence agencies are characterized by group deliberations that are highly ill structured and yield limited analytical tractability. In this context, a collaborative approach to providing cognitive support to decision makers using a connectionist modeling approach is proposed. The connectionist modeling of such decision scenarios offers several unique and significant advantages in developing systems to support collaborative discussions. Several inference rules for augmenting the argument network and to capture implicit notions in arguments are proposed. We further explore the effects of incorporating notions of information source reliability within arguments and the effects thereof.
In this article, we investigate the criteria used by online searchers when assessing the relevance of Web pages for information-seeking tasks. Twenty-four participants were given three tasks each, and they indicated the features of Web pages that they used when deciding about the usefulness of the pages in relation to the tasks. These tasks were presented within the context of a simulated work-task situation. We investigated the relative utility of features identified by participants (Web page content, structure, and quality) and how the importance of these features is affected by the type of information-seeking task performed and the stage of the search. The results of this study provide a set of criteria used by searchers to decide about the utility of Web pages for different types of tasks. Such criteria can have implications for the design of systems that use or recommend Web pages.
Today, most document categorization in organizations is done manually. We save at work hundreds of files and e-mail messages in folders every day. While automatic document categorization has been widely studied, much challenging research still remains to support user-subjective categorization. This study evaluates and compares the application of self-organizing maps (SOMs) and learning vector quantization (LVQ) with automatic document classification, using a set of documents from an organization, in a specific domain, manually classified by a domain expert. After running the SOM and LVQ we requested the user to reclassify documents that were misclassified by the system. Results show that despite the subjective nature of human categorization, automatic document categorization methods correlate well with subjective, personal categorization, and the LVQ method outperforms the SOM. The reclassification process revealed an interesting pattern: About 40% of the documents were classified according to their original categorization, about 35% according to the system's categorization (the users changed the original categorization), and the remainder received a different (new) categorization. Based on these results we conclude that automatic support for subjective categorization is feasible; however, an exact match is probably impossible due to the users' changing categorization behavior.
In this article, we describe a new means of presenting and analyzing information regarding recovering alcoholics. We show that the use of reliability life-test techniques-specifically Weibull plots-can be used not only to gain an understanding of the recovery rate of alcoholics who are undergoing counseling during their recovery process, but also to present a visual indication of the speed and nature of their recovery. We show that by considering the movement of the recovering addicts with time-as a direct analogy with the degradation of samples on device life-test-it is possible to predict the time and path that will be taken for a recovering addict to reach a given recovery stage. We also show that, in a direct analogy with overstress life-tests, it is possible to predict the "time to recovery of addict" based upon the time taken to reach a "part-way" recovery step (or stage). The results also show that those presenting with different addiction types follow varying recovery models as do those from different locations. This information could be particularly useful in providing appropriate support to those recovering from multiple addictions as well as those living in different locations. The results of this work have great potential for use in this and any other therapeutic interventions, and in medical applications such as the observed stepwise recovery from major operations and RTAs (road traffic accidents). Hence the results of this work have the potential to affect the health of the nation by enabling appropriate patient support, and, by the ensuing cost-effective support, enable the maximum number possible of patients to be supported by the limited resources of the centralized health-care umbrella.
Digital objects or entities present us with particular problems of an acute nature. The most acute of these are the issues surrounding what constitutes identity within the digital world and between digital entities. These are problems that are important in many contexts but, when dealing with digital texts, documents, and certification, an understanding of them becomes vital legally, philosophically, and historically. Legally, the central issues are those of authorship, authenticity, and ownership; philosophically, we must be concerned with the sorts of logical relations that hold between objects and in determining the ontological nature of the object; and historically, our concern centers around our interest in chronology and the recording of progress, adaptation, change, and provenance. Our purpose is to emphasize why questions of digital identity matter and how we might address and respond to some of them. We will begin by examining the lines along which we draw a distinction between the digital and the physical context and how, by importing notions of transitivity and symmetry from the domain of mathematical logic, we might attempt to provide at least interim resolutions of these questions.
This small-scale empirical study focuses on students' anticipated and assessed contribution of references retrieved during the preparation of research proposals. It explores how the expected contribution of types of information before searches differs from the assessed contribution of relevant references found by the types of information. Twenty-two psychology undergraduates searched the PsychINFO database for references at the initial and end stages of a seminar for preparing proposals. Data about their subject knowledge, search goals and utility assessments were collected using several methods. They were asked to predict and assess the utility of information types provided by relevant references for the proposals. At the beginning of the process, they found fewer general types of information and more specific types of information than they expected. However, the students tended to accept references according to their expectations. By the end of the process, the expected importance of general information types declined and the importance of specific information types increased. At the end of the task, students became more proficient at recognizing the utility and topicality of references. They also became more critical in accepting found information to match their expectations.
Use of citations and Web links embedded in online teaching materials was studied for an undergraduate course. The undergraduate students enrolled in Geographic Information Science for Geography and Regional Development used Web links more often than citations, but clearly did not see them as key to enhancing learning. Current conventions for citing and linking tend to make citations and links invisible. There is some evidence that citations and Web links categorized and highlighted in terms of their importance and function to be served may help student learning in interdisciplinary domains.
Most information systems share a common assumption: information seeking is discrete. Such an assumption neither reflects real-life information seeking processes nor conforms to the perspective of phenomenology, "life is a journey constituted by continuous acquisition of knowledge." Thus, this study develops and validates a theoretical model that explains successive search experience for essentially the same information problem. The proposed model is called Multiple Information Seeking Episodes (MISE), which consists of four dimensions: problematic situation, information problem, information seeking process, episodes. Eight modes of multiple information seeking episodes are identified and specified with properties of the four dimensions of MISE. The results partially validate MISE by finding that the original MISE model is highly accurate, but less sufficient in characterizing successive searches; all factors in the MISE model are empirically confirmed, but new factors are identified as well. The revised MISE model is shifted from the user-centered to the interaction-centered perspective, taking into account factors of searcher, system, search activity, search context, information attainment, and information use activities.
Successive information searches are fairly common. To enhance the understanding of the behavior, this study attempted to improve both the descriptive and explanatory power of the Multiple Information Seeking Episodes (MISE) model, a conceptual model characterizing factors affecting successive searches. It empirically observed how the key factors in the information seeking process in the MISE model evolve over multiple search sessions and explained how those factors are affected by other factors associated with searchers, search activity, search context, systems, information attainment, and information-use activities. The validated and enriched MISE model can be extended to serve the basis for future studies in other complex searches process such as multi-tasking and collaborative searches, and can also help identity problems that users face and thus derive requirements for system support.
Six case studies of international cooperation at the subfield level are presented and compared. The cases examine international collaboration by detailing co-authorship links among researchers by field, evidenced at the level of the nation. Cases are offered based on possible drivers for collaboration: sharing ideas, cooperating around equipment, cooperating around resources, and exchanging data. Scientometric and network analysis of linkages are presented and discussed for each of the six cases: astrophysics, geophysics, mathematical logic, polymers, soil science, and virology, Visualizations of the cosine matrices within each field are compared for 1990 and 2000. The research shows that international collaboration grew in all the fields at rates higher than the international average. The possibility that rapid increases in international collaboration in science can be attributed in part to certain drivers related to access to resources or equipment sharing could not be upheld by the data. Other possible explanations for the rapid growth of collaboration are offered, including the possibility that weak ties evidenced by geographically remote collaboration can promote new knowledge creation.
This study, based on two empirical investigations undertaken in Croatia on samples of 320 eminent and 840 young researchers, shows a comparison between the professional values/norms of these groups (normative level of research ethics), as well as a comparison (of perceptions) about the frequency of ethically questionable and unacceptable behaviour of researchers in Croatian research institutions (behavioural level of ethos). Science ethics includes a core of cognitive and social standards about which there is relatively high consensus in both groups of researchers. Their cognitive standards correspond to epistemological realism with an accent on objective, reliable, measurable and precise new knowledge. Their basic social values include the broadest social responsibility, responsibility towards colleagues and students, and professionality in relation with funders and/or clients. Thus, research ethos is a combination of traditional cognitive norms and new socially-engaged values. However, research ethics is not a static or homogeneous set of professional values and norms. Young scientists value cognitive norms relating to basic research lower, but rank some cognitive standards more closely linked with applied empirical research higher. Considering the social dimensions of research ethics, young researchers rate traditional academic values of collegiality, commonality and autonomy less important than do eminent scientists, but they hold professionalism and establishing research networks more important. As expected, cognitive and social values and norms are not strictly followed on the level of professional practice. In their everyday professional life eminent and young researchers experience particular questionable research practices that could harm research work and results, and impair collegial relations in science, more often than they encounter breaking social norms that harm or even threaten participants in and users of scientific professional work. Differences in perceiving the incidence of certain kinds of questionable behaviour between the eminent and the young may be attributed to their different professional position and experience.
Felsenstein's (1985) method of phylogenetic independent contrasts is probably the most commonly used technique in evolutionary biology to study adaptation of organisms to their environment, taking phylogeny into account. Here, we performed a scientometric evaluation of all 1462 articles that cited Felsenstein (1985) between 1985 and 2002, in order to analyze the impact of his comparative method on the evolutionary research program and what has been done since it. We found that Felsentein's (1985) article can be classified as a "hot paper" or a breakthrough contribution, since it was the most cited article from The American Naturalist published in 1985. Also, it can be considered as a "citation classic", since it is the third more cited paper in The American Naturalist from 1945 to 2002. In general, papers that cited Felsenstein (1985) were published in high-impact journals, and most of them are theoretical articles indicating that biologists are aware of statistical and conceptual problems in dealing with comparative methods.
Programs in global environmental change research call for sweeping international cooperation and the creation of global networks. This paper analyzes to what extent research institutions in the field of global environmental change have responded to this call. Several bibliometric indicators of internationalization are discussed. A German and a U.S. sample are compared. The results indicate that a very discernable trend of recent internationalization can be observed. This is in line with a general internationalization trend across all fields, but at a much higher level. Given the political emphasis on capacity building in developing countries in this research field, however, there is only weak evidence of a more encompassing globalization process which also includes marginal world regions. Finally, the internationalization trend does not coincide with de-nationalization.
The possible existence of specialisation patterns by research fields of the Italian regions is investigated. Accordingly, bibliometric data on papers published in international scientific journals have been processed and tailored for regional comparative analysis. The results show that the trends in scientific regional specialisation are related to the research activities performed by each scientific system, but also the regional industrial skill is very often reflected in the corresponding scientific profile. The empirical evidences show also that each Italian region works as a well identifiable scientific system providing for its own specific contribution to the national performance.
The introduction of bibliometric (and other) ranking is an answer to legitimation pressures on the higher education and research system. After years of hesitation by scientists, science administrators and even politicians in many of the industrialized countries, the implementation of bibliometrics based (and other types of) rankings for institutions of higher education and research is now being introduced on a full scale. What used to be an irritation to the parties concerned has suddenly become a fad. In contrast to this rather sudden enthusiasm, there is very little reflection on the impacts of this practice on the system itself. So far empirical data on the impact of bibliometric rankings seem to be available only for two cases: Australia and the British research assessment exercise (RA-E). Thus, the actual steering effects of bibliometric rankings, the reactions of the system are largely unknown. Rankings are in urgent demand by politics. The intended effect is to create competition among institutions of higher learning and research and thereby to increase their efficiency. The rankings are supposed to identify excellence in these institutions and among researchers. Unintended effects may be 'oversteering', either by forcing less competitive institutions to be closed down or by creating oligopolies whose once achieved position of supremacy cannot be challenged anymore by competitors. On the individual level the emergence of a kind of 'chart' of highly cited stars in science can already be observed (ISI HighlyCited.com). With the spread of rankings the business administration paradigm and culture is diffused through the academic system. The commercialization of ranking is most pronounced in the dependence of the entire practice on commercial providers of the pertinent data. As products like IST's Essential Science Indicators become available, their use in the context of evaluation tasks is increasing rapidly. The future of the higher education and research system rests on two pillars: traditional peer review and ranking. The goal must be to have a system of informed peer review which combines the two. However, the politicized use of numbers (citations, impact factors, funding etc.) appears unavoidable.
Ranking of research institutions by bibliometric methods is an improper tool for research performance evaluation, even at the level of large institutions. The problem, however, is not the ranking as such. The indicators used for ranking are often not advanced enough, and this situation is part of the broader problem of the application of insufficiently developed bibliometric indicators used by persons who do not have clear competence and experience in the field of quantitative studies of science. After a brief overview of the basic elements of bibliometric analysis, I discuss the major technical and methodological problems in the application of publication and citation data in the context of evaluation. Then I contend that the core of the problem ties not necessarily at the side of the data producer. Quite often persons responsible for research performance evaluation, for instance scientists themselves in their role as head of institutions and departments, science administrators at the government level and other policy makers show an attitude that encourages,quick and dirty' bibliometric analyses whereas better quality is available. Finally, the necessary conditions for a successful application of advanced bibliometric indicators as support tool for peer review are discussed.
A system of input, output, and efficiency indicators is sketched out, with each indicator related to basic research, applied research, and experimental development. Mainly, this scheme is inspired by empirical innovation economics (represented in Germany, e.g., by H. Grupp) and by "advanced bibliometrics" and scientometrics (profiled by van Raan and others). After considering strengths and weaknesses of some of the indicators, possible additional "entry points" for institutions of information delivery are examined, such contributing to an enrichment of existing indicators. And to a "Nationalokonomik des Geistes", requested from librarians in the twenties of the last century by A. von Harnack.
This paper discusses the Thomson ISI Research Services Group approaches to analyzing the world research environment, particularly in terms of comparing research performance among nations and institutions. This discussion concentrates on the recent research environment -1998-2002- beginning first with comparisons among selected nations overall, in terms of publications-an indicator of research output and productivity; and citations-an indicator of research impact and influence. The second part addresses the German research landscape and concludes with an analysis of the contributions of specific German institutions to Germanys' research performance.
This paper outlines how private institutions and particularly foundations contribute to the furtherance of higher education and research, and it depicts what role bibliometric analysis can or cannot play in foundations' private research funding and in the process of strategic realignment under financial constraints.
The year 2002 brought a successive funding change-over from until now institutional to programme oriented funding (POF) in the Helmholtz Association of German Research Centres (HGF). This way the 15 German research centres now have to generate their means successively from programmes of the research fields of the HGF (see: www.helmholtz.de) by competing with each other. This nucleus of the reform of the Association is being implemented based upon the opinion of international experts. In this context the evaluation of publications of individual research centres, resp. research groups will be playing an ever increasing part. This lecture will inform about the reformed, partially formalized system and first experiences therewith at the time of the first evaluations.
Co-authorship patterns derived from 1997-2001 data in the CSTPC Database (Chinese Science and Technology Papers and Citations Database) are analyzed to show the status of science and technology collaboration in China. Four different collaborative types, namely papers co-authored by the authors in the same institution (SI), in different institution located in the same region (SR), in different regions (DR) of China, and in different countries or regions of the world (DC) are discussed, The regional and subject distributions of co-authored papers as well as the general status of collaboration in science and technology in China are studied. It is concluded that, for all four collaborative types, collaboration in science and technology has increased in China. Different regions have different collaborative patterns corresponding to economic, technological and scientific development levels. Differences in collaborative patterns in terms of subjects are explained by different characteristics of the subjects themselves.
A chronically weak area in research papers, reports, and reviews is the complete identification of background documents that formed the building blocks for these papers. A method for systematically determining these seminal references is presented. Citation-Assisted Background (CAB) is based on the assumption that seminal documents tend to be highly cited. CAB is being applied presently to three applications studies, and the results so far are much superior to those used by the first author for background development in any other study. An example of the application of CAB to the field of Nonlinear Dynamics is outlined. While CAB is a highly systematic approach for identifying seminal references, it is not a substitute for the judgement of the researchers, and serves as a supplement.
The assessment of scientific journals is of particular interest to South Africa's higher education institutions as their research is partly funded according to the number of publications of their members of staff. This article has two objectives. The first one is to identify the effects of the government's withdrawal of financial support on these journals' impact factors. The second objective is to provide an assessment of the visibility of the South African journals indexed in the Journal Citation Report (JCR) of the 2002. The findings indicate that the termination of the government interference in the affairs of the journals had on average a beneficial effect on the impact factors of the journals. South Africa is found to have a good representation in the JCR, similar or better to that of the scientifically small countries in Europe, and represents approximately 90% of the African continent journals in the JCR. ne visible scientific disciplines are identified and the journals are assessed according to their impact factors, to the impact factors of journals citing them, and the self-citing and self-cited rates.
Recognizing the critical role played by science and technology in the development of fuel cells, this article aims to characterize the evolution of the S&T knowledge bases of fuel cells over the nineties, using data on patents and scientific publications. The field of fuel cells is particularly heterogeneous. It covers diverse sub-fields that are marked by idiosyncratic characteristics (e.g. actors, demand, and input) and different historical developments. Although this heterogeneity of the field of fuel cells is reflected in the dynamics of S&T knowledge generation within and across its sub-fields too, this article shows that it does not entail the absence of cognitive interrelations between their S&T knowledge bases. For that purpose, the article uses "simultaneous mapping" approach of their S&T knowledge bases by means of textual analysis.
This study evaluates the scientific output of Iran over the past two decades. The information has been extracted by searching ISI in December 2003. Science production in Iran has been reviewed (1967-2003) and compared with 15 countries in the year 2000. During these years Iran's relative share in the scientific output in the world increased from 0.0003% in 1970 to 0.29% in 2003. Comparing the ratio of science output to GNP, Iran stands on thirteenth place among 16 countries in the year 2000. ne present article discusses that Iran has had an increasing growth in presenting articles after the Iraq-Iran war, which marks the period of stability and development.
Using the method of bibliometrics, a 1999-2002 biochemistry and molecular biology database was constructed for China from the Science Citation Index Expanded (SCI-Expanded). Based on this database, the author quantitatively analyzed the current research activity in biochemistry and molecular biology in China. Results show that almost half the publications were published in Chinese journals. The percentage of articles published by Chinese authors in the total articles from the world is increasing. The number of articles published in high influence journals is continuously increasing. The research outputs are mainly located in Beijing, Shanghai and Hong Kong. The sites of the China Science Academy and National Universities are the important locations for these studies. The collaboration rate of Chinese output is low as compared to results from other countries. USA and Japan are the main international collaborating countries.
This paper attempts to highlight the scientific productivity, productivity age, collaboration trend, domains of contributions of eight Nobel laureates of past and present belonging to different domains of research in science. Also attempts to document the various factors that affect productivity of scientists. No Nobel laureates can be compared with other Nobel laureates as they are an altogether different class of scientific elites and each piece of research is unique by itself.
The application of the measurement of scientific and technical activities has been a lengthy process of the appropriate metrics and the assignment of the standards and benchmarks for their usage. Although some studies have addressed issues of the management of science and technology and their relation to scientometrics and infometrics, there is nevertheless a need to consider the linkages between the conceptual background of scientific generation and progress - and the measurement of its process and outcomes. This paper first reviews the three main approaches to the generation and progress of human knowledge in general and scientific activity in particular. These approaches are reviewed in terms of the demands they would make on the measurement of scientific process and outputs. The paper then examines the currently used categories of metrics, and arrives at several conclusions. The paper provides an analysis of these conclusions and their implications to the generation and utilization of metrics of science and its outcomes. The review of the conceptual or philosophical foundations for the measurement of science offers an in-depth examination, resulting in the correlation of these foundations with the metrics we now use to measure science and its outcomes. The paper suggests research directions for a much needed link between theories of science and knowledge, and the application of metrics used to measure them. Finally, the paper offers several hypotheses and proposes potential empirical studies.
Correlation between diabetes-related publication output and diabetes prevalence was sought and found in a sample of world countries and in the states of the US. Various correlation patterns ("demand driven research", "research driven prevention", no correlation) were distinguished and interpreted.
The increasing amount of publicly available literature and experimental data in biomedicine makes it hard for biomedical researchers to stay up-to-date. Genescene is a toolkit that will help alleviate this problem by providing an overview of published literature content. We combined a linguistic parser with Concept Space, a cooccurrence based semantic net. Both techniques extract complementary biomedical relations between noun phrases from MEDLINE abstracts. The parser extracts precise and semantically rich relations from individual abstracts. Concept Space extracts relations that hold true for the collection of abstracts. The Gene Ontology, the Human Genome Nomenclature, and the Unified Medical Language System, are also integrated in Genescene. Currently, they are used to facilitate the integration of the two relation types, and to select the more interesting and high-quality relations for presentation. A user study focusing on p53 literature is discussed. All MEDLINE abstracts discussing p53 were processed in Genescene. Two researchers evaluated the terms and relations from several abstracts of interest to them. The results show that the terms were precise (precision 93%) and relevant, as were the parser relations (precision 95%). The Concept Space relations were more precise when selected with ontological knowledge (precision 78%) than without (60%).
The purpose of this research is to capture, understand, and model the process used by bioinformatics analysts when facing a specific scientific problem. Integrating information behavior with task analysis, we interviewed 20 bioinformatics experts about the process they follow to conduct a typical bioinformatics analysis-a functional analysis of a gene, and then used a task analysis approach to model that process. We found that each expert followed a unique process in using bioinformatics resources, but had significant similarities with their peers. We synthesized these unique processes into a standard research protocol, from which we developed a procedural model that describes the process of conducting a functional analysis of a gene. The model protocol consists of a series of 16 individual steps, each of which specifies detail for the type of analysis, how and why it Is conducted, the tools used, the data input and output, and the interpretation of the results. The linking of information behavior and task analysis research is a novel approach, as it provides a rich high-level view of information behavior while providing a detailed analysis at the task level. In this article we concentrate on the latter.
In this article we present an in silico method that automatically assigns putative functions to DNA sequences. The annotations are at an increasingly conceptual level, up to identifying general biomedical fields to which the sequences could contribute. This bioinformatics data-mining system makes substantial use of several resources: a locally stored MEDLINE(R) database; a manually built classification system; the MeSH(R) taxonomy; relational technology; and bioinformatics methods. Knowledge is generated from various data sources by using well-defined semantics, and by exploiting direct links between them. A two-dimensional "Concept Map(TM)" displays the knowledge graph, which allows causal connections to be followed. The use of this method has been valuable and has saved considerable time in our in-house projects, and can be generally exploited for any sequence-an notation or knowledge-condensation task.
We present a novel application of knowledge discovery technology to a developing and challenging application area such as bioinformatics. This methodology allows the identification of relationships between low-magnitude similarity (LMS) sequence patterns and other well-contrasted protein characteristics, such as those described by database annotations. We start with the identification of these signals inside protein sequences by exhaustive database searching and automatic pattern recognition strategies. In a second step we address the discovering of association rules that will allow tagging sequences that hold LMS signals with consequent functional keywords. We have designed our own algorithm for discovering association rules, meeting the special necessities of bioinformatics problems, where the patterns we search lie in sparse datasets and are uncommon and thus difficult to locate. Computational efficiency has been verified both with synthetic and real biological data showing that the algorithm is well suited to this application area compared to state of the art algorithms. The usefulness of the method is confirmed by its ability to produce previously unknown and useful knowledge in the area of biological sequence analysis. In addition, we introduce a new and promising application of the rule extraction algorithm on gene expression databases.
A variety of biological data is transferred and exchanged in overwhelming volumes on the World Wide Web. How to rapidly capture, utilize, and integrate the information on the Internet to discover valuable biological knowledge is one of the most critical issues in bioinformatics. Many information integration systems have been proposed for integrating biological data. These systems usually rely on an intermediate software layer called wrappers to access connected information sources. Wrapper construction for Web data sources is often specially hand coded to accommodate the differences between each Web site. However, programming a Web wrapper requires substantial programming skill, and is time-consuming and hard to maintain. In this article we provide a solution for rapidly building software agents that can serve as Web wrappers for biological information integration. We define an XMIL-based language called Web Navigation Description Language (WNDL), to model a Web-browsing session. A WNDL script describes how to locate the data, extract the data, and combine the data. By executing different WNDL scripts, we can automate virtually all types of Web-browsing sessions. We also describe IEPAD (information Extraction Based on Pattern Discovery), a data extractor based on pattern discovery techniques. IEPAD allows our software agents to automatically discover the extraction rules to extract the contents of a structurally formatted Web page. With a programming-by-example authoring tool, a user can generate a complete Web wrapper agent by browsing the target Web sites. We built a variety of biological applications to demonstrate the feasibility of our approach.
Subgraph isomorphism and maximum common subgraph isomorphism algorithms from graph theory provide an effective and an efficient way of identifying structural relationships between biological macromolecules. They thus provide a natural complement to the pattern matching algorithms that are used in bioinformatics to identify sequence relationships. Examples are provided of the use of graph theory to analyze proteins for which three-dimensional crystallographic or NMR structures are available, focusing on the use of the Bron-Kerbosch clique detection algorithm to identify common folding motifs and of the Ullmann subgraph isomorphism algorithm to identify patterns of amino acid residues. Our methods are also applicable to other types of biological macromolecule, such as carbohydrate and nucleic acid structures.
Bioinformatics is a rapidly growing field, and educational programs for bioinformatics are increasing at a similar pace to answer the demand for qualified professionals. Here we survey currently available bioinformatics programs. We have compiled summaries of these programs, including university, state, degree type, department, entrance requirements, degree requirements, links to course Web pages, research interests, and funding. Complete details are presented in the Web version, and an abbreviated listing of the primary attributes of all programs is included in this article.
The National Center for Biotechnology Information (NCBI) provides access to more than 30 publicly available molecular biology resources, offering an effective discovery space through high levels of data integration among large-scale data repositories. The foundation for many services is GenBank(R), a public repository of DNA sequences from more than 133,000 different organisms. GenBank is accessible through the Entrez retrieval system, which integrates data from the major DNA and protein sequence databases, along with resources for taxonomy, genome maps, sequence variation, gene expression, gene function and phenotypes, protein structure and domain information, and the biomedical literature via PubMed(R). Computational tools allow scientists to analyze vast quantities of diverse data. The BLAST(R) sequence similarity programs are instrumental in identifying genes and genetic features. Other tools support mapping disease loci to the genome, identifying new genes, comparing genomes, and relating sequence data to model protein structures. A basic research program in computational molecular biology enhances the database and software tool development initiatives. Future plans include further data integration, enhanced genome annotation and protein classification, additional data types, and links to a wider range of resources.
National Library of Medicine (NLM) extramural programs in bioinformatics are described in the context of National Institutes of Health (NIH) funding mechanisms and illustrated through a sampling of recently funded grants. The NIH application, evaluation, and funding process is described as used by the NLM.
Peer-to-Peer (P2P) networks provide a new distributed computing paradigm on the Internet for file sharing. The decentralized nature of P2P networks fosters cooperative and non-cooperative behaviors in sharing resources. Searching is a major component of P2P file sharing. Several studies have been reported on the nature of queries of World Wide Web (WWW) search engines, but studies on queries of P2P networks have not been reported yet. In this report, we present our study on the Gnutella network, a decentralized and unstructured P2P network. We found that the majority of Gnutella users are located in the United States. Most queries are repeated. This may be because the hosts of the target files connect or disconnect from the network any time, so clients resubmit their queries. Queries are also forwarded from peers to peers. Findings are compared with the data from two other studies of Web queries. The length of queries in the Gnutella network is longer than those reported in the studies of WWW search engines. Queries with the highest frequency are mostly related to the names of movies, songs, artists, singers, and directors. Terms with the highest frequency are related to file formats, entertainment, and sexuality. This study is important for the future design of applications, architecture, and services of P2P networks.
With the rise of interactive search engines, direct access to information is available to the end user. Evaluation of these systems is clearly important for their comparison and development. From the perspective of the user engaged with the system in some information-seeking task, the range of relevant evaluation factors could be considerable. This article presents an investigation into the suitability of repertory grid technique eliciting a mental model of search engines in the user. Such a model comprises constructs held important to the user, which can be used to evaluate the system. The design of the repertory grid application is described and the data analyzed in two stages. The important issues in determining the suitability of the method are identified and the constructs are analyzed to determine their association, discriminatory ability, and clustering around a central overall rating. From this, it is concluded that repertory grid technique is appropriate for user-centered determination of evaluative constructs. Further analysis of the characteristics of the resulting construct set is undertaken towards establishing an underlying user model of the search engines, and further research is identified for its use in system evaluation.
The essential characteristic of knowledge workers is that they use information to produce information subsequently. Hence, information seeking is a knowledge worker's central aspect of work life. In a corporate research laboratory environment, this is even more pronounced because the results produced are often in the form of more information, such as publications, tech reports, patent applications, or the embodiment of these into prototypes. The practices and expectations regarding information seeking and collaboration are fundamental to productive research in a corporate setting. To this end, a survey research project sampled researchers from selected labs of Hewlett Packard and Compaq Computer shortly after their merger. This survey examined researchers' usage of information sources, their preferred means of information seeking, and the types of information assets they produced. Findings indicated that participants relied heavily on the Internet and other Web-based resources, more so than on their colleagues inside the company. Participants chose which information resources to use based on the time it took them to track down the information as well as the authoritativeness of the sources. Most information assets were generated collaboratively by teams rather then by individuals. Findings suggested that behavior was affected by the unstable environment resulting from the merger and the process of integrating the two research organizations.
The rapid growth of the non- English-speaking Internet population has created a need for better searching and browsing capabilities in languages other than English. However, existing search engines may not serve the needs of many non-English-speaking Internet users. In this paper, we propose a generic and integrated approach to searching and browsing the Internet in a multilingual world. Based on this approach, we have developed the Chinese Business Intelligence Portal (CBizPort), a meta-search engine that searches for business information of mainland China, Taiwan, and Hong Kong. Additional functions provided by CBizPort include encoding conversion (between Simplified Chinese and Traditional Chinese), summarization, and categorization. Experimental results of our user evaluation study show that the searching and browsing performance of CBizPort was comparable to that of regional Chinese search engines, and CBizPort could significantly augment these search engines. Subjects' verbal comments indicate that CBizPort performed best in terms of analysis functions, cross-regional searching, and user-friendliness, whereas regional search engines were more efficient and more popular. Subjects especially liked CBizPort's summarizer and categorizer, which helped in understanding search results. These encouraging results suggest a promising future of our approach to Internet searching and browsing in a multilingual world.
Hypothesizing that workplace significantly affects information-seeking patterns, this study compared accessibility and use of information sources among 233 Israeli computer scientists and software engineers, employed in industry and academy, using a mail questionnaire, which yielded a usable reply rate of 33%. The two groups were found to differ significantly in age, education, seniority, and type of research they performed (basic vs. applied). Printed textbooks, professional journals, and oral discussions with colleagues or experts in the organization were common to both groups, topping almost all lists of accessibility and use. For most information sources, however, the two groups differed significantly and consistently. Printed professional journals as well as printed and electronic conference or meeting papers were consistently more accessible and more often used by the academy group, while the industry group reported greater access to and more frequent use of electronic textbooks and trade or promotional literature. In regard to handbooks and standards, in-house technical reports (printed), government technical reports (Internet), librarians and technical specialists (Internet), and oral discussions with supervisors, no significant differences in accessibility were found, but their use by the industry group was much higher. In both groups, accessibility was only partly related to use, and more so among the academy than the industry group.
An incomplete bibliography (or, more generally, an incomplete Information Production Process (IPP)) can be considered as a sample from a complete one. Sampling can be done in the sources or in the items. The simplest sampling technique is the systematic one where every k(th) source or k(th) item is taken (alternatively: deleted) (k is an element of N). In this paper we give a definition of systematic sampling in items and sources in the framework of an IPP in which we have continuous variables. We prove the theorem that in such IPPs we have a Lotkaian size-frequency function (i.e. a decreasing power function) if and only if systematic sampling in sources is the same as systematic sampling in items. In this proof we use the well-known characterization of power functions as scale-free functions.
in order to model the variable T (the age of citations received by scientific works) with data elaborated by the Institute of Scientific Information, we have used some of the instruments already developed in the survival models to this type of retrospective analyses in the presence of censored data. This analysis is used because, usually, the citations of ages greater than or equal to 10 years appear added together. For a set of journals related to the field of Applied Economics, we have explored which models fit better among those commonly used. Two different approaches to assess the goodness-of-fit for each selected model have been suggested: an analysis through graphical methods and a formal analysis to estimate the parameters of each model by the method of maximum likelihood estimation with data censored to the right.
Klaus Fuchs, during his years in England as an immigrant, has written 20 scientific papers. One of these papers, published in 1938, became a fundamental text in solid state physics and for the development of microelectronics in succeeding decades. It was cited more than 1200 times in the period from 1945 until 2003. It appears to be a typical case of delayed recognition in science. Pioneering papers simultaneously written by Hahn P StraBmann and by Meitner P Frisch on the discovery of nuclear fission are considered for comparison.
The medical specialty of "tropical medicine" only dates back a little more than 100 years and, in the meantime, has gone through several quite distinctive eras. The aim of our study was to investigate trends that occurred in the leading literature on tropical medicine over the past 50 years. We analysed 2,802 original articles published in 1952, 1962, 1972, 1982, 1992 and 2002 in five of the high impact factor journals, namely (i) Acta Tropica, (ii) American Journal of Tropical Medicine and Hygiene, (iii) Annals of Tropical Medicine and Parasitology, (iv) Leprosy Review, and (v) Transactions of the Royal Society of Tropical Medicine and Hygiene. Authors' country affiliations were categorized according to the human development index 2003 (HDI), with stratification into low, medium and high HDI. We observed the following trends: First, there was a strong increase in the number of articles published from 250 in 1952 to 726 in 2002. Second, over the same time span, the median number of authors per article increased from I (four journals) or 2 (American Journal of Tropical Medicine and Hygiene) to 2.5 (Leprosy Review) up to 6 (Acta Tropica and American Journal of Tropical Medicine and Hygiene). Third, research collaborations between countries of different HDI ranks increased concomitantly - in 2002, 19.4-43.7% of all manuscripts comprised authors from different HDI countries - indicating that tropical medicine has become a global endeavour. However, in four of the five journals investigated, the overall percentage of researchers affiliated with low HDI countries decreased over the past 50 years and only a slight positive trend can be observed over the last decade. Concluding, current roadblocks should be identified and programmes designed and implemented to enhance equity of publishing in tropical medicine. This in cum might be an important step forward to substantially reduce the current burden of tropical diseases, so that social and economic development in the tropics and subtropics can be advanced and poverty alleviated.
We present empirical data on frequency and pattern of misprints in citations to twelve high-profile papers. We find that the distribution of misprints, ranked by frequency of their repetition, follows Zipfs law. We propose a stochastic model of citation process, which explains these findings, and leads to the conclusion that about 70-90% of scientific citations are copied from the lists of references used in other papers.
This paper studied the intellectual structure of urban studies through a co-citation analysis of its thirty-eight representative journals from 1992 to 2002. Relevant journal co-citation data were retrieved from Social SciSearch, and were subjected to cluster analysis, multidimensional scaling, and factor analysis. A cluster-enhanced two-dimensional map was created, showing a noticeable subject variation along the horizontal axis depicting four clusters of journals differentiated into mainstream urban studies, regional science and urban economics, transportation, and real estate finance. The cluster of the mainstream urban studies journals revealed a higher degree of interdisciplinarity than other clusters. The four-factor solution, though not a perfect match for the cluster solution, demonstrated the interrelationships among the overlapping journals loaded high on different factors. The results also showed a strong negative correlation between the coordinates of the horizontal axis and the mean journal correlation coefficients reflecting the subject variation, and a less revealing positive correlation between the coordinates of the vertical axis and the mean journal correlation coefficients.
Core/periphery scientific communication is important for information transfer in terrorism literature. The mutual awareness between disciplinary journals contributors in the mainstream and those in the margins of the field enhances their social interaction. The usual case is that the mainstream of a discipline is visible through such indexes as the Web of Science (SCI) and the Journal Citation Report (JCR) the second of which assigns an impact factor to the most cited journals. In terrorism subject area, however, the reverse situation exists; only the peripheral journals in this field are indexed in JCR. From a scientific communication perspective, then, the core journals of terrorism writings are relatively invisible. This study attempts to identify the core and the periphery of journals dealing with terrorism, and suggests a way to bring them closer together. The assumption is that the quality and quantity of work in this field will increase as the distance between these two poles decreases.
Major Web search engines, such as AltaVista, are essential tools in the quest to locate online information. This article reports research that used transaction log analysis to examine the characteristics and changes in AltaVista Web searching that occurred from 1998 to 2002. The research questions we examined are (1) What are the changes in AltaVista Web searching from 1998 to 2002? (2) What are the current characteristics of AltaVlsta searching, including the duration and frequency of search sessions? (3) What changes in the information needs of AltaVista users occurred between 1998 and 2002? The results of our research show (1) a move toward more interactivity with increases in session and query length, (2) with 70% of session durations at 5 minutes or less, the frequency of interaction is increasing, but it is happening very quickly, and (3) a broadening range of Web searchers' information needs, with the most frequent terms accounting for less than 1% of total term usage. We discuss the implications of these findings for the development of Web search engines.
Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 on the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research on this task, because they allow different researchers to experimentally compare their own systems by comparing the results they have obtained on this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last 10 years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have "carved" different subsets out of this collection and tested their systems on one of these subsets only; systems that have been tested on different Reuters-21578 subsets are thus not readily comparable. In this article, we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an indirect means for comparing TC systems that have, or will be, tested on these different subsets.
Latent Semantic Indexing (LSI), when applied to semantic space built on text collections, improves information retrieval, information filtering, and word sense disambiguation. A new dual probability model based on the similarity concepts is introduced to provide deeper understanding of LSI. Semantic associations can be quantitatively characterized by their statistical significance, the likelihood. Semantic dimensions containing redundant and noisy information can be separated out and should be ignored because their negative contribution to the overall statistical significance. LSI is the optimal solution of the model. The peak in the likelihood curve indicates the existence of an intrinsic semantic dimension. The importance of LSI dimensions follows the Zipf-distribution, indicating that LSI dimensions represent latent concepts. Document frequency of words follows the Zipf distribution, and the number of distinct words follows log-normal distribution. Experiments on five standard document collections confirm and illustrate the analysis.
The nature of the contents of academic Web sites is of direct relevance to the new field of scientific Web intelligence, and for search engine and topic-specific crawler designers. We analyze word frequencies in national academic Webs using the Web sites of three English-speaking nations: Australia, New Zealand, and the United Kingdom. Strong regularities were found in page size and word frequency distributions, but with significant anomalies. At least 26% of pages contain no words. High frequency words include university names and acronyms, Internet terminology, and computing product names: not always words in common usage away from the Web. A minority of low frequency words are spelling mistakes, with other common types including nonwords, proper names, foreign language terms or computer science variable names. Based upon these findings, recommendations for data cleansing and filtering are made, particularly for clustering applications.
Nimble competitors competing in a dynamic global marketspace increasingly characterize the current environment faced by many organizations. Providing the organization's knowledge workers with the tools and technology to mine information and generate insights has become a key issue facing organizations. In this study we investigate the potential combined impact of the use of organizational decision models and competitive intelligence tool proficiency on knowledge creation and strategic use of information competence. Regression analysis results show significant main and interaction effects of the organizational decision models and competitive intelligence tool proficiency on four identified factors of knowledge creation and strategic use of information competence: pattern discovery, strategy appraisal, insight generation, and solution formulation. Organizational implications and future research directions are discussed.
Implicit knowledge and "tacit knowledge" in Knowledge Management (KM) are important, often synonymous, terms. In KM they often refer to private or personal knowledge that needs to be made public. The original reference of "tacit knowledge" is to the work of the late scientist and philosopher, Michael Polanyi (Polanyi, 1969), but there is substantial evidence that the KM discourse has poorly understood Polanyi's term. Two theoretical problems in Knowledge Management's notion of "implicit knowledge," which undermine empirical work in this area, are examined. The first problem involves understanding the term "knowledge" according to a folk-psychology of mental representation to model expression. The second is epistemological and social: understanding Polanyi's term, tacit knowing as a psychological concept instead of as an epistemological problem, in general, and one of social epistemology and of the epistemology of the sciences, in particular. Further, exploring Polanyi's notion of tacit knowing in more detail yields important insights into the role of knowledge in science, including empirical work in information science. This article has two parts: first, there is a discussion of the folk-psychology model of representation and the need to replace this with a more expressionist model. In the second part, Polanyi's concept of tacit knowledge in relation to the role of analogical thought in expertise is examined. The works of philosophers, particularly Harre and Wittgenstein, are brought to bear on these problems. Conceptual methods play several roles in information science that cannot satisfactorily be performed empirically at all or alone. Among these roles, such methods may examine historical issues, they may critically engage foundational assumptions, and they may deploy new concepts. In this article the last two roles are examined.
ADEPT is a 5-year project whose goals are to develop, deploy, and evaluate inquiry learning capabilities for the Alexandria Digital Library, an extant digital library of primary sources in geography. We interviewed nine geography faculty members who teach undergraduate courses about their information seeking for research and teaching and their use of information resources in teaching. These data were supplemented by interviews with four faculty members from another ADEPT study about the nature of knowledge in geography. Among our key findings are that geography faculty are more likely to encounter useful teaching resources while seeking research resources than vice versa, although the influence goes in both directions. Their greatest information needs are for research data, maps, and images. They desire better searching by concept or theme, in addition to searching by location and place name. They make extensive use of their own research resources in their teaching. Among the implications for functionality and architecture of geographic digital libraries for educational use are that personal digital libraries are essential, because individual faculty members have personalized approaches to selecting, collecting, and organizing teaching resources. Digital library services for research and teaching should include the ability to import content from common office software and to store content in standard formats that can be exported to other applications. Digital library services can facilitate sharing among faculty but cannot overcome barriers such as intellectual property rights, access to proprietary research data, or the desire of individuals to maintain control over their own resources. Faculty use of primary and secondary resources needs to be better understood if we are to design successful digital libraries for research and teaching.
Homeopathy has been applied to clinical use since it was first presented 200 years ago. The use of the bibilometric analysis technique for examining this topic does not exist in the literature. The objective of this study is to conduct a bibliometric analysis of all homeopathy-related publications in Science Citation Index (SCI). A systematic search was performed using the SCI for publications during the period of 1991 to 2003. Selected documents included 'Homoeopathy, Homoeopathic, Homeopathy, or Homeopathic' as a part of the title, abstract, or keywords. Analyzed parameters included authorship, patterns of international collaboration, journal, language, document type, research address, number of times cited, and reprint author's address. Citation analysis was mainly based on the impact factor as defined by the Journal Citation Reports (JCR) and on citations per publications (CPP), which is used to assess the impact relative to the entire field and is defined as the ratio between the average numbers of citations per publications in a certain period. Of total articles, 49% had a single author. The UK, the US, and Germany produced 71% of the total output, while European countries as a whole also contributed 65% of the total share of independent publications. English remains the dominant language, it comprised only 76%, while German contributed 18%, and the remaining where distributed among 8 European languages. More document types and languages, and fewer pages have appeared in homeopathy research. 3.5% of papers were cited more than 10 times in three years after publication, and 60% were never cited. Small-group collaboration was a popular method as co-authorship. The top 3 ranking countries of publication were the UK, the US, and Germany. The US dominated citation followed by the UK, and then Germany. In addition, a simulation model was applied to describe the relationship between the cumulative number of citations and the paper life.
According to the discrete model of periodical publication process, recurrence formulae of parameters of the process are gained and the initial conditions of control process parameters from one steady state to another are deduced. Using the variable separation approach, which is used generally to solve the partial differential equation, the recurrence computing formula of the publication probability function is deduced. First the publication delay increasing process caused by the accepted contribution flux increase is simulated, and then the publication delay decreasing processes under four different control means are simulated too. Finally it is demonstrated that the periodical publishing process is a strong inertia system and it is found that reducing the quantity of deposited contributions can shorten the publication delay.
We examine the diffusion of information and communication technologies (ICTs) in the knowledge production sectors of three developing areas. Using interviews with 918 scientists in one South Asian and two African locations, we address three fundamental questions: (1) To what degree has the research community in the developing world adopted the Internet? (2) How can the disparities in Internet adoption best be characterized? (3) To what extent is Internet use associated with research productivity? Our findings indicate that while the vast majority of scientists describe themselves as current email users, far fewer have ready access to the technology, use it in diverse ways, or have extensive experience. These results are consistent with the notion that Internet adoption should not be characterized as a single act on the part of users. The rapid development of the Internet and the cumulative skills required for its effective use are equally important, particularly its impact on productivity. These findings lead us to qualify crude generalizations about the diffusion of the Internet in developing areas.
The purpose of this paper is to investigate whether geographical concentration can act as a supplement to the Journal Impact Factor (JIF). The results indicate that the use of a geographical concentration measure opens up new possibilities for analyses of the development of geographic diversion over time. In contrast to measures used in earlier studies the precise strength of the geographical concentration index as a measure of diversion is that it represents diversion as a single value that can be followed over time. The results show wider geographic distribution of European economics journals in the 1980s compared to the American economics journals whereas there seems to be no difference in geographic dispersion in the 1990s.
The paper assesses the empirical foundation of two largely held assumptions in science policy making, namely scale and agglomeration effects. According to the former effect, scientific production may be subject to increasing returns to scale, defined at the level of administrative units, such as institutes or departments. A rationale for concentrating resources on larger units clearly follows from this argument. According to the latter, scientific production may be positively affected by external economies at the geographical level, so that concentrating institutes in the same area may improve scientific spillover, linkages and collaborations. Taken together, these arguments have implicitly or explicitly legitimated policies aimed at consolidating institutes in public sector research and at creating large physical facilities in a small number of cities. The paper is based on the analysis of two large databases, built by the authors from data on the activity of the Italian National Research Council in all scientific fields and of the French INSERM in biomedical research. Evidence from the two institutions is that the two effects do not receive empirical support. The implications for policy making and for the theory of scientific production are discussed.
This paper addresses the issue of relevancy when tackling the problem of the evaluation of research published in Social Science journals. This evaluation initialy relies on a critical selection of the databases scientists use. To implement relevant disciplinary evaluations, the method also needs to be scientific, ethical, replicable, comprehensive, flexible, transparent, accessible, incentive, productive, updatable and "internationalizable". This qualitative approach takes into account the current global environment of research. Our method - introducing these criteria - consists in selecting the bases (either bases from the Institute for Scientific Information or not) scientists favour, in crossing them to elaborate new lists of journals, in testing them, in launching a life-size survey among scientists. This method stands as a prerequisite for further applications. Beyond this rather constructivist approach, such evaluations of research can benefit to all the actors participating in the process of the dissemination of knowledge. The need for an international cooperation in coming up with relevant evaluation criteria and indexes is put forward when implementing these sets of evaluation. The appendix presents a case study on French sociology.
This paper compares the inventive output of two science systems in small European countries. More specifically, we examine patented inventions of Finnish and Flemish university researchers. The comparison includes inventive output as such and its concentration on organizations, inventors, and corporate owners as well as foreign assignations and the degree to which individual inventors have retained the ownership of the patents. While there are commonalities between the Finnish and Flemish systems in terms of patent concentration on key institutions and corporate assignees, there are also pronounced differences with respect to the ownership structure of academic patents, which was expected in light of the different intellectual property regulations. Our observations seem to suggest that the total inventive output of a research system is not a function of the prevailing intellectual property system but rather in correspondence to overall national inventiveness thereby pointing to more general (national, cultural) drivers of academic inventive activity. From a methodological viewpoint, this research illustrates that tracing university-owned patents alone would leave considerable technological contributions of academics unidentified - also in countries where universities own the rights to their researchers' patents. Another finding with potential methodological implications is that patents are highly concentrated on institutions. If such a distribution law applies to large countries as well, analysts could cover most of the national academic patent output by an intelligent selection of universities.
In the present study full-text analysis and traditional bibliometric methods are combined to improve the efficiency of the individual methods in the mapping of science. The methodology is applied to map research papers from a special issue of Scientometrics. The outcomes substantiate that such hybrid methodology can be applied to both research evaluation and information retrieval. The subject classification given by the guest-editors of the special issue is used for validation purposes. Because of the limited number of papers underlying the study the paper is considered a pilot study that will be extended in a later study on the basis of a larger corpus.
The discrete Lotka power function describes the number of sources (e.g., authors) with n = 1, 2, 3.... items (e.g., publications). As in econometrics, informetrics theory requires functions of a continuous variable j, replacing the discrete variable n. Now j represents item densities instead of number of items. The continuous Lotka power function describes the density of sources with item density j. The discrete Lotka function one obtains from data, obtained empirically; the continuous Lotka function is the one needed when one wants to apply Lotkaian informetrics, i.e., to determine properties that can be derived from the (continuous) model. It is, hence, important to know the relations between the two models. We show that the exponents of the discrete Lotka function (if not too high, i.e., within limits encountered in practice) and of the continuous Lotka function are approximately the same. This is important to know in applying theoretical results (from the continuous model), derived from practical data.
Power laws as defined in 1926 by A. Lotka are increasing in importance because they have been found valid in varied social networks including the Internet. In this article some unique properties of power laws are proven. They are shown to characterize functions with the scale-free property (also called self-similarity property) as well as functions with the product property. Power laws have other desirable properties that are not shared by exponential laws, as we indicate in this paper. Specifically, Naranan (1970) proves the validity of Lotka's law based on the exponential growth of articles in journals and of the number of journals. His argument is reproduced here and a discrete-time argument is also given, yielding the same law as that of Lotka. This argument makes it possible to interpret the information production process as a self-similar fractal and show the relation between Lotka's exponent and the (self-similar) fractal dimension of the system. Lotkaian informetric systems are self-similar fractals, a fact revealed by Mandelbrot (1977) in relation to nature, but is also true for random texts, which exemplify a very special type of informetric system.
The journal impact factors, published by the Institute for Scientific Information (ISI; Philadelphia, PA), are widely known and are used to evaluate overall journal quality and the quality of the papers published therein. However, when making comparisons between subject fields, the work of individual scientists and their research institutions as reflected in their articles' ISI impact factors can become meaningless. This inequality will remain as long as ISI impact factors are employed as an instrument to assess the quality of international research. Here we propose a new mathematical index entitled Impact Factor Point Average (IFPA) for assessment of the quality of individual research work in different subject fields. The index is established based on a normalization of differences in impact factors, rankings, and number of journal titles in different subject fields. The proposed index is simple and enables the ISI impact factors to be used with equality, especially when evaluating the quality of research work in different subject fields.
Domain novice users in the beginning stages of researching a topic find themselves searching for information via information retrieval (IR) systems before they have identified their information need. Pre-Internet access technologies adapted by current IR systems poorly serve these domain novice users, whose behavior might be characterized as rudderless and without a compass. In this article we describe a conceptual design for an information retrieval system that incorporates standard information need identification classification and subject cataloging schemes, called the INIIReye System, and a study that tests the efficacy of the innovative part of the INIIReye System, called the Associative Index. The Associative Index helps the user put together his or her associative thoughts-Vannevar Bush's idea of associative indexing for his Memex machine that he never actually described. For the first time, data from the study reported here quantitatively supports the theoretical notion that the information seeker's information need is identified through transformation of his/her knowledge structure (i.e., the seeker's cognitive map or perspective on the task for which information is being sought).
In this article the results of research that examined the permanence of 1,068 Web-located citations in 123 academic conference articles published between 1995 and 2003 are reported. The study is one of the few but increasing number of investigations that examines the growing practice of authors citing URLs (Uniform Resource Locators) in their publications to support and argue their scholarly research. It was found that some 46% of all citations to Web-located sources could not be accessed-with the HTTP 404 ("Page not found") message (61.5%) being the greatest cause of missing citations. Collectively, the missing citations accounted for 22.0% of all citations, which represents a significant reduction in the theoretical knowledge base underpinning many scholarly articles. It is argued that the consequences of disappearing Web-located citations has led to diminishing opportunities for future researchers to examination the underlaying foundations of discourse and argument in scholarly articles.
From its earliest days, much investigative work in informetrics has been concerned with inequality aspects. Beginning with the well-known Gini coefficient as a measure of the concentration/inequality of productivity within a single data set, in this study we look at the problem of measuring relative inequality of productivity between two data sets. A measure originally proposed by Dagum (1987), analogous to the Gini coefficient, is discussed and developed with both theoretical and empirical illustrations. From this we derive a standardized measure-the relative concentration coefficient-based on the notion of "relative economic affluence" also introduced by Dagum (1987). Finally, a new standardized measure-the co-concentration coefficient, in some ways analogous to the correlation coefficient-is defined. The merits and drawbacks of these two measures are discussed and illustrated. Their value will be most readily appreciated in comparative empirical studies.
Recent work in automatic question answering has called for question taxonomies as a critical component of the process of machine understanding of questions. There is a long tradition of classifying questions in library reference services, and digital reference services have a strong need for automation to support scalability. Digital reference and question answering systems have the potential to arrive at a highly fruitful symbiosis. To move towards this goal, an extensive review was conducted of bodies of literature from several fields that deal with questions, to identify question taxonomies that exist in these bodies of literature. In the course of this review, five question taxonomies were identified, at four levels of linguistic analysis.
The success of content-based image retrieval (CBIR) relies critically on the ability to find effective image features to represent the database images. The shape of an object is a fundamental image feature and belongs to one of the most important image features used in CBIR. In this article we propose a robust and effective shape feature known as the compound image descriptor (CID), which combines the Fourier transform (FT) magnitude and phase coefficients with the global features. The underlying FT coefficients have been shown analytically to be invariant to rotation, translation, and scaling. We also present details of the underlying innovative shape feature extraction method. The global features, besides being incorporated with the FT coefficients to form the CID, are also used to filter out the highly dissimilar images during the image retrieval process. Thus, they serve a dual purpose of improving the accuracy and hence the robustness of the shape descriptor, and of speeding up the retrieval process, leading to a reduced query response time. Experiment results show that the proposed shape descriptor is, in general, robust to changes caused by image shape rotation, translation, and/or scaling. It also outperforms other recently published proposals, such as the generic Fourier descriptor (Zhang & Lu, 2002).
The research reported here was an exploratory study that sought to discover the effects of human individual differences on Web search strategy. These differences consisted of (a) study approaches, (b) cognitive and demographic features, and (c) perceptions of and preferred approaches to Web-based information seeking. Sixty-eight master's students used AltaVista to search for information on three assigned search topics graded in terms of complexity. Five hundred seven search queries were factor analyzed to identify relationships between the individual difference variables and Boolean and best-match search strategies. A number of consistent patterns of relationship were found. As task complexity increased, a number of strategic shifts were also observed on the part of searchers possessing particular combinations of characteristics. A second article (published in this issue of JASIST; Ford, Miller, & Moss, 2005) presents a combined analyses of the data including a series of regression analyses.
This is the second of two articles published in this issue of JASIST reporting the results of a study investigating relationships between Web search strategies and a range of human individual differences. In this article we provide a combined analysis of the factor analyses previously presented separately in relation to each of three groups of human individual difference (study approaches, cognitive and demographic features, and perceptions of and approaches to Internet-based information seeking). It also introduces two series of regression analyses conducted on data spanning all three individual difference groups. The results are discussed in terms of the extent to which they satisfy the original aim of this exploratory research, namely to identify any relationships between search strategy and individual difference variables for which there is a prima facie case for more focused systematic study. It is argued that a number of such relationships do exist. The results of the project are summarized and suggestions are made for further research.
Most organizations have reported dismal returns on their investments in knowledge portals-Intranet Web sites aimed at enabling the storage and exchange of explicit knowledge artifacts. In our research, we were surprised to find that knowledge workers have for the most part abandoned the use of knowledge portals. Moreover, in cases where they do turn to knowledge portals they use it as a last resort. In this brief communication, we call attention both to research and practice to help transform current knowledge portals to ones that are more sensitive to the issues faced by practitioners. To this end, we will elaborate on the need to pay attention to maintenance of knowledge management portals.
The use of Pearson's correlation coefficient in Author Cocitation Analysis was compared with Salton's cosine measure in a number of recent contributions. Unlike the Pearson correlation, the cosine is insensitive to the number of zeros. However, one has the option of applying a logarithmic transformation in correlation analysis. Information calculus is based on both the logarithmic transformation and provides a non-parametric statistics. Using this methodology, one can cluster a document set in a precise way and express the differences in terms of bits of information. The algorithm is explained and used on the data set, which was made the subject of this discussion.
This study examines the bibliometrics of the controversial scientific literature of Polywater research, focusing on publication types (books, journal publications, conference proceedings, and technical reports). Publication (P) frequency is used to measure publication "shape" or pattern and output, citations per publication (CPP) for impact, author self-citations (SC) and uncited publications (UP) for their effect on P and CPP. Findings show an epidemic publication pattern, journal publications with the highest P, books with the highest CPP, and insignificant SC and UP. Comparisons to several non-controversial scientific literatures suggest that these findings may be common to other controversial scientific literatures.
The bibliometric laws of Zipf, Bradford, and Lotka, in their various mathematical expressions, frequently present difficulties in the fitting of empirical values. The empirical flaws of fit take place in the frequency of the words, in the productivity of the authors and the journals, as well as in econometric and demographic aspects. This indicates that the underlying fractal model should be revised, since, as shown, the inverse power equations (of the Zipf-Mandelbrot type) are not adequate, as they need to include exponential terms. These modifications not only affect Bibliometrics and Scientometrics, but also, for the generality of the fractal model, apply to Economy, Demography, and even Natural Sciences in general.
A unified scientometric model has been developed on the basis of seven principles: the actor-network principle, the translation principle, the spatial principle, the quantativity principle, the composition principle, the centre-periphery or nucleation principle, and the unified principle of cumulative advantages. The paradigm of the fractal model has been expanded by introducing the concept of fractality index and transfractality. In this work, as the first demonstration of the power of the model proposed, all the bibliometric laws known and all their mathematical expressions are deduced, both the structural distributions (Zipf, Bradford and Lotka) as well as the Price's Law of the exponential growth of science and Brookes' and Avramescu's Laws of ageing.
By the information system of CoPalRed&COPY; and with the treatment of 63,543 bibliographical references of scientific articles, the field of surfactants has been analysed in the light of the Unified Scientometric Model. It was found that the distributions of actors (countries, centres, and research laboratories, journals, researchers, key words of documents) fit Zif's Unified Law better than the Zipf-Mandelbrot Law. The model showed an especially good fit for relational indicators such as density and centrality. Using the Unified Bradford Law, the three zones fit were: core, straight fraction, and Groos droop. The fractality index was used to verify that Science can present fractal as well as transfractal structures. In conclusion, the Unified Scientometric Model is, for its flexibility and its integrating capacity, an appropriate model for representing Science, joining non-relational with relational Scientometrics under the same paradigm.
We study new and existing data sets which show that growth rates of sources usually are different from growth rates of items. Examples: references in publications grow with a rate that is different (usually higher) from the growth rate of the publications themselves; article growth rates are different from journal growth rates and so on. In this paper we interpret this phenomenon of "disproportionate growth"; in terms of Naranan's growth model and in terms of the self-similar fractal dimension of such an information system, which follows from Naranan's growth model. The main part of the paper is devoted to explain disproportionate growth. We show that the "simple" 2-dimensional informetrics models of source-item relations are not able to explain this but we also show that linear 3-dimensional informetrics (i.e. adding a new source set) is capable to model disproportionate growth. Formulae of such different growth rates are presented using Lotkaian informetrics and new and existing data sets are presented and interpreted in terms of the used linear 3-dimensional model.
In science, peer review is the best-established method of assessing manuscripts for publication and applications for research fellowships and grants. However, the fairness of peer review, its reliability and whether it achieves its aim to select the best science and scientists has often been questioned. The paper presents the first comprehensive study on committee peer review for the selection of doctoral (Ph.D.) and post-doctoral research fellowship recipients. We analysed the selection procedure followed by the Boehringer Ingelheim Fonds (B.I.F.), a foundation for the promotion of basic research in biomedicine, with regard to the reliability, fairness and predictive validity of the procedure - the three quality criteria for professional evaluations. We analysed a total of 2,697 applications, 1,954 for doctoral and 743 for post-doctoral fellowships. In 76% of the cases, the fellowship award decision was characterized by agreement between reviewers. Similar figures for reliability have been reported for the grant selection procedures of other major funding agencies. With regard to fairness, we analysed whether potential sources of bias, i.e., gender, nationality, major field of study and institutional affiliation, could have influenced decisions made by the B.I.F. Board of Trustees. For post-doctoral fellowship applications, no statistically significant influence of any of these variables could be observed. For doctoral fellowship applications, we found evidence of an institutional, major field of study and gender bias, but not of a nationality bias. The most important aspect of our study was to investigate the predictive validity of the procedure, i.e., whether the foundation achieves its aim to select as fellowship recipients the best junior scientists. Our bibliometric analysis showed that this is indeed the case and that the selection procedure is thus highly valid: research articles by B.I.F. fellows are cited considerably more often than the "average" paper (average citation rate) published in the journal sets corresponding to the fields "Multidisciplinary", "Molecular Biology & Genetics", and "Biology & Biochemistry" in Essential Science Indicators (ESI) from the Institute for Scientific Information (ISI, Philadelphia, Pennsylvania, USA). Most of the fellows publish within these fields.
This paper investigates Korean scientific output, focusing on international collaboration patterns, through an analysis of journal publications. For the study, 44,534 publications, published by researchers affiliated with Korean institutions and indexed by SCI during the six years 1995-2000, were considered. The study period was divided into two periods to compare the international collaboration for three years 1995-1997 and 1998-2000. The results show a clear decrease in Korea's international collaboration level between the study periods even though the number of researchers as well as the total R&D expenditure decreased considerably after Korea's economic change. The decrease of international collaboration in Korean science was inversely associated with different determinants such as scientific size as well as national scientific infrastructure. This decreasing trend of international collaboration in Korean science was largely caused by discipline-to-discipline variations in coverage of the SCI database. Among the top-ten collaborating countries, only the Chinese and the Canadian share of collaborative publications with Korea increased between the two periods under consideration.
Much has been written about titles in scientific journal articles but little research has been carried out. We aimed to assess in two studies how factors like the length of a title and its structure might vary in different scientific fields, and whether or not these features have changed over time. Statistical analyses were made of 216,500 UK papers from science journals, and of 133,200 international oncology papers. Factors examined included title length, the use of colons in the titles, and the number of authors. All of these factors increased over time for both sets of papers, although there were some disciplinary differences in the findings. In both studies, titles with colons occurred more frequently with single than with multiple authors except when the numbers of co-authors were large. Certain features of titles can be related to different disciplines, different journals, the numbers of authors and their nationalities.
In this study, journal impact factors play a central role. In addition to this important bibliometric indicator, which evolves around the average impact of a journal in a two-year timeframe, related aspects of journal impact measurement are studied. Aspects like the output volume, the percentage of publications not cited, and the citation frequency distribution within a set timeframe are researched, and put in perspective with the 'classical' journal Impact Factor. In this study it is shown that these aspects of journal impact measurement play a significant role, and are strongly inter-related. Especially the separation between journals on the basis of the differences in output volume seems to be relevant, as can be concluded from the different results in the analysis of journal impact factors, the degree of uncitedness, and the share of a journal its contents above or below the impact factor value.
As citation practices strongly depend on fields, field normalisation is recognised as necessary for fair comparison of figures in bibliometrics and evaluation studies. However fields may be defined at various levels, from small research areas to broad academic disciplines, and thus normalisation values are expected to vary. The aim of this project was to test the stability of citation ratings of articles as the level of observation - hence the basis of normalisation - changes. A conventional classification of science based on ISI subject categories and their aggregates at various scales was used, namely at five levels: all science, large academic discipline, sub-discipline, speciality and journal. Among various normalisation methods, we selected a simple ranking method (quantiles), based on the citation score of the article in each particular aggregate (journal, speciality, etc.) it belonged to at each level. The study was conducted on articles in the full SCI range, for publication year 1998 with a four-year citation window. Stability is measured in three ways: overall comparison of article rankings; individual trajectory of articles; survival of the top-cited class across levels. Overall rank correlations on the observed empirical structure are benchmarked against two fictitious sets that keep the same embedded structure of articles but reassign citation scores either in a totally ordered or in a totally random distribution. These sets act respectively as a 'worst case' and 'best case' for the stability of citation ratings. The results show that: (a) the average citation rankings of articles substantially change with the level of observation (b) observation at the journal level is very particular, and the results differ greatly in all test circumstances from all the other levels of observation (c) the lack of cross-scale stability is confirmed when looking at the distribution of individual trajectories of articles across the levels; (d) when considering the top-cited fractions, a standard measure of excellence, it is found that the contents of the 'top-cited' set is completely dependent on the level of observation. The instability of impact measures should not be interpreted in terms of lack of robustness but rather as the co-existence of various perspectives each having their own form of legitimacy. A follow-up study will focus on the micro levels of observation and will be based on a structure built around bibliometric groupings rather than conventional groupings based on ISI subject categories.
The definitions of Experimental Development and Applied Research currently suggested by OECD bring about inconsistent R&D data. Here, coherent definitions, based on the criterion of generality, are proposed.
Patents have become increasingly important, especially over the past two decades. As patent office procedures have adapted to remain abreast of changing economic and scientific circumstances, it has also become increasingly important to define and analyse innovation more precisely. This paper introduces a simple new measure of innovation, the patent success ratio (PSR), or the ratio of successful patent applications to total patent applications. It has been argued in the extensive literature on innovation and technology policy that patents can serve as an accurate proxy for innovative activity or innovation. This paper suggests that PSR is a more accurate measure of how innovative activity has changed over time. A sensitivity analysis is conducted to assess the usefulness of the new PSR measure of innovation using annual US data for the period 1915-2001.
In a recent article Sombatsompop et al. (2004) proposed a new way of calculating a synchronous journal impact factor. Their proposal seems quite interesting and will be discussed in this note. Their index will be referred as the Median Impact Factor (MIF). I explain every step in detail so that readers with little mathematical background can understand and apply the procedure. Illustrations of the procedure are presented. Some attention is given to the estimation of the median cited age in case it is larger than ten year. I think the idea introduced by Sombatsompop, Markpin and Premkamolnetr has a great theoretical value as they are - to the best of my knowledge - the first ones to consider impact factors not using years as a basic ingredient, but an element of the actual form of the citation curve. The MIF is further generalized to the notion of a percentile impact factor.
The purpose of this study was to develop a method for characterizing the page and linking patterns related to dramatic events on the Web. As a specific case, we characterized Web pages linking to the set of pages on anthrax indexed by the Yahoo directory (generally acknowledged as a high quality directory). The sample of Web pages was collected shortly after anthrax became a matter of widespread concern (November 2001). The findings show that at that time the "typical" source page was either a news item or a page with a list of links. Most of the examined links were not navigational but linked to the target page in order to provide additional content. Many Web sites added hyperlinks to pages providing presumably authoritative and high quality information on anthrax rather than supplying the information themselves. The results show that Web authors link extensively to presumably "high quality" pages. The methods presented here can be utilized in order to characterize pages and linking patterns of Web pages linking to a set of predefined pages, and the findings of this specific study can serve as a basis for comparison.
We operationalize scientific output in a region by means of the number of articles (as in the SciSearch database) per year and technology output by means of the number of patent applications (as in the database of the European Patent Office) per priority year. All informetric analyses were done using the DIALOG online-system. The main research questions are the following: Which scientific and technological fields or topics are most influent within a region and which institutions or companies are mainly publishing articles or holding patents? Do the distributions of regional science and technology fields and of publishing institutions follow the well-known informetric function? Are there - as it is expected - only few fields and few institutions which dominate the region? Is there a connection between the economic power of a region and the regional publication and patent output? Examples studied in detail are seven German regions: Aachen, Dusseldorf, Hamburg, Koln (Cologne), Leipzig - Halle - Dessau, Munchen (Munich), and Stuttgart. Three different indicators were used, science and technology attraction of a region (number of scientific articles and patents), science and technology intensity (articles and patents per 1,000 inhabitants), and science and technology density (articles and patents per 1 billion EURO gross value added). Top region concerning both attraction and intensity is Munich, concerning density it is Aachen.
Following a brief historical account of the initial difficulties of introducing modern sciences, especially the Western art of independent scientific inquiry, into Iran, using data obtained from the ISI (http://access.isiproducts.com/trials) an attempt is made to analyze the apparent present successes of Iranian scientists on the international science market. Using the corresponding ISI data of the publications (1990-2003) of 24 selected young chemistry Ph.D. graduates and present faculty members at various internal academia, a quantitative and qualitative assessment (www.geocities.com/iipopescu) of their achievements has been attempted and the results related to the strengths and weaknesses of the present science policy of the country.
In this paper we report first results of our study on network characteristics of a reference-based, bibliographically coupled (BC) publication network structure. We find that this network of clustered publications shows different statistical properties depending on the age of the references used for building the network. A remarkable finding is that only the network based on all references within publications is characterized by a degree distribution with a power-law dependence. This structure, which is typical for scale-free networks, disappears when selecting references of a specific age for the clustering process. Changing the publication network as a function of reference age, allows "tuning through the episodic memory' of the nodes of the network. We find that the older the references, the more the network tends to change its structure towards a more exponential degree distribution.
The present paper addresses the objective of developing forward indicators of research performance using bibliometric information on the UK science base. Most research indicators rely primarily on historical time series relating to inputs to, activity within and outputs from the research system. Policy makers wish to be able to monitor changing research profiles in a more timely fashion, the better to determine where new investment is having the greatest effect. Initial (e.g. 12 months from publication) citation counts might be useful as a forward indicator of the long-term (e.g. 10 years from publication) quality of research publications, but - although there is literature on citation-time functions - no study to evaluate this specifically has been carried out by Thomson ISI or any other analysts. Here, I describe the outcomes of a preliminary study to explore these citation relationships, drawing on the UK National Citation Report held by Evidence Ltd under licence from Thomson ISI for OST policy use. Annual citation counts typically peak at around the third year after publication. I show that there is a statistically highly significant correlation between initial (years 1-2) and later (years 3-10) citations in six research categories across the life and physical sciences. The relationship holds over a wide range of initial citation counts. Papers that attract more than a definable but field dependent threshold of citations in the initial period after publication are usually among the top 1% (the most highly cited papers) for their field and year. Some papers may take off slowly but can later join the high impact group. It is important to recognise that the statistical relationship is applicable to groups of publications. The citation profiles of individual articles may be quite different. Nonetheless, it seems reasonable to conclude that leading indicators of research excellence could be developed. This initial study should now be extended across a wider range fields to test the initial outcomes: earlier papers suggest the model holds in economics. Additional statistical tests should be applied to explore and model the relationship between initial, later and total citation counts and thus to create a general tool for policy application.
The study explores the chronological growth of Indian Biotechnology. Applicability of Lotka's law has been examined for the authorship pattern. Productivity of authors is analyzed and a list of 35 authors publishing more than 10 publications is given. Bradford's law of scattering is used to identify the core journals which cover most of the research and development output of Indian Biotechnology. The study also shows the active authors, institutions and statewise distributions of Indian Biotechnology research output.
The contribution of Brazil to the database of the Institute for Scientific Information, ISI, has increased remarkably during the last years. Among the Brazilian research institutions, the publications of the University of Sao Paulo (USP) have been around 30% of the country's total publication within the ISI database. A similar share was found for USP's publications published in the 1980-1999 period and classified in the Life Sciences. This was observed in publications from both the highest impact factor journals and from those with the largest number of articles. We have found that the present share of USP's publications in some of the fields of the Life Sciences was much less than 30%, suggesting a gradual decentralization of the scientific activity in Brazil. The data point out that this set of USP's publications were concentrated in traditional and basic fields of biological research, where the focus is mainly oriented by international trends. The data suggest that USP's researchers have not been much devoted to some of the fields where research is oriented toward national issues.
Institutions and their aggregates are not the right units of analysis for developing a science policy with cognitive goals in view. Institutions, however, can be compared in terms of their performance with reference to their previous stages. KING's (2004) 'The scientific impact of nations' has provided the data for this comparison. Evaluation of the data from this perspective along the time axis leads to completely different and hitherto overlooked conclusions: a new dynamic can be revealed which points to a group of emerging nations. These nations do not increase their contributions marginally, but their national science systems grow endogenously. In addition to publications, their citation rates keep pace with the exponential growth patterns, albeit with a delay. The center of gravity of the world system of science may be changing accordingly.
There have been many attempts to evaluate Web spaces on the basis of the information that they provide, their form or functionality, or even the importance given to each of them by the Web itself. The indicators that have been developed for this purpose fall into two groups: those based on the study of a Web space's formal characteristics, and those related to its link structure. In this study we examine most of the webometric indicators that have been proposed in the literature together with others of our own design by applying them to a set of thematically related Web spaces and analyzing the relationships between the different indicators.
Surveys of the members of the American Astronomical Society identify how astronomers use journals and what features and formats they prefer. While every work field is distinct, the patterns of use by astronomers may provide a glimpse of what to expect of journal patterns and use by other scientists. Astronomers, like other scientists, continue to invest a large amount of their time in reading articles and place a high level of importance on journal articles. They use a wide variety of formats and means to get access to materials that are essential to their work in teaching, service, and research. They select access means that are convenient-whether those means be print, electronic, or both. The availability of a mature electronic journals system from their primary professional society has surely influenced their early adoption of e-journals.
An, innovator's personality along with the perceived attributes of an innovation predicts the rate of diffusion. The current study focuses on the personality factors that determine the likelihood of adoption of a technological innovation. To that end, the study distinguishes between global innovativeness and context-specific innovativeness. An information processing model was tested where technological innovativeness was purported to be indirectly influenced by an individual's global innovativeness, through its impact on communication and media use behaviors. The structural model was tested on two separate technology clusters, and partial support was found for linking sophistication in information search, and prior technology ownership to technological innovativeness.
In this study a perceived gap between the ideal and the reality of a community network (CN) is examined. Most proponents of CNs state that building a better physical community is their major service goal. However, there has been a concern that citizens might use the service simply as a means to connect to the Internet rather than as a means to connect to their communities. Using a survey research method (n = 213), users' perceptions of community aspects of CN service and the influence of such perceptions on their use were investigated. User demographics and alternative service accessibility were also examined as predictors of use. The present study found that the respondents were using the service mainly for general Internet features. More than two thirds of the respondents were not aware of the community content aspect of the service. Approximately 20% of respondents were identified as those whose perceptions of the community aspects actually affected their use of the service. They were both aware of community contents and using an additional Internet service provider. Findings suggest that the providers did not fully communicate the community aspects of the service with the users, while the user perception of community aspects is a key to further promotion of the service.
The VIBE (Visual Information Browsing Environment) prototype system, which was developed at Molde College in Norway in conjunction with researchers at the University of Pittsburgh, allows users to evaluate documents from a retrieved set that is graphically represented as geometric icons within one screen display. While the formal modeling behind VIBE and other information visualization retrieval systems is well known, user interaction with the system is not. This investigation tested the designer assumption that VIBE is a tool for a smart (expert) user and asked: What are the effects of the different levels of user expertise upon VIBE usability? Three user groups including novices, online searching experts, and VIBE system experts totaling 31 participants were tested over two sessions with VIBE. Participants selected appropriate features to complete tasks, but did not always solve the tasks correctly. Task timings improved over repeated use with VIBE and the nontypical visually oriented tasks were resolved more successfully than others. Statistically significant differences were not found among all parameters examined between novices and online experts. The VIBE system experts provided the predicted baseline for this study and the VIBE designer assumption was shown to be correct. The study's results point toward further exploration of cognitive preattentive processing, which may help to understand better the novice/expert paradigm when testing a visualized interface design for information retrieval.
Information filtering is a technique to identify, in large collections, information that is relevant according to some criteria (e.g., a user's personal interests, or a research project objective). As such, it is a key technology for providing efficient user services in any large-scale information infrastructure, e.g., digital libraries. To provide large-scale information filtering services, both computational and knowledge management issues need to be addressed. A centralized (single-agent) approach to information filtering suffers from serious drawbacks in terms of speed, accuracy, and economic considerations, and becomes unrealistic even for medium-scale applications. In this article, we discuss two distributed (multi-agent) information filtering approaches, that are distributed with respect to knowledge or functionality, to overcome the limitations of single-agent centralized information filtering. Large-scale experimental studies involving the well-known TREC data set are also presented to illustrate the advantages of distributed filtering as well as to compare the different distributed approaches.
End-user computing has become a well-established aspect of enterprise database systems today. End-user computing performance depends on the user-database interface, in which the data model and query language are major components. W e examined three prominent data models-the relational model, the Extended-Entity-Relationship (EER) model,and the Object-Oriented (00) model-and their query languages in a rigorous and systematic experiment to evaluate their effects on novice end-user computing performance in the context of database design and data manipulation. In addition, relationships among the performances for different tasks (modeling, query writing, query comprehension) were postulated with the use of a cognitive model for the query process, and are tested in the experiment. Structural Equation Modeling (SEM) techniques were used to examine the multiple causal relationships simultaneously. The findings indicate. that the EER and 00 models overwhelmingly outperformed the relational model in terms of accuracy for both database design and data manipulation. The associations between tasks suggest that data modeling techniques Would enhance query writing correctness, and query writing ability would contribute to query comprehension. This study provides a better and thorough understanding of the inter-relationships among these data modeling and task factors. Our findings have significant implications for novice end-user training and development.
Most of the science indicators used in the literature for characterizing the research activity and performance of nations are based on journal articles and citations. As an alternative we have examined the use of journal gatekeepers as real and useful science indicators. As an example, gatekeepers of some analytical chemistry core journals were used. To reveal the changes in the contributions of different nations to the field studied, a comparison has been made between the 1970-1974 and the 2002 data on gatekeepers of the same journals.
Query formation and expansion is an integral part of nearly every effort to search for information. In the work reported here we investigate the effects of domain knowledge and feedback on search term selection and reformation. We explore differences between experts and novices as they generate search terms over 10 successive trials and under two feedback conditions. Search attempts were coded on quantitative dimensions such as the number of unique terms and average time per trial, and as a whole in an attempt to characterize the user's conceptual map for the topic under differing conditions of participant-defined domain expertise. Nine distinct strategies were identified. Differences emerged as a function of both expertise and feedback. In addition, strategic behavior varied depending on prior search conditions. The results are considered from both a theoretical and design perspective, and have direct implications for digital library usability and metadata generation, and query expansion systems.
In this article recording evidence for data values in addition to the values themselves in bibliographic records and descriptive metadata is proposed, with the aim of improving the expressiveness and reliability of those records and metadata. Recorded evidence indicates why and how data values are recorded for elements. Recording the history of changes in data values is also proposed, with the aim of reinforcing recorded evidence. First, evidence that can be recorded is categorized into classes: identifiers of rules or tasks, action descriptions of them, and input and output data of them. Dates of recording values and evidence are an additional class. Then, the relative usefulness of evidence classes and also levels (i.e., the record, data element, or data value level) to which an individual evidence class is applied, is examined. Second, examples that can be viewed as recorded evidence in existing bibliographic records and current cataloging rules are shown. Third, some examples of bibliographic records and descriptive metadata with notes of evidence are demonstrated. Fourth, ways of using recorded evidence are addressed.
The value of low frequency words for subject-based academic Web site clustering is assessed. A new technique is introduced to compare the relative clustering power of different vocabularies. The technique is designed for word frequency tests in large document clustering exercises. Results for the Australian and New Zealand academic Web spaces indicate that low frequency words are useful for clustering academic Web sites along subject lines; removing low frequency words results in sites becoming, on average, less dissimilar to sites from other subjects.
An analysis of 2058 papers published by Chinese authors and 2678 papers published by Indian authors in the field of computer science during 1971-2000 indicates that India's output is significantly higher than the Chinese output. However, China is catching up fast. Chinese researchers prefer to publish their research results in domestic journals, while Indian researchers prefer to publish their research results in journals published in the advanced countries of the West. Also the share of papers in journals covered by SCI for India was higher than from China. However, no significant difference has been observed in the impact of the research output of the two countries as seen by different impact indicators. Team research is more common in India as compared to China.
This paper presents the results of a Data Envelopment Analysis of Operations Research/ Management Science journals on two questions: the duration of the refereeing/publication process and the relation between the length of the articles published and their impact. The second question uses data publicly available through the ISI Journal Citation Reports database and through the journals contents while for the first question data had to be gathered from the journal editors through an e-mail survey. The analysis gives cues about the amount each journal should aim to reduce their lead times, setting efficiency targets both on the average time from submission to first editorial decision and on the time from final editorial decision to publication. Similarly, for each journal, efficiency targets for the average article length are obtained. Our promoting of refereeing efficiency and paper length efficiency assumes that no loss of quality in the peer review process or in the knowledge transmission process needs to happen.
The structural similarity between hyperlinks and citations has encouraged information scientists to apply bibliometric techniques to the Web. University links have been previously validated as a new data source through significant statistical correlations between link and research measures, together with identification of motivations for hyperlink creation at the university level. Many investigations have been conducted for university interlinking, but few for departments. University Web sites are large compared with departmental Web sites, and significant statistical results are more easily obtained. Nevertheless, universities are multidisciplinary by nature and disciplines may employ the Web differently, thus patterns identified at the university level may hide subject differences. This paper validates departmental interlinking, using Physics, Chemistry and Biology departments from Australia, Canada and the UK.
Although many link patterns have been identified at the university level, departmental interlinking has been relatively ignored. Universities are multidisciplinary by nature and various disciplines may employ the Web differently, thus patterns identified at the university level may hide subject differences. Departments are typically subject-oriented, and departmental interlinking may therefore illustrate interesting disciplinary linking patterns, perhaps relating to informal scholarly communication. The aim of this paper is to identify whether and how link patterns differ along country and disciplinary lines between similar disciplines and similar countries. Physics, Chemistry and Biology departments in Australia, Canada and the UK have been chosen. In order to get a holistic picture of departments Web use profiles and link patterns, five different perspectives are identified and compared for each set of departments. Differences in link patterns are identified along both national and disciplinary lines, and are found to reflect offline phenomena. Along national lines, a likely explanation for the difference is that countries with better research performances make more general use of the Web; and, with respect to international peer interlinking, countries that share more scholarly communication tend to interlink more with each other. Along disciplinary lines, it seems that departments from disciplines which are more willing to distribute their research outputs tend to make more general use of the Web, and also interlink more with their national and international peers.
Which signals are important in gaining attention in science? For a group of 1,371 scientific articles published in 17 demography journals in the years 1990-1992 we track their influence and discern which signals are important in receiving citations. Three types of signals are examined: the author's reputation (as producer of the idea), the journal (as the broker of the idea), and the state of uncitedness (as an indication of the assessment by the scientific community of an idea). The empirical analysis points out that, first, the reputation of journals plays an overriding role in gaining attention in science. Second, in contrast to common wisdom, the state of uncitedness does not affect the future probability of being cited. And third, the reputation of a journal may help to get late recognition (so-called sleeping beauties) as well as generate 'flash-in-the-pans': immediately noted articles but apparently not very influential in the long run.
Based on the convolution formula of the disturbed aging distribution (EGGHE & ROUSSEAU, 2000) and the transfer function model of the publishing delay process, we establish the transfer function model of the disturbed citing process. Using the model, we make simulative investigations of disturbed citation distributions and impact factors according to different average publication delays. These simulative results show that the bigger increment the average publication delays in a scientific field, the larger shift backwards of the citation distribution curves and the more fall the impact factors of journals in the field. Based on some theoretical hypotheses, it is shown that there exists theoretically an approximate inverse linear relation between the field (or discipline) average publication delay and the journal impact factor.
We analyze a set of three databases at different levels of aggregation: (a) a database of approximately 106 publications from 247 countries published from 1980-2001, (b) a database of 508 academic institutions from the European Union (EU) and 408 institutes from the United States for the 11-year period of 1991-2001, and (c) a database of 2,330 Flemish authors published in the period from 1980-2000. At all levels of aggregation we find that the mean annual growth rates of publications is independent of the number of publications of the various units involved. We also find that the standard deviation of the distribution of annual growth rates decays with the number of publications as a power law with exponent approximate to 0.3. These findings are consistent with those of recent studies of systems such as the size of research and development funding budgets of countries, the research publication volumes of U.S. universities, and the size of business firms.
In this article, ethical decision-making methods for creating, revising, and maintaining knowledge representation and organization systems are described, particularly in relation to the global use of these systems. The analysis uses a three-level model and the literature on ethically based decision-making in the social and technical sciences. In addition, methods for making these kinds of decisions in an ethical manner are presented. This multidisciplinary approach is generalizable to other information areas and is useful for encouraging the development of ethics policies for knowledge representation and organization systems and for other kinds of systems or institutions.
We report on a study that was undertaken to better understand search and navigation behavior by exploiting the close association between the process underlying users' query submission and the navigational trails emanating from query clickthroughs. To our knowledge, there has been little research towards bridging the gap between these two important processes pertaining to users' online information searching activity. Based on log data obtained from a search and navigation documentation system called AutoDoc, we propose a model of user search sessions and provide analysis on users' link or clickthrough selection behavior, reformulation activities, and search strategy patterns. We also conducted a simple user study to gauge users' perceptions of their information seeking activity when interacting with the system. The results obtained show that analyzing both the query submissions and navigation starting from query clickthrough, reveals much more interesting patterns than analyzing these two processes independently. On average, AutoDoc users submitted only one query per search session and entered approximately two query terms. Specifically, our results show how AutoDoc users are more inclined to submit new queries or resubmit modified queries than to navigate by link-following. We also show that users' behavior within this search system can be approximated by Zipf's Law distribution.
In this article concentration (i.e., inequality) aspects of the functions of Zipf and of Lotka are studied. Since both functions are power laws (i.e., they are mathematically the same) it suffices to develop one concentration theory for power laws and apply it twice for the different interpretations of the laws of Zipf and Lotka. After a brief repetition of the functional relationships between Zipf's law and Lotka's law, we prove that Price's law of concentration is equivalent with Zipf's law. A major part of this article is devoted to the development of continuous concentration theory, based on Lorenz curves. The Lorenz curve for power functions is calculated and, based on this, some important concentration measures such as the ones of Gini, Theil, and the variation coefficient. Using Lorenz curves, it is shown that the concentration of a power law increases with its exponent and this result is interpreted in terms of the functions of Zipf and Lotka.
The presence of trivial words in text databases can affect record or concept (words/phrases) clustering adversely. Additionally, the determination of whether a word/phrase is trivial is context-dependent. Our objective in the present article is to demonstrate a context-dependent trivial word filter to improve clustering quality. Factor analysis was used as a context-dependent trivial word filter for subsequent term clustering. Medline records for Raynaud's Phenomenon were used as the database, and words were extracted from the record abstracts. A factor matrix of these words was generated, and the words that had low factor loadings across all factors were identified, and eliminated. The remaining words, which had high factor loading values for at least one factor and therefore were influential in determining the theme of that factor, were input to the clustering algorithm. Both quantitative and qualitative analyses were used to show that factor matrix filtering leads to higher quality clusters and subsequent taxonomies.
In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of k, the number of dimensions retained under latent semantic indexing (LSI). Amended parallel analysis is an elaboration of Horn's parallel analysis, which advocates retaining eigenvalues larger than those that we would expect under term independence. Amended parallel analysis operates by deriving confidence intervals on these "null" eigenvalues. The technique amounts to a series of nonparametric hypothesis tests on the correlation matrix eigenvalues. In the study, APA is tested along with four established dimensionality estimators on six standard IR test collections. These estimates are evaluated with regard to two IR performance metrics. Additionally, results from simulated data are reported. In both rounds of experimentation APA performs well, predicting the best values of k on 3 of 12 observations, with good predictions on several others, and never offering the worst estimate of optimal dimensionality.
The rapid development of Web sites providing extensive coverage of a topic, coupled with the development of powerful search engines (designed to help users find such Web sites), suggests that users can easily find comprehensive information about a topic. In domains such as consumer healthcare, finding comprehensive information about a topic is critical as it can improve a patient's judgment in making healthcare decisions, and can encourage higher compliance with treatment. However, recent studies show that despite using powerful search engines, many healthcare information seekers have difficulty finding comprehensive information even for narrow healthcare topics because the relevant information is scattered across many Web sites. To date, no studies have analyzed how facts related to a search topic are distributed across relevant Web pages and Web sites. In this study, the distribution of facts related to five common healthcare topics across high-quality sites is analyzed, and the reasons underlying those distributions are explored. The analysis revealed the existence of few pages that had many facts, many pages that had few facts, and no single page or site that provided all the facts. While such a distribution conforms to other information-related phenomena, a deeper analysis revealed that the distributions were caused by a trade-off between depth and breadth, leading to the existence of general, specialized, and sparse pages. Furthermore, the results helped to make explicit the knowledge needed by searchers to find comprehensive healthcare information, and suggested the motivation to explore distribution-conscious approaches for the development of future search systems, search interfaces, Web page designs, and training.
Papers in journals are indexed in bibliographic databases in varying degrees of overlap. The question has been raised as to whether papers that appear in multiple databases (highly overlapping) are in any way more significant (such as being more highly cited) than papers that are indexed in few databases. This paper uses a dataset from fuzzy set theory to compare low overlap papers with high overlap ones, and finds that more highly overlapping papers are in fact more highly cited.
This study presents a bibliometric analysis of scientific output in the area of Differential Item Functioning (DIF), the aim being to offer an overview of research activity in this field and characterise its most important aspects and their evolution over the last quarter of the 20th century, thus providing data regarding the basis on which this activity is being developed at the beginning of the 21st century. The analysis make, use of the Web of Science database, the search being restricted to articles published between 1975 and 2000 and which contain the terms 'differential item functioning', 'DIF or 'item bias'. The various analyses focus on the presentation of publication frequencies and percentages, as well as on the application of Bradford's law of scattering and Lotka's law.
The purpose of this study is to analyze and compare journal citation data, from Journal Citation Reports on the Web 2000, of general and internal medicine and Surgery. The source items and five kinds of citation data, i.e. citation counts, impact factor, immediacy index, citing half-life and cited half-life are examined and the correlation between each of the fifteen pairs of citation data is determined based on the Pearson correlation tests. The Fisher's Z-transform was employed to test the significant difference between the Pearson correlation coefficient for each pair of citation data of these two subject areas. The following results of this work reveal: the frequently published journals are cited more frequently and also with high impact factor and immediacy index, in addition, they are usually accompanied with short citing half-life (i.e., usually cite current literature). The impact factor and immediacy index has significant correlation with citation Counts. A significant correlation also exists between impact factor and immediacy index. However there is no correlation between cited half-life and other citation data, except citing half-life. For journals of general and internal medicine and surgical medicine, there are no significant difference of the Pearson correlation coefficient for the following pair of citation data: source items and citation counts, source items and impact factor, source items and citing half-life, citation counts and citing half-life, impact factor and citing half-life, immediacy index and citing half-life, and cited half-life and citing half-life.
This paper uses United States patent classification analysis to study the development of core technologies and key industries in Taiwan over the last 25 years, from 1978 to 2002. After counting the number of Taiwan-held United States granted Utility patents, the authors divide the years into three phases: from 1978 to 1994, with less than 500 patents each year; from 1995 to 1999, with 500-2,500 patents each year; from 2000 to 2002, with annual patents greater than 2,500. The results show that for both Taiwan's core technologies and key industries, there was a great diversity at the first phase, while a mainstream forms and matures at the second and the third phases. However, industrial development at the third phase was more concentrated and focused than previous ones. Overall, Taiwan has clearly moved from a manufacturing-based economy to an innovation-based one, with its focus on high-tech industries during the previous 25 years.
We present the results on the relationship between the bonding number (the number of links among the authors of an article) and a measure of group cohesiveness on a Likert-type scale in three research areas, Biotechnology, Mathematics and Physics, at the National University of Mexico (UNAM). We found a difference between disciplines with regard to group size, and although there is little difference between disciplines in cohesiveness, results suggest that there is a direct relationship between the level of cohesiveness and the bonding number in Physics and Biotechnology, but not in Mathematics where the groups are much smaller.
Composite science and technology (S&T) indices are essential to overall understanding and evaluation of national S&T status, and to formulation of S&T policy. However, only a few studies on making these indices have been conducted so far since a number of complications and uncertainties are involved in the work. Therefore, this study proposes a new approach to employ fuzzy set theory and to make composite S&T indices, and applies it. The approach appears to Successfully integrate various S&T indicators into three indices: R&D input, R&D output, and economic Output. We also compare Korea's S&T indices with those of five developed countries (France, Germany, Japan, the United Kingdom, and the United States) to obtain some implications of the results for Korea's S&T.
For all rankings Of Countries research output based on number of publications or citations compared with population, GDP, R&D and public R&D expenses, and other national characteristics the counting method is decisive. Total counting (full credit to a country when at least one of the authors is from this country) and Fractional Counting (a country receives a fraction of full credit for a publication equal to the fraction of authors from this country) Of publications give widely different results. Counting methods must be stated, rankings based on different counting methods cannot be compared, and Fractional Counting is to be preferred.
Citation distributions for 1992, 1994, 1996, 1997, 1999, and 2001, which were published in the 2004 report of the National Science Foundation, USA, are analyzed. It is shown that the ratio of the total number of citations of any two broad fields of science remains close to constant over the analyzed years. Based on this observation, normalization of total numbers of citations with respect to the number of citations in mathematics is suggested as a tool for comparing scientific impact expressed by the number of citations in different fields of science.
This article introduces and validates a method for identifying technologically similar organizations, industries, or regions by applying the techniques from information science for term similarity to international patent classifications. Several applications of the method are explored, including identifying hidden competitive threats, finding potential acquisition targets, locating university expertise within a technology, identifying competitor strategy shifts, and more. One advantage of the method is that it is size invariant, meaning, for example, that it is possible for a huge corporation to identify smaller firms in its space before they become significant competitors. Another advantage is that technologically similar organizations can be identified on a large scale without any particular knowledge of the technology or business of either source organizations or target organizations.
When a client interacts with an expert, e.g., a doctor, it falls upon the expert to ask questions that steer the process towards fulfilling the client's needs. This is most efficient given that the expert has more knowledge and a broader view of possible illnesses and treatments. On the other hand, when faced with an information retrieval (IR) task, most IR systems leave to the client the task of coming up with queries. We propose an information retrieval framework that assumes the responsibility of leading the users to the information, thus increasing efficiency and satisfaction.
Comparative study of knowledge management (KM) promises to lead to more effective knowledge use in all cultural environments. This pilot study compares KM priorities, needs, tools, and administrative structure components in large Chinese and American universities. General KM theory and literature related to KM in higher education are analyzed to develop the four components of the study. Comparative differences in KM practice at large Chinese and American universities are analyzed for each component. A correlation matrix reveals statistically significant co-variation among all but one of the study components. Four conclusions related to comparative KM and suggestions for future research are presented.
Like most activities in the world, scientific evolution has its own rhythm. How can this evolutionary rhythm be described and made visible? Do different fields have different rhythms, and how can they be measured? In order to answer these questions a relative indicator, called R-sequence, was designed. This indicator is time dependent, derived from publication and citation data, but independent of the absolute number of publications, as well as the absolute number of citations, and can therefore be used in a comparison of different scientific fields, nations, institutes, or journals. Two calculation methods of the R-sequence-the triangle method and the parallelogram method-are introduced. As a case study JASIS(T)'s R-sequence has been obtained.
Today information-intensive work tasks in professional settings involve highly dynamic information utilization in which information seeking and searching tasks are taking a more central role. This article considers the concept of task in the context of information studies in order to provide a definitional clarity for task-based information seeking and retrieval studies. We identify (1) the central task levels as well as (2) the kinds of dimensions connected to the levels from the perspective of information studies. The analysis is aimed to serve as a conceptual starting point for empirical studies in the research area. The focus is on some central aspects of tasks that are recognized within information studies as well as related research areas (e.g., organizational studies). We define two levels of information-related subtasks: information seeking tasks and information search tasks. Information retrieval tasks are explicitly considered as a specific type of information search task. We describe differences and connections between these task levels. Finally, the implications of the proposed conceptual framework for information studies are discussed.
A questionnaire was distributed to local union officials in Illinois in order to determine the officials' use of various types of libraries, their satisfaction with their experience in using the libraries, the problems they encountered in library use, and their opinion of various ways in which libraries might be made more useful to them. They were also asked whether they had had training in how to find information. Respondents to the survey used more than one type of library, and their union role had an impact on which type they were likely to use. They used different types of libraries to find different types of information. In general they were satisfied with their library experience, but they found library collections inadequate for their needs. Respondents who had had training in how to find information appeared to use libraries more but differed little in the frequency or types of problems encountered from those who had no training. When asked their opinion on various suggestions for improving library service to local union officials, they preferred measures that gave greater emphasis to increasing labor materials in library collections. The findings of this study, combined with those of our earlier study (Chaplan & Hertenstein, 2002), suggest that an information seeking model developed by Wilkinson (2001) may be useful in explaining union officials' information seeking behavior.
The number and type of Web citations to journal articles in four areas of science are examined: biology, genetics, medicine, and multidisciplinary sciences. For a sample of 5,972 articles published in 114 journals, the median Web citation counts per journal article range from 6.2 in medicine to 10.4 in genetics. About 30% of Web citations in each area indicate intellectual impact (citations from articles or class readings, in contrast to citations from bibliographic services or the author's or journal's home page). Journals receiving more Web citations also have higher percentages of citations indicating intellectual impact. There is significant correlation between the number of citations reported in the databases from the Institute for Scientific Information (ISI, now Thomson Scientific) and the number of citations retrieved using the Google search engine (Web citations). The correlation is much weaker for journals published outside the United Kingdom or United States and for multidisciplinary journals. Web citation numbers are higher than ISI citation counts, suggesting that Web searches might be conducted for an earlier or a more fine-grained assessment of an article's impact. The Web-evident impact of non-UK/USA publications might provide a balance to the geographic or cultural biases observed in ISI's data, although the stability of Web citation counts is debatable.
Statistical relationships between downloads from ScienceDirect of documents in Elsevier's electronic journal Tetrahedron Letters and citations to these documents recorded in journals processed by the Institute for Scientific Information/Thomson Scientific for the Science Citation Index (SCI) are examined. A synchronous approach revealed that downloads and citations show different patterns of obsolescence of the used materials. The former can be adequately described by a model consisting of the sum of two negative exponential functions, representing an ephemeral and a residual factor, whereas the decline phase of the latter conforms to a simple exponential function with a decay constant statistically similar to that of the downloads residual factor. A diachronous approach showed that, as a cohort of documents grows older, its download distribution becomes more and more skewed, and more statistically similar to its citation distribution. A method is proposed to estimate the effect of citations upon downloads using obsolescence patterns. It was found that during the first 3 months after an article is cited, its number of downloads increased 25% compared to what one would expect this number to be if the article had not been cited. Moreover, more downloads of citing documents led to more downloads of the cited article through the citation. An analysis of 1,190 papers in the journal during a time interval of 2 years after publication date revealed that there is about one citation for every 100 downloads. A Spearman rank correlation coefficient of 0.22 was found between the number of times an article was downloaded and its citation rate recorded in the SCI. When initial down loads-defined as downloads made during the first 3 months after publication-were discarded, the correlation raised to 0.35. However, both outcomes measure the joint effect of downloads upon citation and that of citation upon downloads. Correlating initial downloads to later citation counts, the correlation coefficient drops to 0.11. Findings suggest that initial downloads and citations relate to distinct phases in the process of collecting and processing relevant scientific information that eventually leads to the publication of a journal article.
Our purpose in this study is to inductively reorganize software interface menu items based on a user's process model. The proposed menu interface in this study used direct users' nput, such as goals and strategies for solving their information needs, to reorganize and re-label menus. To assess its effectiveness, efficiency, and user satisfaction with actual users, we implemented and compared this new menu version to the original interface that was based upon a traditional categorical menu organization. The significance of this study is that it incorporates user process modeling into the design of the user interface, providing insights into the impact of such modeling on the usability of an information system. Results from the usability testing do indicate that the proposed menu and the traditional menu are similarly effective for users in terms of task completion time and accuracy. User preferences and debriefing comments from usability testing also indicate users preferred the user-process based arrangement of menu items as displayed. However, the types of tasks (different problem type) suggest significant differences for results in task completion time and in accuracy, sometimes favoring the new version. In other words, usable and effective menu organization depends more on the types of tasks and the domain of knowledge than mere menu organization, although menu organization is a factor in the process.
Web searchers typically fail to view search results beyond the first page nor fully examine those results presented to them. In this article we describe an approach that encourages a deeper examination of the contents of the document set retrieved in response to a searcher's query. The approach shifts the focus of perusal and interaction away from potentially uninformative document surrogates (such as titles, sentence fragments, and URLs) to actual document content, and uses this content to drive the information seeking process. Current search interfaces assume searchers examine results document-by-document. In contrast our approach extracts, ranks, and presents the contents of the top-ranked document set. We use query-relevant top-ranking sentences extracted from the top documents at retrieval time as fine-grained representations of top-raked document content and, when combined in a ranked list, an overview of these documents. The interaction of the searcher provides implicit evidence that is used to reorder the sentences where appropriate. We evaluate our approach in three separate user studies, each applying these sentences in a different way. The findings of these studies show that top-ranking sentences can facilitate effective information access.
Although its use in informetrics dates back at least to 1987, data analysed in a recent paper by SHAN et al. (2004) has rekindled interest in the generalized Waring distribution (GWD). The purpose of this note is to show that for many purposes, the distribution is best motivated via a familiar informetric scenario of a population of "sources" producing "items" over time leading to a stochastic process from which the univariate, bivariate and multivariate forms of the GWD are natural consequences. Earlier work and possible future applications are highlighted. Many of the results are due to Irwin and Xekalaki while much of the material on the Waring process has been previously available in an unpublished research report by the author (Burrell, 1991).
Research quality is the cornerstone of modern science, it is used in the understanding of reputational differences among scientific and academic institutions. Traditionally, scientific activity is measured by a set of indicators and well-established bibliometric techniques based on the number of academic papers published in top-ranked journals or on the number of citations of these papers. These indicators are usually critical in measuring differences in research performance, both at individual and at scientific institutional levels. In this paper, we introduce an alternative and complementary set of indicators based on the results of competition for research funding, that aims to enlarge the framework in which research performance has traditionally been measured. Theoretical support for this paper is found in the role that the search for funding plays in the researchers' credibility cycle as well as in peer review, the basic instrument for the allocation of public R&D funds. Our method analyses the outcomes of the researchers' struggle for funding, using data from research proposal applications and awards, as the unit of observation, and aggregating them by research institutions to rank them in relative scales of research competitiveness.
This paper examines the relationship between the antitrust environment and innovation in the US economy, where innovation is measured by patent activity. The hypothesis to be tested is whether antitrust enforcement activity, as measured by the number of civil filings of the US Department of Justice, has had a significant impact on the level of innovation in the US economy, after adjusting for other factors that have an impact on innovation, such as research and development expenditures and real economic growth. Impacts of civil antitrust case filings, criminal antitrust case filings and total US Department of Justice antitrust case filings on patent activity in the USA are estimated for the period 1953-2000. The empirical results show that civil case filings have a statistically significant impact on innovation.
The dramatic consequences of the Nazi-power for science are described extensively in various articles and books. Recent progress in information systems allows a more quantitative reflection. Literature databases ranging back to the beginning of the 20th century, the ISI citation indexes ranging back to 1945 and sophisticated search systems are suitable tools for this purpose. In this study the overall break in the scientific productivity and that of selected physical journals are examined. An overview of the citation impact of some 50 leading physicists is given. The productivity before and after departure is analyzed and, in some cases, connected to biographical data.
An attempt is made to shed light on part of Granada University ' s female academics ' past in what was a critical period in Spain ' s history (1975-1982), referring of course to the political transition from dictatorship to democracy. The period studied is 1975-1990, in which an analysis is made of a section of the teaching staff, using part of the female staff as the sample due to their being the most socially affected during this period. Firstly, a study is carried out on the teaching staff, both male and female, to verify the staff situation at the university using the gender indicator. Secondly, the female teachers ' scholarly output is studied; due to the fact that areas of study are very varied, it has been considered appropriate to apply the study to monographs, scholarly publications articles and doctoral theses. Moreover, because the study intends to be as exhaustive as possible, various databases and catalogues have been consulted which collect the documental typology to be used in the analysis.
This paper presents a new map representing the structure of all of science, based on journal articles, including both the natural and social sciences. Similar to cartographic maps of our world, the map of science provides a bird's eye view of today's scientific landscape. It can be used to visually identify major areas of science, their size, similarity, and interconnectedness. In order to be useful, the map needs to be accurate on a local and on a global scale. While our recent work has focused on the former aspect,1 this paper summarizes results on how to achieve structural accuracy. Eight alternative measures of journal similarity were applied to a data set of 7,121 journals covering over 1 million documents in the combined Science Citation and Social Science Citation Indexes. For each journal similarity measure we generated two-dimensional spatial layouts using the force-directed graph layout tool, VxOrd. Next, mutual information values were calculated for each graph at different clustering levels to give a measure of structural accuracy for each map. The best co-citation and inter-citation maps according to local and structural accuracy were selected and are presented and characterized. These two maps are compared to establish robustness. The inter-citation map is then used to examine linkages between disciplines. Biochemistry appears as the most interdisciplinary discipline in science.
Actually the Matthew effect for countries (MEC) was discovered at Holy Eve 1994. Since then more than 30 papers of mine - many of them together with Andrea Scharnhorst and Eberhard Bruckner - appeared in journals or were read at conferences of international and national scientific societies. (1-6) It is not the task of this paper to present a bibliometric analysis of those paper's impact, nor to give any detailed historical description of the surprising findings following the discovery. I'd rather try to unfold - from the heightened standpoint of our days - a new summary of the Matthew phenomenon, because I am convinced it will not lose fascination and importance in the years to come.
This article explores the use of the term public domain in the American context and finds that the symbol is subject to multiple meanings. Using historical and content analysis, the analysis explores the various uses of the term and provides a preliminary taxonomy for subsequent analysis and theory building. In conclusion, it suggests that more coherent information policies regarding national and international information access, creativity, governance, and private property rights will require a better understanding and delineation of the use of public domain in legislative and common practice.
In the cycle of scholarly communication, scholars play the role of both consumer and contributor of intellectual works within the stores of recorded knowledge. In the digital environment scholars are seeking and using information in new ways and generating new types of scholarly products, many of which are specialized resources for access to research information. These practices have important implications for the collection and organization of digital access resources. Drawing on a series of qualitative studies investigating the information work of scientists and humanities scholars, specific information seeking activities influenced by the Internet and two general modes of information access evident in research practice are identified in this article. These conceptual modes of access are examined in relation to the digital access resources currently being developed by researchers in the humanities and neuroscience. Scholars' modes of access and their "working" and "implicit" assemblages of information represent what researchers actually do when gathering and working with research materials and therefore provide a useful framework for the collection and organization of access resources in research libraries.
Knowledge is a critical resource that can help organizations to sustain strategic advantage in competitive environments. Organizations in Asia and elsewhere are turning to knowledge management (KM) initiatives and technologies to leverage their knowledge resources. As a key component of KM initiatives, electronic knowledge repositories (EKRs) are deployed by organizations to store codified knowledge for future reuse. Although EKRs have been used for some time, there is a lack of understanding of what motivate employees' usage of an EKR. This study formulates and empirically tests a theoretical model relating potential antecedents to EKR usage for knowledge seeking. The model was operationalized and the survey instrument subjected to a conceptual validation process. The survey was administered to 160 knowledge professionals in public-sector organizations in Singapore who had accessed EKRs in the course of their work. Results reveal that perceived output quality directly affects EKR usage for knowledge seeking. Further, resource availability affects EKR usage for knowledge seeking particularly when task tacitness is low and incentives affect EKR usage particularly when task interdependence is high. Implications of these results for further research and improving EKR implementation are discussed.
The design and implementation of an ontology-based Web retrieval (ONTOWEB) system is described; ONTOWEB allows the semantic search of the Web resources of international organizations such as the World Bank and the Organisation for Economic Co-operation and Development (OECD). A firm's knowledge management project is introduced first, followed by a description of the ONTOWEB system's design, implementation, and evaluation. The ONTOWEB system has two components: databases and an ontology-based search engine. The ontology-based search engine is a tool used to query the information that has been loaded into the database. Last, to evaluate the system, an experiment was conducted to compare the performance of the proposed system with that of Internet search engines in terms of relevance and search time. This study shows that ontologies can be used not only to improve precision, but also to reduce the search time. Because of the expense of annotating resources, the domains that contain the most valuable knowledge, such as the medical and the business sectors, are prime areas for future ontology applications.
The academic literature suggests that the extent of exporting by multinational corporation subsidiaries (MCSs) depends on their strategic role in the multinational corporation (MNC), their age and size, and whether their products are targeted at niche or commodity markets. In particular, it is claimed that MNCs seek to invest in a particular country if its resources adopt a vertically integrated structure, if the country grants regional or global sales mandates to their subsidiaries, or if it has been established in a host market for a longer time and is thus more likely to promote subsidiary exports. Our aim in this article is to model the complex export pattern behavior of multinational corporation subsidiaries in Malaysia using a Takagi-Sugeno fuzzy inference system. The proposed fuzzy inference system (FIS) is optimized by using neural network learning and evolutionary computation. Empirical results clearly show that the proposed approach could model the export behavior reasonably well compared to a direct neural network approach.
Designing an interdisciplinary graduate program in knowledge management requires a good understanding of knowledge processes and the ability to differentiate between information management and knowledge management. Given the complexity of knowledge and the nature of its existence, there is a need for graduate programs to go beyond information management and include in the curriculum disciplines that deal with social, cultural, and economic issues such as communication, cognitive science, and business. An understanding of the interdisciplinary nature of knowledge management is necessary for a more balanced and practical approach to the development of a knowledge management curriculum. In this article, the design and development of an interdisciplinary graduate program in knowledge management at Nanyang Technological University in Singapore is reported. The initiation of the program was influenced by the strong demand from the public sector in Singapore for knowledge management professionals. It was developed in close association with the information studies program at Nanyang Technological University. In the first year, the program attracted 230 applicants, of which 45 were selected-22 students came from the public sector and 23 students came from the private sector.
Knowledge management is a discipline that has rapidly gained attention from both practitioners and academics over the last decade. However, the number of simulation games designed for knowledge management education has been limited. This is largely due to the emerging nature of knowledge management, whose domain the established gaming and simulation community has yet to enter. For this reason, the value and relevance of knowledge management simulation games is highlighted in this article by detailing the design and implementation of a simulation game entitled The Chief Knowledge Officer (CKO). The study was intended to meet two objectives: (a) to provide a template for designing knowledge management simulation games, and (b) to determine the effectiveness of CKO through a pretest-posttest research design. An empirical study which involved 32 final-year Business Studies students reading an elective module entitled Knowledge Management Systems in an institute of higher education in Singapore was conducted. The findings confirmed that CKO was a viable and effective instructional tool for imparting knowledge to the participants. In addition, the scores obtained from CKO had a moderating effect on the participants' attitude towards the subject matter.
To study the knowledge creation process, we introduce a conceptual framework that captures the major goals and features of research organizations. The knowledge efficiency of research groups is then empirically studied. The budget of the projects and size of the research groups are inputs of the projects. To make the assessment more reasonable, two-dimensional indicators, including a domestic impact factor and an international impact factor, are jointly used to evaluate the research outputs for Chinese research groups through a Data Envelopment Analysis approach with preferences. Through comparisons of groups with the highest and lowest efficiency, we discover the critical factors influencing productivity and efficiency of these research groups based on the proposed framework. Finally, we provide some management suggestions for research groups to improve their knowledge creation efficiency.
Influenced by the concept of a "Knowledge Economy," knowledge management (KM) has been receiving a lot of attention in the field of business administration recently. In the field of Library and Information Science, corporate librarians working in the information centers are mostly affected by KM either in their working environment or in their daily operations' role. Headed by the Special Library Association (SLA), a series of studies about the working environment and the changing role of corporate librarians in the last 10 years had been done in the United States. Due to differences in politics, economics, and cultures between Taiwan and Western countries, the organizational structure and corporate culture of business is not the same. Therefore, local studies on similar topics are needed. Our purpose in this study is to explore the influence of knowledge management on the working environment and the changing roles of corporate librarians in Taiwan.
This paper elaborates on the Triple Helix model for measuring the emergence of a knowledge base of socio-economic systems. The knowledge infrastructure is measured using multiple indicators: webometric, scientometric, and technometric. The paper employs this triangulation strategy to examine the current state of the innovation systems of South Korea and the Netherlands. These indicators are there after used for the evaluation of the systemness in configurations of university-industry-government relations. South Korea is becoming somewhat stronger than the Netherlands in terms of scientific and technological outputs and in terms of the knowledge-based dynamics; South Korea's portfolio is more traditional than that of the Netherlands. For example, research and patenting in the biomedical sector is underdeveloped. In terms of the Internet-economy, the Netherlands seem oriented towards global trends more than South Korea; this maybe due to the high component of services in the Dutch economy.
The purpose of this study is to quantify and compare the publication and citation output of the biggest faculties of economics and social sciences in Germany. Various publication and citation measures based upon Social Science Citation Index (SSCI) data are used to explore the comparative strengths and weaknesses of ten academic fields at the named faculties. To reflect the varying size of the fields and faculties, output measures as well as productivity measures are explicitly considered. From a bibliometric perspective empirical results demonstrate that various measures are necessary to adequately identify the comparative strengths and weaknesses of entire faculties and of selected disciplines within faculties.
The literature on Terrorism and National Security (NS), and Homeland Security (HS) presents two sides of a coin: one side demonstrates the problematic nature of terrorism and asks for solutions; the other side tries to find a response and solutions to the problem. It was expected that the NS literature would emanate from the same source material as the HS publications. Analysis of the literature of terrorism, homeland security, and national security on Science Citation Index (SCI) has shown that the material on terrorism and NS stems from the same scientific sources; that is, the Social Sciences. In contrast, the HS scientific literature originates in the exact sciences, engineering, and life and environmental sources. The three kinds of literature have grown remarkably in recent years; however, cross-section search strategy between terrorism and HS studies yields small retrieval sets. This means that few articles both present the problem and propose possible solutions. Currently, HS is on one side of the scholarly arena, and NS and terrorism literature on the other side; they advance mostly in lines parallel to each other, but as the researcher moves from observing the core scientific literature toward the more general material, this state of affairs changes. Another analysis of a multimedia database, WorldCatalog (which indexes mostly books, but also videos and computer materials, both scientific and popular) demonstrates a different trend; the same publications deal with both terrorism and HS counter-terrorism, and suggested solutions.
This work has analyzed and evaluated the dissemination of research done at Spanish universities through the World Wide Web (WWW) in order to obtain a map of the visibility of the information available on this research and to propose measures for improving the quality of this diffusion, all within the social and institutional context of the European Area for Higher Education. The methodology applied in the study has used both qualitative and quantitative research methods to obtain some quality indicators on the dissemination of university research. The object of study consists of a sample of 19 Spanish universities, chosen according to their representativeness by Autonomous Community and their administrative and scientific weight. The process of defining indicators, both qualitative and quantitative, as well as the collection and analysis of data, are explained. The results give us a detailed panorama of the state of the art of the visibility of information on research in the web pages of selected universities. This has allowed us to make certain proposals for improvement that can contribute to the excellence of its dissemination.
The main characteristics, human resources, organizational development, R&D output and outcome of the Venezuelan scientific and technological community, are studied in depth for three specific dates - years 1954, 1983 and 1999 -, aiming to reveal its strengths and weaknesses and to establish its dynamics. During the first half of the twentieth century, Venezuela had no major organized or institutionalized scientific activity. From 1954 thru 1983, the State built a considerable number of institutions mostly for research and development activities. Initially, researchers came from classical professions but were later substituted by graduates in scientific and technological disciplines. Biomedical and basic sciences are the areas of knowledge favored by researchers while, in terms of intellectual creation, social sciences and humanities seem to be the less productive, despite being one of the fields of knowledge embraced by most professionals. Although from 1983 on there has been no major input to the national S&T system, the research community showed a few years of growth in absolute terms in the number of publications, however national productivity decreased during the last decade of the century. It is believed that this reflects an aging, asphyxiated and self-consuming community using its reserves at a maximum rate. The S&T system constructed exhibits a dominance of the public sector that privileged, financially, the hydrocarbon related technological/service industry at the expense of academic research in universities while maintaining agribusiness related service and developmental research at the same level of expenditure throughout the last twenty years of the twentieth century. While the generation - practically from zero - of a modern R&D community in Venezuela, together with higher education, could well be one of the most significant accomplishments of democracy in Venezuela, this remarkable social achievement has been put in peril by neglect and changes in public policies. Downturn of the national S&T system is bound to worsen due to a virtual collapse, on February 4, 2002, of the R&D centre of the nationalized oil industry.
Parallel mappings of the intellectual and cognitive structure of Software Engineering (SE) were conducted using Author Cocitation Analysis (ACA), PFNet Analysis, and card sorting, a Knowledge Elicitation (KE) method. Cocitation counts for 60 prominent SE authors over the period 1990 - 1997 were gathered from SCISEARCH. Forty-six software engineers provided similar data by sorting authors' names into labeled piles. At the 8 cluster level, ACA and KE identified similar author clusters representing key areas of SE research and application, though the KE labels suggested some differences between the way that the authors' works were used and how they were perceived by respondents. In both maps, the clusters were arranged along a horizontal axis moving from "micro" to "macro" level R&D activities (correlation of X axis coordinates = 0.73). The vertical axis of the two maps differed (correlation of Y axis coordinates = -0.08). The Y axis of the ACA map pointed to a continuum of high to low formal content in published work, whereas the Y axis of the KE map was anchored at the bottom by "generalist" authors and at the top by authors identified with a single, highly specific and consistent specialty. The PFNet of the raw ACA counts identified Boehm, Basili, and Booch as central figures in subregions of the network with Boehm being connected directly or through a single intervening author with just over 50% of the author set. The ACA and KE combination provides a richer picture of the knowledge domain and provide useful cross-validation.
An analysis of 330 questionnaires received from project investigators funded by AICTE indicates that project investigators preferred to present their research results at conferences rather than in national and international journals. Impact of funding has been better on human resource capability development as compared to research and technological output. Analysis of data using data envelopment analysis indicates that projects funded under electronics and communication engineering, mechanical engineering, electrical engineering and management displayed some consistency and uniformity with regard to impact on various output parameters.
This comparative study covers the period 1988-2003 of the Institute for Scientific Information Databases (ISI-DBs), CD-ROM edition: Science Citation Index (SCI), Social Sciences Citation Index (SSCI) and Arts & Humanities Citation Index (A&HCI) as international databases and from the CubaCiencias (CubaCiencias) as an internal database. The number of articles published in Cuban journals, ISI-DBs, the author associativeness trend, the most important institutions and other indicators are collected. However, it is observed that CubaCiencias and ISI-DBs are not perfectly suitable for a study of the productivity of Cuban authors. It is necessary to properly standardize the author fields. For bibliometric studies, Cuba needs a database not only for the published papers in Cuban journals, but also for all the papers published by Cuban authors.
The aim is to investigate the cities based on the author-affiliation data from Web of Science, Biosis Previews, CAB Abstracts, Chemical Abstracts, Compendex/Inspec, Francis, Medline, Pascal, and Sociological Abstracts databases. Specifics of particular cities and publishing patterns and trends with reference to particular disciplines are studied. Characteristics of city-data collection with regard to retrieval accuracy are investigated. Databases are compared regarding document coverage and input consistency. A city as an emerging supranational unit is proposed as a scientometric object and indicator in its own right as a complement to the traditional notion of a country or a nation-state.
This paper identifies and presents some characteristics of the psychology journals included in each of the Journal Citation Reports (JCR) categories in 2002. The study shows that most of the journals belong to the categories of Multidisciplinary Psychology (102) and Clinical Psychology (83). Their ranking is seen to vary depending on the category, and the same journal may occupy different positions in different JCR categories. Journals included in the categories of Biological Psychology, Experimental Psychology and Multidisciplinary Psychology had the highest impact factor (IF).
In this paper attempt has been made to study the engineering research scenario in ocean sector across the countries - globally. To understand the research dynamics, the articles appeared in Science Citation Index (SCI) database under Ocean Engineering category in the year 2000 were analyzed to visualize the structure of the field. USA and UK are the major producers - 62% of the total output contributed by them. The cooperation linkages between engineers, organizations, countries and journals were mapped. The causal linkages between the productivity function and the socio-economic imperatives of the production units were studied. 62% output in this sector goes to USA & UK. They are also toppers in collaboration centrality list. National Oceanic Atmospheric Administration (NOAA), USA; National Aeronautics and Space Administration (NASA), USA; National Institute of Oceanography (NIO), India are the most productive institutions. GDP explains only 36% of variance in productivity (R-2 = 0.36). M Longuethiggins and CC Mei are the most cited authors in the field. Co-citation maps of cited authors and cited journals throw light on the semantic structure of the field. Studies in wave mechanics and modeling of waves are the most important areas of research in Ocean Technology.
The purpose of this paper is to evaluate the basic research performance of key projects in the field of information science & technology funded by National Natural Science Foundation of China (NSFC) from both international and national perspectives during the period 1994-2001, based upon the Science Citation Index (SCI) and China Scientific and Technical Papers and Citations (CSTPC) databases. We compare the international and domestic outputs of the key projects by applying various scientometric indicators and techniques. The findings indicate that, as a whole, the research performances of the key projects have, to different degrees, increased in both international and domestic papers during the period of study. Semiconductor is the internationally most productive sub-discipline and Automatization is the domestically most productive sub-discipline, measured on average per project. The Combination Impact Factor (CIF), which integrates the CSTPC-IF and the SCI-IF into the evaluation process, is further proposed for the combined evaluation of domestic and international outputs of the key projects. In terms of ratio of CIF relative to the funds in each sub-discipline, results also show that Semiconductor is the most productive sub-discipline and Computer is the least productive one. Using correlation analysis a significant and positive relationship between the SCI-IF and the CIF has been found for the evaluated projects.
This paper builds on previous research concerned with the classification and specialty mapping of research fields. Two methods are put to test in order to decide if significant differences as to mapping results of the research front of a science field occur when compared. The first method was based on document co-citation analysis where papers citing co-citation clusters were assumed to reflect the research front. The second method was bibliographic coupling where likewise citing papers were assumed to reflect the research front. The application of these methods resulted in two different types of aggregations of papers: (1) groups of papers citing clusters of co-cited works and (2) clusters of bibliographically coupled papers. The comparision of the two methods as to mapping results was pursued by matching word profiles of groups of papers citing a particular co-citation cluster with word profiles of clusters of bibliographically coupled papers. Findings suggested that the research front was portrayed in two considerably different ways by the methods applied. It was concluded that the results in this study would support a further comparative study of these methods on a more detailed and qualitative ground. The original data set encompassed 73,379 articles from the fifty most cited environmental science journals listed in Journal Citation Report, science edition downloaded from the Science Citation Index on CD-ROM.
Some collections cover many topics, while others are narrowly focused on a limited number of topics. We introduce the concept of the "scope" of a collection of documents and we compare two ways of measuring it. These measures are based on the distances between documents. The first uses the overlap of words between pairs of documents. The second measure uses a novel method that calculates the semantic relatedness to pairs of words from the documents. Those values are combined to obtain an overall distance between the documents. The main validation for the measures compared Web pages categorized by Yahoo. Sets of pages sampled from broad categories were determined to have a higher scope than sets derived from subcategories. The measure was significant and confirmed the expected difference in scope. Finally, we discuss other measures related to scope.
A model is presented of the manifestation of the birth and development of a scientific specialty in a collection of journal papers. The proposed model, Cumulative Advantage by Paper with Exemplars (CAPE) is an adaptation of Price's cumulative advantage model (D. Price, 1976). Two modifications are made: (a) references are cited in groups by paper, and (b) the model accounts for the generation of highly cited exemplar references immediately after the birth of the specialty. This simple growth process mimics many characteristic features of real collections of papers, including the structure of the paper-to-reference matrix, the reference-per-paper distribution, the paper-per-reference distribution, the bibliographic coupling distribution, the cocitation distribution, the bibliographic coupling clustering coefficient distribution, and the temporal distribution of exemplar references. The model yields a great deal of insight into the process that produces the connectedness and clustering of a collection of articles and references. Two examples are presented and successfully modeled: a collection of 131 articles on MEMS RF (microelectromechnical systems radio frequency) switches, and a collection of 901 articles on the subject of complex networks.
One of the most challenging topics in the automatic document rating process is the development of a rating scheme for the image quality of documents. As part of the Department of Energy (DOE) document declassification program, we have developed a generalized rating system to predict the optical character recognition (OCR) accuracy level that is achieved when processing a document. The need for such a system emerged from the declassification of degraded, typewriter-era documents, which is currently a time-consuming manual process. This article presents the statistical analysis of the most influential document quality features affecting OCR accuracy, develops consistent predictive models for four currently used OCR engines, and studies the applicability of different OCR products to the DOE document declassification process. This study is expected to lead to an efficient and completely automated document declassification system.
This article seeks to shift the literature on chat-based reference services beyond the current spate of case studies and discussions of emerging standards and best practices in providing chat-based reference, to a higher level of discussion on the creation and discussion of theoretical frameworks to unite these standards and practices. The article explores the various steps in the process of providing synchronous, chat-based reference, as well as issues involved in providing such service at each step. The purpose of this exploration is twofold: First, this article presents some open research questions at each step in the process of providing chat-based reference service. Second, the entire process of providing chat-based reference is viewed as a whole, and a model of the provision of chat-based reference service is developed at a high level of abstraction. It is hoped that this model may serve as a conceptual framework for future discussions of and development of applications for chat-based reference.
Eleven middle school children constructed hierarchical maps for two science categories selected from two Web directories, Yahooligans! and KidsClick! For each category, children constructed a pair of maps: one without links and one with links. Forty-four maps were analyzed to identify similarities and differences. The structures of the maps were compared to the structures employed by the directories. Children were able to construct hierarchical maps and articulate the relationships among the concepts. At the global level (whole map), children's maps were not alike and did not match the structures of the Web directories. At the local levels (superordinate and subordinate), however, children shared similarities in the conceptual configurations, especially for the concrete concepts. For these concepts, substantial overlap was found between the children's structures and those employed in the directories. For the abstract concepts the configurations were diverse and did not match those in the directories. The findings of this study have implications for design of systems that are more supportive of children's conceptual structures.
This study evaluates the data sources and research methods used in earlier studies to rank the research productivity of Library and Information Science (LIS) faculty and schools. In doing so, the study identifies both tools and methods that generate more accurate publication count rankings as well as databases that should be taken into consideration when conducting comprehensive searches in the literature for research and curricular needs. With a list of 2,625 items published between 1982 and 2002 by 68 faculty members of 18 American Library Association-(ALA-) accredited LIS schools, hundreds of databases were searched. Results show that there are only 10 databases that provide significant coverage of the LIS indexed literature. Results also show that restricting the data sources to one, two, or even three databases leads to inaccurate rankings and erroneous conclusions. Because no database provides comprehensive coverage of the LIS literature, researchers must rely on a wide range of disciplinary and multidisciplinary databases for ranking and other research purposes. The study answers such questions as the following: Is the Association of Library and Information Science Education's (ALISE's) directory of members a reliable tool to identify a complete list of faculty members at LIS schools? How many and which databases are needed in a multifile search to arrive at accurate publication count rankings? What coverage will be achieved using a certain number of databases? Which research areas are well covered by which databases? What alternative methods and tools are available to supplement gaps among databases? Did coverage performance of databases change over time? What counting method should be used when determining what and how many items each LIS faculty and school has published? The authors recommend advanced analysis of research productivity to provide a more detailed assessment of research productivity of authors and programs.
Although it is assumed that information about patient safety and adverse events will be used for improvement and organizational learning, we know little about how this actually happens in patient care settings. This study examines how organizational and professional practices and beliefs related to patient safety influence (1) how health care providers and managers make sense of patient safety risks and adverse events, and (2) the flow and use of information for making improvements. The research is based on an ethnographic case study of a medical unit in a large tertiary care hospital in Canada. The study found that front-line staff are task driven, coping with heavy workloads that limit their attention to and recognition of potential information needs and knowledge gaps. However, a surrogate in an information-related role-an "information/change agent"-may intervene successfully with staff and engage in preventive maintenance and repair of routines. The article discusses four key functions of the information/change agent (i.e., boundary spanner, information seeker, knowledge translator, and change champion) in the context of situated practice and learning. All four functions are important for facilitating changes to practice, routines, and the work environment to improve patient safety.
This article reports the analysis of two samples of search logs from a commercial image provider over a 1-month period. The study analyzes image searches and queries, user query modification strategies, and user browsing and downloading of results. Unique term searches are less frequent than earlier research has shown; descriptive and thematic queries are more common. Boolean searching, although heavily employed, appears to be ineffective and leads to query modifications. Although there was a large amount of query modification (61.7% of queries across the two samples), the tactics overall do not appear to be carefully thought out and seem to be largely experimental. Given the willingness to modify queries but the inability to do so in an effective way, more support for query modification may be beneficial.
The research in this paper is based on the paper of D.W. Aksnes & G. Sivertsen: The effect of highly cited papers on national citation indicators, Scientometrics 59 (2) (2004), 213-224, where one states that "the few highly cited papers account for the highest share of the citations in the smallest fields". This, at first sight, evident property is examined in the theoretical models that exist in the literature. We first define exactly what we mean by "size of a field" (i.e. when is a field "smaller" or "larger" than another one). We show that there are two, non-equivalent possible definitions. Next we define exactly the possible property under study. This leads us again to two possible, non-equivalent formulations. Hence, in total, there are four different formulations to consider. We show, by giving counterexamples, that none of these four formulations are true in general. We also express conditions (in Lotkaian and Zipfian informetrics), under which the property of Aksnes and Sivertsen is true. All these results are not only valid in the papers-citations relationships but in any informetric source-item relationship. In this connection we present formulae describing the share of items of highly productive sources as a function of the parameters of the system (e.g. the size of the system).
The research performance of Thai researchers in various subject categories was evaluated using a new mathematical index entitled "Impact Factor Point Average" (IFPA), by considering the number of published papers in journals listed in the Science Citation Index (SCI) database held by the Institute for Scientific Information (ISI) for the years 1998-2002, and the results compared with the direct publication number (PN) and publication credit (PC) methods. The results suggested that the PN and PC indicators cannot be used for comparison between fields or countries because of the strong field-dependence. The IFPA index, based on a normalization of differences in impact factors, rankings, and number of journal titles in different subject categories, was found to be simple and could be used with equality for accurate assessment of the quality of research work in different subject categories. The results of research performance were found to be dependent on the method used for the evaluations. All evaluation methods indicated that Clinical Medicine was ranked first in terms of the research performance of Thai scholars listed in the SCI database, but exhibited the lowest improvement of performance. Chemistry was shown to be the most improved subject category.
Nowadays, the Italian science sector is undergoing a strategic reform due to budget cuts and there is a need for measuring and evaluating research performance of public research institutes. This research presents a new measure to assess the scientific research performance of public research institutes. The new model is successfully applied to 108 public research institutes belonging to the Italian National Research Council, using data from year 2003 and displays the laboratories with high/low performance. The results are substantially stronger and quicker to obtain than those calculated by using conventional indicators. This model supports the policy-makers, who must decide about the level and direction of public funding for research and technology transfer.
As a first element of a macro-level country-by-country cross-reference and cross-citation analysis, domestic/international character of reference and citation behavior of 36 countries is studied and compared with international co-authorship patterns. Indicators of reference and citation domesticity as well as reference-citation domesticity balance are constructed and presented. Science policy relevance of these indicators is discussed and examples deserving science policy attention are pinpointed.
As science has become much complex and sophisticated, greater attention is paid to scientific collaboration within recent bibliometric studies. A total of 6538 publications in Molecular Biology from China during 1999-2003, as indicated by data collected from database of the Science Citation Index Expanded - Web Edition, have been analyzed. A large proportion of publications have been authored by more than 3 scientists. The composition of publications grouped by collaboration patterns are: 1.58% non-collaborative papers, 42.43% local papers, 34.37% domestic papers and 21.62% international papers on average during the studied period. The countries with which China has collaborative links and their frequencies are all itemized to indicate the intensity of international collaboration in the field of Molecular Biology. Finally, the differences between the impact of wholly indigenous papers and internationally collaborative papers have been compared. The results indicate that foreign collaboration does contribute a lot to the improvement of the mainstream connectivity and international visibility.
This paper describes an attempt to explore how far a categorisation of citations could be used as part of an assessment of the outcomes from health research. A large-scale project to assess the outcomes from basic, or early clinical, research is being planned, but before proceeding with such a project it was thought important to test and refine the developing methods in a preliminary study. Here we describe the development, and initial application, of one element of the planned methods: an approach to categorising citations with the aim of tracing the impact made by a body of research through several generations of papers. The results from this study contribute to methodological development for the large-scale project by indicating that: only for a small minority of citing papers is the cited paper of considerable importance; the number of times a paper is cited can not be used to indicate the importance of that paper to the articles that cite it; and self-citations could play an important role in facilitating the eventual outcomes achieved from a body of research.
A paper that is little cited ('sleeps') for a long period of time and then becomes much cited ('is awakened'), has been termed by van Raan (2004) a 'Sleeping Beauty', or a paper that was 'ahead of its time'. The inference is that the importance of the paper was not initially recognised, only later was it (re)discovered. On the other hand, much theoretical work in informetrics views the citation process as being purely random - modelled by an appropriate stochastic process. From this point of view, the 'awakening' could simply be a matter of chance without necessarily saying anything about the worth of the paper. The question therefore arises as to whether such awakenings can be explained or expected purely by the random nature of the model or whether they are so unlikely that an alternative explanation should be sought. In this note we express the notion of a Sleeping Beauty in terms of a well-known stochastic model and seek to answer this question, at least in general terms.
Hirsch (2005) has proposed the h-index as a single-number criterion to evaluate the scientific output of a researcher (Ball, 2005): A scientist has index h if h of his/her N-p papers have at least h citations each, and the other (N-p - h) papers have fewer than h citations each. In a study on committee peer review (Bornmann & Daniel, 2005) we found that on average the h-index for successful applicants for post-doctoral research fellowships was consistently higher than for non-successful applicants.
A large number of studies have investigated the transaction log of general-purpose search engines such as Excite and AltaVista, but few studies have reported on the analysis of search logs for search engines that are limited to particular Web sites, namely, Web site search engines. In this article, we report our research on analyzing the search logs of the search engine of the Utah state government Web site. Our results show that some statistics, such as the number of search terms per query, of Web users are the same for general-purpose search engines and Web site search engines, but others, such as the search topics and the terms used, are considerably different. Possible reasons for the differences include the focused domain of Web site search engines and users' different information needs. The findings are useful for Web site developers to improve the performance of their services provided on the Web and for researchers to conduct further research in this area. The analysis also can be applied in e-government research by investigating how information should be delivered to users in government Web sites.
The Web provides a massive knowledge source, as do intranets and other electronic document collections. However, much of that knowledge is encoded implicitly and cannot be applied directly without processing into some more appropriate structures. Searching, browsing, question answering, for example, could all benefit from domain-specific knowledge contained in the documents, and in applications such as simple search we do not actually need very "deep" knowledge structures such as ontologies, but we can get a long way with a model of the domain that consists of term hierarchies. We combine domain knowledge automatically acquired by exploiting the documents' markup structure with knowledge extracted on the fly to assist a user with ad hoc search requests. Such a search system can suggest query modification options derived from the actual data and thus guide a user through the space of documents. This article gives a detailed account of a task-based evaluation that compares a search system that uses the outlined domain knowledge with a standard search system. We found that users do use the query modification suggestions proposed by the system. The main conclusion we can draw from this evaluation, however, is that users prefer a system that can suggest query modifications over a standard search engine, which simply presents a ranked list of documents. Most interestingly, we observe this user preference despite the fact that the baseline system even performs slightly better under certain criteria.
A novel metric for quantitatively measuring the content accessibility of the Web for persons with disabilities is proposed. The metric is based on the Web Content Accessibility Guidelines (WCAG) checkpoints, an internationally accepted standard, that can be automatically tested using computer programs. Problems with current accessibility evaluation and the need for a good Web accessibility metric are discussed. The proposed metric is intended to overcome the deficiencies of the current measurements used in Web accessibility studies. The proposed metric meets the requirements as a measurement for scientific research. Examples of large-scale Web accessibility evaluations using the metric are given. The examples cover a comparison of Web accessibility of top medical journal Web sites and a longitudinal study of a Web site over time. The validity of the metric was tested using a large number of Web sites with different levels of compliance (rating categories) to the standard WCAG. The metric, which uses a predetermined simple weighting scheme, compares well to the more complex C5.0 machine learning algorithm in separating Web sites into different rating categories.
This study explored why Web authors post the DVD decryption software known as DeCSS-specifically whether authors post DeCSS to protest changes in copyright law. Data are drawn from content analysis of Web sites posting the software. Most DeCSS posters did not include any content explaining why they posted DeCSS; however, no authors presented DeCSS as a piracy tool. Of sites containing explanatory content, many argued that DeCSS is a legitimate tool to play DVDs on free/open source computers. Other sites asserted that current copyright law is unjust, and that DVD-related corporations are engaging in undesirable behaviors. Based on the data, and theorizing from rhetoric and the collective action literatures, we assert that much DeCSS posting is protest, but it may not be copyright protest-numerous posters protest related issues such as freedom of speech. More research is needed to determine the significance of DeCSS posting to broader copyright policy debates including its relation to off-line protest, and the development of shared identities and cognitive frames. Also, the complexities of circumvention issues raise concerns about whether policy debate will be limited to elites. Finally, data point to the need to understand both international and local laws, norms, and events when studying copyright protest activity.
This study is based on earlier research by the author that employed social judgment analysis (SJA; J. Stefl-Mabry, 2001, 2003) to identify the information judgment preferences held by professional groups. This study explores the extent to which individuals, professional groups, and subgroups are self-aware of their judgment profiles. Three specialized groups of professionals-law enforcement, medicine, and education-were chosen to determine if preference profiles cluster around professions or around demographic and other background variables. As the proliferation of data continues to increase, the need to understand users' media preference and selection decisions is of tremendous value to every industry, governmental agency, and institution of learning. In 1966, H. Menzel first raised concern about the reliability of users' to self-assess, and scientists continue to explore the issue of competency in human judgment. To understand the reliability of users' self-assessment regarding media preferences, this study examines the extent to which individuals and groups are self-aware of the empirical judgment profiles they employ in evaluating information source scenarios. This investigation explores the congruence of three groups of professionals' self-reported media preferences as compared to their empirical judgment values, as defined by social judgment analysis.
In this article, findings from a study on the diffusion and adoption of Encoded Archival Description (EAD) within the U.S. archival community are reported. Using E. M. Rogers' (1995) theory of the diffusion of innovations as a theoretical framework, the authors surveyed 399 archives and manuscript repositories that sent participants to EAD workshops from 1993-2002. Their findings indicated that EAD diffusion and adoption are complex phenomena. While the diffusion pattern mirrored that of MAchine-Readable Cataloging (MARC), overall adoption was slow. Only 42% of the survey respondents utilized EAD in their descriptive programs. Critical factors inhibiting adoption include the small staff size of many repositories, the lack of standardization in archival descriptive practices, a multiplicity of existing archival access tools, insufficient institutional infrastructure, and difficulty in maintaining expertise.
The authors propose a heuristic method for Chinese automatic text segmentation based on a statistical approach. This method is developed based on statistical information about the association among adjacent characters in Chinese text. Mutual information of bi-grams and significant estimation of tri-grams are utilized. A heuristic method with six rules is then proposed to determine the segmentation points in a Chinese sentence. No dictionary is required in this method. Chinese text segmentation is important in Chinese text indexing and thus greatly affects the performance of Chinese information retrieval. Due to the lack of delimiters of words in Chinese text, Chinese text segmentation is more difficult than English text segmentation. Besides, segmentation ambiguities and occurrences of out-of-vocabulary words (i.e., unknown words) are the major challenges in Chinese segmentation. Many research studies dealing with the problem of word segmentation have focused on the resolution of segmentation ambiguities. The problem of unknown word identification has not drawn much attention. The experimental result shows that the proposed heuristic method is promising to segment the unknown words as well as the known words. The authors further investigated the distribution of the errors of commission and the errors of omission caused by the proposed heuristic method and benchmarked the proposed heuristic method with a previous proposed technique, boundary detection. It is found that the heuristic method outperformed the boundary detection method.
A potentially useful feature of information retrieval systems for students is the ability to identify documents that not only are relevant to the query but also match the student's reading level. Manually obtaining an estimate of reading difficulty for each document is not feasible for very large collections, so we require an automated technique. Traditional readability measures, such as the widely used Flesch-Kincaid measure, are simple to apply but perform poorly on Web pages and other non-traditional documents. This work focuses on building a broadly applicable statistical model of text for different reading levels that works for a wide range of documents. To do this, we recast the well-studied. problem of readability in terms of text categorization and use straightforward techniques from statistical language modeling. We show that with a modified form of text categorization, it is possible to build generally applicable classifiers with relatively little training data. We apply this method to the problem of classifying Web pages according to their reading difficulty level and show that by using a mixture model to interpolate evidence of a word's frequency across grades, it is possible to build a classifier that achieves an average root mean squared error of between one and two grade levels for 9 of 12 grades. Such classifiers have very efficient implementations and can be applied in many different scenarios. The models can be varied to focus on smaller or larger grade ranges or easily retrained for a variety of tasks or populations.
Methods developed for mapping the journal structure contained in aggregated journal-journal citations in the Science Citation Index (SCI; Thomson ISI, 2002) are applied to the Chinese Science Citation Database of the Chinese Academy of Sciences. This database covered 991 journals in 2001, of which only 37 originally had English titles; only 31 of which were covered by the SCI. Using factor-analytical and graph-analytical techniques, the authors show that the journal relations are dually structured. The main structure is the intellectual organization of the journals in journal groups (as in the international SCI), but the university-based journals provide an institutional layer that orients this structure towards practical ends (e.g., agriculture). This mechanism of integration is further distinguished from the role of general science journals. The Chinese Science Citation Database thus exhibits the characteristics of "Mode 2" or transdisciplinary science in the production of scientific knowledge more than its Western counterpart does. The contexts of application lead to correlation among the components.
We report quantitative and qualitative results of an empirical evaluation to determine whether automated assistance improves searching performance and when searchers desire system intervention in the search process. Forty participants interacted with two fully functional information retrieval systems in a counterbalanced, within-participant study. The systems were identical in all respects except that one offered automated assistance and the other did not. The study used a client-side automated assistance application, an approximately 500,000-document Text REtrieval Conference content collection, and six topics. Results indicate that automated assistance can improve searching performance. However, the improvement is less dramatic than one might expect, with an approximately 20% performance increase, as measured by the number of user-selected relevant documents. Concerning patterns of interaction, we identified 1,879 occurrences of searcher-system interactions and classified them into 9 major categories and 27 subcategories or states. Results indicate that there are predictable patterns of times when searchers desire and implement searching assistance. The most common three-state pattern is Execute Query-View Results: With Scrolling-View Assistance. Searchers appear receptive to automated assistance; there is a 71% implementation rate. There does not seem to be a correlation between the use of assistance and previous searching performance. We discuss the implications for the design of information retrieval systems and future research directions.
In this article, a word-oriented approximate string matching approach for searching Arabic text is presented. The distance between a pair of words is determined on the basis of aligning the two words by using occurrence heuristic tables. Two words are considered related if they have the same morphological or lexical basis. The heuristic reports an approximate match if common letters agree in order and noncommon letters represent valid affixes. The heuristic was tested by using four different alignment strategies: forward, backward, combined forward-backward, and combined backward-forward. Using the error rate and missing rate as performance indicators, the approach was successful in providing more than 80% correct matches. Within the conditions of the experiments performed, the results indicated that the combined forward-backward strategy seemed to exhibit the best performance. Most of the errors were caused by multiple-letter occurrences and by the presence of weak letters in cases in which the shared core consisted of one or two letters.
Representation and retrieval obstacles within a digital library designed for use by middle school children are presented. Representation of objects is key to retrieval. Tools used to create representations for children's resources, such as controlled vocabularies, need to be more age appropriate. Development of age-appropriate controlled vocabularies requires us to learn more about the ways children interact with systems and form search strategies to represent their information needs. Children's search terms and questions are a rich resource for learning more about their information seeking process, their question state, and their formulation of searches. A method for gathering and using children's own search terms and the benefits of their utilization in developing more age-appropriate controlled vocabularies are discussed.
Understanding users' attitudes and perceptions and their influence on behavior is crucial to predict the use of community information and communication technology. In this study, the authors attempt to uncover this process by elaborating on I. Ajzen's (1985, 1991) theory of planned behavior (TPB), a widely applied social behavior model. The developed structural equation model (SEM) was tested using a sample of 417 users of a community network. The final selected model, which was called the community network use model, included seven predictors of use: three behavioral beliefs (i.e., Learning, Social Interactions, and Community Connection), Normative Beliefs Individuals, Attitude, Subjective Norm, and Intention. In particular, Intention moderated the relationships between Use and the other six variables. Further, beliefs in both Community Connection and Social Interactions were directly related to community network use, as was Attitude. Belief in Community Connection was indirectly and positively related to community network use via Intention; Belief in Community Connection was directly and negatively related to Use. These latter two findings suggest that Belief in Community Connection serves as both a facilitator and inhibitor of community network use depending on whether belief is followed by intention. Implications are discussed.
The authors present a study of the real-life information needs of 59 McGill University undergraduates researching essay topics for either a history or psychology course, interviewed just after they had selected their essay topic. The interview's purpose was to transform the undergraduate's query from general topic terms, based on vague conceptions of their essay topic, to an information need-based query. To chart the transformation, the authors investigate N. J. Belkin, R. N. Oddy, and H. M. Brooks' Anomalous States of Knowledge (ASK) hypothesis (1982a, 1982b), which links the user's ASK to a relevant document set via a common code based on structural facets. In the present study an interoperable structural code based on eight essay styles is created, then notions of structural facets compatible with a high-impact essay structure are presented. The important findings of the study are: (a) the undergraduates' topic statements and terms derived from it do not constitute an effective information need statement because for most of the subjects in the study the topic terms conformed to a low-impact essay style; (b) essay style is an effective interoperable structural code for charting the evolution of the undergraduate's knowledge state from ASK to partial resolution of the ASK in an information need statement.
Recent studies suggest that the wide variability in type, detail, and reliability of online information motivate expert searchers to develop procedural search knowledge. In contrast to prior research that has focused on finding relevant sources, procedural search knowledge focuses on how to order multiple relevant sources with the goal of retrieving comprehensive information. Because such procedural search knowledge is neither spontaneously inferred from the results of search engines, nor from the categories provided by domain-specific portals, the lack of such knowledge leads most novice searchers to retrieve incomplete information. In domains like healthcare, such incomplete information can lead to dangerous consequences. To address the above problem, a new kind of domain portal called a Strategy Hub was developed and tested. Strategy Hubs provide critical search procedures and associated high-quality links to enable users to find comprehensive and accurate information. We begin by describing how we collaborated with physicians to systematically identify generalizable search procedures to find comprehensive information about a disease, and how these search procedures were made available through the Strategy Hub. A controlled experiment suggests that this approach can improve the ability of novice searchers in finding comprehensive and accurate information, when compared to general-purpose search engines and domain-specific portals. We conclude with insights on how to refine and automate the Strategy Hub design, with the ultimate goal of helping users find more comprehensive information when searching in unfamiliar domains.
For millennia humans have sought, organized, and used information as they learned and evolved patterns of human information behaviors to resolve their human problems and survive. However, despite the current focus on living in an "information age," we have a limited evolutionary understanding of human information behavior. In this article the authors examine the current three interdisciplinary approaches to conceptualizing how humans have sought information including (a) the everyday life information seeking-sense-making approach, (b) the information foraging approach, and (c) the problem-solution perspective on information seeking approach. In addition, due to the lack of clarity regarding the role of information use in information behavior, a fourth information approach is provided based on a theory of information use. The use theory proposed starts from an evolutionary psychology notion that humans are able to adapt to their environment and survive because of our modular cognitive architecture. Finally, the authors begin the process of conceptualizing these diverse approaches, and the various aspects or elements of these approaches, within an integrated model with consideration of information use. An initial integrated model of these different approaches with information use is proposed.
Knowledge management as a field continues to receive resounding interest from scholars. While we have made progress in many areas of knowledge management, we are yet to understand what factors contribute to employee usage of knowledge artifacts. A field study of 175 employees in a software engineering organization was conducted to understand factors that govern consumption of explicit knowledge. We assert that the decision to consume knowledge can be framed as a problem of risk evaluation. Specifically, there are two sources of risk a consumer must evaluate prior to knowledge consumption-risk from the knowledge producer and risk from the knowledge product. We find support for the factors of perceived complexity, perceived relative advantage, and perceived risk as they relate to intentions to consuming knowledge.
In recent years there has been an explosion of biological data stored in large central databases, tools to handle the data, and educational programs to train scientists in using bioinformatics resources. Still, the diffusion of bioinformatics within the biological community has yet to be extensively studied. In this study, the diffusion of two bioinformatics-related practices-using genomic databases and analyzing DNA and protein sequences-was investigated by analyzing MEDLINE records of 12 journals, representing various fields of biology. The diffusion of these practices between 1970 and 2003 follows an S-shaped curve typical of many innovations, beginning with slow growth, followed by a period of rapid linear growth, and finally reaching saturation. Similar diffusion patterns were found for both the use of genomic databases and biological sequence analysis, indicating the strong relationship between these practices. This study presents the surge in the use of genomic databases and analysis of biological sequences and proposes that these practices are fully diffused within the biological community. Extrapolating from these results, it suggests that taking a diffusion of innovations approach may be useful for researchers as well as for providers of bioinformatics applications and support services.
The author examines patterns of productivity in the Internet mailing lists, also known as discussion lists or discussion groups. Datasets have been collected from electronic archives of two Internet mailing lists, the LINGUIST and the History of the English Language. Theoretical models widely used in informetric research have been applied to fit the distribution of posted messages over the population of authors. The Generalized Inverse Poisson-Gaussian and Poisson-log normal distributions show excellent results in both datasets, while Lotka and Yule-Simon distribution demonstrate poor-to-mediocre fits. In the mailing list where moderation and quality control are enforced to a higher degree, i.e., the LINGUIST, Lotka, and Yule-Simon distributions perform better. The findings can be plausibly explained by the lesser applicability of the success-breeds-success model to the information production in the electronic communication media, such as Internet mailing lists, where selectivity of publications is marginal or nonexistent. The hypothesis is preliminary, and needs to be validated against the larger variety of datasets. Characteristics of the quality control, competitiveness, and the reward structure in Internet mailing lists as compared to professional scholarly journals are discussed.
Link analysis in various forms is now an established technique in many different subjects, reflecting the perceived importance of links and of the Web. A critical but very difficult issue is how to interpret the results of social science link analyses. It is argued that the dynamic nature of the Web, its lack of quality control, and the online proliferation of copying and imitation mean that methodologies operating within a highly positivist, quantitative framework are ineffective. Conversely, the sheer variety of the Web makes application of qualitative methodologies and pure reason very problematic to large-scale studies. Methodology triangulation is consequently advocated, in combination with a warning that the Web is incapable of giving definitive answers to large-scale link analysis research questions concerning social factors underlying link creation. Finally, it is claimed that although theoretical frameworks are appropriate for guiding research, a Theory of Link Analysis is not possible.
The concept of relevance has received a great deal of theoretical attention. Separately, the relationship between focused search and browsing has also received extensive theoretical attention. This article aims to integrate these two literatures with a model and an empirical study that relate relevance in focused searching to relevance in browsing. Some factors affect both kinds of relevance in the same direction; others affect them in different ways. In our empirical study, we find that the latter factors dominate, so that there is actually a negative correlation between the probability of a document's relevance to a browsing user and its probability of relevance to a focused searcher.
Information technology designers and users are generally treated as interacting yet distinct groups. Although approaches such as participatory design attempt to bring these groups together, such efforts are viewed as temporary and restricted to a specific knowledge domain where users can share key information and insights with designers. The author explores case studies that point to a different situation, role hybridization. Role hybridization focuses on the ability of individuals to shift from one knowledge domain to another, thus allowing for simultaneous membership within two otherwise distinct social worlds. While some studies focus on the ability of designers to act as users, this study focuses on the opposite situation, users who become designers. Interview and participant observation data is used to explore hybrid user-designers in two case studies: frog dissection simulations used in K-12 biology education and human anatomy simulations used in medical education. Hybrid users as designers are one part of a larger design-use interface, illustrating the mutually constructive relationship between the activities of information technology design and use. Users as designers also challenge the traditional power relationship between designers and users, leading to a novel and exciting form of user-centered design.
An experiment was performed at the National Library of Medicine (R) (NLM (R)) in word sense disambiguation (WSD) using the Journal Descriptor Indexing (JDI) methodology. The motivation is the need to solve the ambiguity problem confronting NLM's MetaMap system, which maps free text to terms corresponding to concepts in NLM's Unified Medical Language System (R) (UMLS (R)) Metathesaurus(D. If the text maps to more than one Metathesaurus concept at the same high confidence score, MetaMap has no way of knowing which concept is the correct mapping. We describe the JDI methodology, which is ultimately based on statistical associations between words in a training set of MEDLINE (R) citations and a small set of journal descriptors (assigned by humans to journals per se) assumed to be inherited by the citations. JDI is the basis for selecting the best meaning that is correlated to UMLS semantic types (STs) assigned to ambiguous concepts in the Metathesaurus. For example, the ambiguity transport has two meanings: "Biological Transport" assigned the ST Cell Function and "Patient transport" assigned the ST Health Care Activity. A JDI-based methodology can analyze text containing transport and determine which ST receives a higher score for that text, which then returns the associated meaning, presumed to apply to the ambiguity itself. We then present an experiment in which a baseline disambiguation method was compared to four versions of JDI in disambiguating 45 ambiguous strings from NLM's WSD Test Collection. Overall average precision for the highest-scoring JDI version was 0.7873 compared to 0.2492 for the baseline method, and average precision for individual ambiguities was greater than 0.90 for 23 of them (51%), greater than 0.85 for 24 (53%), and greater than 0.65 for 35 (79%). On the basis of these results, we hope to improve performance of JDI and test its use in applications.
This article explores the associations that message features and Web structural features have with perceptions of Web site credibility. In a within-subjects experiment, 84 participants actively located health-related Web sites on the basis of two tasks that differed in task specificity and complexity. Web sites that were deemed most credible were content analyzed for message features and structural features that have been found to be associated with perceptions of source credibility. Regression analyses indicated that message features predicted perceived Web site credibility for both searches when controlling for Internet experience and issue involvement. Advertisements and structural features had no significant effects on perceived Web site credibility. Institution affiliated domain names (.gov, org, edu) predicted Web site credibility, but only in the general search, which was more difficult. Implications of results are discussed in terms of online credibility research and Web site design.
The author applies the cognitive work analysis (CWA) approach to investigate human-work interaction in a corporate setting. This study reports the analysis of data collected from a Web survey, diaries, and telephone interviews. The results present characterizations of actors and the work domain; three dimensions for each of the four interactive activities involved in the human-work interaction and their relationships are identified. An enhanced model and its implications for the development of a corporate digital library are discussed.
This article addresses the invisible college concept with the intent of developing a consensus regarding its definition. Emphasis is placed on the term as it was defined and used in Derek de Solla Price's work (1963,1986) and reviewed on the basis of its thematic progress in past research over the years. Special attention is given to Lievrouw's (1990) article concerning the structure versus social process problem to show that both conditions are essential to the invisible college and may be reconciled. A new definition of the invisible college is also introduced, including a proposed research model. With this model, researchers are encouraged to study the invisible college by focusing on three critical components-the subject specialty, the scientists as social actors, and the information use environment (IUE).
In this study scientists were asked about their own publication history and their citation counts. The study shows that the citation counts of the publications correspond reasonably well with the authors' own assessments of scientific contribution. Generally, citations proved to have the highest accuracy in identifying either major or minor contributions. Nevertheless, according to these judgments, citations are not a reliable indicator of scientific contribution at the level of the individual article. In the construction of relative citation indicators, the average citation rate of the subfield appears to be slightly more appropriate as a reference standard than the journal citation rate. The study confirms that review articles are cited more frequently than other publication types. Compared to the significance authors attach to these articles they appear to be considerably "overcited." However, there were only marginal differences in the citation rates between empirical, methods, and theoretical contributions.
With the increasing globalization of the world economy, there has been a growing interest in the potential contributions of good governance to accelerating the rate of economic and social development in the developing countries, and enhance their smooth integration into the emerging global economy. Simultaneously, developed economies are experiencing an increasing proportion in the contribution of knowledge, information, and telecommunication sectors to their overall gross domestic product. This is placing increased focus on the role and contribution of national information infrastructure to economic productivity, and in extension, to the economic and social development of nations. In this article, the authors explore the link between information and communication technologies (ICTs), governance, and social economic development in the developing countries. They empirically conclude that contributions of ICTs to social-economic development are influenced by sociopolitical governance-leading to national development through more prudent and egalitarian application and use of the ICTs' portfolio. The contingent role of governance on ICTs and national development paves the path towards recognizing the importance of sociopolitical moderators.
The authors propose a bibliometric model for discarding journal volumes at academic libraries, i.e., removal to offsite storage as part of the library's serials collection. The method is based on the volume as the unit of measurement and on user satisfaction with given titles. The discarding age, calculated for each volume, from the year of publication to the year of decision to discard, is dependent on citation half-life, relative productivity, knowledge area, and residual utility (potential consultations). The model makes it possible to predict the approximate size of a collection when a stationary state is reached in which the inflow of journal volumes is equal to the outflow from discarding. The model is also able to determine the rate of growth of the holdings. This information can be used to optimize future use of available space and economic and maintenance resources; thus promoting efficient management of the collection.
Traditional text-based document classifiers tend to perform poorly on the Web. Text in Web documents is usually noisy and often does not contain enough information to determine their topic. However, the Web provides a different source that can be useful to document classification: its hyperlink structure. In this work, the authors evaluate how the link structure of the Web can be used to determine a measure of similarity appropriate for document classification. They experiment with five different similarity measures and determine their adequacy for predicting the topic of a Web page. Tests performed on a Web directory show that link information alone allows classifying documents with an average precision of 86%. Further, when combined with a traditional text-based classifier, precision increases to values of up to 90%, representing gains that range from 63 to 132% over the use of text-based classification alone. Because the measures proposed in this article are straightforward to compute, they provide a practical and effective solution for Web classification and related information retrieval tasks. Further, the authors provide an important set of guidelines on how link structure can be used effectively to classify Web documents.
In this article the authors look at the prescriptions advocated by Web search textbooks in the light of a selection of empirical data of real Web information search processes. They use the strategy of disjointed incrementalism, which is a theoretical foundation from decision making, to focus on how people face complex problems, and claim that such problem solving can be compared to the tasks searchers perform when interacting with the Web. The findings suggest that textbooks on Web searching should take into account that searchers only tend to take a certain number of sources into consideration, that the searchers adjust their goals and objectives during searching, and that searchers reconsider the usefulness of sources at different stages of their work tasks as well as their search tasks.
This study proposes and explores a natural language processing-(NLP) based strategy to address out-of-dictionary and vocabulary mismatch problems in query translation based English-Chinese Cross-Language Information Retrieval (EC-CLIR). The strategy, named the LKB approach, is to construct a lexical knowledge base (LKB) and to use it for query translation. In this article, the author describes the LKB construction process, which customizes available translation resources based on the document collection of the EC-CLIR system. The evaluation shows that the LKB approach is very promising. It consistently increased the percentage of correct translations and decreased the percentage of missing translations in addition to effectively detecting the vocabulary gap between the document collection and the translation resource of the system. The comparative analysis of the top EC-CLIR results using the LKB and two other translation resources demonstrates that the LKB approach has produced significant improvement in EC-CLIR performance compared to performance using the original translation resource without customization. It has also achieved the same level of performance as a sophisticated machine translation system. The study concludes that the LKB approach has the potential to be an empirical model for developing real-world CLIR systems. Linguistic knowledge and NLP techniques, if appropriately used, can improve the effectiveness of English-Chinese cross-language information retrieval.
Ironically, although much work has been done on elucidating algorithms for enabling scientists to efficiently retrieve relevant information from the glut of data derived from the efforts of the Human Genome Project and other similar projects, little has been performed on optimizing the levels of data economy across databases. One technique to qualify the degree of data economization is that constructed by Boisot. Boisot's Information Space (I-Space) takes into account the degree to which data are written (codification), the degree to which the data can be understood (abstraction), and the degree to which the data are effectively communicated to an audience (diffusion). A data system is said to be more data economical if it is relatively high in these dimensions. Application of the approach to entries in two popular, publicly available biological data repositories, the Protein DataBank (PDB) and GenBank, leads to the recommendation that PDB increases its level of abstraction through establishing a larger set of detailed keywords, diffusion through constructing hyperlinks to other databases, and codification through constructing additional subsections. With these recommendations in place, PDB would achieve the greater data economies currently enjoyed by GenBank. A discussion of the limitations of the approach is presented.
Measuring the relatedness between bibliometric units (journals, documents, authors, or words) is a central task in bibliometric analysis. Relatedness measures are used for many different tasks, among them the generating of maps, or visual pictures, showing the relationship between all items from these data. Despite the importance of these tasks, there has been little written on how to quantitatively evaluate the accuracy of relatedness measures or the resulting maps. The authors propose a new framework for assessing the performance of relatedness measures and visualization algorithms that contains four factors: accuracy, coverage, scalability, and robustness. This method was applied to 10 measures of journal-journal relatedness to determine the best measure. The 10 relatedness measures were then used as inputs to a visualization algorithm to create an additional 10 measures of journal-journal relatedness based on the distances between pairs of journals in two-dimensional space. This second step determines robustness (i.e., which measure remains best after dimension reduction). Results show that, for low coverage (under 50%) the Pearson correlation is the most accurate raw relatedness measure. However, the best overall measure, both at high coverage, and after dimension reduction, is the cosine index or a modified cosine index. Results also showed that the visualization algorithm increased local accuracy for most measures. Possible reasons for this counterintuitive finding are discussed.
Recent studies have found organizational learning capacity to be a key factor in influencing organizational assimilation and exploitation of knowledge-intensive innovations. Despite its increasing importance, the impact of organizational learning capacity on technology assimilation is not well understood. Distilling from extant works on organizational learning and technology assimilation, this study identifies four components of organizational learning capacity, namely, systems orientation, organizational climate for learning orientation, knowledge acquisition and utilization orientation, and information sharing and dissemination orientation. The authors subject these components to structural equation modeling analyses to better understand their structure and dimensionality. The analyses strongly support the proposed four major dimensions underlying organizational learning capacity. Organizational learning capacity, as a higher-order factor, has a significant impact on attitude towards organizational adoption of knowledge-intensive innovations. Implications for practice and research are discussed.
The structural, functional, and production views on learning objects influence metadata structure and vocabulary. The authors drew on these views and conducted a literature review and in-depth analysis of 14 learning objects and over 500 components in these learning objects to model the knowledge framework for a learning object ontology. The learning object ontology reported in this article consists of 8 top-level classes, 28 classes at the second level, and 34 at the third level. Except class Learning object, all other classes have the three properties of preferred term, related term, and synonym. To validate the ontology, we conducted a query log analysis that focused on discovering what terms users have used at both conceptual and word levels. The findings show that the main classes in the ontology are either conceptually or linguistically similar to the top terms in the query log data. The authors built an "Exercise Editor" as an informal experiment to test its adoption ability in authoring tools. The main contribution of this project is in the framework for the learning object domain and the methodology used to develop and validate an ontology.
Large scale bibliometric analysis is often hindered by the presence of homonyms, or namesakes, of the researchers of interest in literature databases. This makes it difficult to build up a true picture of a researcher's publication record, as publications by another researcher with the same name will be included in search results. Using additional information such as title and author addresses, an expert in the field can generally tell if a paper is by a researcher or a namesake; however, manual checking is not practical in large scale studies. Previously various methods have been used to address this problem, chiefly based on filtering by subject, funding acknowledgement or author address. Co-author inclusion is a novel algorithmic method based on co-authorship for dealing with problems of homonyms in large bibliometric surveys. We compared co-author inclusion and subject and funding based filter against the manual assignment of papers by a subject expert (which we assumed to be correct). The subject and funding based filtering identifies only 75% as many papers as assigned by manual scoring. By using co-author inclusion once we increase this to 95%, two further rounds produces 99% as many papers as manual filtering. Although the number of papers identified that were not assigned to the PIs manually also increases, the absolute number is low: rising from 0.2% papers with subject and funding filtering, to 3% papers for three rounds of co-author inclusion.
This paper examines the extent of the 'home advantage' effect in the USPTO and the EPO patent data and in the OECD triadic patent families. By comparing a set of internationalisation indicators for a sample of European, US and Japanese MNEs it finds that, contrary to what is often assumed, this effect is not only present in the USPTO but also in the EPO. OECD triadic patent data, instead, are not biased towards any particular home country. It also finds that, because MNEs do not systematically file their patents with the EPO, the USPTO and the JPO, the OECD triadic patent family dataset excludes many patents, especially those invented in the US and accounted for in the USPTO, though it is mainly only low-value patents that are excluded. Thus OECD triadic patents can be considered a satisfactory alternative to the USPTO and the EPO for measuring R&D internationalisation.
Scientometrics is an application of quantitative methods to the history of Science. It is also one of the techniques for documenting, collecting works of eminent scientists and researcher's. In this paper, we present a concise sketch of Prof. Peter John Wyllie, stressing on his scientific achievements. His research has had a great impact in the fields dealing with terrestrial magmatic phenomena and geology.
This paper introduces a citation-based metholodology to characterize and measure the magnitude and intensity of knowledge flows and knowledge spillovers from the public research sector to basic and strategic research in the private sector. We present results derived from an interrelated series of statistical analyses based on Private-to-Public Citations (PrPuCs) within reference lists of the research articles produced by industrial researchers during the years 1996-2003. The first part of the results provides an overview of PrPuC statistics worldwide for OECD countries. Overall, 70% to 80% of those references within corporate research papers relate to papers produced by public research organizations. When controlling for the size of their public sector research bases, Switzerland and the United States appear to be the major suppliers of 'citable' scientific knowledge for industrial research - the value of their Corporate Citation Intensity (CCI) exceeds their statistically expected value by more than 25%. A country's CCI performance turns out to be closely related to the citation impact of the entire domestic science base. The second section deals with an exploratory case study devoted to Electrical Engineering and Telecommunications, one of the corporate sector's major research areas. The findings include a list of the major citing and cited sources at the level of countries and organizations, as well as an analysis of PrPuCs as a "missing link"connection intra-science citations and citations received from corporate science-based patents.
In earlier studies by the authors, basic regularities of author self-citations have been analysed. These regularities are related to the ageing, to the relation between self-citations and foreign citations, to the interdependence of self-citations with other bibliometric indicators and to the influence of co-authorship on self-citation behaviour. Although both national and subject specific peculiarities influence the share of self-citations at the macro level, the authors came to the conclusion that - at this level of aggregation - there is practically no need for excluding self-citations. The aim of the present study is to answer the question in how far the influence of author self-citations on bibliometric meso-indicators deviates from that at the macro level, and to what extent national reference standards can be used in bibliometric meso analyses. In order to study the situation at the institutional level, a selection of twelve European universities representing different countries and different research profiles have been made. The results show a quite complex situation at the meso-level, therefore we suggest the usage of both indicators, including and excluding self-citations.
This paper addresses research performance monitoring of the social sciences and the humanities using citation analysis. Main differences in publication and citation behavior between the (basic) sciences and the social sciences and humanities are outlined. Limitations of the (S)SCI and A&HCI for monitoring research performance are considered. For research performance monitoring in many social sciences and humanities, the methods used in science need to be extended. A broader range of both publications (including non-ISI journals and monographs) and citation indicators (including non-ISI reference citation values) is needed. Three options for bibliometric monitoring are discussed.
This study presents a general view of the scientific and technological production in the ICT sector in Spain during the period 1990-2002 and its relative weight in the international production, as well as the identification of the main institutional actors and the performance patterns of the researchers in this scientific community through bibliometric techniques, with the aim of exploring the character of its outputs, both in terms of publications and patents. Indicators at macro-meso level are presented by: geographic regions, thematic areas at different aggregation levels, institutional sectors and research centres. Bibliometric indicators may help focus attention on the position and contribution of Spanish ICT science and technological capabilities.
This article analyses the changes in development of journals on social sciences and humanities in Ukraine and shows the results of the comparative analysis of journals on social sciences and humanities in Ukraine, and journals in the world included in relevant databases of the US Institute for Scientific Information (Philadelphia).
The paper discusses an application of bibliometric techniques in the social sciences. While the interest of policy makers is growing, the topic is getting more and more attention from bibliometricians. However, many efforts are put into developing tools to measure scientific output and impact outside the world of the Social Sciences Citation Index, while the use of the SSCI for bibliometric applications is covered with obscurity and myths. This study attempts to clarify some of the topics mentioned against the application of the SSCI for evaluation purposes. The study will cover topics like the existing publication and citation culture within the social sciences, the effect of variable citation windows, and the (geographical) origin of citation flows.
This paper analyzes Internet diffusion among various organizations, based on daily observation of second- level domain name registrations under the ".it" ccTLD. In particular, we analyzed domain names registered by organizations in the non-profit sector.The penetration rate, calculated according to the number of organizations, was computed for various widely separated geographic levels (regions). A concentration analysis was performed in order to determine whether the geographical distribution of Internet use in Italy is less concentrated with respect to both the number of existing institutions and income distribution, suggesting a diffusive effect.Regression analysis was performed using demographic, social, economic and infrastructure indicators. Results show thata "social digital divide"exists, both in terms of geographical distribution (i.e., in macro-areas - Northern, Central, and Southern Italy - and at the regional level) and in terms of the legal status of the organizations, and that this digital divide will probably decrease in the future.
We present a new kind of statistical analysis of science and technical information (STI) in the Web context. We propose a battery of indicators about Web users, used bibliographic records and e-commercial transactions. In addition, we introduce two Web usage factors and we give an overview of the co-usage analysis. For these tasks, we present a computer-based system, called Miri@d, which produces descriptive statistical information about Web users' searching behaviour, and what is effectively used from a free-access digital bibliographical database.
This study demonstrates that the choice of search strategy for article identification has an impact on evaluation and policy analysis of research areas. We have assessed the scientific production in two areas at one research institution during a ten-year period. We explore the recall and precision of three article identification strategies: journal classifications, keywords and authors. Our results show that the different search strategies have varying recall (0.38-1.00) and precision (0.50-1.00). In conclusion, uncritical analysis based on rudimentary article identification strategies may lead to misinterpretation of the development of research areas, and thus provide incorrect data for decision-making.
A comparative analysis of the scientific performance of male and female scientists in the area of Materials Science at the Spanish Council for Scientific Research (CSIC) is presented. Publications of 333 scientists during 1996-2000 are downloaded from the international database Science Citation Index and the national one ICYT. Scientific performance of scientists is studied through different indicators of productivity (number of SCI and ICYT publications), international visibility (average impact factor of publications, percentage of documents in "top journals") and publication practices (%international publications, signing order of authors in the documents and different collaboration measures). Inter-gender differences in the research performance of scientists are studied. Influence of professional category and age are analysed. Although women are less productive than men, no significant differences in productivity are found within each professional category. However, a different life-cycle of productivity is found for men and woman and the most important inter-gender differences in productivity occur at the ages of 40-59.
We estimate the determinants of university patents by route in Spain. National patents are an indicator of R&D efforts when we focus on the region, but not of how regions organize their university or joint research structure. International patents are a stronger indicator of R&D efforts, so they express confidence in the potential of the patent. Neither set is an indicator of proximity to the region's competencies in technologies other than for production-intensive sectors, so they will not always foster regional technology transfer. Since the driving forces of national and international patents differ, the use of both is recommended.
This paper investigates the research collaboration of Korean physicist (1977-2000), using co-authorship method. After discussing on the co-authorship method, this paper suggests the necessity of taking contexts of research collaboration into consideration, especially for scientifically '' peripheral '' countries. The analysis shows an interesting finding with respect to the international collaboration: the proportion of internationally collaborated papers did not increase substantially during the last two decades, which is against the results of other studies. An analysis of the number of participating countries shows that multilateral collaboration has increased considerably in the last 20 years, though the proportion of international research collaboration remains stable. The results of this study give an indirect support to the transformation of research collaboration from rather 'asymmetrical' from 'symmetrical' one.
The main objective of this paper is to observe to what extent research priorities set in R&D policy strategy documents are supported with publication and citation data, delivered from ISI databases. As supporting background information the results of questionnaire sent to the Committee of Senior Officials of the Co-operation in the field of Scientific and Technical Research are used.
The relationship between physicians' research activity and professional performance was examined. We have taken into account their motivation, job satisfaction and the way that research groups are organized. Previous studies have shown that physicians' research activity has a positive effect on clinical practice. They also show that motivation and job satisfaction are positively related with research activity. A group of 278 physicians from hospitals in Madrid were surveyed. A structural model was estimated using SEM techniques. The results indicate both job satisfaction (0.620) and motivation (0.236) are important factors. These results demonstrate the strategic value of both human resources policies and research management practices within hospitals.
Visualization with the algorithm of BibTechMon provides the out-degree as well as the in-degree. The analysis shows that both frequency and co-occurrences of objects (nodes in the network) support the idea of Kleinberg's algorithm. The analysis of the algorithm shows clearly that strongly linked scores lead the iteration to a convergence and give the highest weights. Therefore BibTechMon visualizes the results well.
In this paper we analyze the (historical) co-evolution of technological development and economic progress (by relating public and private R&D investment, patenting, and corporate profitability). We relate to the work of Schmookler(1966), Griliches(1990), Pakes & Griliches(1980) and Pakes(1986) who all have studied the techno-economic interplay by considering patents as in indicator of technological performance. We use United States industry and government data over the period 1953-1998 (45 years). Co-evolution analysis over this period reveals a strong interdependency among the variables. Patent evolution is strongly related to the development of private R&D and corporate profitability; the levels of public and private R&D expenditure in combination with the level of technological output (i.e. patents) have a strong predictive and explanatory power towards corporate profitability (R-2 value of 94.9%). Causality tests reveal a joint determination between R&D investment and corporate profitability (L=2; p < 0.01).
Today's theories and models on innovation stress the importance of scientific capabilities and science-technology proximity, especially in new emerging fields of economic activity. In this contribution we examine the relationship between national scientific capabilities, the science intensity of technology and technological performance within six emergent industrial fields. Our findings reveal that national technological performance is positively associated with scientific capabilities. Countries performing better on a technological level are characterized both by larger numbers of publications and by numbers of involved institutions that exceed average expected values. The latter observation holds for both companies and knowledge generating institutes actively involved in scientific activities. As such, our findings seem to suggest beneficial effects of scientific capabilities shouldered by a multitude of organizations. In addition, higher numbers of patent activity coincide with higher levels of science intensity pointing out the relevance of science 'proximity' when developing technology in newer, emerging fields. Limitations and directions for further research are discussed.
To compare science growth of different countries is both, of theoretical and of pragmatic interest. Using methods for the analysis of complex growth processes introduced by H. E. Stanley and others, we exhibit quantitative features of Chinese science growth from 1986 to 1999 and compare them with corresponding features of western countries. Patterns of growth dynamics of Chinese universities publication output do not differ significantly from those found in the case of western countries. The same is valid for Chinese journals when compared to international journals. In nearly all cases the size distribution of output over universities or journals is near to a lognormal one, the growth rate distribution is Laplace-like, and the standard deviations of the corresponding conditional distributions with regard to size decay according to a power law. This means that regarding some structural-dynamical properties China's recent science system cannot be distinguished from a western one - despite different prehistory and different political and economic environment.
This paper reports the first results of the extension of citation analysis to 'non-source' items, which is one strand of an extensive study of quantitative performance indicators used in the assessment of research. It would be presumptuous to draw firm conclusions from this first foray into the realm of non-source citations, however our analysis is based on an extensive experimental database of over 30,000 publications, so the results can be viewed as strong pointers to possible generalised outcomes. We show that it is possible to mine ISI databases for references to a comprehensive oeuvre of items from whole institutions. Many types of publications are visible in the ISI data - books, book chapters, journals not indexed by ISI, and some conference publications. When applied to the assessment of university departments, they can have a significant effect on rankings, though this does not follow in all cases. The investment of time, effort, and money in a significantly extended analysis will not be equally beneficial in all fields. However, a considerable amount of testing is required to confirm our initial results.
The so-called biotechnology revolution has changed the institutional and knowledge environment of the pharmaceutical industry. The industry incumbents have faced the challenge of adjusting to the new conditions for innovation in drug discovery and development. Drawing on the theoretical framework of the organizational capabilities of the firm, this contribution aims at capturing the changes in the knowledge environment and exploring the adjustment of 4 German corporations (2 companies rooted in the coal tar dyestuff industry and 2 traditional pharmaceutical companies) to the advent of modern biotechnology. Despite the firm-specific capabilities in organic chemical synthesis, the representatives of the coal tar dyestuff industry seem to have been better able to adjust to the external discontinuity in their knowledge environment. The existence of research and development activities, the science-based research tradition together with interactions to access the extramural knowledge base of the firms seem to have been crucial in the perception and adoption of the new technological possibilities of biotechnology after the 1970s, rather than prior competence in biotechnology or the employees with the skills to develop the capabilities to exploit it.
We present a new bibliometric approach to identify research groups in a particular research field. With a combination of bibliometric mapping techniques and network analysis we identify and classify clusters of authors to represent research groups. In this paper we illustrate the application and potential of this approach and present two types of outcomes: actual research groups and potential research groups. The former enables us to define research groups beyond the organizational structure. The latter may be used to identify potential partners for collaboration. Our approach is a starting point to deal with the complex issue of research groups in a changing structure of scientific research.
Bibliometric maps of science are a well-established research subject. But their adoption as a science policy support tool is lacking. We think this is because the user does not immediately comprehend a map and (as a result) is not enticed into using it. To help this comprehension, we propose the use of '' qualitative maps '': an umbrella term for diverse tools such as concept maps and mental maps. We developed a tool that interfaces between a qualitative map and a bibliometric map which lets the user create a correspondence between the distinct vocabularies of the maps. We also conducted two user studies: the first explored the combined use of bibliometric and qualitative maps and the second the preferred format of the map and the word-usage in the description of its elements.
This paper explores scale, scope and trade-off effects in scientific research and education. External conditions may dramatically affect the measurement of performance. We apply the Daraio&Simar's (2005) nonparametric methodology to robustlytake into account these factors and decompose the indicators of productivity accordingly. From a preliminary investigation on the Italian system of universities, we find that economies of scale and scope are not significant factors in explaining research and education productivity. We do not find any evidence of the trade-off research vs teaching. About the trade-off academic publications vs industry oriented research, it seems that, initially, collaboration with industry may improve productivity, but beyond a certain level the compliance with industry expectations may be too demanding and deteriorate the publication profile. Robust nonparametric methods in efficiency analysis are shown as useful tools for measuring and explaining the performance of a public research system of universities.
Journal Citation Identity, Journal Citation Image, and Internationalisation are methods for journal evaluation used for an analysis of the Journal of Documentation(JDOC) which is compared to JASIS(T) and the Journal of Information Science(JIS). The set of analyses contributes to portrait a journal and gives a multifaceted picture. For instance, the Journal Citation Image by the New Journal Diffusion Factor tells that JDOC reaches farther out into the scientific community than the JASIS(T) and JIS. Comparing New Journal Diffusion Factor and Journal Impact Factor illustrates how new information has been added by the new indicator. Furthermore, JDOC is characterised by a higher rate of journal diversity in the references and has a lower number of scientific publications. JDOC authors and citers are affiliated Western European institutions at an increasing rate.
This paper addresses the issue of how science-technology interaction can be measured in the knowledge-driven economy. More specifically, it compares the patent citation indicator to another patent-based measure using data on a small European economy. Patent citation patterns will be compared to researcher patents. Comparing the two indicators suggests different patterns of science-technology linkage. An analysis of revealed technology contributions of academic inventors and a survey-based analysis of technological collaboration and knowledge transfer point to a possible explanation. Furthermore the research presents evidence that suggests technology sectors are related to different modes of collaboration in inventive processes amongst academics.
The article provides an in-depth analysis of previous literature that led to the understanding of the four interactive components of "e" learning and how we can utilize these components to maximize the positive and minimize the negative results of "e" learning. The four interactive dimensions of "e" learning are the following three originally described in Moore's editorial (1989): (1) interaction with the content, (2) interaction with the instructor, (3) interaction with the students, and an additional new fourth dimension, interaction with the system, which considered all of the new computer technology since his article. In our viewpoint we will highlight the impact that this fourth technological interactive dimension has on the results of "e" learning. The question then is not "to 'e' or not to 'e'," since "e" learning is already an essential factor of our contemporary learning environment. The question is how to "e", based on the understanding of the four interactive components of "e" learning, and the understanding that these four types of interactions are different from the ones we are accustomed to in the traditional learning environment.
The outputs of several information filtering (IF) systems can be combined to improve filtering performance. In this article the authors propose and explore a framework based on the so-called information structure (IS) model, which is frequently used in Information Economics, for combining the output of multiple IF systems according to each user's preferences (profile). The combination seeks to maximize the expected payoff to that user. The authors show analytically that the proposed framework increases users expected payoff from the combined filtering output for any user preferences. An experiment using the TREC-6 test collection confirms the theoretical findings.
This is an empirical, experimental investigation of the value of information, as perceived through the willingness to purchase information (WTP) and the willingness to sell it (accept payment, WTA). We examined the effects of source nature: expertise versus content, and source status: copy versus exclusive original of information on the WTA-WTP ratio. In an animated computer simulation of a business game, players could maximize their profits by making choices regarding inventory and prices. Participants were offered the chance to bid for buying or selling information regarding the weather that may affect demand. We find, as hypothesized, that the subjective value of information does indeed follow the predictions of endowment effect theory. The ratio of willingness to accept to willingness to purchase (WTA-WTP) recorded for the 294 subjects resembles the ratio common for private goods, rather than the intuitively expected unity. The WTA-WTP ratios diverged from unity more often and in a more pronounced manner for information traded in the "original" form rather than as a copy of the original, although even for copies the WTA-WTP ratio is still double. The results yield a value of about three for the WTA-WTP ratio for original information whether the source is content or expertise. Copy information received a subjective value that was significantly different (lower) than original information. The implications for both online trading and online sharing of information are discussed.
This comparative case study of the diffusion and nondiffusion over time of eight theories in the social sciences uses citation analysis, citation context analysis, content analysis, surveys of editorial review boards, and personal interviews with theorists to develop a model of the theory functions that facilitate theory diffusion throughout specific intellectual communities. Unlike previous work on the diffusion of theories as innovations, this theory functions model differs in several important respects from the findings of previous studies that employed Everett Rogers's classic typology of "innovation characteristics that promote diffusion." The model is also presented as a contribution to a more integrated theory of citation.
The age distribution of a country's scientists is an important element in the study of its research capacity. In this article we investigate the age distribution of Japanese scientists in order to find out whether major events such as World War 11 had an appreciable effect on its features. Data have been obtained from population censuses taken in Japan from 1970 to 1995. A comparison with the situation in China and the United States has been made. We find that the group of scientific researchers outside academia is dominated by the young: those younger than age 35. The personnel group in higher education, on the other hand, is dominated by the baby boomers: those who were born after World War II. Contrary to the Chinese situation we could not find any influence of major nondemographic events. The only influence we found was the increase in enrollment of university students after World War II caused by the reform of the Japanese university system. Female participation in the scientific and university systems in Japan, though still low, is increasing.
Federated search and distributed information retrieval systems provide a single user interface for searching multiple full-text search engines. They have been an active area of research for more than a decade, but in spite of their success as a research topic, they are still rare in operational environments. This article discusses a prototype federated search system developed for the U.S. government's FedStats Web portal, and the issues addressed in adapting research solutions to this operational environment. A series of experiments explore how well prior research results, parameter settings, and heuristics apply in the FedStats environment. The article concludes with a set of lessons learned from this technology transfer effort, including observations about search engine quality in the "real world."
This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conceptualized and visualized as a time-variant duality between two fundamental concepts in information science: research fronts and intellectual bases. A research front is defined as an emergent and transient grouping of concepts and underlying research issues. The intellectual base of a research front is its citation and co-citation footprint in scientific literature an evolving network of scientific publications cited by research-front concepts. Kleinberg's (2002) burst detection algorithm is adapted to identify emergent research-front concepts. Freeman's (1979) betweenness centrality metric is used to highlight potential pivotal points of paradigm shift over time. Two complementary visualization views are designed and implemented: cluster views and time-zone views. The contributions of the approach are that (a) the nature of an intellectual base is algorithmically and temporally identified by emergent research-front terms, (b) the value of a co-citation cluster is explicitly interpreted in terms of research-front concepts, and (c) visually prominent and algorithmically detected pivotal points substantially reduce the complexity of a visualized network. The modeling and visualization process is implemented in CiteSpace II, a Java application, and applied to the analysis of two research fields: mass extinction (1981-2004) and terrorism (1990-2003). Prominent trends and pivotal points in visualized networks were verified in collaboration with domain experts, who are the authors of pivotal-point articles. Practical implications of the work are discussed. A number of challenges and opportunities for future studies are identified.
With the rapid proliferation of Internet technologies and applications, misuse of online messages for inappropriate or illegal purposes has become a major concern for society. The anonymous nature of online-message distribution makes identity tracing a critical problem. We developed a framework for authorship identification of online messages to address the identity-tracing problem. In this framework, four types of writing-style features (lexical, syntactic, structural, and content-specific features) are extracted and inductive learning algorithms are used to build feature-based classification models to identify authorship of online messages. To examine this framework, we conducted experiments on English and Chinese online-newsgroup messages. We compared the discriminating power of the four types of features and of three classification techniques: decision trees, back-propagation neural networks, and support vector machines. The experimental results showed that the proposed approach was able to identify authors of online messages with satisfactory accuracy of 70 to 95%. All four types of message features contributed to discriminating authors of online messages. Support vector machines outperformed the other two classification techniques in our experiments. The high performance we achieved for both the English and Chinese datasets showed the potential of this approach in a multiple-language context.
Homepages usually describe important semantic information about conceptual or physical entities; hence, they are the main targets for searching and browsing. To facilitate semantic-based information retrieval (IR) at a Web site, homepages can be identified and classified under some predefined concepts and these concepts are then used in query or browsing criteria, e.g., finding professor homepages containing "information retrieval." In some Web sites, relationships may also exist among homepages. These relationship instances (also known as homepage relationships) enrich our knowledge about these Web sites and allow more expressive semantic-based IR. In this article, we investigate the features to be used in mining homepage relationships. We systematically develop different classes of inter-homepage features, namely, navigation, relative-location, and common-item features. We also propose deriving for each homepage a set of support pages to obtain richer and more complete content about the entity described by the homepage. The homepage together with its support pages are known to be a Web unit By extracting inter-homepage features from Web units, our experiments on the WebKB dataset show that better homepage relationship mining accuracies can be achieved.
In this article we present an empirical approach to the study of the statistical properties of bibliometric indicators on a very relevant but not simply "available" aggregation level: the research group. We focus on the distribution functions of a coherent set of indicators that are used frequently in the analysis of research performance. In this sense, the coherent set of indicators acts as a measuring instrument. Better insight into the statistical properties of a measuring instrument is necessary to enable assessment of the instrument itself. The most basic distribution in bibliometric analysis is the distribution of citations over publications, and this distribution is very skewed. Nevertheless, we clearly observe the working of the central limit theorem and find that at the level of research groups the distribution functions of the main indicators, particularly the journal-normalized and the field-normalized indicators, approach normal distributions. The results of our study underline the importance of the idea of "group oeuvre," that is, the role of sets of related publications as a unit of analysis.
In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
The evolution of scientific fields analyzed by co-word analysis and presented in strategic diagrams is simulated based on the law of cumulative advantages - the probability of a new tie between two keywords depends positively on the frequencies in which both keywords have taken part already. The results we get from simulations are compared with the results of real scientific field evolution. We consider the high correspondence of both to be a proof of the working of the law of cumulative advantages in the development of scientific fields and we believe that our research opens new possibilities for predictions of the development of scientific fields.
A literature review uncovered six distinctive indicators of failed information epidemics in the scientific journal literature: (1) presence of seminal papers(s), (2) rapid growth/decline in author frequency, (3) multi-disciplinary research, (4) epidemic growth/decline in journal publication frequency, (5) predominance of rapid communication journal publications, and (6) increased multi-authorship. These indicators were applied to journal publication data from two known failed information epidemics, Polywater and Cold Nuclear Fusion. Indicators 1-4 were distinctive of the failed epidemics, Indicator 6 was not, and Indicator 5 might be. Further bibliometric study of these five indicators in the context of other epidemic literatures needed.
Many abstracts submitted to medical meetings never come to full publication in peer-reviewed journals. From the 2,992 abstracts presented at the 1994-1998 Spanish Congresses of Radiology, 464 (15%) were published as full articles in journals covered by the Medline and IME (Aindice Medico Espanol), the Spanish medical database. The publication rate of oral presentations was higher than that of posters (18% versus 13%). Collaboration between radiologists and clinicians and between radiologists from different institutions increased full publication (21% and 27%, respectively) compared to abstracts from just one institution (14%). Therefore, oral presentations, multi-disciplinary and multi-institutional collaboration in the abstract predicted full publication.
This paper analyses several issues that arise when measuring technological specialisation with patent data. Three starting choices are required regarding the data source, the statistical measure and the sectoral aggregation level. We show that the measure is highly sensitive to the data source and to the level of sectoral aggregation. The statistical analysis further suggests that the most stable and reliable measures of technological specialization are obtained with patents applied at the EPO, with Gini or C20 as statistical measure and the 4-digits aggregation level of the IPC classification system.
International R&D activities have grown significantly over the last two decades. Both the number of actors involved, as well as the importance of the technological activity carried out abroad, has considerably increased. We aim to quantify the international generation of knowledge for the case of Belgium, using indicators based on EPO and USPTO patent data (1978-2001). We distinguish among Belgian applicants, affiliates of foreign firms located in Belgium as well as Belgian based firms with affiliates abroad. This approach allows to improve existing indicators of internationalisation of technology based on patent data. The results are consistent with what can be expected for a small open economy as Belgium. A large part of patents with Belgian inventors are assigned to Belgian affiliates of foreign firms. Hence our more complete indicator of foreign ownership gives a substantially higher foreign control of Belgian inventors. Relatively more knowledge generated by Belgian inventors flows out of the country towards foreign owners of technology, than that knowledge generated abroad is owned by Belgian patent applicants. But the share of foreign inventors to Belgian assigned patents is considerably increasing over time, especially in the subcategory of Belgian firms with foreign affiliates.
The Science Citation Index (SCI) with its coverage of journals has been forming a criterion for the performance assessment of researchers worldwide. If the journals of a specialty were under-proportionally indexed, its development in research could be distorted in the long term. A MEDLINE-based bibliometric analysis of research output by family medicine departments in Taiwan from 1990 to 2003 might help to provide some evidence of the influence of SCI on the developing disciplines.
Quantitative and qualitative scientific evaluations of the research performance of Thai researchers were carried out with regards to their international publications and citations in four different subject categories; namely Clinical Medicine, Chemistry, Material Sciences, and Engineering. This work used citations to publications of Thai researchers in the Science Citation Index (SCI) database during 1998-2002 as a data source. The calculations and comparisons of article impact factors (AIF), position impact factors (PIF) and journal impact factors (JIF) were attempted for quantitative evaluation.The positions and significance levels (cited contents) of the citations were considered for qualitative assessment.For quantitative evaluation, the highest article quantity and number of times cited were given by Thai researchers in Clinical Medicine, the lowest being for Material Sciences. Clinical Medicine had the highest AIF value, while Engineering exhibited the lowest. Each article by Thai researchers was found to be cited more than once within a citing article, especially articles in Clinical Medicine. For qualitative assessment, most articles from Thai scholars were cited in Introduction and Results & Discussion sections of the citing articles. Only non-Thai researchers in Clinical Medicine preferred to use Discussion from Thais' articles for discussion of their work whereas those in Chemistry, Material Sciences and Engineering were referred as general references. Less than 1.5% of research works of Thai scholars were cited as "the pioneer"for the research communities of the subject categories of interest.
In a recent paper [H. F. MOED, E. GARFIELD: In basic science the percentage of "authoritative"references decreases as bibliographies become shorter. Scientometrics 60 (3) (2004) 295-303] the authors show, experimentally, the validity of the statement in the title of their paper. In this paper we give a general informetric proof of it, under certain natural conditions. The proof is given both in the discrete and the continuous setting. An easy corollary of this result is that the fraction of non-authoritative references increases as bibliographies become shorter. This finding is supported by a set of data of the journal Information Processing and Management (2002 + 2003) with respect to the fraction of conference proceedings articles in reference lists.
An evaluation of Turkey's science and technology (S & T) policy in the last two decades has been made by using various indicators of S & T and technological innovation. National trends in inputs for research and development (R & D) activities, publication output and patent data have been studied for the implications of the S & T policy from 1983 to 2003. Some of the findings on the outcomes of policy measures in terms of inputs to R & D and publication output are as follows: (1) Total R & D expenditure, as percent of gross domestic product (GDP), increased from 0.32% in 1990 to 0.67% in 2002, (2) the fraction of R & D in the total expenditure for technological innovation increased from 6.6% in 1995-1997 to 29.2% in 1998-2000, and (3) the number of papers in the journals covered in the Science Citation Index (SCI) of the Institute for Scientific Information (ISI) increased from 464 in 1983 to 12160 in 2003 - a more than 26-fold increase in the last two decades.
Supplying library users with literature by a seamless linking of media is the goal of (scientific) libraries. By the digitization of primary and secondary data and the convergence of products and providers, libraries have already come very close to achieving this ideal. A digital library is the realization of this goal. However, many librarians are in danger of running out of imagination. What will come after the digital library? Will information professionals still be needed? What services can libraries offer? Bibliometric analysis is an example of new business areas in libraries. This paper will discuss what shape this service could take in practice, who needs it and what target groups exist in the scientific environment. Concrete examples of bibliometric analysis from the Central Library of Research Centre Julich will round off the overview.
This study documents a decade of mainstream research output by European economics institutions. In contrast to previous European economics departmental rankings, we investigate the changing pattern of the ranking over two subperiods and a total decade. The validity of our bibliometric approach is demonstrated by a comparison with gradings of UK economic departments in the 2001 Research Assessment Exercise (RAE). We also provide some explanation of the ranking based on regional factors and institutional features. Strong evidence for the 'institutional oligopoly' of editors and authors hypothesis is found. However, in a dynamic context this departmental concentration of authorship and editorial board membership does not represent a 'closed shop'. We find several departments entering the centre stage of economic mainstream for the first time towards the end of the 1990s.
Most Web search tools integrate sponsored results with results from their internal editorial database in providing results to users. The goal of this research is to get a better idea of how much of the screen real estate displays "real" editorial results as compared to sponsored results. The overall average results are that 40% of all results presented on the first screen are "real" results, and when the entire first Web page is considered, 67% of the results are nonsponsored results. For general search tools such as Google, 56% of the first screen and 82% of the first Web page contain nonsponsored results. Other results include that query structure makes a significant difference in the percentage of nonsponsored results returned by a search. Similarly, the topic of the query also can have a significant effect on the percentage of sponsored results displayed by most Web search tools.
The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.
The importance of trust in building and maintaining consumer relationships in the online environment is widely accepted in the Information Systems literature. A key challenge for researchers is to identify antecedent variables that engender consumer trust in Internet shopping. This paper adopts a multidisciplinary approach and develops an integrative model of consumer trust in Internet shopping through synthesizing the three diverse trust literatures. The social psychological perspective guides us to include perceived trustworthiness of Internet merchants as the key determinant of consumer trust in Internet shopping. The sociological viewpoint suggests the inclusion of legal framework and third-party recognition in the research model. The views of personality theorists postulate a direct effect of propensity to trust on consumer trust in Internet shopping. The results of this study provide strong support for the research model and research hypotheses, and the high explanatory power illustrates the complementarity of the three streams of research on trust. This paper contributes to the conceptual and empirical understanding of consumer trust in Internet shopping. Implications of this study are noteworthy for both researchers and practitioners.
In recent years, many evaluations of Web sites have been conducted, and relevant researches have also been carried out in academic circles. Correspondence analysis is introduced in this paper to evaluate university library Web sites through building a correspondence analysis model. This paper gives suggestions as to how to construct university library Web sites based on analysis and summary of evaluation results, in a bid to strengthen the construction of university library Web sites.
The Web has become a large repository of documents (or pages) written in many different languages. In this context, traditional information retrieval (IR) techniques cannot be used whenever the user query and the documents being retrieved are in different languages. To address this problem, new cross-language information retrieval (CLIR) techniques have been proposed. In this work, we describe a method for cross-language retrieval of medical information. This method combines query terms and related medical concepts obtained automatically through a categorization procedure. The medical concepts are used to create a linguistic abstraction that allows retrieval of information in a language-independent way, minimizing linguistic problems such as polysemy. To evaluate our method, we carried out experiments using the OHSUMED test collection, whose documents are written in English, with queries expressed in Portuguese, Spanish, and French. The results indicate that our cross-language retrieval method is as effective as a standard vector space model algorithm operating on queries and documents in the same language. Further, our results are better than previous results in the literature.
The explosive growth of the Web and the consequent exigency of the Web personalization domain have gained a key position in the direction of customization of the Web information to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user's navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. This work presents an agent-based framework designed to help a user in achieving personalized navigation, by recommending related documents according to the user's responses in similar-pages searching mode. Our agent-based approach is grounded in the integration of different techniques and methodologies into a unique platform featuring user profiling, fuzzy multisets, proximity-oriented fuzzy clustering, and knowledge-based discovery technologies. Each of these methodologies serves to solve one facet of the general problem (discovering documents relevant to the user by searching the Web) and is treated by specialized agents that ultimately achieve the final functionality through cooperation and task distribution.
Peer-to-peer (P2P) applications are rapidly gaining acceptance among users of Internet-based services, especially because of their capability of exchanging resources while preserving the anonymity of both requesters and providers. However, concerns have been raised about the possibility that malicious users can exploit the network to spread tampered-with resources (e.g., malicious programs and viruses). A considerable amount of research has thus focused on the development of trust and reputation models in P2P networks. In this article, we propose to use fuzzy techniques in the design of reputation systems based on collecting and aggregating peers' opinions. Fuzzy techniques are used in the evaluation and synthesis of all the opinions expressed by peers. The behavior of the proposed system is described by comparison with probabilistic approaches.
An evaluation methodology based on fuzzy computing with words aimed at measuring the information quality of Web sites containing documents is presented. This methodology is qualitative and user oriented because it generates linguistic recommendations on the information quality of the content-based Web sites based on users' perceptions. It is composed of two main components, an evaluation scheme to analyze the information quality of Web sites and a measurement method to generate the linguistic recommendations. The evaluation scheme is based on both technical criteria related to the Web site structure and criteria related to the content of information on the Web sites. It is user driven because the chosen criteria are easily understandable by the users, in such a way that Web visitors can assess them by means of linguistic evaluation judgments. The measurement method is user centered because it generates linguistic recommendations of the Web sites based on the visitors' linguistic evaluation judgments. To combine the linguistic evaluation judgments we introduce two new majority guided linguistic aggregation operators, the Majority guided Linguistic Induced Ordered Weighted Averaging (MLIOWA) and weighted MLIOWA operators, which generate the linguistic recommendations according to the majority of the evaluation judgments provided by different visitors. The use of this methodology could improve tasks such as information filtering and evaluation on the World Wide Web.
We point out that question-answering systems differ from other information-seeking applications, such as search engines, by having a deduction capability, an ability to answer questions by a synthesis of information residing in different parts of its knowledge base. This capability requires appropriate representation of various types of human knowledge, rules for locally manipulating this knowledge, and a framework for providing a global plan for appropriately mobilizing the information in the knowledge to address the question posed. In this article we suggest tools to provide these capabilities. We describe how the fuzzy set-based theory of approximate reasoning can aid in the process of representing knowledge. We discuss how protoforms can be used to aid in deduction and local manipulation of knowledge. The idea of a knowledge tree is introduced to provide a global framework for mobilizing the knowledge base in response to a query. We look at some types of common-sense and default knowledge. This requires us to address the complexity of the nonmonotonicity that these types of knowledge often display. We also briefly discuss the role that Dempster-Shafer structures can play in representing knowledge.
This article presents a semantic-based Web retrieval system that is capable of retrieving the Web pages that are conceptually related to the implicit concepts of the query. The concept of "concept" is managed from a fuzzy point of view by means of semantic areas. In this context, the proposed system improves most search engines that are based on matching words. The key of the system is to use a new version of the Fuzzy Interrelations and Synonymy-Based Concept Representation Model (FIS-CRM) to extract and represent the concepts contained in both the Web pages and the user query. This model, which was integrated into other tools such as the Fuzzy Interrelations and Synonymy based Searcher (FISS) metasearcher and the fz-mail system, considers the fuzzy synonymy and the fuzzy generality interrelations as a means of representing word interrelations (stored in a fuzzy synonymy dictionary and ontologies). The new version of the model, which is based on the study of the cooccurrences of synonyms, integrates a soft method for disambiguating word senses. This method also considers the context of the word to be disambiguated and the thematic ontologies and sets of synonyms stored in the dictionary.
In this article we present a system for the exploration of video sequences. The system, GAMBAL for the Exploration of Video Sequences (GAMBAL-EVS), segments video sequences, extracting an image for each shot, and then clusters such images and presents them in a visualization system. The system allows the user to find similarities between images and to proceed through the video sequences to find the relevant ones.
The Decision Tree Forest (DTF) is an architecture for information retrieval that uses a separate decision tree for each document in a collection. Experiments were conducted in which DTFs working with the incremental tree induction (ITI) algorithm of Utgoff, Berkman, and Clouse (1997) were trained and evaluated in the medical and word processing domains using the Cystic Fibrosis and SIFT collections. Performance was compared with that of a conventional inverted index system (IIS) using a BM25-derived probabilistic matching function. Initial results using DTF were poor compared to those obtained with IIS. We then simulated scenarios in which large quantities of training data were available, by using only those parts of the document collection that were well covered by the data sets. Consequently, the retrieval effectiveness of DTF improved substantially. In one particular experiment, precision and recall for DTF were 0.65 and 0.67 respectively, values that compared favorably with values of 0.49 and 0.56 for IIS.
The aggregated citation relations among journals included in the Science Citation Index provide us with a huge matrix, which can be analyzed in various ways. By using principal component analysis or factor analysis, the factor scores can be employed as indicators of the position of the cited journals in the citing dimensions of the database. Unrotated factor scores are exact, and the extraction of principal components can be made stepwise because the principal components are independent. Rotation may be needed for the designation, but in the rotated solution a model is assumed. This assumption can be legitimated on pragmatic or theoretical grounds. Because the resulting outcomes remain sensitive to the assumptions in the model, an unambiguous classification is no longer possible in this case. However, the factor-analytic solutions allow us to test classifications against the structures contained in the database; in this article the process will be demonstrated for the delineation of a set of biochemistry journals.
In this article, we introduce a new information system evaluation method and report on its application to a collaborative information seeking system, AntWorld. The key innovation of the new method is to use precisely the same group of users who work with the system as judges, a system we call Cross-Evaluation. In the new method, we also propose to assess the system at the level of task completion. The obvious potential limitation of this method is that individuals may be inclined to think more highly of the materials that they themselves have found and are almost certain to think more highly of their own work product than they do of the products built by others. The keys to neutralizing this problem are careful design and a corresponding analytical model based on analysis of variance. We model the several measures of task completion with a linear model of five effects, describing the users who interact with the system, the system used to finish the task, the task itself, the behavior of individuals as judges, and the self-judgment bias. Our analytical method successfully isolates the effect of each variable. This approach provides a successful model to make concrete the "three-realities" paradigm, which calls for "real tasks," "real users," and "real systems.".
As illustrated by the World Wide Web, the volume of information in languages other than English has grown significantly in recent years. This highlights the importance of multilingual corpora. Much effort has been devoted to the compilation of multilingual corpora for the purpose of cross-lingual information retrieval and machine translation. Existing parallel corpora mostly involve European languages, such as English-French and English-Spanish. There is still a lack of parallel corpora between European languages and Asian languages. In the authors' previous work, an alignment method to identify one-to-one Chinese and English title pairs was developed to construct an English-Chinese parallel corpus that works automatically from the World Wide Web, and a 100% precision and 87% recall were obtained. Careful analysis of these results has helped the authors to understand how the alignment method can be improved. A conceptual analysis was conducted, which includes the analysis of conceptual equivalent and conceptual information alternation in the aligned and nonaligned English-Chinese title pairs that are obtained by the alignment method. The result of the analysis not only reflects the characteristics of parallel corpora, but also gives insight into the strengths and weaknesses of the alignment method. In particular, conceptual alternation, such as omission and addition, is found to have a significant impact on the performance of the alignment method.
Named entities are major constituents of a document but are usually unknown words. This work proposes a systematic way of dealing with formulation, transformation, translation, and transliteration of multilingual-named entities. The rules and similarity matrices for translation and transliteration are learned automatically from parallel-named-entity corpora. The results are applied in cross-language access to collections of images with captions. Experimental results demonstrate that the similarity-based transliteration of named entities is effective, and runs in which transliteration is considered outperform the runs in which it is neglected.
Users' cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this article, the authors investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. They propose a Web-based term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine. This approach can enhance the construction of a domain-specific bilingual lexicon and bring multilingual support to a digital library that only has monolingual document collections. Very promising results have been obtained in generating effective translation equivalents for many unknown terms, including proper nouns, technical terms, and Web query terms, and in assisting bilingual lexicon construction for a real digital library system.
As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIR), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC) collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English-Chinese Web portal that incorporates various CLIR techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision over simple word-byword translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0% improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise.
Based on the salient features of the documents, automatic text summarization systems extract the key sentences from source documents. This process supports the users in evaluating the relevance of the extracted documents returned by information retrieval systems. Because of this tool, efficient filtering can be achieved. Indirectly, these systems help to resolve the problem of information overloading. Many automatic text summarization systems have been implemented for use with different languages. It has been established that the grammatical and lexical differences between languages have a significant effect on text processing. However, the impact of the language differences on the automatic text summarization systems has not yet been investigated. The authors provide an impact analysis of language difference on automatic text summarization. It includes the effect on the extraction processes, the scoring mechanisms, the performance, and the matching of the extracted sentences, using the parallel corpus in English and Chinese as the tested object. The analysis results provide a greater understanding of language differences and promote the future development of more advanced text summarization techniques.
In this article the authors present Eurovision, a text-based system for cross-language (CL) image retrieval. The system is evaluated by multilingual users for two search tasks with the system configured in English and five other languages. To the authors' knowledge, this is the first published set of user experiments for CL image retrieval. They show that (a) it is possible to create a usable multilingual search engine using little knowledge of any language other than English, (b) categorizing images assists the user's search, and (c) there are differences in the way users search between the proposed search tasks. Based on the two search tasks and user feedback, they describe important aspects of any CL image retrieval system.
A novel and complex form of information access is cross-language information retrieval: searching for texts written in foreign languages based on native language queries. Although the underlying technology for achieving such a search is relatively well understood, the appropriate interface design is not. The authors present three user evaluations undertaken during the iterative design of Clarity, a cross-language retrieval system for low-density languages, and shows how the user-interaction design evolved depending on the results of usability tests. The first test was instrumental to identify weaknesses in both functionalities and interface; the second was run to determine if query translation should be shown or not; the final was a global assessment and focused on user satisfaction criteria. Lessons were learned at every stage of the process leading to a much more informed view of what a cross-language retrieval system should offer to users.
Chinese science has developed rapidly over the latest fifteen years. It is said that it is now in a quantitative expansion phase. A series of programmes extending over a period of twenty years has resulted in more than 160 Key Labs and nearly 400 Open Labs at present. The organization and evaluation of this system of labs is one of the strategic measures for scientific resource reorganization in China. The role played by these labs is analysed in this article using data front the Chinese Science Citation Database (CSCD) and the Science Citation Index (SCI). Nowadays almost one quarter of all internationally oriented Chinese publications originate front these labs. The same is true for citations received by Chinese scientists in the SCI. Comparisons between SCI-based and CSCD-based performance results show that the relative academic impact of Key Labs and Open Labs is more international than domestic. Key Labs have a higher total production and receive more citations than Open Labs. Yet their impact, measured as citations per publication, is very similar. We conclude that when it comes to impact on the international scene, these labs have not yet led to a big step forward for Chinese science as a whole. The fact that in the year 2004 a new evaluation procedure has been put in place means that the Chinese scientific authorities have recognized this fact and are dealing with it.
We use a method that captures the intrinsic metrics of variables in a cross-tabulation to analyze data on the association between referee recommendations and editorial decisions at two scholarly journals. The method enables researchers to (1) determine the number of latent dimensions needed to account for this association, and (2) estimate scale values for both the referee-recommendation and the editorial-decision categories. We show that one latent dimension is sufficient to account for the association at each journal, and that both referee-recommendation categories and editorial-decision categories have scale values on the dimension that are consistent with their ostensible meanings.
The difference between individual social capital and organizational (or corporate) social capital has been an important topic of research in sociology during the past decade. The existence of this difference between two forms of social capital evokes an old question in a new manner what matters most in explaining individual actors' performance? Is it personal social or collective resources provided by the organization to which the individuals belong and in which they work? In this paper we provide a preliminary answer to this question based on a multi-level network study of the top 'elites' in French cancer research during 1996-1998. By multi-level we mean that we reconstituted both the inter-organizational networks of exchange between most French laboratories carrying out cancer research in 1999; simultaneously, we reconstituted key social networks of the top individual elites in cancer research in France during that same year. Given our 'linked design' (i.e., knowing to which laboratory each researcher belongs), we were able to disentangle the effects of structural properties of the laboratory front the effects of characteristics of the individual researcher (including structural ones) on the latter's performance. Performance was measured by a score based on the impact factor of the journal in which each researcher published. Our results show that organizational social capital matters more, and more consistently, than individual relational capital in explaining variations in performance by French top cancer researchers.
Internationality as a concept is being applied ambiguously, particularly in the world of academic journal publication. Although different criteria are used by scientometrists in order to measure internationality and to supplement its minimal literal meaning, the present study suggests that no single criterion alone is sufficient. This paper surveys, critically-assesses and extends the existing measures of internationality in the context of academic publishing and identifies those criteria that are most clearly resolved and amenable to quantitative analysis. When applied, however, to a case study of four thematically-connected journals from the field of Health and Clinical Psychology using descriptive statistics and the Gini Coefficient, the measurement of internationality using these criteria was found to be ambiguous. We conclude that internationality is best viewed as a mathematically fuzzy entity and that a single measure Internationality Index, constructed from a combination of suitably weighted criteria, is the only way to unambiguously quantify the degree of internationality.
In the present paper, the evolution of publication activity and citation impact in Brazil is studied for the period 1991-2003. Besides the analysis of trends in publication and citation patterns and of national publication profiles, an attempt is made to find statistical evidences of the relation between international co-authorship and both research profile and citation impact in the Latin American region. Despite similarities and strong co-publication links with the other countries in the region, Brazil has nonetheless a specific research profile, and forms the largest potential in the region.
In the present study a bibliometric meso-level analysis of Brazilian scientific research is conducted. Both sectoral and publication profile of Brazilian universities and research institutions are studied. Publication dynamics and changing profiles allow to the conclusion that powerful growth of science in Brazil goes with striking structural changes. By contrast, citation-based indicators reflect less spectacular developments.
The standardization of distribution fitting procedures is recommended also in informetrics. We examined the possibility of that standardization when fitting the Zipf-Mandelbrot (ZM) distribution. After propositions of possible steps of standardization, we stress the unique role of maximum likelihood estimates concerning the chi-square goodness-of-fit tests. We touch upon the possible correlation between the parameters of the ZM distribution. A numerical example demonstrates the method and the results.
This study analysed how leadership and organizational support (LOS) influences creative knowledge environments for research groups in biotechnology. A questionnaire distributed to 90 (97% responding) university and business company researchers resulted in that leadership was rated higher than organizational support. First, leaders were more important to creativity than organizational support. Secondly, LOS differed to a limited extent between members and leaders, universities and business companies and excellent and less excellent groups. Thirdly. working freedom was rated higher in universities than in business companies. Fourthly, group members perceived they were more encouraged to think freely in comparison to their group leaders. Finally, innovation goals were more pronounced in excellent than less excellent groups.
Clustering algorithms are used prominently in co-citation analysis by analysts aiming to reveal research streams within a field. However, clustering of widely cited articles is not robust to small variations in citation patterns. We propose an alternative algorithm, dense network sub-grouping, which identifies dense groups of co-cited references. We demonstrate the algorithm using a data set from the field of family business research and compare it to two alternative methods, multidimensional scaling and clustering. We also introduce a free software tool, Sitkis. that implements the algorithm and other common bibliometric methods. The software identifies journal-, country- and university-specific citation patterns and co-citation groups, enabling the identification of "invisible colleges.".
The need for effective identity matching systems has led to extensive research in the area of name search. For the most part, such work has been limited to English and other Latin-based languages. Consequently, algorithms such as Soundex and n-gram matching are of limited utility for languages such as Arabic, which has vastly different morphologic features that rely heavily on phonetic information. The dearth of work in this field is partly caused by the lack of standardized test data. Consequently, we have built a collection of 7,939 Arabic names, along with 50 training queries and 111 test queries. We use this collection to evaluate a variety of algorithms, including a derivative of Soundex tailored to Arabic (ASOUNDEX), measuring effectiveness by using standard information retrieval measures. Our results show an improvement of 70% over existing approaches.
Document keyphrases provide a concise summary of a document's content, offering semantic metadata summarizing a document. They can be used in many applications related to knowledge management and text mining, such as automatic text summarization, development of search engines, document clustering, document classification, thesaurus construction, and browsing interfaces. Because only a small portion of documents have keyphrases assigned by authors, and it is time-consuming and costly to manually assign keyphrases to documents, it is necessary to develop an algorithm to automatically generate keyphrases for documents. This paper describes a Keyphrase Identification Program (KIP), which extracts document keyphrases by using prior positive samples of human identified phrases to assign weights to the candidate keyphrases. The logic of our algorithm is: The more keywords a candidate keyphrase contains and the more significant these keywords are, the more likely this candidate phrase is a keyphrase. KIP's learning function can enrich the glossary database by automatically adding new identified keyphrases to the database. KIP's personalization feature will let the user build a glossary database specifically suitable for the area of his/her interest. The evaluation results show that KIP's performance is better than the systems we compared to and that the learning function is effective.
We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
A new document representation model is presented in this paper. This model is based on the idea of representing a document by two or more pictures of the document taken from different perspectives. It is shown that by applying the stereo representation model, enhanced textual retrieval performance is achieved because the new model improves the capability of capturing individual features of the document. Experiments have been conducted on two standard corpora, TIME and ADI, using the standard term vector method and the latent semantic indexing (LSI) method based upon both the stereo representation model and the traditional representation model. Statistical t-tests on the experimental results have convincingly illustrated that these methods achieve significant improvements in retrieval performances with the stereo representation model over those with the traditional representation model.
The authors describe a set of best practices that were developed to assist in the design of search user interfaces. Search user interfaces represent a challenging design domain because novices who have no desire to learn the mechanics of search engine architecture or algorithms often use them. These can lead to frustration and task failure when it is not addressed by the user interface. The best practices are organized into five domains: the corpus, search algorithms, user and task context, the search interface, and mobility. In each section the authors present an introduction to the design challenges related to the domain and a set of best practices for creating a user interface that facilitates effective use by a broad population of users and tasks.
The author describes some of the challenges, decisions, and processes that affected the design and development of the search user interface for Version 2 of the Digital Library for Earth System Education (DLESE; www.dlese.org), released July 29, 2003. The DLESE is a community-led effort funded by the National Science Foundation and is part of the National Science Digital Library (NSDL).
A new search paradigm, in which the primary user activity is the guided exploration of a complex information space rather than the retrieval of items based on precise specifications, is proposed. The author claims that this paradigm is the norm in most practical applications, and that solutions based on traditional search methods are not effective in this context. He then presents a solution based on dynamic taxonomies, a knowledge management model that effectively guides users to reach their goal while giving them total freedom in exploring the information base. Applications, benefits, and current research are discussed.
User interfaces of Web search engines reflect attributes of the underlying tools used to create them, rather than what we know about how people look for information. In this article, the author examines several characteristics of user search behavior: the variety of information-seeking goals, the cultural and situational context of search, and the iterative nature of the search task. An analysis of these characteristics suggests ways that interfaces can be redesigned to make searching more effective for users.
Progress in search interfaces requires vigorous inquiry into how search features can be embedded into application environments such as those for decision-making, personal information collecting, and designing. Progress can be made by focusing on mid-level descriptions of how search components can draw upon and update workspace content and structure. The immediate goal is to advance our understanding of how to shape and exploit context in search. The long-term goal is to develop an interdisciplinary design resource that enables stakeholders in the computing, social, and information sciences to more richly impact each others'work.
The authors describe user interface tools based on search histories to support legal information seekers. The design of the tools was informed by the results of a user study (Komlodi, 2002a) that examined the use of human memory, external memory aids, and search histories in legal information seeking and derived interface design recommendations for information storage and retrieval systems. The data collected were analyzed to identify potential task areas where search histories can support information seeking and use. The results show that many information-seeking tasks can take advantage of automatically and manually recorded history information. These findings encouraged the design of user interface tools building on search history information: direct search history displays, history-enabled scratchpad facilities, and organized results collection tools.
Usability evaluations and observations of users shopping at Amazon.com (http://www.amazon.com) revealed some interesting user behaviors. The mixed behavior patterns were leveraged to create an interface for an e-commerce product. The author describes some design practices for providing a scoped search interface for an e-commerce site.
Recent research highlights the potential relevance of emotions in interface design. People can no longer be modeled as purely goal-driven, task-solving agents: They also have affective motivations for their choices and behavior implying an extended mandate for search design. Absent from current Web design practice, however, is a pattern for emotive criticism and design reflecting these new directions. Further, discussion of emotions and Web design is not limited to visual design or aesthetic appeal: Emotions users have as they interact with information also have design implications. The author outlines a framework for understanding users' emotional states as they seek information on the Web. It is inspired largely by Carol Kuhlthau's (1991,1993,1999) work in library services, particularly her information searching process (ISP), which is adapted to Web design practice. A staged approach resembling traditional models of information seeking behavior is presented here as the basis for creating appropriate search and navigation systems. This user-centered framework is flexible and solution-oriented, enjoys longevity, and considers affective factors. Its aim is a more comprehensive, conceptual analysis of the user's entire information search experience.
The author presents a design case study of a search user interface for Web catalogs in the context of online shopping for consumer products such as clothing, furniture, and sporting goods. The case study provides a review of the user data for the user interface (UI), and the resulting redesign recommendations. Based on the case study and its user data, a set of common user requirements for searching in the context of online shopping is provided.
An evidence-based practice approach to search interface design is proposed, with the goal of designing interfaces that adequately support search strategy formulation and reformulation. Relevant findings from studies of information professionals' searching behaviors, end users' searching of bibliographic databases, and search behaviors on the Web are highlighted. Three brief examples are presented to illustrate the ways in which findings from such studies can be used to make decisions about the design of search interfaces. If academic research can be effectively connected with design practice, we can discover which design practices truly are "best practices" and incorporate them into future search interfaces.
The Internet is a medium for education, entertainment, communication, and personal expression. User behavior has developed three main modalities for using this medium effectively-searching, browsing, and monitoring-which are supported to different degrees by conventional tools. Understanding the nature of the interaction allows us to design and implement a system called Mitsukeru to support browsing behaviors, while retaining the free-form movements between other interaction styles. The system uses agent-based modeling and look-ahead to provide informative yet nonintrusive guidance to the user, and is described in detail.
The search tools familiar from the personal computer are propagating to mobile devices. Are users willing to type keywords with the limited keypad of an ordinary mobile phone? How does mobile search differ from stationary search? The author found that users are surprisingly willing to use search also with the traditional phone keypad, and foresees a search revolution as mobile devices enable location-based search.
Almost all Web searches are carried out while the user is sitting at a conventional desktop computer connected to the Internet. Although online, handheld, mobile search offers new possibilities, the fast-paced, focused style of interaction may not be appropriate for all user search needs. The authors explore an alternative, relaxed style for Web searching that asynchronously combines an off line handheld computer and an online desktop personal computer. They discuss the role and utility of such an approach, present a tool to meet these user needs, and discuss its relation to other systems.
In contrast to traditional information retrieval systems, which return ranked lists of documents that users must manually browse through, a question answering system attempts to directly answer natural language questions posed by the user. Although such systems possess language-processing capabilities, they still rely on traditional document retrieval techniques to generate an initial candidate set of documents. In this article, the authors argue that document retrieval for question answering represents a task different from retrieving documents in response to more general retrospective information needs. Thus, to guide future system development, specialized question answering test collections must be constructed. They show that the current evaluation resources have major shortcomings; to remedy the situation, they have manually created a small, reusable question answering test collection for research purposes. In this article they describe their methodology for building this test collection and discuss issues they encountered regarding the notion of "answer correctness."
Producers in creative genres are frequently motivated by goals that put those producers in opposition to popular culture and marketplace pressures. Questions about whether those goals reflect values that belong specifically to print culture, or whether those values will continue to motivate producers in creative genres after the introduction of online technology, have not been answered empirically. Previous studies of genre change have been among those that have focused on the ability of human actors to use information technology to alter those genres as social structures. However, these studies have focused on generic artifacts rather than on the creative values that motivated the creation of those artifacts. Editors of small literary magazines (generally referred to as little magazines) make ideal subjects for this study. Creative values play an important role in their decisions, and they frequently publish poetry, fiction, and other work that stand in opposition to popular culture and literature. This study proposed and evaluated a conceptual framework for anticipating whether editors of little magazines will use online technologies to reinforce or alter the values characteristic of their genre. The study found that the values posited in the conceptual framework fit the goals expressed by little magazine editors. Not all editors held those values equally, however. These findings suggest that producers in creative genres can use online technology in ways that actually reflect an intensification of those values. The concept of intensifying use of technology (IUT) was posited to explain the differences.
This study aimed to develop a model for predicting the impact of information access using Web searches, on human decision making. Models were constructed using a database of search behaviors and decisions of 75 clinicians, who answered questions about eight scenarios within 80 minutes in a controlled setting at a university computer laboratory. Bayesian models were developed with and without bias factors to account for anchoring, primacy, recency, exposure, and reinforcement decision biases. Prior probabilities were estimated from the population prior, from a personal prior calculated from presearch answers and confidence ratings provided by the participants, from an overall measure of willingness to switch belief before and after searching, and from a willingness to switch belief calculated in each individual scenario. The optimal Bayes model predicted user answers in 73.3% (95% CI: 68.71 to 77.35%) of cases, and incorporated participants' willingness to switch belief before and after searching for each scenario, as well as the decision biases they encounter during the search journey. In most cases, it is possible to predict the impact of a sequence of documents retrieved by a Web search engine on a decision task without reference to the content or structure of the documents, but relying solely on a simple Bayesian model of belief revision.
The authors report on an experimental study on the differences between spoken and written queries. A set of written and spontaneous spoken queries are generated by users from written topics. These two sets of queries are compared in qualitative terms and in terms of their retrieval effectiveness. Written and spoken queries are compared in terms of length, duration, and part of speech. In addition, assuming perfect transcription of the spoken queries, written and spoken queries are compared in terms of their aptitude to describe relevant documents. The retrieval effectiveness of spoken and written queries is compared using three different information retrieval models. The results show that using speech to formulate one's information need provides a way to express it more naturally and encourages the formulation of longer queries. Despite that, longer spoken queries do not seem to significantly improve retrieval effectiveness compared with written queries.
The authors propose a method for automatically generating Japanese-English bilingual thesauri based on bilingual corpora. The term bilingual thesaurus refers to a set of bilingual equivalent words and their synonyms. Most of the methods proposed so far for extracting bilingual equivalent word clusters from bilingual corpora depend heavily on word frequency and are not effective for dealing with low-frequency clusters. These low-frequency bilingual clusters are worth extracting because they contain many newly coined terms that are in demand but are not listed in existing bilingual thesauri. Assuming that single language-pair-independent methods such as frequency-based ones have reached their limitations and that a language-pair-dependent method used in combination with other methods shows promise, the authors propose the following approach: (a) Extract translation pairs based on transliteration patterns; (b) remove the pairs from among the candidate words; (c) extract translation pairs based on word frequency from the remaining candidate words; and (d) generate bilingual clusters based on the extracted pairs using a graph-theoretic method. The proposed method has been found to be significantly more effective than other methods.
The application of thesauri in networked environments is seriously hampered by the challenges of introducing new concepts and terminology into the formal controlled vocabulary, which is critical for enhancing its retrieval capability. The author describes an automated process of adding new terms to thesauri as entry vocabulary by analyzing the association between words/phrases extracted from bibliographic titles and subject descriptors in the metadata record (subject descriptors are terms assigned from controlled vocabularies of thesauri to describe the subjects of the objects [e.g., books, articles] represented by the metadata records). The investigated approach uses a corpus of metadata for scientific and technical (S&T) publications in which the titles contain substantive words for key topics. The three steps of the method are (a) extracting words and phrases from the title field of the metadata; (b) applying a method to identify and select the specific and meaningful keywords based on the associated controlled vocabulary terms from the thesaurus used to catalog the objects; and (c) inserting selected keywords into the thesaurus as new terms (most of them are in hierarchical relationships with the existing concepts), thereby updating the thesaurus with new terminology that is being used in the literature. The effectiveness of the method was demonstrated by an experiment with the Chinese Classification Thesaurus (CCT) and bibliographic data in China Machine-Readable Cataloging Record (MARC) format (CNMARC) provided by Peking University Library. This approach is equally effective in large-scale collections and in other languages.
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retrieve search results consisting of sets of duplicate documents, whether identical duplicates or close variants. The goal of this work is to facilitate (a) investigations into the phenomenon of near duplicates and (b) algorithmic approaches to minimizing its deleterious effect on search results. Harnessing the expertise of both client-users and professional searchers, we establish principled methods to generate a test collection for identifying and handling nonidentical duplicate documents. We subsequently examine a flexible method of characterizing and comparing documents to permit the identification of near duplicates. This method has produced promising results following an extensive evaluation using a production-based test collection created by domain experts.
Advances in search technology have meant that search systems can now offer assistance to users beyond simply retrieving a set of documents. For example, search systems are now capable of inferring user interests by observing their interaction, offering suggestions about what terms could be used in a query, or reorganizing search results to make exploration of retrieved material more effective. When providing new search functionality, system designers must decide how tho new functionality should be offered to users. One major choice is between (a) offering automatic features that require little human input, but give little human control; or (b) interactive features which allow human control over how the feature is used, but often give little guidance over how the feature should be best used. This article presents a study in which we empirically investigate the issue of control by presenting an experiment in which participants were asked to interact with three experimental systems that vary the degree of control they had in creating queries, indicating which results are relevant in making search decisions. We use our findings to discuss why and how the control users want over search decisions can vary depending on the nature of the decisions and the impact of those decisions on the user's search.
The scientific network of the surfactants and related subjects has been analyzed with the CoPalRed (c) knowledge system. The actors studied have been countries, research centers and laboratories, researchers, and journals. The thematic map of the major research areas has been established. Most of the research areas, and those that have the greatest representation in terms of number of documents, are related to physics and chemistry. However, biochemistry and cell biology, medicine (pediatrics and pulmonary physiology), and. to a lesser extent, veterinary medicine and food science and technology are also noteworthy in the field of surfactants, which presents a markedly multidisciplinary profile.
How does an information user perceive a document as relevant? The literature on relevance has identified numerous factors affecting such a judgment. Taking a cognitive approach, this study focuses on the criteria users employ in making relevance judgment beyond topicality. On the basis of Grice's theory of communication, we propose a five-factor model of relevance: topicality, novelty, reliability, understandability, and scope. Data are collected from a semicontrolled survey and analyzed by following a psychometric procedure. Topicality and novelty are found to be the two essential relevance criteria. Understandability and reliability are also found to be significant, but scope is not. The theoretical and practical implications of this study are discussed.
Using content analysis, this study explored the role of the literature in the diffusion of new information; the influence of the literature on the innovation-decision process; and how the concept of tie strength can contribute to a greater understanding of the role of the literature in information transmission. Diffusion of innovations and strength of weak ties theories provided the framework that informed this research, and an illustrated medical case study, changing practices related to hormone therapy for menopausal women, provided context for the study. Findings suggest that published literature impacts the innovation-decision process and thus plays an integral role in the diffusion of medical innovation to physicians and consumers; that the view of literature as a bridging "weak tie" in a multifactor communication network allows for a more comprehensive understanding of the role of published literature in information diffusion; and that medical and lay articles are not neutral channels, they function to provide information, reinforce knowledge, and produce and shape meaning.
The Libernian-Wolf bonding number can not be considered as an acceptable measure for the internal bonding of a research group or community. This is shown by a construction where adding the same number of articles with the same number of co-authors to two existing groups (with a given number of articles with one or two collaborators) reverses the original order in these groups' bonding numbers.
Based on the impact factors of the journals recorded by JCR from 1998 to 2003, this paper established the fluctuation model for discipline development. According to the Fluctuation Strength Coefficient, then we gave analysis and evaluation of developing trends of the disciplines in recent years.
We assessed the contribution of Brazilian limnologists (freshwater ecologists) in international journals in the period 1970-2004. Brazilian contribution was low and regular in the 1970's, but increased steeply after 1980 with no signs of stabilization until the present. Articles authored by Brazilians tend to be less cited than articles authored by non-Brazilians, although this difference is reduced in co-authored articles with international researchers. Brazilian articles are not distributed homogenously among the sub-areas of Limnology, but present some biases that can be explained by intellectual legacy. Brazil has invested since the 1970's in establishing postgraduate courses in Brazil and in the last years has turned the focus to a better qualification of these courses. We believe these are the main reasons for the conspicuous development of Brazilian Limnology.
The inter-citation journal group is defined as a group of journals with inter-citation relations. In this paper, according to the 2003 JCR, an inter-citation relation matrix of 10 medical journals is established. Based on the transfer function model of the disturbed citing process, the calculation formula of journal impact factor disturbed by publication delays of certain journal in the group is deduced and a changing process of every journal's impact factor caused by the increase of each journal's average publication delay is simulated. In the inter-citation journal group, when a journal's publication delay increase, impact factors of all journals will be decreased and rankings of journals according to the impact factor may be changed. The closer a citation relation between two journals, the stronger the interaction of them and the larger the decrease of their impact factors caused by the increase of their publication delays.
In this paper we analyze the objectivity of the peer review process of research performance by research groups in the scientific and technological Valencian system, over the period 1998-2002. For that purpose, we use qualitative and quantitative indicators to assess which of them are the most important to determine a research group as excellent one, based on peer review evaluation methodology. The results show that excellence appears to be driven only by publications in SCI/SSCI and the number of sexenios, and suggest that the peer review process is not as objective as we expected.
Co-words have been considered as carriers of meaning across different domains in studies of science, technology, and society. Words and co-words, however, obtain meaning in sentences, and sentences obtain meaning in their contexts of use. At the science/society interface, words can be expected to have different meanings: the codes of communication that provide meaning to words differ on the varying sides of the interface. Furthermore, meanings and interfaces may change over time. Given this structuring of meaning across interfaces and over time, we distinguish between metaphors and diaphors as reflexive mechanisms that facilitate the translation between contexts. Our empirical focus is on three recent scientific controversies: Monarch butterflies, Frankenfoods, and stem-cell therapies. This study explores new avenues that relate the study of co-word analysis in context with the sociological quest for the analysis and processing of meaning.
The objective of the present study is twofold: (I) to show the aims and means of quantitative interpretation of bibliographic features in bibliometrics and their re-interpretation in research policy, and (2) to summarise the state-of-art in self-citation research. The authors describe three approaches to the role of author self-citations and possible conflicts arising from the different perspectives. From the bibliometric viewpoint we can conclude that that there is no reason for condemning self-citations in general or for removing them from macro or meso statistics; supplementary indicators based on self-citations are, nonetheless, useful to understand communication patterns.
In order to easily see the citation patterns of a journal or subject area it is very useful to use a graphical diagram to visualize all the connections between journals. Using data derived from the Journal Citation Reports, this study investigates the visualization of citation patterns for three Canadian journals in three different subject areas: library and information science, psychology and mathematics.
Motivations for the creation of hyperlinks to business sites were analyzed through a content analysis approach. Links to 280 North American IT companies (71 Canadian companies and 209 U.S. companies) were searched through Yahoo!. Then a random sample of 808 links was taken from the links retrieved. The content as well as the context of each link was manually examined to determine why the link was created. The country location and the type of the site where the link came from were also identified. The study found that most links were created for business purposes confirming findings from early quantitative studies that links contain useful business information. Links to competitors were extremely rare but competitors were often co-linked, suggesting that co-link analysis is the direction to pursue for information on competitive intelligence.
Many informetric data types lend themselves to ready adaptation to relational DBMS environments for storage and processing. SQL, the standard language used for constructing and querying relational databases, provides useful tools for processing informetric data. The author demonstrates the applications and some limitations of SQL for efficient organization and tabulation of raw informetric data.
Journal articles constitute the core documents for the diffusion of knowledge in the natural sciences. It has been argued that the same is not true for the social sciences and humanities where knowledge is more often disseminated in monographs that are not indexed in the journal-based databases used for bibliometric analysis. Previous studies have made only partial assessments of the role played by both serials and other types of literature. The importance of journal literature in the various scientific fields has therefore not been systematically characterized. The authors address this issue by providing a systematic measurement of the role played by journal literature in the building of knowledge in both the natural sciences and engineering and the social sciences and humanities. Using citation data from the CD-ROM versions of the Science Citation Index (SCI), Social Science Citation Index (SSCI), and Arts and Humanities Citation Index (AHCI) databases from 1981 to 2000 (Thomson ISI, Philadelphia, PA), the authors quantify the share of citations to both serials and other types of literature. Variations in time and between fields are also analyzed. The results show that journal literature is increasingly important in the natural and social sciences, but that its role in the humanities is stagnant and has even tended to diminish slightly in the 1990s. Journal literature accounts for less than 50% of the citations in several disciplines of the social sciences and humanities; hence, special care should be used when using bibliometric indicators that rely only on journal literature.
In this article we propose a distance-based classifier for categorizing Arabic text. Each category is represented as a vector of words in an m-dimensional space, and documents are classified on the basis of their closeness to feature vectors of categories. The classifier, in its learning phase, scans the set of training documents to extract features of categories that capture inherent category-specific properties; in its testing phase the classifier uses previously determined category-specific features to categorize unclassified documents. Stemming was used to reduce the dimensionality of feature vectors of documents. The accuracy of the classifier was tested by carrying out several categorization tasks on an in-house collected Arabic corpus. The results show that the proposed classifier is very accurate and robust.
Scholarly communication in arts and humanities differs from that in the sciences. Arts and humanities scholars rely primarily on monographs as a medium of publication whereas scientists consider articles that appear in scholarly journals as the single most important publication outlet. The number of journal citation studies in arts and humanities is therefore limited. In this article, we investigate the bibliometric characteristics of 507 arts and humanities journal articles written by authors affiliated with Turkish institutions and indexed in the Arts & Humanities Citation Index (A&HCI) between the years 1975-2003. Journal articles constituted more than 60% of all publications. One third of all contributions were published during the last 4 years (1999-2003) and appeared in 16 different journals. An overwhelming majority of contributions (91%) were written in English, and 83% of them had single authorship. Researchers based at Turkish universities produced 90% of all publications. Two thirds of references in publications were to monographs. The median age of all references was 12 years. Eighty percent of publications authored by Turkish arts and humanities scholars were not cited at all while the remaining 20% (or 99 publications) were cited 304 times (an average of three citations per publication). Self-citation ratio was 31%. Two thirds of the cited publications were cited for the first time within 2 years of their publications.
The author reports findings from experiments with the International Federation of Library Associations and Institutions' (IFLA) Functional Requirements for Bibliographic Records (FRBR) as applied to the domain of science fiction, Edwin A. Abbott's Flatland. A Romance of Many Dimensions in the Online Computer Library Center's (OCLC) WorldCat. The goal of the study is to gauge the characteristics of bibliographic entities under study, to examine types of relationships these entities exhibit, and to collocate bibliographic entities according to the FRBR group I hierarchy of entities identified as works, expressions, manifestations, and items. The study's findings show that by assembling bibliographic records into interrelated clusters and displaying these according to the FRBR entity-relationship model, a new navigational capability in networked digital libraries can be developed.
Fundamental forms of information, as well as the term information itself, are defined and developed for the purposes of information science/studies. Concepts of natural and represented information (taking an unconventional sense of representation), encoded and embodied information, as well as experienced, enacted, expressed, embedded, recorded, and trace information are elaborated. The utility of these terms for the discipline is illustrated with examples from the study of information-seeking behavior and of information genres. Distinctions between the information and curatorial sciences with respect to their social (and informational) objects of study are briefly outlined.
Synchronous chat reference services have emerged as viable alternatives to the traditional face-to-face (FtF) library reference encounter. Research in virtual reference service (VRS) and client-librarian behavior is just beginning with a primary focus on task issues of accuracy and efficiency. This study is among the first to apply communication theory to an exploration of relational (socioemotional) aspects of VRS. It reports results from a pilot study that analyzed 44 transcripts nominated for the LSSI Samuel Swett Green Award (Library Systems and Services, Germantown, MD) for Exemplary Virtual Reference followed by an analysis of 245 randomly selected anonymous transcripts from Maryland AskUsNow! statewide chat reference service. Transcripts underwent in-depth qualitative content analysis. Results revealed that interpersonal skills important to FtF reference success are present (although modified) in VRS. These include techniques for rapport building, compensation for lack of nonverbal cues, strategies for relationship development, evidence of deference and respect, face-saving tactics, greeting and closing rituals. Results also identified interpersonal communication dynamics present in the chat reference environment, differences in client versus librarian patterns, and compensation strategies for lack of nonverbal communication.
The use of citation counts to assess the impact of research articles is well established. However, the citation impact of an article can only be measured several years after it has been published. As research articles are increasingly accessed through the Web, the number of times an article is downloaded can be instantly recorded and counted. One would expect the number of times an article is read to be related both to the number of times it is cited and to how old the article is. The authors analyze how short-term Web usage impact predicts medium-term citation impact. The physics e-print archive - arXiv.org - is used to test this.
The authors assess the number of coauthors in articles published by authors affiliated with domestic (Croatian) and foreign (non-Croatian) institutions in the Croatian Medical Journal (CMJ) and investigate the increase in the number of coauthors after inclusion of the journal in the Current Contents (CC) bibliographic database (Thomson ISI, Philadelphia, PA) in 1999. They analyzed 761 articles published in the CMJ between 1992 and 2003, and determined the average number of authors per article, authors' country of origin, and the gross domestic product (GDP) per capita of the authors' country. The average number of authors in articles written by authors affiliated with domestic institutions was significantly larger in almost all journal sections. The increase in the number of domestic coauthors was more pronounced after inclusion of the journal in the CC database. The number of domestic coauthors published in the Clinical section increased from 4.2 +/- 2.1 to 5.1 +/- 2.3. There was also an increase in coauthors published in the Public Health section - from 3.1 +/- 1.9 to 4.1 +/- 1.9. The results of the study imply that authors' adherence to the International Committee of Medical Journal Editors (ICMJE) authorship criteria depends on the size of the scientific community and that adherence is poor among domestic authors publishing in a small, national medical journal outside of mainstream science. An increased number of coauthors in articles published by authors affiliated with domestic institutions does not necessarily imply authorship misconduct but it suggests involvement of an appreciable number of authors who made few or no substantial contributions to the research. This discounts two main purposes of scientific authorship: to confer credit and denote responsibility for performed research.
Interdisciplinary collaboration has become of particular interest as science and social science research increasingly crosses traditional boundaries, raising issues about what kinds of information and knowledge exchange occurs, and thus what to support. Research on interdisciplinarity, learning, and knowledge management suggest the benefits of collaboration are achieved when individuals pool knowledge toward a common goal. Yet, it is not sufficient to say that knowledge exchange must take place; instead, we need to ask what kinds of exchanges form the basis of collaboration in these groups. To explore this, members of three distributed, interdisciplinary teams (one science and two social science teams) were asked what they learned from the five to eight others with whom they worked most closely, and what they thought those others learned from them. Results show the exchange of factual knowledge to be only one of a number of learning exchanges that support the team. Important exchanges also include learning the process of doing something, learning about methods, engaging jointly in research, learning about technology, generating new ideas, socialization into the profession, accessing a network of contacts, and administration work. Distributions of these relations show that there is more sharing of similar than different kinds knowledge, suggesting that knowledge may flow across disciplinary boundaries along lines of practice.
Business opportunities created by the Internet economy and new business methods have triggered the development of the electronic or e-marketplace, and vice versa. To generate competitive new products/services, Internet firms need to have access to detailed technological innovations to compete. Despite the wealth of literature on e-marketplaces, research on patent analyses in e-marketplaces is scarce. The patent is a crucial indicator of the technological competitiveness of a company or a nation. This study provides a preliminary step in depicting a holistic picture of technological innovations associated with e-marketplace patents. This study analyzes patents issued during the period of 1990-2002 from major databases; hence, it provides the first empirical study on e-marketplaces and related innovations holistically. A comprehensive set of statistical patent analyses and a discussion on e-marketplaces' technology by means of a patent map analysis are presented.
Adapting the consumer behavior selectivity model to the Web environment, this paper's key contribution is the introduction of a self-concept orientation model of Web information seeking. This model, which addresses gender, effort, and information content factors, questions the commonly assumed equivalence of sex and gender by specifying the measurement of gender-related self-concept traits known as self- and other-orientation. Regression analyses identified associations between self-orientation, other-orientation, and self-reported search frequencies for content with identical subject domain (e.g., medical information, government information) and differing relevance (i.e., important to the individual personally versus important to someone close to him or her). Self- and other-orientation interacted such that when individuals were highly self-oriented, their frequency of search for both self- and other-relevant information depended on their level of other-orientation. Specifically, high-self/high-other individuals, with a comprehensive processing strategy, searched most often, whereas high-self/low-other respondents, with an effort minimization strategy, reported the lowest search frequencies. This interaction pattern was even more pronounced for other-relevant information seeking. We found no sex differences in search frequency for either self-relevant or other-relevant information.
The conceptual issues of information use are discussed by reviewing the major ideas of sense-making methodology developed by Brenda Dervin. Sense-making methodology approaches the phenomena of information use by drawing on the metaphor of gap-bridging. The nature of this metaphor is explored by utilizing the ideas of metaphor analysis suggested by Lakoff and Johnson. First, the source domain of the metaphor is characterized by utilizing the graphical illustrations of sense-making metaphors. Second, the target domain of the metaphor is analyzed by scrutinizing Dervin's key writings on information seeking and use. The metaphor of gap-bridging does not suggest a substantive conception of information use; the metaphor gives methodological and heuristic guidance to posit contextual questions as to how people interpret information to make sense of it. Specifically, these questions focus on the ways in which cognitive, affective, and other elements useful for the sense-making process are constructed and shaped to bridge the gap. Ultimately, the key question of information use studies is how people design information in context.
There are at least three possible ways that documents are distributed by relevance: informetric (power law), inverse logistic, and dichotomous. The nature of the type of distribution has implications for the construction of relevance ranking algorithms for search engines, for automated (blind) relevance feedback, for user behavior when using Web search engines, for combining of outputs of search engines for metasearch, for topic detection and tracking, and for the methodology of evaluation of information retrieval systems.
Scientific activity has been increasing in Puerto Rico in recent years, a development mirrored not only by the amount of papers published, but by the international links established for scientific co-operation. The purpose of the present study is to identify and discuss the patterns of such cooperation, along with the trends in scientific research conducted in that context at Puerto Rican institutions. The methodology includes an analysis of the main areas of research addressed, defined as the area of specialization of the journals publishing papers indexed in the Science Citation Index (CD-ROM version) from 1980 to 1999. A total of 7271 studies, appearing ill 1240 scientific journals, were selected to study the co-operation established between Puerto Rican institutions and organizations in other countries. The findings showed a high rate of international co-operation: 46.07% of the papers published were co-authored by researchers from other countries. The country accounting for the highest percentage of joint research was the USA, followed by Germany, United Kingdom, Canada and Italy. The close relationship between the Puerto Rican and US scientific systems is not unusual, inasmuch as the economic and sociopolitical bonds between them play an essential role in Puerto Rican scientific activity. The results also revealed substantial differences between the nineteen eighties and the nineties in terms, of the nature of the links established, as well as growing internationalization of scientific research conducted on the island over the twenty-year period studied.
Many authors have written about how exotic ants invaded the Atlantic islands of Madeira and negatively impacted or even completely exterminated its native ants, despite the lack of first hand observations concerning such impact. I examine how quotation error (misrepresentation of previous work) and citation copying (citing unexamined publications referred to by others) led to the origin and spread of the erroneous story of ant extinctions in Madeira. Quotation error and citation copying may be more common than most scientists realize, particularly when authors cite references that are written in languages they do not understand.
Gender inequalities are prevalent in science despite many initiatives to try to eradicate them. Given the deep-rooted and complex nature of these inequalities there is a continuing need for research into their causes and manifestations. This study analyses one aspect of web communication, hyperlinks, to explore whether they are a potential source of insights into gender differences in this important scientific communication medium. A study of links to life sciences research groups in nine European found little evidence of gender differences, except in Germany. As a consequence, it is argued that hyperlinks are not a promising source of quantitative information about gender differences in communication strategies or online visibility, at least for senior researchers or research groups.
A bibliometric analysis of the literature covering a one-year period (2003) was performed it) evaluate the number of scientific publications on sleep and its distribution among the European Union countries. 912 articles appearing in Life Sciences and Clinical Medicine journals indexed in the Institute for Scientific Information databases were downloaded. These articles were authored by EU researchers; Germany, the United Kingdom, France, and Italy rank at the top of the EU countries. The output distribution of the most productive EU countries are also presented and discussed. Despite the limitations of the methods used, the present results give all interesting snapshot of the EU publishing behavior in sleep research.
The scientific community organises its relationships into network patterns, where the nodes are individuals (scientists) and the links are acquaintance and common work, usually presented at workshops and conferences and/or published in books and scientific journals. A references review on Population Studies by Italian scientists is delivered every two years by the Demography Section of the Italian Statistical Society; the review is exhaustive for academic demographers. In this paper, the properties of the demographers' network in 1998-1999 are evaluated. with the aim of identifying factors which may influence collaborative relations among actors. The probability of cooperation between couples (dyads) of demographers is modelled, conditionally oil observed characteristics of the dyad (sex, academic position, university affiliation). Main results suggest that "closeness", defined in a wider sense and not simply as geographical proximity, plays a major role in determining actors' relationships.
Publications have been regarded as the most significant output indicating the research performance of universities. This paper uses ISI Essential Science Indicators (ESI) database to investigate the academic performance of research-oriented universities in Taiwan, adopting the bibliometric method from both quantitative and qualitative perspectives. The data cover the time span for 11 years from 1993 to 2003. The performance indicators applied in this study includes the number of papers, the number of citations, the average citations per paper, the number of highly cited papers, the number of hot papers, and the number of top papers. The research performance and the strength of those universities are revealed in this study, and it is found that National Taiwan University leads among these universities though each university still shows strengths in various specific fields.
Patent citations are extensively used as a measure of patent quality. However, counting citations does not account for the fact that citations come from patents of different qualities, and that citations are of variable qualities. We develop a citation index which takes into account the cumulative quality of the citing patents. We apply this index to the 2,139,314 utility patents granted in the U.S. between 1975 and 1999. We study the properties of this index by year and by technological category, and analyse the links between patents.
Chemistry is accepted as the central science since it encompasses the great divide between Physics and Biology with linkages to many othert disciplines. But recent emergence of other interdisciplinary sciences likes biomedicine, molecular biology, biotechnology etc. are overshadowing chemical research. Still one of the subfields of chemistry, Synthetic Organic Chemistry (SOC) retained its importance as it is a part of new drug discovery and is the basis of bulk of chemical industry. Scientometric evaluation of world's research output in Synthetic Organic Chemistry has been quantified for two periods spans 1989-1993 and 1998-2003. The global trends in publication output are mapped and a cross-country comparison of the relative activity in the subspecialty is examined. The Activity Index trend reveals that though quantitatively USA, Japan and European nations produce more publications, their Activity Index recorded a declining trend and leads to the conclusion that these nations are shifting their interest towards other emerging specialties. Asian countries, having recorded a linear increase in tile Activity Index show that synthetic organic chemistry is still their priority.
Bioinformatics is a multidisciplinary and comparatively new area of science that has made a significant impact within a short period. A systematic analysis of the rise in bioinformatics literature is, however, not available. This study analyses the growth of the scientific literature in this area as available from NCBI PubMed using standard bibliometric techniques. Bradford's law of scattering was used to identify core journals and Lotka's law employed to analyze author's productivity pattern. Study also explored publication type, language and the Country of publication. Twenty core journals were identified and the primary mode of dissemination of information was through journal articles. Authors with single publication were more predominant (73.58%) contrary to that predicted by Lotka's law. The study provides useful information to scientists wishing to undertake work in this area.
In this paper we present characteristics of the statistical correlation between the Hirsch (h-) index and several standard bibliometric indicators, as well as with the results of peer review judgment. We use the results of a large evaluation study of 147 university chemistry research groups in the Netherlands covering the work of about 700 senior researchers during the period 1991-2000. Thus, we deal with research groups rather than individual scientists, as we consider the research group as the most important work floor unit in research, particularly in the natural sciences. Furthermore, we restrict the citation period to a three-year window instead of 'life time counts' in order to focus on the impact of recent work and thus on current research performance. Results show that the h-index and our bibliometric 'crown indicator' both relate in a quite comparable way with peer judgments. But for smaller groups in fields with 'less heavy citation traffic' the crown indicator appears to be a more appropriate measure of research performance.
The discussion about how to treat author self-citations driven by policy application and quality measurement intensified in the last years. The definition introduced by Snyder and Bonzi has - in lack of any reasonable alternative - been used in bibliometric practice for science policy purposes. This method, however, does not take into account the weight of self-citing authors among coauthors of both the cited and citing papers. The objective of the present paper is to quantify the weight of self-citations with respect to co-authorship. The analysis is conducted at two levels: at the macro level, namely, for fifteen subject fields and the most active forty countries, and at the meso level, for a set of selected research institutions.
The view of documents and/or queries as random variables is gaining importance in the theory of information retrieval. We argue that traditional probabilistic models consider documents and queries as random variables, but that newer models such as language modeling and our unified model take this one step further. The additional step is called error in predictors. Such models consider that we don't observe the document and query random variables that are modeled to predict relevance probabilistically. Rather, there are additional random variables, which are the observed documents and queries. We discuss some important implications of this idea for parameter estimation, relevance prediction, and even test-collection construction. By clarifying the positions of various probabilistic models on this question, and presenting in one place many of its implications, this article aims to deepen our common understanding of the theories behind traditional probabilistic models, and to strengthen the theoretical basis for further development of more recent approaches such as language modeling.
The authors report on a series of experiments to automate the assessment of document qualities such as depth and objectivity. The primary purpose is to develop a quality-sensitive functionality, orthogonal to relevance, to select documents for an interactive question-answering system. The study consisted of two stages. In the classifier construction stage, nine document qualities deemed important by information professionals were identified and classifiers were developed to predict their values. In the confirmative evaluation stage, the performance of the developed methods was checked using a different document collection. The quality prediction methods worked well in the second stage. The results strongly suggest that the best way to predict document qualities automatically is to construct classifiers on a person-by-person basis.
In the first part of this article the author defines the n-overlap vector whose coordinates consist of the fraction of the objects (e.g., books, N-grams, etc.) that belong to 1, 2...., n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of n-overlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case n = 2). Next, the distributional form of the n-overlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The n-overlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by N-grams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in N-grams).
The study examined Web co-links to Canadian university Web sites. Multidimensional scaling (MISS) was used to analyze and visualize co-link data as was done in co-citation analysis. Co-link data were collected in ways that would reflect three different views, the global view, the French Canada view, and the English Canada view. Mapping results of the three data sets accurately reflected the ways Canadians see the universities and clearly showed the linguistic and cultural differences within Canadian society. This shows that Web co-linking is not a random phenomenon and that co-link data contain useful information for Web data mining. It is proposed that the method developed in the study can be applied to other contexts such as analyzing relationships of different organizations or countries. This kind of research is promising because of the dynamics and the diversity of the Web.
Reading and writing book reviews for learned journals plays an important part in academic life but little is known about how academics carry out these tasks. The aim of this research was to explore these activities with academics from the arts and humanities, the social sciences, and the natural sciences. An electronic questionnaire was used to ascertain (a) how often the respondents read and wrote book reviews, (b) how useful they found them, and (c) what features they thought important in book reviews. Fifty-two academics in the arts, 53 in the social sciences, and 51 in the sciences replied. There were few disciplinary differences. Most respondents reported reading between one and five book reviews a month and writing between one and two a year. There was high overall agreement between what the respondents thought were important features of book reviews, but there were also wide individual differences between them. This agreement across the disciplines supports the notion that book reviews can be seen as an academic genre with measurable features. This has implications for how they are written, and how authors might be taught to write them better. A potential checklist for authors is suggested.
The authors report the findings of a study that analyzes and compares the query logs of PsycINFO for psychology and the two history databases of ABC-Clio: Historical Abstracts and America: History and Life to establish the sociological nature of information need, searching, and seeking in history versus psychology. Two problems are addressed: (a) What level of query log analysis-by individual query terms, by co-occurrence of word pairs, or by multiword terms (MWTs)-best serves as data for categorizing the queries to these two subject-bound databases; and (b) how can the differences in the nature of the queries to history versus psychology databases aid in our understanding of user search behavior and the information needs of their respective users. The authors conclude that MWTs provide the most effective snapshot of user searching behavior for query categorization. The MWTs to ABC-Clio indicate specific instances of historical events, people, and regions, whereas the MWTs to PsycINFO indicate concepts roughly equivalent to descriptors used by PsycINFO's own classification scheme. The average length of queries is 3.16 terms for PsycINFO and 3.42 for ABC-Clio, which breaks from findings for other reference and scholarly search engine studies, bringing query length closer in line to findings for general Web search engines like Excite.
One of the main factors in all aviation accidents is human error. Therefore, the National Aeronautics and Space Administration (NASA) Aviation Safety Program (AvSP) has identified several human factors safety technologies to address this problem. Some technologies directly address human error either by attempting to reduce the occurrence of errors or by mitigating the negative consequences of errors. However, new technologies and system changes may also introduce new error opportunities or even induce different types of errors. Consequently, a thorough understanding of the relationship between error classes and technology "fixes" is crucial for the evaluation of intervention strategies outlined in the AvSP so that resources can be effectively directed to maximize the benefit to flight safety. This article summarizes efforts to map intervention technologies onto error categories and describes creation of a conceptual framework, identification of applicable taxonomies for each dimension of the framework, and construction of a usable prototype database. The framework consists of a three-dimensional matrix with axes for the human operator, the task, and the environment. Human errors and technologies cohabit molecules in the matrix linking them. The database allows for taxonomic development in all three areas pertaining to human performance by keeping the taxonomies dynamic.
It has been 61 years since the 1945 Memex article, and so much has changed since then that we might well wonder whether the article is still worth looking at. It certainly inspired some of the leading figures in information technology, but now it seems to be cited either for things it did not really say, or because everything it proposed has been pretty much accomplished, albeit with alternate technology. If we take another look at the Memex description, though, there are a few key ideas that can still be goals in terms of an easy-to-use personal collection that is a supplement to one's own memory. Perhaps in today's terms, the device would be a combination of the iPod design and a tablet computer. As such, it could function as a handy information pod, with certain Memex features, serving as an extended personal memory.
The design of a publisher's electronic interface can have a measurable effect on electronic journal usage statistics. A study of,journal usage from six COUNTER-compliant publishers at 32 research institutions in the United States, the United Kingdom, and Sweden indicates that the ratio of PDF to HTML views is not consistent across publisher interfaces, even after controlling for differences in publisher content. The number of full-text downloads may be artificially inflated when publishers require users to view HTML versions before accessing PDF versions or when linking mechanisms, such as CrossRef, direct users to the full text rather than the abstract of each article. These results suggest that usage reports from COUNTER-compliant publishers are not directly comparable in their current form. One solution may be to modify publisher numbers with "adjustment factors" deemed to be representative of the benefit or disadvantage due to its interface. Standardization of some interface and linking protocols may obviate these differences and allow for more accurate cross-publisher comparisons.
Recording evidence for data values, in addition to the values themselves, in bibliographic records and descriptive metadata has been proposed in a previous study. Recorded evidence indicates why and how data values are recorded for elements. As a continuation of that study, this article first proposes a scenario in which a cataloger and a system interact with each other in recording evidence in bibliographic records for books, with the aim of minimizing costs and effort in recording evidence. Second, it reports on prototype system development in accordance with the scenario. The system (1) searches a string, corresponding to the data value entered by a cataloger or extracted from the Machine Readable Cataloging (MARC) record, within the scanned and optical character recognition (OCR)-converted title page and verso of the title page of an item being cataloged; (2) identifies the place where the string appears within the source of information; (3) identifies the procedure being used to form the value entered or recorded; and finally (4) displays the place and procedure identified for the data value as its candidate evidence. Third, this study reports on an experiment conducted to examine the system's performance. The results of the experiment show the usefulness of the system and the validity of the proposed scenario.
Research patterns could enhance understanding of the Information Systems (IS) field. Citation analysis is the methodology commonly used to determine such research patterns. In this study, the citation methodology is applied to one of the top-ranked Information Systems conferences-international Conference on Information Systems (ICIS). Information is extracted from papers in the proceedings of ICIS 2000 to 2002. A total of 145 base articles and 4,226 citations are used. Research patterns are obtained using total citations, citations per journal or conference, and overlapping citations. We then provide the citation ranking of journals and conferences. We also examine the difference between the citation ranking in this study and the ranking of IS journals and IS conferences in other studies. Based on the comparison, we confirm that IS research is a multidisciplinary research area. We also identify the most cited papers and authors in the IS research area, and the organizations most active in producing papers in the top-rated IS conference. We discuss the findings and implications of the study.
The authors apply a new bibliometric measure, the h-index (Hirsch, 2005), to the literature of information science. Faculty rankings based on raw citation counts are compared with those based on h-counts. There is a strong positive correlation between the two sets of rankings. It is shown how the h-index can be used to express the broad impact of a scholar's research output over time in more nuanced fashion than straight citation counts.
The goal of this paper is to examine the impact of linguistic coverage of databases used by bibliometricians on the capacity to effectively benchmark the work of researchers in social sciences and humanities. We examine the strong link between bibliometrics and the Thomson Scientific's database and review the differences in the production and diffusion of knowledge in the social sciences and humanities (SSH) and the natural sciences and engineering (NSE). This leads to a re-examination of the debate on the coverage of these databases, more specifically in the SSH. The methods section explains how we have compared the coverage of Thomson Scientific databases in the NSE and SSH to the Ulrich extensive database of journals. Our results show that there is a 20 to 25% overrepresentation of English-language journals in Thomson Scientific's databases compared to the list of journals presented in Ulrich. This paper concludes that because of this bias, Thomson Scientific databases cannot be used in isolation to benchmark the output of countries in the SSH.
The present study investigated the relationship between the use of different internet applications and research productivity, controlling for other influences on the latter. The control variables included dummies for country, discipline, gender and type of organization of the respondent; as well as variables for age, recognition, the degree of society-related and career-related motivation for research, and the size of the collaboration network. Simple variance analyses and more complex negative binomial hurdle models point to a positive relationship between internet use (for personal communication, information retrieval and information dissemination) and research productivity. However, the results should be interpreted with caution as it was not possible to test the role of the internet against other pre-internet tools which fulfil the same functions. Thus instance it may not be the use of e-mail per se, but the degree of communicating with colleagues that makes a productive scientist.
A high level of citation to an author's work is, in general, a testimony to the fact that the author's work has been noted and used by his peers. High citation is seen to be correlated with other forms of recognition and rewards, and is a key indicator of research performance, among other bibliometric indicators. The Institute for Scientific Information (ISI) defines a 'highly cited researcher' (HCR) as one of 250 most cited authors of journal papers in any discipline. Citation data for 20 years (1981-1999) is used to calculate the share of HCRs for countries in 21 subject areas. We find that the US dominates in all subject areas (US share similar to 40-90%). Based on the number of highly cited researchers in a country, an index of citation excellence is proposed. We find that rank order of countries based on this index is in conformity with our general understanding of research excellence, whereas the more frequently used indicator, citations per paper, gave an unacceptable rank order due to an inherent bias toward very small countries. Additionally, a high value of the index of citation excellence was found to be associated with higher concentration of highly cited researchers in affiliating organizations.
Mapping of science and technology can be done at different levels of aggregation, using a variety of methods. In this paper, we propose a method in which title words are used as indicators for the content of a research topic, and cited references are used as the context in which words get their meaning. Research topics are represented by sets of papers that are similar in terms of these word-reference combinations. In this way we use words without neglecting differences and changes in their meanings. The method has several advantages, such as high coverage of publications. As an illustration we apply the method to produce knowledge maps of information science.
Combining webometric and social network analytic approaches, this study developed a methodology to sample and identify Web links, pages, and sites that function as small-world connectors affecting short link distances along link paths between different topical domains in an academic Web space. The data set comprised 7669 subsites harvested from 109 UK universities. A novel corona-shaped Web graph model revealed reachability structures among the investigated subsites. Shortest link path netsfunctioned as investigable small-world link structures-'mini small worlds'-generated by deliberate juxtaposition of topically dissimilar subsites. Indicative findings suggest that personal Web page authors and computer science subsites may be important small-world connectors across sites and topics in an academic Web space. Such connectors may counteract balkanization of the Web into insularities of disconnected and unreachable subpopulations.
This paper reports the results of a large scale data analysis that aims to identify the production, diffusion, and consumption of scholarly knowledge among top research institutions in the United States. A 20-year publication data set was analyzed to identify the 500 most cited research institutions and spatio-temporal changes in their inter-citation patterns. A novel approach to analyzing the dual role of institutions as producers and consumers of scholarly knowledge and to study the diffusion of knowledge among them is introduced. A geographic visualization metaphor is used to visually depict the production and consumption of knowledge. The highest producers and their consumers as well as the highest consumers and their producers are identified and mapped. Surprisingly, the introduction of the Internet does not seem to affect the distance over which scholarly knowledge diffuses as manifested by citation links. The citation linkages between institutions fall off with the distance between them, and there is a strong linear relationship between the log of the citation counts and the log of the distance. The paper concludes with a discussion of these results and future work.
We investigated committee peer review for awarding long-term fellowships to post-doctoral researchers as practiced by the Boehringer Ingelheim Fonds (B.I.F.)-a foundation for the promotion of basic research in biomedicine. Assessing the validity of selection decisions requires a generally accepted criterion for research impact. A widely used approach is to use citation counts as a proxy for the impact of scientific research. Therefore, a citation analysis for articles published previous to the applicants' approval or rejection for a B.I.F. fellowship was conducted. Based on our model estimation (negative binomial regression model), journal articles that had been published by applicants approved for a fellowship award (n = 64) prior to applying for the B.I.F. fellowship award can be expected to have 37% (straight counts of citations) and 49% (complete counts of citations) more citations than articles that had been published by rejected applicants (n = 333). Furthermore, comparison with international scientific reference values revealed (a) that articles published by successful and non-successful applicants are cited considerably more often than the "average" publication and (b) that excellent research performance can be expected more of successful than non-successful applicants. The findings confirm that the foundation is not only achieving its goal of selecting the best junior scientists for fellowship awards, but also successfully attracting highly talented young scientists to apply for B.I.F. fellowships.
There is a well-established literature on the use of concentration measures in informetrics. However, these works have usually been devoted to measures of concentration within a productivity distribution. In a pair of recent papers the author introduced two new measures, both based on the Gini ratio, for measuring the similarity of concentration of productivity between two different informetric distributions. The first of these was derived from Dagum's notion of relative economic affluence; the second-in some ways analogous to the correlation coefficient-is completely new. The purpose of this study is to develop a purely empirical approach to comparative studies of concentration between informetric data sets using both within and between measures thereby greatly extending the original study which considered just two data sets for purposes of illustration of the methods of calculation of the measures.
Scientific meetings have become increasingly important channels for scholarly communi-cation. In several fields of applied and engineering sciences they are-according to the statements of scientists active in those fields-even more important than publishing in periodicals. One objective of this study is to analyse the weight of proceedings literature in all fields of the sciences, social sciences and humanities as well as the use of the ISI Proceedings database as additional data source for bibliometric studies. The second objective is exploring the use of a further important feature of this database, namely, of information about conference location for the analysis of bibliometrically relevant aspects of information flow such as the relative attractivity, the extent of mobility and unidirectional or mutual affinity of countries.
This article describes recent improvements in mapping the world-wide scientific literature. Existing research is extended in three ways. First, a method for generating maps directly from the data on the relationships between hundreds of thousands of documents is presented. Second, quantitative techniques for evaluating these large maps of science are introduced. Third, these techniques are applied to data in order to evaluate eight different maps. The analyses suggest that accuracy can be increased by using a modified cosine measure of relatedness. Disciplinary bias can be significantly reduced and accuracy can be further increased by using much lower threshold levels. In short, much larger samples of papers can and should be used to generate more accurate maps of science.
We define the URL citations of a Web page to be the mentions of its URL in the text of other Web pages, whether hyperlinked or not. The proportions of formal and informal scholarly motivations for creating URL citations to Library and Information Science open access journal articles were identified. Five characteristics for each source of URL citations equivalent to formal citations were manually extracted and the relationship between Web and conventional citation counts at the e-journal level was examined. Results of Google searches showed that 282 research articles published in the year 2000 in 15 peer-reviewed LIS open access journals were invoked by 3,045 URL citations. Of these URL citations, 43% were created for formal scholarly reasons equivalent to traditional citations and 18% for informal scholarly reasons. Of the sources of URL citations, 82% were in English, 88% were full text papers and 58% were non-HTML documents. Of the URL citations, 60% were text URLs only and 40% were hyperlinked. About 50% of URL citations were created within one year after the publication of the cited e-article. A slight correlation was found between average numbers of URL citations and average numbers of ISI citations for the journals in 2000. Separating out the citing HTML and non-HTML documents showed that formal scholarly communication trends on the Web were mainly influenced by text URL citations from non-HTML documents.
A basic dichotomy is generally made between publication practices in the natural sciences and engineering (NSE) on the one hand and social sciences and humanities (SSH) on the other. However, while researchers in the NSE share some common practices with researchers in SSH, the spectrum of practices is broader in the latter. Drawing on data from the CD-ROM versions of the Science Citation Index, Social Sciences Citation Index and the Arts & Humanities Citation Index from 1980 to 2002, this paper compares collaboration patterns in the SSH to those in the NSE. We show that, contrary to a widely held belief, researchers in the social sciences and the humanities do not form a homogeneous category. In fact, collaborative activities of researchers in the social sciences are more comparable to those of researchers in the NSE than in the humanities. Also, we see that language and geographical proximity influences the choice of collaborators in the SSH, but also in the NSE. This empirical analysis, which sheds a new light on the collaborative activities of researchers in the NSE compared to those in the SSH, may have policy implications as granting councils in these fields have a tendency to imitate programs developed for the NSE, without always taking into account the specificity of the humanities.
The rhythm of science may be compared to the rhythm of music. The R-indicator studied in this article is a complex indicator, trying to reflect part of this rhythm. The R-indicator interweaves publication and citation data over a long period. In this way R-sequences can be used to describe the evolutionary rhythm of science considered in a novel way. As an example the R-sequence of the journal Science from 1945 on is calculated.
Policy-makers in many countries emphasize the importance of non-publication output of university research. Increasingly, policies are pursued that attempt to encourage entrepreneurial activity in universities and public research institutes. Apart from generating spin-out companies, technology licensing, and collaborative research, attention is focused on patenting activities of researchers. Some analysts suggest that there is a trade-off between scholarly publication and patenting activity. This paper explores this relationship drawing on a data set of nanoscience publications and nanotechnology patents in three European countries. In particular, this study examines whether researchers who both publish and patent are more productive and more highly cited than their peers who concentrate on scholarly publication in communicating their research results. Furthermore, this study investigates the collaborative activity of inventor-authors and their position in their respective networks of scientific communication. The findings suggest that overall there seems to be no adverse relationship between publication and patenting activity, at least not in this area of science and technology. Patenting scientists appear to outperform their solely publishing, non-inventing peers in terms of publication counts and citation frequency. However, while they are considerably over-represented in the top performance class, the data indicates that inventor-authors may not occupy top positions within that group. An analysis of co-authorship links indicates that patenting authors can also play a prominent role within networks of scientific communication. The network maps also point to groups where inventor-authors occur frequently and others where this is not the case, which possibly reflects cognitive differences between sub-fields. Finally, the data indicates that inventor-authors account only for a marginal share of publishing scholars while they play a substantial role amongst inventors.
Comparing properties of citing and cited source items opens a wide variety of analytical possibilities. In a study of citations among papers in the journal Scientometrics a number of analytical themes are identified. The analysis shows: the way in which a citation graph can be decomposed into different subparts; country specific citation patterns; the effects of self-citations and domestic citations; the mapping of cited author relationships using direct citation and co-citation links; and time slicing effects on impact ranking of countries and papers.
The present study presents a semi-automatic method for parsing and filtering of noun phrases from citation contexts of concept symbols. The purpose of the method is to extract contextual, agreed upon, and pertinent noun phrases, to be used in visualization studies for naming clusters (concept groups) or concept symbols. The method is applied in a case study, which forms part of a larger dissertation work concerning the applicability of bibliometric methods for thesaurus construction. The case study is carried out within periodontology, a specialty area of dentistry. The result of the case study indicates that the method is able to identify highly important noun phrases, and that these phrases accurately describe their parent clusters. Hence, the method is able to reduce the labour intensive work of manual citation context analysis, though further refinements are still needed.
We explore the possibility of using co-citation clusters over three time periods to track the emergence and growth of research areas, and predict their near term change. Data sets are from three overlapping six-year periods: 1996-2001, 1997-2002 and 1998-2003. The methodologies of co-citation clustering, mapping, and string formation are reviewed, and a measure of cluster currency is defined as the average age of highly cited papers relative to the year span of the data set. An association is found between the currency variable in a prior period and the percentage change in cluster size and citation frequency in the following period. The conflating factor of "single-issue clusters" is discussed and dealt with using a new metric called in-group citation.
Based on the findings from earlier studies which showed that links to business Websites contain useful business information, we examined the feasibility of using Web co-link data to compare business competitive positions. We hypothesized that the number of co-links to a pair of business Websites is a measure of the similarity between the two companies. Since similar or related businesses are competing businesses, the co-link data can be used to map business competitive positions. We selected 32 telecommunications companies for the study and collected co-link data to these companies from Yahoo!. Multidimensional scaling (MDS) analysis on the co-link data correctly mapped these companies into telecommunications industry sectors. This proved our hypothesis and further confirmed the theory that links to business Websites can be objects for Web data mining. We collected data in a way that would reflect two markets, the global market and the Chinese market. Results from the two data sets revealed the competitive positions of the companies in the two markets. We propose that regular data collection and analysis based on this method can be used to monitor the business competitive environment and trigger early warnings on the change of the competitive landscape.
Both quantitative and qualitative evaluation of publications of research teams or institutes requires several scientometric indicators. In this paper a new composite indicator is introduced for the assessment of publications of research institutes working in different fields of science. The composite indicator consists of three part-indicators (Journal Paper Productivity, Relative Publication Strategy and Relative Paper Citedness). The different methods of calculating the composite index have only a slight effect on the value, whereas application of diverse weights for the individual part-indicators results in significant changes.
Many studies have analyzed "direct" partnerships in co-authorship networks. On the other hand, the global network structure, including "indirect" links between researchers, has not yet been sufficiently studied. This study analyzes researchers' activities from the viewpoints considering their roles in the global structures of co-authorship networks, and compares the co-authorship networks between the theoretical and application areas in computer science. The modified HITS algorithm is used to calculate the two types of importance of researchers in co-authorship networks, i.e., the importance as the leader and that as the follower.
The use of electronic data is steadily gaining ground in the study of the social organization of scientific and research communities, decreasing the researcher's reliance on commercial databases of bibliographic entries, patents grants and other manually constructed records of scientific works. In our work we provide a methodological innovation based on semantic technology for dealing with heterogeneity in electronic data sources. We demonstrate the use of our electronic system for data collection and aggregation through a study of the Semantic Web research community. Using methods of network analysis, we confirm the effect of Structural Holes and provide novel explanations of scientific performance based on cognitive diversity in social networks.
The paper examines the use of references by applicants and the examiners in US patent documents by R&D scientists from CSIR in India. It observes that scientists in CSIR use higher inputs of scientific information than the technical information in patenting. The examiners do make their own prior art search and add significantly to the patent and non-patent literature, which is distinctly different from the references given by the R&D scientists from CSIR. It identifies (a) the major disciplines and the sub-disciplines that contribute most of the scientific knowledge, and (b) the countries from where most references to patent literature are made. The applicants cite relatively less recent patent literature and more medium-term patent literature in comparison to citations by examiners. The paper observes that there is scope of improvement in making relevant prior art search, particularly, for patent literature by R&D scientists and in planning and organizing the information support for conducting patentable R&D in CSIR.
The aims of this paper are to summarize Canadian government programs pertaining to research and development (R&D) and R&D support programs, and to propose a method for analyzing their socio-economic impact. The programs under investigation include: Canada Research Chairs Canada Millennium Scholarship Foundation Canada Foundation for Innovation Technology Partnerships Canada (TPC) Industrial Research Assistance Program (IRAP) Natural Sciences and Engineering Research Council (NSERC) Social Sciences and Humanities Research Council (SSHRC) Canada Institutes of Health Research (CIHR) Canadian Institute of Advanced Research (CIAR) Pre-Competitive Advanced Research Networks (PRECARN) Networks of Centres of Excellence.
Building on the findings of recent ethnographic studies of scientific practice, I develop and test theory about the impact of taken-for-granted-ness on citation practice in scientific communities. Using data gathered from special issues of scientific journals I find support for the hypothesized differences in the practices of natural and social science communities. Post hoc analysis uncovers evidence of a third pattern of citation practice associated in part with engineering and technology research, and evidence that organization studies and strategic management communities tend to employ extreme versions of social science citation practices. I discuss the implications of the study for our understanding of communities of practice, for our beliefs about differences between the branches of science, and about science as a productive enterprise.
We review the knowledge base for biotechnology in South Africa in the light of government interventions aimed at establishing a biotechnology industry. We use bibliometric methods to analyse data from the ISI database on the performance of microbiology, genetics and molecular biology research over a 20-year period from 1980 to 2000. Genetics and molecular biology publications have seen a steady decline while microbiology has steadily increased its share of world publications. Although the quantity of the base is small the relative impact factor suggests that the quality of publications in these disciplines is comparable to world output. We conclude that the lack of adequate output in these disciplines poses a threat to government policies and investment aimed at increasing biotechnology commercialisation.
In this study, the top 500 world universities are classified into 21 types according to their disciplinary characteristics using clustering method. The indicators used to represent the disciplinary characteristics of an institution are the proportion of publications in six broader disciplinary areas: Arts/Humanities & Social Sciences, Natural Sciences & Mathematics, Engineering/Technology & Computer Sciences, Life Sciences, Clinical Medicine, and Interdisciplinary & Multidisciplinary Sciences. Institutions have been classified into types of having focus in a disciplinary group, having priority in a disciplinary group, having orientation in a disciplinary group, and balanced. The distribution of different types of institutions with respect to countries and ranks are analyzed.
An analysis of 16891 publications published by Indian scientists during 1993-2002 and indexed by Science Citation Index Expanded (Web of Science) indicates that the publication output in the agricultural sciences is on the decline since 1998 onwards. 'Dairy and animal sciences' followed by `veterinary sciences' constitute the largest component of the Indian agricultural research output. Agricultural universities and institutes under the aegis of Indian Council of Agricultural Research (ICAR) are the major producers of research output. Most of the papers have been published in domestic journals and in low normalized impact factor journals with a low rate of citation per paper. Most of the highly productive institutions are either agricultural universities or the institutes under the aegis of ICAR. Most of the prolific authors are from the highly productive institutions. However, only a few highly cited authors are from highly productive institutions.
The paper focuses on the top 500 foreign investment corporations (FICs) in China, by conducting data mining and system searching on the data-base of patent from the State Intellectual Property Office of the People's Republic of China (SIPO). Structure of patent applications, industrial distribution of patent applications, monopolistic tendency, technological innovation of Chinese companies and directions of foreign investment are studied.
The number h of papers with at least h citations has been proposed to evaluate individuals scientific research production. This index is robust in several ways but yet strongly dependent on the research field. We propose a complementary index h(t) =h(2)/N-a((T)), with N-a((T)) being the total number of authors in the considered h papers. A researcher with index h, has h, papers with at least ht citation if he/she had published alone. We have obtained the rank plots of h and ht for four Brazilian scientific communities. In contrast with the h-index, the ht index rank plots collapse into a single curve allowing comparison among different research areas.
This is the second part of a two-part article that presents a theoretical and an empirical model of the everyday life information needs of urban teenagers. Part 2 focuses on the derivation of the empirical model-and on its relationship to the theoretical model presented in Part 1. Part 2 also provides examples from the project data to support each of the components of the empirical model, which ties 28 information needs topics to the seven independent variables in the theoretical model. Comparison of the empirical model to the results of past youth information behavior research shows that the participants in this study tended to have the same types of information needs as previous researchers have found with more advantaged, nonminority groups of teens. This finding is significant because it suggests that teenagers have similar information needs across socioeconomic, ethnic, cultural, and geographic boundaries. Due to the exploratory nature of this study, however, additional research is necessary to confirm this possibility.
Arrowsmith, a computer-assisted process for literature-based discovery, takes as input two disjoint sets of records (A, C) from the Medline database. It produces a list of title words and phrases, B, that are common to A and C, and displays the title context in which each B-term occurs within A and within C. Subject experts then can try to find A-B and B-C title-pairs that together may suggest novel and plausible indirect A-C relationships (via B-terms) that are of particular interest in the absence of any known direct A-C relationship. The list of B-terms typically is so large that it is difficult to find the relatively few that contribute to scientifically interesting connections. The purpose of the present article is to propose and test several techniques for improving the quality of the B-list. These techniques exploit the Medical Subject Headings (MeSH) that are assigned to each input record. A MesH-based concept of literature cohesiveness is defined and plays a key role. The proposed techniques are tested on a published example of indirect connections between migraine and magnesium deficiency. The tests demonstrate how the earlier results can be replicated with a more efficient and more systematic computer-aided process.
This article investigates whether information seeking patterns can be related to discipline differences, study approaches, and personality traits. A quantitative study of 305 master's thesis students' information behavior found that their information seeking tended to be either exploratory or precise. Statistical analyses showed that inner traits seemed more influential than discipline characteristics on information behavior. Exploration or specificity was manifested in terms of both the level and scope of information students wished to retrieve and the way they searched for it.
Empirical evidence suggests that the ownership of related products that form a technology cluster is significantly better than the attributes of an innovation at predicting adoption. The treatment of technology clusters, however, has been ad hoc and study specific: Researchers often make a priori assumptions about the relationships between technologies and measure ownership using lists of functionally related technology, without any systematic reasoning. Hence, the authors set out to examine empirically the composition of technology clusters and the differences, if any, in clusters of technologies formed by adopters and nonadopters. Using the Galileo system of multidimensional scaling and the associational diffusion framework, the dissimilarities between 30 technology concepts were scored by adopters and nonadopters. Results indicate clear differences in conceptualization of clusters: Adopters tend to relate technologies based on their functional similarity; here, innovations are perceived to be complementary, and hence, adoption of one technology spurs the adoption of related technologies. On the other hand, nonadopters tend to relate technologies using a stricter ascendancy of association where the adoption of an innovation makes subsequent innovations redundant. The results question the measurement approaches and present an alternative methodology.
We propose an approach to visualizing the scientific world and its evolution by constructing minimum spanning trees (MSTs) and a two-dimensional map of scientific journals using the database of the Science Citation Index (SCI) during 1994-2001. The structures of constructed MSTs are consistent with the sorting of SCI categories. The map of science is constructed based on our MST results. Such a map shows the relation among various knowledge clusters and their citation properties. The temporal evolution of the scientific world can also be delineated in the map. In particular, this map clearly shows a linear structure of the scientific world, which contains three major domains including physical sciences, life sciences, and medical sciences. The interaction of various knowledge fields can be clearly seen from this scientific world map. This approach can be applied to various levels of knowledge domains.
Logarithmic transformation of the data has been recommended by the literature in the case of highly skewed distributions such as those commonly found in information science. The purpose of the transformation is to make the data conform to the lognormal law of error for inferential purposes. How does this transformation affect the analysis? We factor analyze and visualize the citation environment of the Journal of the American Chemical Society (JACS) before and after a logarithmic transformation. The transformation strongly reduces the variance necessary for classificatory purposes and therefore is counterproductive to the purposes of the descriptive statistics. We recommend against the logarithmic transformation when sets cannot be defined unambiguously. The intellectual organization of the sciences is reflected in the curvilinear parts of the citation distributions while negative powerlaws fit excellently to the tails of the distributions.
Author Cocitation Analysis (ACA) and Web Colink Analysis (WCA) are examined as sister techniques in the related fields of bibliometrics and webometrics. Comparisons are made between the two techniques based on their data retrieval, mapping, and interpretation procedures, using mathematics as the subject in focus. An ACA is carried out and interpreted for a group of participants (authors) involved in an Isaac Newton Institute (2000) workshop-Singularity Theory and Its Applications to Wave Propagation Theory and Dynamical Systems-and compared/contrasted with a WCA for a list of international mathematics research institute home pages on the Web. Although the practice of ACA may be used to inform a WCA, the two techniques do not share many elements in common. The most important departure between ACA and WCA exists at the interpretive stage when ACA maps become meaningful in light of citation theory, and WCA maps require interpretation based on hyperlink theory. Much of the research concerning link theory and motivations for linking is still new; therefore further studies based on colinking are needed, mainly map-based studies, to understand what makes a Web colink structure meaningful.
Current document-retrieval tools succeed in locating large numbers of documents relevant to a given query. While search results may be relevant according to the topic of the documents, it is more difficult to identify which of the relevant documents are most suitable for a particular user. Automatic genre analysis (i.e., the ability to distinguish documents according to style) would be a useful tool for identifying documents that are most suitable for a particular user. We investigate the use of machine learning for automatic genre classification. We introduce the idea of domain transfer-genre classifiers should be reusable across multiple topics-which does not arise in standard text classification. We investigate different features for building genre classifiers and their ability to transfer across multiple-topic domains. We also show how different feature-sets can be used in conjunction with each other to improve performance and reduce the number of documents that need to be labeled.
We introduce a new measure on linguistic features, called stability, which captures the extent to which a language element such as a word or a syntactic construct is replaceable by semantically equivalent elements. This measure may be perceived as quantifying the degree of available "synonymy" for a language item. We show that frequent, but unstable, features are especially useful as discriminators of an author's writing style.
Understanding and modeling human experience and emotional response when listening to music are important for better understanding of the stylistic choices in musical composition. In this work, we explore the relation of audio signal structure to human perceptual and emotional reactions. Memory, repetition, and anticipatory structure have been suggested as some of the major factors in music that might influence and possibly shape these responses. The audio analysis was conducted on two recordings of an extended contemporary musical composition by one of the authors. Signal properties were analyzed using statistical analyses of signal similarities over time and information theoretic measures of signal redundancy. They were then compared to Familiarity Rating and Emotional Force profiles, as recorded continually by listeners hearing the two versions of the piece in a live-concert setting. The analysis shows strong evidence that signal properties and human reactions are related, suggesting applications of these techniques to music understanding and music information-retrieval systems.
Architectural plans are design diagrams that describe building layout where space is planned according to design requirements. Style in architecture is generally characterized as common features appearing in a particular class of building design. This research seeks to address how to recognize architectural design style from a 2D plan diagram. We explore this question in a computational encoder-analyzer (E-A) model for 2D plans, where a characterization of 2D style is based on qualitative spatial representation and information theoretic measures. In a preliminary study of a prominent architect's plans, we demonstrate the effectiveness of our approach. We conclude by discussing practical applications of automated plan recognition and classification in design support tools.
Styles in creative works cannot adequately be represented by categories based on formal features. Instead, styles could be studied in terms of modal relationships between the features to provide a basis for definitions of structure in generative models. Modal relationships are more flexible and robust under the dynamic conditions of the artist's creative process. This article illustrates through the examples of Seljuk and Celtic patterns how these modal relationships emerge, why they are essential to detailed descriptions of style, and how they might be identified.
Universality through standardization is at the heart of scientific and medical practices. In this study we dealt with the meaning, significance, and implications of standardization through "operationalization" in psychiatric diagnostic criteria by focusing on the effects of the DSM (Diagnostic Statistical Manual) M. What does "operational" mean?* The discussion of "operationalization" in psychiatric diagnosis poses quite a challenge. Given the importance of semantics and the word networks of everyday life in forming descriptions of symptoms and reaching clinical judgments, cultural differences in these semantics inevitably have strong impacts on psychiatric diagnosis. The link between sensitivity and semantics in words enhances this effect. In spite of the difficulties in approaching operationalization in psychiatric diagnosis, several attempts have been made to standardize diagnostic criteria. Prominent examples include the DSM of the American Psychiatric Association and the ICD (International Disease Classification) of the WHO. In this paper we analyzed the effects of standardized diagnostic criteria by performing a content analysis of papers published in the Archives of General Psychiatry from 1978 to 1990. Our results clearly show changes in the research questions, research designs, methodologies, target diseases, and selections of independent and dependent variables.
Public attitude toward biotechnology- and health-related scenes in movies influences the development of the biomedical science itself and thereafter of our health- and technology-conscious society. We have developed a new quantitative indicator to evaluate positive and negative feelings toward such scenes. Thirty movies including nine biotechnology-related, twenty health-related, and one both-related movies were evaluated into 0 (0%) highly negative, 10 (33%) negative, 17 (57%) neutral, 3 (10%) positive, and 0 (0%) highly positive feeling movies. Biotechnology-related movies were negative, while health-related movies were neutral. This indicator is useful for rating the perception of biotechnology and health in movies.
In this paper we examine the role of what we call core scientists in innovation in Japanese electronics companies. Core scientists are those who have the top total scores as measured by the number of their publications and citations received. We find that even though they may not apply for a large number of patents themselves, the scientific knowledge of the core scientists may have a positive effect in stimulating patent applications by their collaborators.
The Japanese government has been attempting to reform the national research system for the past 20 years. This paper describes the structural changes of the system and its performance based on bibliometric analyses and discusses the effects of S&T policy. The investigation indicates that although Japan gradually increased its production of highly cited publications, its share of low-cited publications is much higher than the former. Detailed analyses reveal that the top eight universities account for half of the highly cited publications in the university sector, while other hundreds of universities have massively increased their low-cited publications since 1990. The development of financial and human resources for research in the 1990s enabled new actors to be involved in scientific research, but the resources were concentrated to a small number of universities, reinforcing the collaboration between these universities and others.
The paper aims to clarify the extent to which the results of scientific-oriented research conducted by corporations are reflected in their application-oriented research. Focusing on large Japanese manufacturers of electrical machinery, the paper analyses firm-level data on presentations of scientific papers that represent the results of scientific-oriented research activities, citations of scientific papers in patents, and inventions. The electrical machinery industry, a prototypical science-based industry, has been placing a growing emphasis on scientific-oriented research during the 1990's as is evident from trends in R&D expenses, scientific papers, and inventions. Regression analysis results suggest a complementary relationship between citations of basic scientific knowledge as presented in scientific papers on the one hand and acts of invention on the other hand, in the sense that a rise in citations corresponds to a rise in inventions. Moreover, the results suggest that invention efficiency (number of patent claims per unit of R&D expenditure) has been increasing during the 1990's. Furthermore, the results suggest that, given the exogenous influences on the patent system in Japan, it is necessary to include the number of patent claims when attempting to measure corporate technology development activity through the volume of patent applications. However, there was no finding of a clear relationship between the number of scientific papers and inventions. Implications of these results for corporate R&D strategy are examined.
The authors have constructed an original database of the full text of the Japanese Patent Gazette published since 1994. The database includes not only the front page but also the body text of more than 880,000 granted Japanese patents. By reading the full texts of all 1,500 patent samples, we found that some inventors cite many academic papers in addition to earlier patents in the body texts of their Japanese patents. Using manually extracted academic paper citations and patent citations as "right" answers, we fine-tuned a search algorithm that automatically retrieves cited scientific papers and patents from the entire texts of all the Japanese patents in the database. An academic paper citation in a patent text indicates that the inventor used scientific knowledge in the cited paper when he/she invented the idea codified in the citing patent. The degree of science linkage, as measured by the number of research papers cited in patent documents, is particularly strong in biotechnology. Among other types of technology, those related to photographic-sensitized material, cryptography, optical computing, and speech recognition also show strong science linkage. This suggests that the degree of dependence on scientific knowledge differs from technology to technology and therefore, different ways of university-industry collaboration are necessary for different technology fields.
In this article we present an indicator - Probabilistic Partnership Index (PPI) - for use in measuring scientific linkages. This indicator is based on the Monte-Carlo simulation which provides a standard model to each network established in collaboration between two countries. Any relationship that occurs within a (whole) network can be projected to a standard model respectively and thus PPI is useful in examining individual networks within complex exchanges. We investigate inter-sectoral cooperation between France and Japan for the period of 1981-2004, by classifying every research unit appearing in the data set by its sector. We examine international collaborative patterns, domestic collaborative patterns and multilateral relationships established within the French-Japanese cooperation. We also compare PPI with the classic collaborative linkage indexes - Jaccard Index, Salton-Ochiai Index and Probabilistic Affinity Index - in order to describe the specificity of the new indicator. Our hope is that PPI will prove to be a useful and complementary tool for the analysis of international collaboration.
The recent developments towards more systemic conceptualizations of innovation dynamics and related policies highlight the need for indicators that mirror the dynamics involved. In this contribution, we assess the role that 'non-patent references', found in patent documents, can play in this respect. After examining the occurrence of these references in the USPTO and EPO patent systems, their precise nature is delineated by means of a content analysis of two samples of nonpatent references (n=10,000). Our findings reveal that citations in patents allow developing nontrivial and robust indicators. The majority of all non-patent references are journal references, which provide ample possibilities for large-scale analyses focusing on the extent to which technological developments are situated within the vicinity of scientific knowledge. Application areas, limitations and directions for future research are discussed.
Aim: to identify the influence of the 1991-1995 war on Croatian biomedical publications with reference to the Croatian universities and medical centers in Zagreb, Split, Rijeka and Osijek and their regions. Methods: Internet provider PubMed was used to search MEDLINE database in the pre-war (1988-1990), war (1991-1995) and post-war (1996-2000) periods. Annual numbers of publications in the MEDLINE and Core Clinical Journals (Abridged Index Medicus; AIM-journals) were calculated for each center in the above mentioned periods. Our analysis included socio-economic indicators such as gross domestic product (GDP) and total employment, human resources such as the number of full-time researchers, teachers and researchers in biomedical sciences, university graduates, master and doctoral thesis. Descriptive statistics and t-test were used. Results: In the 1988-2000 period the proportion of Croatian publications in the MEDLINE database was 0.076%. The proportion of AIM-publication in the MEDLINE was 11.5%, while the proportion of Croatian AIM-publications in Croatian publications in the MEDLINE was only 0.02%. Compared to the pre-war period, Croatia increased the number of publications in the MEDLINE in the war period (p < 0.05) and post-war period (p < 0.01). In the war period GDP and other socio-economic indicators decreased in contrast to an increase in biomedical publications. All centers increased the number of MEDLINE publications significantly in the war and post-war periods (p < 0.01), while the growth of AIM-publications in Zagreb and Split was not significant. The proportion of biomedical publications in Zagreb decreased in the war and post-war periods while it was almost doubled in the other centers. Croatia increased its biomedical publication rates (per 100,000 inhabitants per year) from 3.8 (the pre-war period) to 6.6 (the war period) and 9.0 (the post-war period). In those periods biomedical publication rates were also increased in all centers with belonging regions, in spite of the war. A small number of teachers and researchers in biomedical sciences in Split and Osijek produced more publications per person in the war period than a larger number of their colleagues in other two centers. Conclusion: Croatia and its centers, Zagreb, Split, Rijeka and Osijek increased biomedical publication rates despite enormous destruction and human losses caused by the war. Despite a significant increase in the quantity of Croatian publications in the MEDLINE database, the number of AIM-publications increased only slightly.
We present a new approach to study the structure of the impact factor of academic journals. This new method is based on calculation of the fraction of citations that contribute to the impact factor of a given journal that come from citing documents in which at least one of the authors is a member of the cited journal's editorial board. We studied the structure of three annual impact factors of 54 journals included in the groups "Education and Educational Research" and "Psychology, Educational" of the Social Sciences Citation Index. The percentage of citations from papers authored by editorial board members ranged from 0% to 61%. In 12 journals, for at least one of the years analysed, 50% or more of the citations that contributed to the impact factor were from documents published in the journal itself. Given that editorial board members are considered to be among the most prestigious scientists, we suggest that citations from papers authored by editorial board members should be given particular consideration.
Modeling firms' R&D expenditures often become complicated due to the zero values reported by a significant number of firms. The maximum likelihood (ML) estimation of the Tobit model, which is usually adopted in this case, however, is not robust to heteroscedastic and/or non-normal error structure. Thus, this paper attempts to apply symmetrically trimmed least squares estimation as a semi-parametric estimation of the Tobit model in order to model firms' R&D expenditures with zero values. The result of specification test indicates the semi-parametric estimation outperforms the parametric ML estimation significantly.
Are prior self-citations an effective input in increasing a subsequent article's citation count? Examination of 418 articles in eight economics journals found that, after controlling for article length, journal and author quality, lead article position, and coauthorship, an author's prior stock of self-citations is not statistically related to a subsequent article's total citation count or the quality of the journals in which those citations appear. Self-citations that appear in prestigious high-impact economics journals have a statistically positive, but numerically small, effect on a subsequent article's total citation count and on the quality of the citing journal. The productive effect of a prior self-citation is inversely related to its age. Prior self-citations of the second author listed in a collaborative article have no significant effect on a subsequent article's total citation count or the quality of the economics journals in which those citations appear.
Following the increasing investment on basic research in China, the outputs of basic research have been greatly enhanced. In this paper, the relative efficiency of investments in basic research is analyzed by adopting statistical regressions and Data Envelopment Analysis (DEA) method. Preliminary results show that injected investment seems to be the main driving force for the increased basic research outputs in China. It is found that there were significant improvements on overall efficiency from 1991 to 1996, although this trend has noticeably slowed down since 1996. Possible causes of this slow-down are discussed.
This study is a follow-up to a published descriptive outline on the publications of Iberian-American (IA) countries in food science and technology field. The number of articles and citations attained by IA producers (Argentina, Brazil, Mexico, Portugal and Spain) were examined on 48 journals indexed in Science Citation Index (SCI) database. The growth rate in publication between 1992 and 2003 depicted differences across journals, those with high impact factor were most preferred by IA authors. Different patterns of collaboration and frequency of citations were obtained. Spain and Argentina show the greatest counts of publications and citations but present the lowest percentages of collaboration with outside authors. Instead, three out of ten papers from Portugal, Mexico and Brazil are signed by at least one foreign author. The association of publication productivity to demographic and socio-economic indicators revealed that Spain and Portugal have the highest ratios of publications or citations by human resources followed by Argentina. Argentina showed the highest ratios of publications or citations by expenditure on science and technology activities.
Using both author-level and journal-level data, Hirsch's h-index is shown to possess substantial heuristic value in that it yields accurate results whilst requiring minimal informational acquisition effort. As expected, the h-index of productive consumer scholars correlated strongly with their total citation counts. Furthermore, the h-indices as obtained via ISI/Thompson and GoogleScholar were highly correlated albeit the latter yielded higher values. Finally, using a database of business-relevant journals, a significant correlation was found between the journals' h-indices and their citation impact scores.
The h-index (or Hirsch-index) was defined by Hirsch in 2005 as the number h such that, for a general group of papers, h papers received at least h citations while the other papers received no more than h citations. This definition is extended here to the general framework of Information Production Processes (IPPs), using a source-item terminology. It is further shown that in each practical situation an IPP always has a unique h-index. In Lotkaian systems h = T-1/alpha, where T is the total number of sources and alpha is the Lotka exponent. The relation between h and the total number of items is highlighted.
The g-index is introduced as an improvement of the h-index of Hirsch to measure the global citation performance of a set of articles. If this set is ranked in decreasing order of the number of citations that they received, the g-index is the (unique) largest number such that the top g articles received (together) at least g(2) citations. We prove the unique existence of g for any set of articles and we have that g 3 h. The general Lotkaian theory of the g-index is presented and we show that g = (alpha-1/alpha-2)T-alpha-1/alpha(1/alpha) where alpha > 2 is the Lotkaian exponent and where T denotes the total number of sources. We then present the g-index of the (still active) Price medallists for their complete careers up to 1972 and compare it with the h-index. It is shown that the g-index inherits all the good properties of the h-index and, in addition, better takes into account the citation scores of the top articles. This yields a better distinction between and order of the scientists from the point of view of visibility.
The calculation of Hirsch's h-index is a detail-ignoring way, therefore, single h-index could not reflect the difference of time spans for scientists to accumulate their papers and citations. In this study the h-index sequence and the h-index matrix are constructed, which complement the absent details of single h-index, reveal different increasing manner and the increasing mechanism of the h-index, and make the scientists at different scientific age comparable.
An interesting twist of the Hirsch index is given, in terms of an index for topics and compounds. By comparing both the hb index and m for a number of compounds and topics, it can be used to differentiate between a new so-called hot topic with older topics. This quick method is shown to help new comers to identify how much interest and work has already been achieved in their chosen area of research.
We suggest that a h-type index - equal to h if you have published h papers, each of which has at least h citations - would be a useful supplement to journal impact factors.
In this short paper I propose a combination of qualitative and quantitative criteria to classify the quality, talent and creative thinking of the scientists of the "hard", medical and biological sciences. The rationale for the proposed classification is to focus on the impact and overall achievements of each individual scientist and on how he is perceived by his own community. This new method is probably more complete than any other form of traditional judgment of a scientist's achievements and reputation, and may be useful for funding agencies, editors of scientific journals, science academies, universities, and research laboratories.
Empirical evidence is given on how membership in a consolidated, well-established research team provides researchers with some competitive advantage as compared to their colleagues in non-consolidated teams. Data were obtained from a survey of researchers ascribed to the 'Biology and Biomedicine' area of the Spanish Council for Scientific Research, as well as from their curricula vitae. One quarter of the scientists work as members of teams in the process of consolidation. Our findings illustrate the importance, for the development and consolidation of research teams, of the availability of a minimum number of researchers with a permanent position and of a minimum number of support staff and non-staff personnel (mainly post-doctoral fellows). Consolidation of research teams has a clear influence on the more academic-oriented quantitative indicators of the scientific activity of individuals. Researchers belonging to consolidated teams perform quantitatively better than their colleagues in terms of the number of articles published in journals covered in the Journal Citation Reports, but not in terms of the impact of these publications. Consolidation favours publication, but not patenting, and it also has a positive effect on the academic prestige of scientists and on their capacity to train new researchers. It does not significantly foster participation in funded R&D projects, nor does it influence the establishment of international collaborations. Impact is influenced to a remarkable degree by seniority and professional background, and is significantly greater for young scientists who have spent time abroad at prestigious research laboratories.
The goal is to deepen the knowledge of both sides of the abstract/ing topic: abstracting variables and abstract attributes. Six abstracting variables (representing abstract, represented source, abstracting means, documentary goal, cognitive domain and user needs) and eight abstract attributes (representativeness, comprehensiveness, usefulness, accuracy, consistency, coherence, density and perceived quality) are proposed and weighted. While abstracting means is uncovered as the main abstracting variable, the representativeness and accuracy attributes stand out, and usefulness, comprehensiveness, consistency, coherence and density are regarded as the basic ones. The feedback of this quality model is performed by the perceived quality attribute, which depends exclusively on users.
Science has traditionally been mapped on the basis of authorship and citation data. Due to publication and citation delays such data represents the structure of science as it existed in the past. We propose to map science by proxy of journal relationships derived from usage data to determine research trends as they presently occur. This mapping is performed by applying a principal components analysis superimposed with a k-means cluster analysis on networks of journal relationships derived from a large set of article usage data collected for the Los Alamos National Laboratory research community. Results indicate that meaningful maps of the interests of a local scientific community can be derived from usage data. Subject groupings in the mappings corresponds to Thomson ISI subject categories. A comparison to maps resulting from the analysis of 2003 Thomson ISI Journal Citation Report data reveals interesting differences between the features of local usage and global citation data.
Bibliographic data on biomedical literature of Nigeria drawn from articles listed in Medline covering the period 1967-2002, and numbering 6820 were analysed to study the pattern of productivity of various author categories using Lotka's law. The total of 2184 authors who wrote the papers was divided into four different files, namely all authors, first authors, non-collaborative authors and co-authors. We hypothesized that the productivity patterns of each of the categories of authors differed from Lotka's inverse power law. The results showed that only the co-author category differed from the inverse power version of the law, while the other categories did not, although they yielded various exponents.
Research manuscripts face various time lags from initial submission to final publication in a scientific periodical. Three publishing models compete for the market. Professional publishing houses publish in print and/or online in a "reader-pays" model, or follow the open access model of "author-pays", while a number of periodicals are bound to learned societies. The present study aims to compare the three business models of publishing, with regards to publication speed. 28 topically similar biomedical journals were compared. Open access journals have a publication lag comparable to journals published by traditional publishers. Manuscript submitted to and accepted in either of these two types of periodicals are available to the reader much faster than manuscripts published in journals with strong ties to specialized learned societies.
Publication and citation profiles of Full and Associate Professors at the School of Chemistry of the Universidad de la Republica in Uruguay were investigated. The groups do not exhibit markedly different age averages. However, the average time since they started publishing, as well as other characteristics of their publication records, like productivity or citations, set them apart. From the point of view of both the number of papers per author and per year of activity, on one side, and of the number of citations per year of activity, on the other, the group of Full Professors has statistically significant larger averages than the Associate Professors. The impact of self-citations, multi-authorship and internationalization of the publications were analyzed within the two groups and shown to have no excessive or predictable influence on those parameters, except in the case of few (<= 2) or many (> 8) authors. It is suggested in this paper that these two indicators, number of papers per author per production year and number of citations per production year, combined in a plot allowing a bidimensional ranking of the individuals in the groups, may be used profitably as one of the components in the development of a policy toward promotion of Associate Professors. The analysis showed also that the quotient of citations received to number of papers published, even when derived from actual citation data of the scientists without involving the impact factors of the journals in which they publish, are not good parameters to use for that purpose, essentially because there is a reduction in the information content of the indicator with respect to those described before.
A person can die at any age. It is an omni-spoken common saying. Is it really true? Are all ages equally prone to die? Does there exist some predictable pattern that may conjecture the incidence of death? These are the questions that are attempted here in this article. Literature is replete with cohort dependant age distributions and pyramids that focus, and are adjusted, primarily for the living persons. The current article is using a cohort free group of people and focuses exclusively on age at death to rummage for some pattern in these ages. A statistical investigation is made of the life span of human beings of previous two centuries. The life span, or age, distribution is revealed to be a quadric modal in nature, refuting the prevailed myth that all ages are equally susceptible to death.
We propose a semi-automatic method based on finite-state techniques for the unification of corporate source data, with potential applications for bibliometric purposes. Bibliographic and citation databases have a well-known problem of inconsistency in the data at micro-level and meso-level, affecting the quality of bibliometric searches and the evaluation of research performance. The unification method applies parametrized finite-state graphs (P-FSG) and involves three stages: (1) breaking of corporate source data in independent units of analysis; (2) creation of binary matrices; and (3) drawing finite-state graphs. This procedure was tested on university departmental addresses, downloaded from the ISI Web of Science. Evaluation was in terms of an adaptation of the measures of precision and recall. The results demonstrate the usefulness of this approach, though it requires some human processing.
This paper attempts to highlight quantitatively the growth and development of world literature on thorium in terms of publication output as per Science Citation Index (1982-2004). During 1982-2004 a total of 3987 papers were published by the scientists in the field 'thorium'. The average number of publications published per year were 173. The highest number of papers 249 were published in 2001. The spurt in the literature output was reported during 1991-2004.There were 94 countries involved in the research in this field. USA is the top producing country with 1000 authorships (21.11%) followed by India with 498 authorships (10.51%). Authorship and collaboration trend was towards multi-authored papers. Intensive collaboration was found during 1990-2004.One paper 'Nuclear Instruments and Methods in Physics Research - A 406 (3) (1998) 411-426' had 64 collaborators. There were 586 international collaborative papers. Bilateral collaboration accounted for 80.55 percent of total collaborative papers. Bhabha Atomic Research Centre (India) topped the list with 153 authorships followed by Los Alamos National Laboratory (USA) with 105 authorships.The most preferred journals by the scientists were: Journal of Radioanalytical Nuclear Chemistry with 181 papers, Radiochimica Acta with 139 papers, Journal of Radioanalytical Nuclear Chemistry -Articles with 127 papers, Geochimica Cosmochimica Acta with 96 papers, Health Physics with 91 papers, Applied Radiation and Isotopes with 88 papers, Journal of Alloys and Compounds with 65 papers, Earth and Planetary Science letters with 59 papers and Chemical Geology, Indian Journal of Chemistry -A, Radiation Protection Dosimetry with 55 papers each. English was the most predominant language used by the scientists for communication. The high frequency keywords were: Thorium (500), Uranium (284), Separation (94), Thorium Isotopes (90), Thorium (IV) (86), Seawater (73), Solvent Extraction (70), and Rare Earth Elements (68).
We investigated the distribution of citations included in documents labeled by the ISI as "editorial material" and how they contribute to the impact factor of journals in which the citing items were published. We studied all documents classified by the ISI as "editorial material" in the Science Citation Index between 1999 and 2004 (277,231 records corresponding to editorial material published in 6141 journals). The results show that most journals published only a few documents that included 1 or 2 citations that contributed to the impact factor, although a few journals published many such documents. The data suggest that manipulation of the impact factor by publishing large amounts of editorial material with many citations to the journal itself is not a widely used strategy to increase the impact factor.
This paper focuses on the Danish Act No. 347 of 1999, which granted IPRs on inventions at public research institutions to the institutions themselves. After summarizing the situation in Denmark prior to the new law, I describe the Act's main features and then I turn my attention to the solutions adopted by Danish academia to face the opportunities and challenges posed by the new situation. Finally, using a unique dataset including all patents filed by Danish universities from 1982 to 2003, I describe university patenting activity.
The macro-level country-by-country co-authorship, cross-reference and cross-citation analysis started in our previous paper,1 continues with revealing the cross-national preference stucture of the 36 selected countries. Preference indicators of co-authorship, cross-reference and cross-citation are defined, presented and discussed. The study revealed that geopolitical location, cultural relations and language are determining factors in shaping preferences whether in co-authorship, cross-reference or cross-citation. Areas like Central Europe, Scandinavia, Latin America (supplemented with Spain and Portugal), the Far East or the Australia-New Zealand-South Africa triad form typical "clusters" with mutually strong preferences towards each other. The USA appears to have a distinguished role enjoying universal preference, which - in the cross-reference and cross-citation case - is asymmetric for the greater part of the countries under study.
We present a method for describing taxonomy evolution. We focus on the structure of epistemic communities (ECs), or groups of agents sharing common knowledge concerns. Introducing a formal framework based on Galois lattices, we categorize ECs in an automated and hierarchically structured way and propose criteria for selecting the most relevant epistemic communities - for instance, ECs gathering a certain proportion of agents and thus prototypical of major fields. This process produces a manageable, insightful taxonomy of the community. Then, the longitudinal study of these static pictures makes possible an historical description. In particular, we capture stylized facts such as field progress, decline, specialization, interaction (merging or splitting), and paradigm emergence. The detection of such patterns in epistemic networks could fruitfully be applied to other contexts.
In 1985 China began the reform of its Science & Technology (S&T) sector inherited from the planned economy. To disclose the impact of the drawn-out reform on the efficiency of the whole sector, we measure the scientific productivity of China's S&T institutes. The analysis is based on R&D input and output data at the country aggregate and provincial level. We utilize Polynomial Distributed Lag model to uncover the structure of the lag between R&D input and output. The findings reveal that the growth rate of scientific productivity of China's S&T institutes has been negative since the 1990s.
The goal of this research is to understand what characteristics, if any, lead users engaged in interactive information seeking to prefer certain sets of query terms. Underlying this work is the assumption that query terms that information seekers prefer induce a kind of cognitive efficiency: They require less mental effort to process and therefore reduce the energy required in the interactive information-seeking process. Conceptually, this work applies insights from linguistics and cognitive science to the study of query-term quality. We report on an experiment in which we compare user preference for three sets of terms; one had been preconstructed by a human indexer, and two were identified automatically. Twenty-four participants used a merged list of all terms to answer a carefully created set of questions. By design, the interface constrained users to access the text exclusively via the displayed list of query terms. We found that participants displayed a preference for the human-constructed set of terms eight times greater than the preference for either set of automatically identified terms. We speculate about reasons for this strong preference and discuss the implications for information access. The primary contributions of this research are (a) explication of the concept of user preference as a measure of query-term quality and (b) identification of a replicable procedure for measuring preference for sets of query terms created by different methods, whether human or automatic. All other factors being equal, query terms that users prefer clearly are the best choice for real-world information-access systems.
The World Wide Web has become one of our more important information sources, and commercial search engines are the major tools for locating information; however, it is not enough for a Web page to be indexed by the search engines-it also must rank high on relevant queries. One of the parameters involved in ranking is the number and quality of links pointing to the page, based on the assumption that links convey appreciation for a page. This article presents the results of a content analysis of the links to two top pages retrieved by Google for the query "jew" as of July 2004: the "jew" entry on the free online encyclopedia Wikipedia, and the home page of "Jew Watch," a highly anti-Semitic site. The top results for the query "jew" gained public attention in April 2004, when it was noticed that the "Jew Watch" homepage ranked number 1. From this point on, both sides engaged in "Googlebombing" (i.e., increasing the number of links pointing to these pages). The results of the study show that most of the links to these pages come from blogs and discussion links, and the number of links pointing to these pages in appreciation of their content is extremely small. These findings have implications for ranking algorithms based on link counts, and emphasize the huge difference between Web links and citations in the scientific community.
Web pages from a Web site can often be associated with concepts in an ontology, and pairs of Web pages also can be associated with relationships between concepts. With such associations, the Web site can be searched, browsed, or even reorganized based on the concept and relationship labels of its Web pages. In this article, we study the link chain extraction problem that is critical to the extraction of Web pages that are related. A link chain is an ordered list of anchor elements linking two Web pages related by some semantic relationship. We propose a link chain extraction method that derives extraction rules for identifying the anchor elements forming the link chains. We applied the proposed method to two well-structured Web sites and found that its performance in terms of precision and recall is good, even with a small number of training examples.
Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.
Co-occurrence matrices, such as cocitation, coword, and colink matrices, have been used widely in the information sciences. However, confusion and controversy have hindered the proper statistical analysis of these data. The underlying problem, in our opinion, involved understanding the nature of various types of matrices. This article discusses the difference between a symmetrical cocitation matrix and an asymmetrical citation matrix as well as the appropriate statistical techniques that can be applied to each of these matrices, respectively. Similarity measures (such as the Pearson correlation coefficient or the cosine) should not be applied to the symmetrical cocitation matrix but can be applied to the asymmetrical citation matrix to derive the proximity matrix. The argument is illustrated with examples. The study then extends the application of co-occurrence matrices to the Web environment, in which the nature of the available data and thus data collection methods are different from those of traditional databases such as the Science Citation Index. A set of data collected with the Google Scholar search engine is analyzed by using both the traditional methods of multivariate analysis and the new visualization software Pajek, which is based on social network analysis and graph theory.
The Open Video Digital Library (OVDL) provides digital video files to the education and research community and is distinguished by an innovative user interface that offers multiple kinds of visual surrogates to people searching for video content. The OVDL is used by several thousand people around the world each month and part of this success is due to its user interface. This article examines the interplay between research and practice in the development of this particular digital library with an eye toward lessons for all digital libraries. We argue that theoretical and research goals blur into practical goals and practical goals raise new research questions as research and development progress - this process is akin to walking along a Mobius strip in which a locally two-sided surface is actually part of a globally one-sided world. We consider the gulf between the theories that guide current digital library research and current practice in operational digital libraries, provide a developmental history of the OVDL and the research frameworks that drove its development, illustrate how user studies informed its implementation and revision, and conclude with reflections and recommendations on the interplay between research and practice.
Broad issue scanning is the task of identifying important public debates arising in a given broad issue; really simple syndication (RSS) feeds are a natural information source for investigating broad issues. RSS, as originally conceived, is a method for publishing timely and concise information on the Internet, for example, about the main stories in a news site or the latest postings in a blog. RSS feeds are potentially a nonintrusive source of high-quality data about public opinion: Monitoring a large number may allow quantitative methods to extract information relevant to a given need. In this article we describe an RSS feed-based coword frequency method to identify bursts of discussion relevant to a given broad issue. A case study of public science concerns is used to demonstrate the method and assess the suitability of raw RSS feeds for broad issue scanning (i.e., without data cleansing). An attempt to identify genuine science concern debates from the corpus through investigating the top 1,000 "burst" words found only two genuine debates, however. The low success rate was mainly caused by a few pathological feeds that dominated the results and obscured any significant debates. The results point to the need to develop effective data cleansing procedures for RSS feeds, particularly if there is not a large quantity of discussion about the broad issue, and a range of potential techniques is suggested. Finally, the analysis confirmed that the time series information generated by real-time monitoring of RSS feeds could usefully illustrate the evolution of new debates relevant to a broad issue.
The authors present a model of information searching in thesaurus-enhanced search systems, intended as a reference model for system developers. The model focuses on user-system interaction and charts the specific stages of searching an indexed collection with a thesaurus. It was developed based on literature, findings from empirical studies, and analysis of existing systems. The model describes in detail the entities, processes, and decisions when interacting with a search system augmented with a thesaurus. A basic search scenario illustrates this process through the model. Graphical and textual depictions of the model are complemented by a concise matrix representation for evaluation purposes. Potential problems at different stages of the search process are discussed, together with possibilities for system developers. The aim is to set out a framework of processes, decisions, and risks involved in thesaurus-based search, within which system developers can consider potential avenues for support.
Information seeking behavior is an important form of human behavior. Past literature in information science and organizational studies has employed the cost-benefit framework to analyze seekers' information-source choice decision. Conflicting findings have been discovered with regard to the importance of source quality and source accessibility in seekers' choices. With a focus on interpersonal task information seeking, this study proposes a seeker-source-information need framework to understand the source choice decision. In this framework, task importance, as an attribute of information need, is introduced to moderate seekers' cost-benefit calculation. Our empirical study finds that in the context of interpersonal task information seeking, first, the least effort principle might not be adequate in explaining personal source choices; rather, a quality-driven perspective is more adequate, and cost factors are of much less importance. Second, the seeker-source relationship is not significant to source choices. Third, the nature of information need, especially task importance, can modify seekers' source choice decisions.
Several previous studies have measured differences in the information search success of novices and experts. However, the definitions of novices and experts have varied greatly between the studies, and so have the measures used for search success. Instead of dividing the searchers into different groups based on their expertise, we chose to model search success with task completion speed, TCS. Towards this goal, 22 participants performed three fact-finding tasks and two broader tasks in an observational user study. In our model, there were two variables related to the Web experience of the participants. Other variables included, for example, the speed of query iteration, the length of the queries, the proportion of precise queries, and the speed of evaluating result documents. Our results showed that the variables related to Web experience had expected effects on TCS. The increase in the years of Web use was related to improvement in TCS in the broader tasks, whereas the less frequent Web use was related to a decrease in TCS in the fact-finding tasks. Other variables having significant effects on TCS in either of the task types were the speed of composing queries, the average number of query terms per query, the proportion of precise queries, and the participants' own evaluation of their search skills. In addition to the statistical models, we present several qualitative findings of the participants' search strategies. These results give valuable insight into the successful strategies in Web search beyond the previous knowledge of the expert-novice differences.
The techniques of clustering and space transformation have been successfully used in the past to solve a number of pattern recognition problems. In this article, the authors propose a new approach to content-based image retrieval (CBIR) that uses (a) a newly proposed similarity-preserving space transformation method to transform the original low-level image space into a high-level vector space that enables efficient query processing, and (b) a clustering scheme that further improves the efficiency of our retrieval system. This combination is unique and the resulting system provides synergistic advantages of using both clustering and space transformation. The proposed space transformation method is shown to preserve the order of the distances in the transformed feature space. This strategy makes this approach to retrieval generic as it can be applied to object types, other than images, and feature spaces more general than metric spaces. The CBIR approach uses the inexpensive "estimated" distance in the transformed space, as opposed to the computationally inefficient "real" distance in the original space, to retrieve the desired results for a given query image. The authors also provide a theoretical analysis of the complexity of their CBIR approach when used for color-based retrieval, which shows that it is computationally more efficient than other comparable approaches. An extensive set of experiments to test the efficiency and effectiveness of the proposed approach has been performed. The results show that the approach offers superior response time (improvement of 1-2 orders of magnitude compared to retrieval approaches that either use pruning techniques like indexing, clustering, etc., or space transformation but not both) with sufficiently high retrieval accuracy.
Underutilization of Web-based subscription databases and the importance of promoting them have been recognized in previous research. To determine the factors affecting user acceptance of Web-based subscription databases, this study tests an integrated model of the antecedents and consequences of user beliefs about intended use by extending the technology acceptance model. The research employs a cross-sectional field study using a Web survey method targeting undergraduate students who have experience with Web-based subscription databases. Overall, the research model performs well in explaining user acceptance of Web-based subscription databases. The effects of the cognitive instrumental determinants of usefulness perceptions are examined. Terminology clarity and accessibility were found to be important determinants for ease of use of the databases. The results indicate that user training has no impact on either perceptions of usefulness or ease of use, and that there is a need to reexamine the effectiveness of user training in the context of Web-based subscription databases. The results suggest that user acceptance of the databases depends largely on the utility they offer. The findings also suggest that although a subjective norm does not directly affect intended use, it exerts a positive influence on user beliefs about the utility of the databases.
Context is one of the most important concepts in information seeking and retrieval research. However, the challenges of studying context are great; thus, it is more common for researchers to use context as a post hoc explanatory factor, rather than as a concept that drives inquiry. The purposes of this study were to develop a method for collecting data about information seeking context in natural online environments, and identify which aspects of context should be considered when studying online information seeking. The study is reported in two parts. In this, the first part, the background and method are presented. Results and implications of this research are presented in Part 2 (Kelly, in press). Part 1 discusses previous literature on information seeking context and behavior and situates the current work within this literature. This part further describes the naturalistic, longitudinal research design that was used to examine and measure the online information seeking contexts of users during a 14-week period. In this design, information seeking context was characterized by a user's self-identified tasks and topics, and several attributes of these, such as the length of time the user expected to work on a task and the user's familiarity with a topic. At weekly intervals, users evaluated the usefulness of the documents that they viewed, and classified these documents according to their tasks and topics. At the end of the study, users provided feedback about the study method.
Scientists engage in the discovery process more than any other user population, yet their day-to-day activities are often elusive. One activity that consumes much of a scientist's time is developing models that balance contradictory and redundant evidence. Driven by our desire to understand the information behaviors of this important user group, and the behaviors of scientific discovery in general, we conducted an observational study of academic research scientists as they resolved different experimental results reported in the biomedical, literature. This article is the first of two that reports our findings. In this article, we introduce the Collaborative Information Synthesis (CIS) model that reflects the salient information behaviors that we observed. The CIS model emerges from a rich collection of qualitative data including interviews, electronic recordings of meetings, meeting minutes, e-mail communications, and extraction worksheets. Our findings suggest that scientists provide two information constructs: a hypothesis projection and context information. They also engage in four critical tasks: retrieval, extraction, verification, and analysis. The findings also suggest that science is not an individual but rather a collaborative activity and that scientists use the results of one analysis to inform new analyses. In Part 2, we compare and contrast existing information and cognitive models that have inadvertently reported synthesis, and then provide five recommendations that will enable designers to build information systems that support the important synthesis activity.
In May 2000, the Board of Directors of the American Society for Information Science (ASIS) changed the name of the association by adding the words, "and Technology" (&T). Today this change may be considered minor, but for many involved at the time-it was a change that had purpose and meaning. A study was initiated to trace the society's transition toward a professional association more inclusive of practitioners and applied research. The study was conducted in two stages, using quantitative and qualitative methods. Stage 1 compared the research published in the conference proceedings both prior to and following the society's name change. Stage 2 built upon the assumption that a professional association is a reflection of its membership. The Stage 1 results offer both an aggregate and a comparative view of the papers accepted for and presented at the annual conference. The results suggest that as an association of researchers, the society may be moving toward a renewed focus on practical and applied information. The Stage 2 results suggest that some ASIS&T members welcome the increased focus on technology, applied research, and the topics of interest to practitioners. Others view the changes as negative, perhaps even undermining the value of ASIS. The practical implications suggest that an annual conference can be an association's most important branding opportunity. It should reflect the interests of its membership as well as attract new members. A clear vision, supported by the membership, will guide the activities of the society and, in turn, will effectively serve its members professional needs.
In this article, I draw on interview data gathered in the High Energy Physics (HEP) community to address recent problems stemming from collaborative research activity that stretches the boundaries of the traditional scientific authorship model. While authorship historically has been attributed to individuals and small groups, thereby making it relatively easy to tell who made major contributions to the work, recent collaborations have involved hundreds or thousands of individuals. Printing all of these names in the author list on articles can mean difficulties in discerning the nature or extent of individual contributions, which has significant implications for hiring and promotion procedures. This also can make collaborative research less attractive to scientists at the outset of a project. I discuss the issues that physicists are considering as they grapple with what it means to be "an author," in addition to suggesting that future work in this area draw on the emerging economics literature on "mechanism design" in considering how credit can be attributed in ways that both ensure proper attribution and induce scientists to put forth their best effort.
Ethical aspects of the employment of Web crawlers for information science research and other contexts are reviewed. The difference between legal and ethical uses of communications technologies is emphasized as well as the changing boundary between ethical and unethical conduct. A review of the potential impacts on Web site owners is used to underpin a new framework for ethical crawling, and it is argued that delicate human judgment is required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment.
This article explicates the common ground between two currently independent fields of inquiry, namely information arts and information science, and suggests a framework that could unite them as a single field of study. The article defines and clarifies the meaning of information art and presents an axiological framework that could be used to judge the value of works of information art. The axiological framework is applied to examples of works of information art to demonstrate its use. The article argues that both information arts and information science could be studied under a common framework; namely, the domain-analytic or sociocognitive approach. It also is argued that the unification of the two fields could help enhance the meaning and scope of both information science and information arts and therefore be beneficial to both fields.
After modeling expert user needs with regard to intellectual property information, we analyze and compare the main providers in this specific information area (Thomson DIALOG, Esp@cenet by the European Patent Office, Questel-Orbit, and STN International) in terms of system content and system functionality. The key question is whether the main providers are able to satisfy these expert user needs. For patent information, some special retrieval features such as chemical structure search (including Markush search), patent family references and citations search, biosequence search, and basic informetric functionality such as ranking, mapping, and visualization of information flows are realized. Considering the results of information science research the practice of patent information shows unexhauste improvement opportunities (e.g., the application of bibliographic patent coupling and co-patent-citation for mapping patents, patent assignees, and technology specialties). For trademark search, users need multiple truncated search (realized) as well as phonetic search and image retrieval (not realized yet).
A measure of formal journal utility designed to offset some of the more noteworthy limitations of the impact factor (IF)-i.e., short follow-up, citations to items in the numerator that are not included in the denominator, self-citations, and the greater citation rate of review articles-was constructed and applied to 15 crime-psychology journals. This measure, referred to as Citations Per Article (CPA), was correlated with a measure of informal journal utility defined as the frequency with which 58 first authors in the field consulted these 15 crime-psychology journals. Results indicated that the CPA, but not the IF, correlated significantly with informal utility. Two journals (Law and Human Behavior and Criminal Justice and Behavior) displayed consistently high impact across measures of formal and informal utility while several other journals (Journal of Interpersonal Violence; Psychology, Public Policy, and Law, Sexual Abuse: A Journal of Research and Treatment, and Behavioral Sciences and the Law) showed signs of moderate impact when formal and informal measures were combined.
The resource-based view suggests that organizations achieve and maintain competitive advantage through effective deployment of firm-specific resources and capabilities. Because of volatile market conditions, researchers now focus on the development of dynamic capabilities that allow firms to react and create change in these dynamic environments. Despite the growing acceptance of the dynamic capabilities perspective in information systems research, the process of how organizations develop capabilities to influence the overall process of strategy formation and implementation in a dynamic and volatile environment (e.g., the information communication technology industry) is still underexplored. To address the knowledge gap, this article draws on an in-depth case study of the capability development experience of a call center in strategic transformation from an in-house customer service department to an outsourced customer service provider. We use Montealegre's (2002) process model of capability development as our analytical framework and extend it beyond the organizational perspective to include a project-level (business unit) perspective. By adopting a dual-level analysis researchers and practitioners may obtain a more detailed and complete view of an organization's capability development, hence allaying criticism of the resource-based view as a vague and tautological concept.
Four cases, illustrated by four examples, of duplicate or highly related publications can be distinguished and are analyzed here using citation data obtained from the Science Citation Index (SCI): (1) publication by different authors in the same journal; (2) the same author(s) publishing in different journals; (3) publication by different authors in different journals; (4) the same author(s) publishing highly related papers simultaneously in the same journal, often as part of a series of papers. Example 1, illustrating case 1, is an occurrence of highly related publications in mechanistic organic chemistry. Example 2, from analytical organic chemistry, contains elements of cases 2 and 3. Example 3, dealing solely with case 3, discusses two time-delayed publications from analytical biochemistry, which were highlighted by Garfield several times in the past to show how the SCI could be utilized to avoid duplicate publication. Example 4, derived from synthetic organic chemistry (total syntheses of taxol), contains elements of cases 1, 3, and 4 and, to a lesser extent, case 2. The citation records of the highly related or duplicate publications can deviate considerably from the journal impact factors; this was observed in three of the four examples relating to cases 2, 3, and 4. The examples suggest that citation of a paper may depend significantly on the journal in which it is published. As an indicator of this dependence, the journals in which the papers used in the present examples appeared were examined. Other factors such as key words in the paper title may also play a role.
The article presents the early findings of an exploratory deep log analysis of journal usage on OhioLINK, conducted as part of the MaxData project, funded by the U.S. Institute of Museum and Library Services. OhioLINK, the original "Big Deal," provides a single digital platform of nearly 6,000 full-text journals for more than 600,000 people; for the purposes of the analysis, the raw logs were obtained from OhioLINK for the period June 2004 to December 2004. During this period approximately 1,215,000 items were viewed on campus in October 2004 and 1,894,000 items viewed off campus between June and December 2004. This article provides an analysis of the age of material that users consulted. From a methodological point of view OhioLINK offered an attractive platform to conduct age of publication usage studies because it is one of the oldest e-journal libraries and thus offered a relatively long archive and stable platform to conduct the studies. The project sought to determine whether the subject, the search approach adopted, and the type of journal item viewed (contents page, abstract, full-text article, etc.) was a factor in regard to the age of articles used.
This paper reviews the methods and findings of studies surveying inventors on nationally representative sample of patents or patent applications. These studies show that the most common inventor is a middle-aged man with a postgraduate qualification, with women representing only 0.4% to 3.5% of inventors. They demonstrate that 43% to 68% of granted patents become innovations (52% on average). Despite Such findings this body of work has only been cited 61 times in scientific journals. Thus, Surveys of inventors provide good insights into the process of coin mercial ising patents and yet are an underutilised method especially within the literature on innovation.
Four hundred and twenty-eight articles published in 12 crime-psychology journals during the 2003 calendar year were reviewed for subsequent citations in the Social Science Citation Index (SSCI). Fifteen potential predictors were reduced to nine after subjecting the 15 variables to a principal components analysis with varimax rotation. The nine predictors included author characteristics - gender, occupational affiliation (acadeinic-nonacademic), national affiliation (U.S.-other), citations per 2001-2002 first author publications - article characteristics collaboration (single author-multiple author), article length, reviews, subject matter (coffectioiis/criminology-legal/foreiisic) - and journal characteristics - journal impact. Negative binomial regression of the citations earned by these 428 journal articles in a 23 to 34 month follow-up (M = 28 months) revealed significant effects for citations per 2001-2002 first author publications, national affiliation, and review articles. These results suggest that author impact may be a more powerful predictor of citations received by a journal article than the periodical in which the article appears.
Based on the transform function model of the observed citing process, the analytical expression of the age distribution of citations is deduced, and it is theoretically proved that the peak Value of the citation distribution curve Would fall and shift backward along with increasing the average publication delay and the peak age has a direct proportion relation with the pure delay and would be prolonged along with increasing the delay or decreasing the aging rate. The influence of the average publication delay on three ISI indicators impact factor, immediacy index and cited half-life are studied; in one subject discipline, the bigger the delay, the lower the three indicators of journals. Using the sensitivity theory, sensitivity formulae of the three indicators to publication delay parameters are deduced and it is found that responses of these indicators to changes of publication delays are different according to different time constant of the aging process: The faster the aging rate of a discipline literature is, the worse the influence of publication delays on the indicators of journals in the discipline.
SciELO (Scientific Electronic Library on Line, www.scielo.bireme.br) is a program aimed at offering a core of Brazilian Scientific Journals in an open access mode at internet. This initiative has been followed by other Latin American, Caribbean and Iberian countries. Along with the development of the open accessed electronic library, a complementary scientometric/bibliometric database has been set up which permit to retrieve citation data of more than 40,000 articles. The robustness that this database has now achieved allows one to make important studies which were not possible before, using only the international Institute for Scientific Information (ISI) database.
China has made great progress in economy and science in the last two decades. Its scientific development in gastroenterology has been seldom reported. Using two authoritative bibliographic databases, Science Citation Index Expanded (SCI-E) and Medline, we analyze China's research output in gastroenterology journals from 1990 to 2004. After detailed analysis, we found that China have greatly advanced in gastroenterology research, but the growth of Chinese articles in gastroenterology journals can largely be attributed to the selection of China-based journals into international bibliographic databases.
This paper suggests an international benchmarking method of disembodied knowledge flow structure. Using patent citation as a proxy measure of disembodied knowledge flow, national knowledge network is developed. Structural equivalence measure is applied to comparing the knowledge network of Korea and Taiwan with that of USA. Static and dynamic compafison make it possible to benchmark disembodied knowledge flow structure efficiently and identify convergent and divergent industries between developing countries and USA. It is also a mesostudy that could be conducive to building a comprehensive analytical framework of national innovation system.
Analysing co-authored publications has become the standard way to measure research collaborations. At the same time bibliometric researchers have advised that co-authorship based indicators should be handled with care as a source of evidence on actual scientific collaboration. The aim of this study is to assess how well university-industry collaborations can be identified and described rising co-authorship data. This is done through a comparison of co-authorship data with industrial funding to a medical university. In total 436 companies were identified through the two methods. Our results show that one third of the companies that have provided funding to the university had not co-authored any publications with the university. Further, the funding indicator identified only 16% of the companies that had co-authored publications. Thus, both co-authorship and funding indicators provide incomplete results. We also observe a case of conflicting trends between funding and co-authorship indicators. We conclude that uncritical use of the two indicators may lead to misinterpretation of the development of collaborations and thus provide incorrect data for decision-making.
Peer reviews are highly valued in academic life, but are notoriously unreliable. A major problem is the substantial measurement error due to the idiosyncratic responses when large numbers of different assessors each evaluate only a single or a few submissions (e.g., journal articles, grants, etc.). To address this problem. the main funding body of academic research in Australia conducted a trial "reader system" in which each of a small number of senior academics read all proposals within their subdiscipline. The traditional peer review process for 1996 (2,989 proposals, 6,233 assessors) resulted in unacceptably low reliabilities comparable with those found in other research (0.475 for research project, 0.572 for researcher). For proposals from psychology and education in 1997, the new reader system resulted in substantially higher reliabilities: 0.643 and 0.881, respectively. In comparison to the traditional peer review approach, the new reader system is substantially more reliable, timely, and cost efficient - and applicable to many peer review situations.
In our analysis we have recalled the general results of recent studies on innovation according to which innovation within the manufacturing industry is a complex phenomenon which does not lend itself to description or explanation utilising simplistic analytical models. We have then taken into account clues garnered from various descriptions of the innovative behaviour of companies Utilising several indicators of how innovative they are. Our results confirm the belief that notable differences exist between the two sub-sectors into which the chemical industry is divided: pharmaceutical and basic chemicals. Regarding the policy implications of our research, the close correlation between patents and basic research expenditure suggests that the Italian Fund for Basic Research might play a useful role in promoting innovation in the chemical industry.
According to a widely used introductory chemistry text by T. E. Brown et al.,(1) chemistry is 'The Central Science'. But scientometric co-citation analyses indicate that biochemistry seems presently to be more interconnected to other sciences. On the other hand, mathematics is considered by many to permeate all sciences and hence might compete as the choice for centrality. A critical commentary and argument leads to a proposal for an alternative partially ordered hierarchical *'framework" map of sciences. This argument is supplemented by a scientometric approach based on university Course requirements for different curricula, so as to support our partially ordered map. This alternative "framework" mapping then is seen to indicate a special position for chemistry, as where significant branching begins.
Tetrachloro-dibenzo-dioxins were declared as human carcinogenic substances in 1997. Objective: to analyse the scientific production about tetrachloro-dibenzo-dioxins between 1976 and 2005. Sella Price and Bradford models were applied. Different aspects of papers were analysed. Impact factor of journals was studied. 3484 articles were found. The number of articles published each year is fitted to Solla Price model. It has been shown the scientific literature dispersion. Specialisation of some journals of Nucleus and 1(st) Bradford Zone has been shown.
This paper discusses the relationship between Journal Impact Factors and the scientific community's judgment of the quality of journals in regional science, a discipline closely related to economics and geography. The paper compares the results of a survey inquiring the quality of journals in the discipline with the impact factors of these journals for a total of five years. The comparison shows that no significant positive correlation between the impact factors and the peer judgments can be found. In many cases the correlation turns out to be negative - in some cases even significantly.
The status of an actor in a social context is commonly defined in terms of two factors: the total number of endorsements the actor receives from other actors and the prestige of the endorsing actors. These two factors indicate the distinction between popularity and expert appreciation of the actor. respectively. We refer to the former as popularity and to the latter as prestige. These notions of popularity and prestige also apply to the domain of scholarly assessment. The ISI Impact Factor (ISI IF) is defined as the mean number of citations a journal receives over a 2 year period. By merely Counting the amount of citations and disregarding the prestige of the citing journals, the ISI IF is a metric of popularity, not of prestige. We demonstrate how a weighted version of the popular PageRank algorithm can be used to obtain a metric that reflects prestige. We contrast the rankings Of journals according to their ISI IF and their Weighted PageRank, and we provide an analysis that reveals both significant overlaps and differences. Furthermore, we introduce the Y-factor which is a simple combination of both the ISI IF and the weighted PageRank, and find that the resulting journal rankings correspond well to a general understanding of journal status.
This paper estimates the long-term impact of journals aggregated in 24 different fields, using a simple logistic diffusion model, and relates the results to the current impact factor. Results show that while the current and the long-term impact factors have a high cot-relation coefficient, some fields are systematically slower-moving than others, as they often differ in the proportion of the overall impact through time that occurs in the short term.
This paper describe an approach for improving the data quality of corporate sources when databases are used for bibliometric purposes. Research management relies on bibliographic databases and citation index systems as analytical tools, yet the raw resources for bibliometric studies are plagued by a lack of consistency in fied formatting for institution data. The present contribution puts forth a Natural Language Processing (NLP)-oriented method for the identification of the structures guiding corporate data and their mapping into a standardized format. The proposed unification process is based on the definition of address patterns and the ensuing application of Enhanced Finite-State Transducers (E-FST). Our procedure was tested on address formats downloaded from the INSPEC, MEDLINE and CAB Abstracts. The results demonstrate the helpfulness of the method as long as close control of errors is exercised as far as the formats to be unified. The computational efficacy of the model is noteworthy, due to the fact that it is firmly guided by the definition of data in the application domain.
Patent citation counts represent an aspect of patent quality and knowledge flow. Especially, citation data of US patents contain most valuable pieces of the information among other patents. This paper identifies the factors affecting patent citation counts using US patents belonging to Korea Institute of Science and Technology (KIST). For patent citation count model, zero-inflated models are announced to handle the excess zero data. For explanatory factors, research team characteristics, invention-specific characteristics, and geographical domain related characteristics are suggested. As results, the size of invention and the degree of dependence upon Japanese technological domain significantly affect patent citation counts of KIST.
An evaluation of the Spanish CSIC performance in Biotechnology, as compared with those of the French CNRS and the Italian CNR, has been carried out to determine the balance between the generation of scientific knowledge and the transfer of technology. This study shows a high scientific productivity mostly in journals with moderate impact factor, a low generation of patents and an insufficient transfer of knowledge to the Spanish companies. Other indicators confirm the existence of competitive human resources in biotechnological research producing scientific knowledge of interest for the development of patents and that cooperates successfully at European level.
The purpose of this study is to analyze the hypothetical changes in the 2002 impact factor (IF) of the biomedical journals included in the Science Citation Index-Journal Citation Reports (SCI-JCR) by also taking into account cites coming from 83 non-indexed Spanish journals on different medical specialties. A further goal of the study is to identify the subject categories of the SCI-JCR with the largest increase in their IF, and to estimate the 2002 hypothetical impact factor (2002 HIF) of these 83 non-indexed Spanish journals. It is demonstrated that the inclusion of cites from a selection of non SCI-JCR-indexed Spanish medical journals in the SCI-JCR-indexed journals produces a slight increase in their 2002 IF, specially in journals edited in the USA and in the UK. More than half of the non-indexed Spanish journals has a higher 2002 HIF than that of the SCI-JCR-indexed journal with the lowest IF in the same subject category.
Chinese herbal medicine has recently become a hot research field internationally; an increasing number of pharmaceutical researchers and scientists have dedicated themselves to such research work. Based on papers in the American Journal of Chinese Medicine from 2002 to 2004, 60% of papers published in the journal were sponsored by different institutions in the authors' countries. This fact indicates that researchers receive sponsorship for their work, and sponsors should pay more attention on the control of the researchers to use financial support more efficiency. This study applied Analytic Hierarchy Process, AHP to evaluating the performance of sponsored Chinese herbal medicine research, and this method can help sponsors weight evaluation elements without having to change the system of every category of research. To explain the process and application of AHP, a Taiwanese case study is presented. The analytical results presented in this study, provide a reference for institutes supporting research on Chinese Herbal Medicine.
This paper analyses some of the methodologies and R&D and innovation indicators used to measure Regional Innovative Capacity in Spain for the period 1996-2000. The results suggest that the approaches examined are not sufficiently rigorous; they vary depending on the methodology and indicators employed. Therefore, we would suggest that the right balance between quantitative and qualitative approaches could produce a better evaluation of innovation system performance which would be more useful to policy makers and other stakeholders.
In this paper we compare the scientific research in the semiconductor-related field in China with some other major nations in Asia. It is based on the bibliometric information from SCI-Expanded database during the time period of 1995-2004. We show that China has been developing fast in semiconductor research, and become the second productive country in Asia as reflected by the publication profile. The evidences indicate a significant increasing trend in the research efforts and readership among Asian countries. Similar to the scientists in Japan and South Korea, Chinese scientists were more inclined to work in larger groups, typically 4 or more authors. The assessment of research quality is further conducted based on citation-based measures. As benchmarks, two western countries, namely USA and Germany, have been compared in the citation analysis. It is revealed that the impacts of research outputs in the Asian countries, except for Japan, have been badly incommensurate with their devoted research efforts compared with USA and Germany. Like most of other Asian countries the research results of Chinese scientists in semiconductor have a low international visibility despite their strong research efforts and increasingly large domestic readership. The application of Leimkuhler curve illustrates vividly the inequality of citation times among the compared countries. Furthermore, the Gini Indices of each country and each pair of countries are calculated which illustrates again the inequality of informetric productivities.
Motivated by concerns about the organizational and institutional conditions that foster research creativity in science, we focus on how creative research can be defined, operationalized, and empirically identified. A functional typology of research creativity is proposed encompassing theoretical, methodological and empirical developments in science. We then apply this typology through a process of creative research event identification in the fields of nanotechnology and human genetics in Europe and the United States, combining nominations made by several hundred experts with data on prize winners. Characteristics of creative research in the two respective fields are analyzed, and there is a discussion of broader insights offered by our approach.
Context. The use of citation frequency and impact factor as measures of research quality and journal prestige is being criticized. Citation frequency is augmented by self-citation and for most journals the majority of citations originate from a minority of papers. We hypothesized that citation frequency is also associated with the geographical origin of the research publication. Objective. We determined whether citations originate more frequently from institutes that are located in the same country as the authors of the cited publication than would be expected by chance. Design. We screened citations referring to 1200 cardiovascular publications in the 7 years following their publication. For the 1200 citation recipient publications we documented the country where the research originated (9 countries/regions) and the total number of received citations. For a selection of 8864 citation donor papers we registered the country/region where the citing paper originated. Results. Self-citation was common in cardiovascular journals (n = 1534, 17.8%). After exclusion of self-citation, however, the number of citations that originated from the same country as the author of the citation recipient was found to be on average 31.6% higher than would be expected by chance (p < 0.01 for all countries/regions). In absolute numbers, nation oriented citation bias was most pronounced in the USA, the country with the largest research output (p < 0.001). Conclusion. Citation frequency was significantly augmented by nation oriented citation bias. This nation oriented citation behaviour seems to mainly influence the cumulative citation number for papers originating from the countries with a larger research output.
This article calculates probabilities for the occurrence of different types of papers such as genius papers, basic papers, ordinary papers or insignificant papers. The basis of these calculations are the formulae for the cumulative n(th) citation distribution, being the cumulative distribution of times at which articles receive their n(th) (n = 1,2,3,...) citation. These formulae (proved in previous papers) are extended to allow for different aging rates of the papers. These new results are then used to define different importance classes of papers according to the different values of n, in function of time t. Examples are given in case of a classification into four parts: genius papers, basic papers, ordinary papers and (almost) insignificant papers. The fact that, in these examples, the size of each class is inversely related to the importance of the journals in this class is proved in a general mathematical context in which we have an arbitrary number of classes and where the threshold values of n in each class are defined according to the natural law of Weber-Fechner.
The aim of this study is to reveal the research growth, the distribution of research productivity and impact of genetic engineering research in Japan, Korea and Taiwan by taking patent bibliometrics approach. This study uses quantitative methods adopt from bibliometrics to analyze the patents granted to Japan, Korea and Taiwan by United States Patent and Trademark Office (USPTO) from 1991 to 2002. In addition to patent and citation count, Bradford's Law is applied to identify core assignees in genetic engineering. Patent coupling approach is taken to further analyze the patents granted to the core assignees to enclose the correlations among the core assignees. 13,055 genetic engineering patents were granted during the period of 1991 to 2002. Japan, Korea and Taiwan own 841 patents and Japan owns most of them. 270 assignees shared 841 patents and 16 core assignees are identified by the Bradford's Law. 18,490 patents were cited by the 13,055 patents and 1,146 out of the 18,490 cited patents were granted to Japan, Korea and Taiwan. The results show Japan performs best in productivity and research impact among three countries. The core assignees are also Japan based institutions and four technical clusters are identified by patent coupling.
It is suggested that h-indices themselves may form the basis of a series of h-indices at successively higher levels of aggregation. The concept of successive h-indices may usefully contribute to develop a coherent frame for multi-level assessments.
We distinguish between an internal differentiation of science and technology that focuses on instrumentalities and an external differentiation in terms of the relations of the knowledge production process to other social domains, notably governance and industry. The external contexts bring into play indicators and statistical techniques other than publications, patents, and citations. Using regression analysis, for example, one can examine the importance of knowledge and knowledge spill-over for economic development. The relations can be expected to vary among nations and regions. The field-specificity of changes is emphasized as a major driver of the research agenda. In a knowledge-based economy, institutional arrangements can be considered as support structures for cognitive developments.
By tracing the flows of patent citation to prior patents and scientific journal articles, we investigate the sources of knowledge for innovation output in Singapore, a small, highly open economy that has traditionally been significantly dependent on foreign multinational corporations (MNCs). We found that the local production of new knowledge by indigenous Singaporean firms depends disproportionately on technological knowledge produced by MNCs with operational presence in Singapore and scientific knowledge generated by foreign universities. Locally produced new knowledge by indigenous firms and local universities/public research institutes constitutes an as yet insignificant, albeit growing, source for innovation in Singapore.
The main objective of this contribution is to test whether university patents share common determinants with university publications at regional level. We build some university production functions with 1,519 patents and 180,239 publications for the 17 Spanish autonomous regions (NUTS-2) in a time span of 14 years (1988-2001). We use econometric models to estimate their determinants. Our results suggest that there is little scope for regional policy to compensate the production of patents vs. publications through different university or joint research institutional settings. On the contrary, while patents are more reactive to expenditure on R&D, publications are more responsive to the number of researchers, so the sustained promotion of both will make it compatible for regions their joint production. However, standing out in the generation of both outputs requires costly investment in various inputs.
Eight Eastern European countries joined the European Union in 2004. In this paper, bibliometric methods are used to analyse if the integration of these countries into the EU was accompanied by corresponding changes in their sectoral research profiles. In addition, the authors discuss changes in the national profiles of three accession countries and three EU15 member states during the last two decades. The results confirm that a process of European homogenisation and convergence is taking place, but also show that this process is slow and that member countries have maintained their individual peculiarities and preferences during this evolution.
The study investigated industrial interactions in science and 'applied science' departments of seven universities in India. Motivating factors and constraints perceived by university departments and the role of the government in initiating and sustaining interactions were examined. Different types of interactions with industry were exhibited in the seven selected universities. Some specific initiatives like creation of special centers to facilitate interaction with industry were observed in the majority of the selected universities. Personal contact was indicated as the major motivator in the initiation of linkages. The government had taken some important initiatives to strengthen the university-industry link. The study points to the need of developing further linkages so that they can lead to successful and mutually beneficial outcomes for both university and industry.
This paper addresses four questions: What is the extent of the collaboration between the natural sciences and engineering researchers in Canadian universities and government agencies and industry? What are the determinants of this collaboration? Which factors explain the barriers to collaboration between the university, industry and government? Are there similarities and differences between the factors that explain collaboration and the barriers to collaboration? Based on a survey of 1554 researchers funded by the Natural Sciences and Engineering Research Council of Canada (NSERC), the results of the multivariate regressions indicate that various factors explain the decision of whether or not to collaborate with industry and the government. The results also differed according to the studied fields. Overall, the results show that the variables that relate to the researcher's strategic positioning, to the set- up of strategic networks, to the costs related to the production of the transferred knowledge and transactions explain in large part the researcher's collaboration. The results of the linear regression pointed to various factors that affect collaboration with researchers: research budget, university localization, radicalness of research, degree of risk-taking culture and researcher's publications. Finally, the last part of the paper presents the results, and what they imply for future research and theory building.
This paper reports results from a survey of 208 Italian faculty members, inventors of university-owned patents, on their motivation to get involved in university patenting activities, the obstacles that they faced, and their suggestions to foster the commercialization of academic knowledge through patents. Findings show that respondents get involved in patenting activities to enhance their prestige and reputation, and look for new stimuli for their research; personal earnings do not represent a main incentive. University-level patent regulations reduce the obstacles perceived by inventors, as far as they signal universities' commitment to legitimate patenting activities. Implications for innovation policies are discussed.
This paper addresses scientists' behaviour regarding the patenting of knowledge produced in universities and other public sector research organisations (PSROs). Recent years have witnessed a rapid growth in patenting and licensing activities by PSROs. We argue that the whole process depends to a certain extent on scientists' willingness to disclose their inventions. Given this assumption, we conduct research into individual behaviour in order to understand scientists' views concerning the patenting of their research results. Data from a questionnaire survey of Portuguese researchers from nine PSROs in life sciences and biotechnology is presented and analysed and complemented with in-depth interviews. The results reveal that overall the scientists surveyed show a low propensity to become involved in patenting and licensing activities, despite the fact that the majority had no "ethical" objections to the disclosure of their inventions and the commercial exploitation of these. Perceptions about the impacts of these activities on certain fundamental aspects of knowledge production and dissemination are however divergent. This may account for the low participation levels. Furthermore, most scientists perceived the personal benefits deriving from this type of activity to be low. Similarly, the majority also believed that there are many difficulties associated with the patenting process and that they receive limited support from their organisations, which lack the proper competences and structures to assist with patenting and licensing.
In this pilot study we examine the performance of text-based profiling in recovering a set of validated inventor-author links. In a first step we match patents and publications solely based on their similarity in content. Next, we compare inventor and author names on the highest ranked matches for the occurrence of name matches. Finally, we compare these candidate matches with the names listed in a validated set of inventor-author names. Our text-based profile methodology performs significantly better than a random matching of patents and publications, suggesting that text-based profiling is a valuable complementary tool to the name searches used in previous studies.
As the commercialization of academic research has risen as a target area in many countries, the need for better empirical data collection to evaluate policy changes on this front has increasingly been recognized. This need is exemplified in the Norwegian case where legislative changes went into effect in 2003 expressly to encourage greater commercialization through patenting research results. This policy ambition faces the problem that no record of the patenting activity of academic researchers is available before 2003 when the country's "professor's privilege" was phased out. This article addresses the fundamental difficulty of how to empirically test the effect of such policy aims. It develops a methodology which can be used to reliably baseline changes in the extent and focus of academic patents. The purpose is to describe the empirical approach and results, while also providing insight into the changes in Norwegian policy on this front and their context.
Third-Stream activities have become increasingly important in the UK. However, valuing them in a meaningful way still poses a challenge to science and technology analysts and policy makers alike. This paper reviews the general literature on "patent value" and assesses the extent to which these established measures, including patent citation, patent family, renewal and litigation data, can be applied to the university context. Our study examines indicators of patent value for short and mid-term evaluation purposes, rather than indicators that suffer from long time lags. We also explore the extent to which differences in IP management practices at universities may have an impact on the validity and robustness of possible indicators. Our observations from four UK universities indicate that there are considerable differences between universities as to how they approach the IP management process, which in turn has implications for valuing patents and how they track activity in this area. In their current form, data as collected by universities are not sufficiently robust to serve as the basis for evaluation or resource allocation.
In this paper we investigate-at a country level-the relationship between the science intensity of patents and technological productivity, taking into account differences in terms of scientific productivity. The number of non patent references in patents is considered as an approximation of the science intensity of technology whereas a country's technological and scientific performance is measured in terms of productivity (i.e., number of patents and publications per capita). We use USPTO patent-data pertaining to biotechnology for 20 countries covering the time period 1992-1999. Our findings reveal mutual positive relationships between scientific and technological productivity for the respective countries involved. At the same time technological productivity is associated positively with the science intensity of patents. These results are confirmed when introducing time effects. These observations corroborate the construct validity of science intensity as a distinctive indicator and suggest its usefulness for assessing science and technology dynamics.
Innovation in medicine is a complex process that unfolds unevenly in time and space. It is characterised by radical uncertainty and emerges from innovation systems that can hardly be comprehended within geographical, technological or institutional boundaries. These systems are instead highly distributed across countries, competences and organisations. This paper explores the nature, rate and direction of the growth and transformation of medical knowledge in two specific areas of research, interventional cardiology and glaucoma. We analyse two large datasets of bibliometric information extracted from ISI and adopt an empirical network approach to try to uncover the fine structure of the relevant micro-innovation systems and the mechanisms through which these evolve along trajectories of change shaped by the search for solutions to interdependent problems.
This study explores boundary-crossing networks in fuel-cell science and technology. We use the case of Norwegian fuel cell and related hydrogen research to explore techno-science networks. Standard bibliometric and patent indicators are presented. Then we explore different types of network maps-maps based on co-authorship, co-patenting and co-activity data. Different network configurations occur for each type of map. Actors reach different levels of prominence in the different maps, but most of them are active both in science and technology. This illustrates that to appreciate fully the range of science-technology interplay, all three analyses need to be taken into account.
methods used in webometrics and scientometrics or informetrics are evident from the literature. Are there also similarities between scientometric and Web indicators of collaboration for possible use in technology policy making? Usually, the bibliometric method used to study collaboration is the investigation of co-authorships. In this paper, Web hyperlinks and Web visibility indicators are examined to establish their usefulness as indicators of collaboration and to explore whether similarities exist between Web-based structures and bibliographic structures. Three empirical studies of collaboration between institutions and individual scientists show that hyperlink structures at the Web don't reflect collaboration structures collected by bibliographic data. However Web visibility indicators of collaboration are different from hyperlinks and can be successfully used as Web indicators of collaboration.
This article reports the findings of a scientometric analysis of nanoscale research in South Africa during the period 2000-2005. The ISI databases were identified as the most appropriate information platform for the objectives of the investigation and have been interrogated for the identification of South African authors publishing in the field. The article identifies trends over time, major institutional contributors, journals in which South African authors publish their research, international collaborators and performance in comparison to four comparator countries (India, Brazil, South Korea and Australia). The major findings of the investigation are as follows: nanoscale research in South Africa is driven by individual researchers interests up to date and it is in its early stages of development; the country's nanoscale research is below what would one expect in light of its overall publication output; the country's nano-research is distributed to a number of Universities with subcritical concentration of researchers.
Nanoscience and technology (NST) is a young scientific and technological field that has generated great worldwide interest in the past two decades. Previous bibliometric analyses have unmistakably demonstrated the remarkable growth of the global NST literature. While almost all published research articles in NST are in English, increasingly a larger share of NST publications is published in the Chinese language. Perplexingly, Chinese is the only language - apart from English - that displays an ascendant trend in the NST literature. In this brief note, we explore and evaluate three arguments that could explain this phenomenon: coverage bias, language preference, and community formation.
Text mining was used to extract technical intelligence from the open source global nanotechnology and nanoscience research literature. An extensive nanotechnology/nanoscience-focused query was applied to the Science Citation Index/Social Science Citation Index (SCI/SSCI) databases. The nanotechnology/nanoscience research literature infrastructure (prolific authors, key journals/institutions/countries, most cited authors/journals/documents) was obtained using bibliometrics. A novel addition was the use of institution and country auto-correlation maps to show co-publishing networks among institutions and among countries, and the use of institution-phrase and country-phrase cross-correlation maps to show institution networks and country networks based on use of common terminology (proxy for common interests). The use of factor matrices quantified further the strength of the linkages among institutions and among countries, and validated the co-publishing networks shown graphically on the maps.
This article explores the emergence of knowledge from scientific discoveries and their effects on the structure of scientific communication. Network analysis is applied to understand this emergence institutionally as changes in the journals; semantically as changes in the codification of meaning in terms of words; and cognitively as the new knowledge becomes the emergent foundation of further developments. The discovery of fullerenes in 1985 is analyzed as the scientific discovery that triggered a process which led to research in nanotubes.
Nanotechnology has been presented in the policy discourse as an intrinsically interdisciplinary field, requiring collaborations among researchers with different backgrounds, and specific funding schemes supporting knowledge-integration activities. Early bibliometric studies supported this interdisciplinary vision (MEYER & PERSSON, 1998), but recent results suggest that nanotechnology is (yet) a mixed bag with various mono-disciplinary subfields (SCHUMMER, 2004). We have reexamined the issue at the research project level, carrying out five case studies in molecular motors, a specialty of bionanotechnology. Relying both in data from interviews and bibliometric indicators, we have developed a multidimensional analysis (SANZ-MENENDEZ et al., 2001) in order to explore the extent and types of cross-disciplinary practices in each project. We have found that there is a consistent high degree of cross-disciplinarity in the cognitive practices of research (i.e., use of references and instrumentalities) but a more erratic and narrower degree in the social dimensions (i.e., affiliation and researchers' background). This suggests that cross-disciplinarity is an eminently epistemic characteristic and that bibliometric indicators based on citations and references capture more accurately the generation of cross-disciplinary knowledge than approaches tracking co-authors' disciplinary affiliations. In the light of these findings we raise the question whether policies focusing on formal collaborations between laboratories are the most appropriate to facilitate cross-disciplinary knowledge acquisition and generation.
Activities on nanoscale research have seen a skyrocketting growth beginning during the nineties. This can be documented by the birth of no less than 16 science journals dedicated entirely to this field of science. The topics of these journals reflect the true interdisciplinary character of nanoscale research. In this paper the decision-makers on what and when appears in those journals, the gatekeepers, i.e., the editorial members of those journals and their national identity are analyzed and some conclusions are drawn on the decisional power of the countries these gatekeepers are located in. It came out that although the United States is still the leading power in the nanoscale research field, the EU is strongly catching up and due to intensive efforts in this directions by some Far East countries as China and Japan but also of India, Asia is nearing and in some cases even overtaking the big powers.
Based on bibliometric methods, this paper describes the global institutionalization of nanotechnology research from the mid-1980s to 2006. Owing to an extremely strong dynamics, the institutionalization of nanotechnology is likely to surpass those of major disciplines in only a few years. A breakdown of the relative institutionalizations strengths by the main geographical regions, countries, research sectors, disciplines, and institutional types provides a very diverse picture over the time period because of different national science policies. The results allow a critical assessment of the different science policies based on the relative institutionalizations strengths as well as the conclusion that the institutionalization process has run out of control of individual governments who once induced the development.
The Journal Citation Reports of the Science Citation Index 2004 were used to delineate a core set of nanotechnology journals and a nanotechnology-relevant set. In comparison with 2003, the core set has grown and the relevant set has decreased. This suggests a higher degree of codification in the field of nanotechnology: the field has become more focused in terms of citation practices. Using the citing patterns among journals at the aggregate level, a core group of ten nanotechnology journals in the vector space can be delineated on the criterion of betweenness centrality. National contributions to this core group of journals are evaluated for the years 2003, 2004, and 2005. Additionally, the specific class of nanotechnology patents in the database of the U. S. Patent and Trade Office (USPTO) is analyzed to determine if non-patent literature references can be used as a source for the delineation of the knowledge base in terms of scientific journals. The references are primarily to general science journals and letters, and therefore not specific enough for the purpose of delineating a journal set.
Nanotechnology patenting has grown rapidly in recent years as an increasing number of countries are getting into the global nanotechnology race. Using a refined methodology to identify and classify nanotechnology patents, this paper analyses the changing pattern of internationalization of nanotechnology patenting activities from 1976-2004. We show that the dominance of the G5 countries have declined in recent years, not only in terms of quantity, but also in terms of quality as measured by citation indicators. In addition, using a new approach to classifying the intended areas of commercial applications, we show that nanotechnology patenting initially emphasized instrumentation, but exhibited greater diversification to other application areas in recent years. Significant differences in application area specialization are also found among major nanotechnology nations. Moreover, universities are found to play a significant and increasing role in patenting, particularly in US, UK and Canada.
Nanotechnology merits having a major impact on the world economy because its applications will be used in virtually all sectors. Scientists, researchers, managers, investors and policy makers worldwide acknowledge this huge potential and have started the nano-race. The purpose of this paper is to analyse the state of the art of nanotechnology from an economic perspective, by presenting data on markets, funding, companies, patents and publications. It will also raise the question of how much of the nano-hype is founded on economic data and how much is based on wishful thinking. It focuses on a comparison between world regions, thereby concentrating on Europe and the European Union in relation to their main competitors - the United States and Japan and the emerging 'nano-powers' China and Russia.
There is general consensus that the field of nanotechnology will be very important in the future. An open question is, however, which technological approaches or paradigms will be important in the field. The paper assumes that the carbon nanotube will be a key element of an emerging technological paradigm in nanotechnology. This study employs a bibliometric method - bibliographic coupling - to identify important nanotubes-related 'leitbilder' - a concept meaning 'guiding images' that provide a basis for different professions and disciplines to work in the same direction. Until recently, bibliographic coupling has been applied rarely for purposes of research evaluation, not to mention technology foresight. Our case study seems to suggest that bibliographic coupling is particularly suitable for anticipating technological breakthroughs. Bibliographic coupling analysis of recent nanotube-related patents focused our attention to recent patents owned by Nantero Inc. Nantero's main focus is the development of NRAM - a high-density nonvolatile random access memory. The NRAM leitbild seems to be an important emerging leitbild. It connects technical opportunities and promising applications relating to the memories in devices such as cell phones, MP3 players, digital cameras, as well as applications in networking arena.
This contribution formulates a number of propositions about the emergence of novel nanoscience and nanotechnology (N&N). Seeking to complement recent work that aims to define a research agenda and draws on general insights from the innovation literature, this paper aims to synthesize knowledge from innovation-related studies of the N&N field. More specifically, it is suggested that N&N is often misconstrued as either a field of technology or an area of (broadly) converging technologies while evidence to date suggests rather that N&N be considered a set of inter-related and overlapping about not necessarily merging technologies. The role of instrumentation in connecting the various N&N fields is underlined. Finally, the question is raised whether change in N&N tends to be incremental rather than discontinuous, being the result of technological path-dependencies and lock-ins in industry-typical search regimes that are only slowly giving way to more boundary-crossing activities.
While some believe that publication and citation scores are key predictors of breakthroughs in science, others claim that people who work at the intersection of scientific communities are more likely to be familiar with selecting and synthesizing alternatives into novel ideas. This paper contributes to this controversy by presenting a longitudinal comparison of highly creative scientists with equally productive researchers. The sample of creative scientists is identified by combining information on science awards and nominations by international peers covering research accomplishments in the mid-1990s. Results suggest that it is not only the sheer quantity of publications that causes scientists to produce creative pieces of work. Rather, their ability to effectively communicate with otherwise disconnected peers and to address a broader work spectrum also enhances their chances to be widely cited and to develop novel ideas.
The past 10 years has seen an explosion of interest for the area of science and technology labelled "nanotechnology." Although at an early stage, nanotechnology is providing a space for the creation of new alliances and the forging of new ties in many actor arenas, initiated based on promises and high expectations of the fruits that could be harvested from development and investment into nanotechnology. Those trying to characterise the dynamics of emerging ties and networks within this field are faced with a number of complexities which are characteristic of the nanotechnology umbrella term, which covers many technologies, various mixes of disciplines and actors, and ongoing debates about definitions of fields and terminology. In this paper we explore an approach for capturing dynamics of emergence of a particular area of nanotechnology by investigating visions of possible futures in relation to molecular mechanical systems (molecular machines). The focus of this text is to outline an approach used to map and analyse visions in an emerging field by taking as the unit of analysis linkages made in statements in texts, and the agglomeration of linkages around certain nodes. Taking the linkage, rather than node, allows one to probe deeper into the dynamics of emergence at early stages when definitions and meanings of certain words/nodes are in flux and patterns of their use change dramatically over short periods of time. As part of a larger project on single and macromolecular machines we explore the dynamics of visions in the field of molecular machines with the eventual aim to elucidate the shaping strength of visions within nanotechnology.
This article presents a citation-based mapping exercise in the nanosciences field and a first sketch of citation transactions (a measure of cognitive dependences). Nanosciences are considered to be one of the "convergent" components shaping the future of science and technology. Recurrent questions about the structure of the field concern its diversity and multi- or inter-disciplinarity. Observations made from various points of view confirm a strong differentiation of the field, which is scattered in multiple galaxies with moderate level of exchanges. The multi-disciplinarity of themes and super-themes detected by mapping also appears moderate, most of the super-themes being based on physics and chemistry in various proportions. Structural analysis of the list of references in articles suggests that the moderate multi-disciplinarity observed at the aggregate level partly stems from an actual inter-disciplinarity at the article level.
By a new fractal/transfractal geometry of the Unified Scientometric Model, it is possible to demonstrate that science presents an oscillating or pulsing dynamic. It goes alternatively through two types of phases. Some phases are fractal, with crystalline networks, where the Matthew effect clearly manifests itself with regard to the most notable actors and those that provide the best contributions. The other phases are transfractal, with deformed, amorphous networks, in which the actors, considered mediocre, present greater capacity to restructure the network than the more renowned actors. The result after any transfractal deformation is a new crystalline fractal network. Behind this vision lies the Kuhn paradigms. As examples, the scientific fields of surfactants and autism have been analysed.
The paper analyses the citations to 1733 publications published during 1970-1999 by the Chemistry Division at Bhabha Atomic Research Centre, using Science Citation Index 1982-2003 as the source data. The extent of citations received, in terms of the number of citations per paper, yearwise break up of citations, domainwise citations, self-citations and citations by others, diachronous self-citation rate, citing authors, citing institutions, highly cited papers, the categories of citing documents, citing journals and distribution of citations among them etc. are determined. During 1982-2003 chemistry Division publications have received a total of 11041 citations. The average number of citations per year was 501.86. The average number of citations per publication was 6.37. The highest number of citations received were 877 in 2001. The citation rate was peaked during 1990-2003 as maximum 9145 (82.82%) citations were received during the period. Total self-citations were 3716 (33.66%) and citations by others were 7325 (66.34%). Mean diachronous self-citation rate was 36.16. Citation time lag was zero for 144 (15.52%) papers and one year for 350 (37.72%) papers. Single authored publications (168) have received 456 (4.13%) citations and 1565 multi-authored publications have received 10585 (95.87%) citations. The core citing authors were: J. P. Mittal (695) followed by V. K. Jain (524), H. Mohan (471), T. Mukherjee (307), R. M Iyer (253), H. Pal (251), J. V. Yakhmi (211), A. V. Sapre (174), D. K. Palit (161), N. M. Gupta (128), and S. K. Kulshrestha (116). Citation life cycles of four highly cited papers was discussed. The core journals citing Chemistry Division publications were: J. Phys. Chem.-A (436 citations), Chem. Phys. Lett. (372), J. Phys. Chem. (355), J. Chem. Phys. (353), J. Organomet. Chem. (285), J. Phys. Chem.-B (279), J. Photochem. Photobiol.-A (263), Langmuir (245), J. Am. Chem. Soc. (226), Physica-C (225), Radiat. Phys. Chem. (217), Inorg. Chem. (215) and Indian J. Chem.-A (207).
This paper examines patterns of Chinese authorship, focusing particularly on international co-authorship, in a sample of 37,526 articles from Elsevier journals published in 2004. Trends relating to potential influences such as subject, journal impact factor and article type are explored. A slightly higher proportion of articles with at least one Chinese author was observed as compared to previous studies. Articles that are a product of Chinese international collaboration account for almost 20% of the Chinese sample as a whole, a similar proportion to levels of international collaboration within the sample overall. Chinese international co-authorship is most common in the Earth & Environmental Sciences. Where China is involved in international collaboration, it is often a proactive participant: 49% of articles that are a result of Chinese international collaboration have a Chinese corresponding author. With some minor variations in subject categories, countries favoured in international co-authorship reflect world shares in publishing and factors such as geographical proximity and political links.
Bio-pharmaceutical R&D is increasingly an international affair. Research articles published in the peer-reviewed international scientific and technical journals represent quantifiable research outputs of bio-pharmaceutical firms. Large-scale systemic measurements of worldwide trends and sectoral patterns within bio-pharmaceutical science can be gauged from these articles, where coauthored research papers are assumed to reflect research cooperation and associated knowledge flows and exchanges. We focus our attention on the largest science-based multinational enterprises (MNEs), those that produce relatively large quantities of research articles. The study deals with the worldwide output of research articles that are co-produced by corporate researchers during the years 1996-2001. We employ these publications to examine structural factors characterizing research cooperation networks within industry at the level of major geographical regions (North America, Europe, Pacific-Asia), with a breakdown by within-MNE and between-MNE network linkages. The descriptive statistics on publication output and results of network analyses of co-publication linkages not only indicate regional differences, with a central role for US companies in biopharmaceutical research, but also a variety of firm-specific research cooperation networks which enabled us to develop a tentative typology of MNEs in terms of their intra- and interorganizational patterns of research cooperation linkages.
The aim of this article is to develop new patent indicators for evaluating technological innovation competitiveness between companies. A novel indicator representing an industrial's patent performance, Essential Patent Index (EPI), was developed by incorporating information on who cited these patents and when these patents were cited, based on the assumption that both contribute to meaningful quality assessment. By combining EPI and Chi's well known Technological Strength (TS) indicator, a second novel indicator Essential Technological Strength (ETS) was developed to represent the innovation competitiveness of an individual company. In this study, patent performance of three high-tech industries in Taiwan were analyzed using ETS as well as the traditional TS for comparison. Results from this analysis demonstrated that ETS provided better insights by clearly verifying the latent influence of citations, reinforcing the impact of essential patents, and aggrandizing the differences of innovation competitiveness between companies.
The objective of this study consists, firstly, of quantifying differences between Spanish universities' output (in terms of publications and citations), and secondly, analysing its determinants. The results obtained show that there are factors which have a positive influence on these indicators, such as having a third-cycle programme, with public financing obtained in competitive selection procedures, having a large number of full-time researchers or involvement in collaborations with international institutions. However, other factors which appear to have the opposite effect were also noted. These include a higher number of students per lecturer or a lower proportion of lecturers with recognised six-year periods.
The database host STN International allows for extensive citation analysis in the SCISEARCH database (Science Citation Index Expanded) and in the CAplus database (Chemical Abstracts). Along with its powerful browsing, searching and analyzing facilities, STN International also features scripts. In this paper we examine the usefulness of the script language in the automation of citation analysis in SCISEARCH and CAplus.
International collaboration is becoming an increasingly significant issue in science. During the last few years, a large number of bibliometric studies of co-authorships have been reported. Mostly, these studies have concentrated on country-to-country collaboration, revealing general patterns of interaction. In this study we analyze international collaborative patterns as indicated in the Indian publications by tracking out multi author publications as given in Science Citation Index (SCI) database. Correspondence analysis is used for analysis and interpretation of the results. According to correspondence analysis of the data set, Physics, Chemistry, Clinical medicine are the first, second and third largest subjects having international collaboration. USA, Italy, Germany, France, England are the top five countries with which India is collaborating. The data set shows an association between Physics and Italy, Switzerland, Algeria, Finland, South Korea, Russia, Netherlands contrasting an association between Biology & Biochemistry, Immunology, Ecology & Environment, Geosciences, Multidisciplinary subjects and England, Japan, Canada. It also shows an association between Agriculture and Philippines, Canada, Denmark in contrast to an association between Chemistry and Malaysia, Germany, France. An association between Clinical medicine, Astrophysics and England, Sweden, USA, New Zealand in contrast to an association between Agriculture and Canada, Philippines, Denmark is shown. An association between Engineering, Mathematics, Computer Science, Neuroscience and Singapore, Canada, USA in contrast to an association between Chemistry, Astrophysics and Malaysia, Spain is shown. This association of collaborating countries and disciplines almost tallies with the publication productivity of these countries in different disciplines.
In reference to the increasing significance of citation counting in evaluations of scientists and science institutes as well as in science historiography, it is analyzed empirically what is cited in which frequency and what types of citations in scientific texts are used. Content analyses refer to numbers of references, self-references, publication language of references cited, publication types of references cited, and type of citation within the texts. Validity of citation counting is empirically analyzed with reference to random samples of English and German journal articles as well as German textbooks, encyclopedias, and test-manuals from psychology. Results show that 25% of all citations are perfunctory, more than 50% of references are journal articles and up to 40% are books and book-chapters, 10% are self-references. Differences between publications from various psychological sub-disciplines, publication languages, and types of publication are weak. Thus, validity of evaluative citation counting is limited because at least one quarter refers to perfunctory citations exhibiting a very low information utility level and by the fact that existing citation-databases refer to journal articles only.
An analysis of 2, 765 articles published in four math journals from 1997 to 2005 indicate that articles deposited in the arXiv received 35% more citations on average than non-deposited articles (an advantage of about 1.1 citations per article), and that this difference was most pronounced for highly-cited articles. Open Access, Early View, and Quality Differential were examined as three non-exclusive postulates for explaining the citation advantage. There was little support for a universal Open Access explanation, and no empirical support for Early View. There was some inferential support for a Quality Differential brought about by more highly-citable articles being deposited in the arXiv. In spite of their citation advantage, arXiv-deposited articles received 23% fewer downloads from the publisher's website (about 10 fewer downloads per article) in all but the most recent two years after publication. The data suggest that arXiv and the publisher's website may be fulfilling distinct functional needs of the reader.
S. Ramaseshan has contributed for the better understanding of various subjects in which he specialized during his years at the Indian Institute of Science, University of Madras and the Raman Research Institute. In this paper we would like to emphasis on his scientific contributions in various journals and some classic papers. In his entire career as a scientist he has collaborated with 47 eminent scientists and students and has published a total of 178 papers during the years 1944-2000. His field of interest has been varied and thus classified into 4 main area, i.e.: Crystallographic studies, Magneto-optics & Optics, Solid State Physics and Miscellaneous topics.
We examine the determinants of five year citations to papers published in the American Economic Review and the Economic Journal. Citations are positively related to page length and position in the journal. Both of these variables are consistent with the hypothesis that citations reflect paper quality, as is the number of subsequent self-citations. However, the publication of a major paper, as judged by subsequent citations, significantly increases the citations of other papers in an issue and this indicates the importance of chance in determining citations.
In this paper we examine whether and to what extent material transfer agreements influence research agenda setting in biotechnology. Research agendas are mapped through patents, articles, letters, reviews, and notes. Three groups are sampled: (1) documents published by government and industry which used research materials received through those agreements, (2) documents published by government and industry which used in-house materials, (3) documents published by academia. Methodologically, a co-word analysis is performed to detect if there is a difference in underlying scientific structure between the first two groups of documents. Secondly, interviews with practitioners of industry and government are intended to capture their opinion regarding the impact of the signed agreements on their own research agenda choices. The existence of synchronic and diachronic common terms between co-word clusters, stemming from the first two groups of publications, suggests cognitive linkage. Moreover, interviewees generally do not consider themselves constrained in research agenda setting when signing agreements for receiving research materials. Finally, after applying a co-word analysis to detect if the first group of documents overlaps with the third group we cannot conclude that agreements signed by industry and government affect research agenda setting in academia.
The applicability of Hirsch's h index (Hirsch, 2005) for evaluating scientific research in Spain has been investigated. A series of derivative indexes that take into account: i) the overall low scientific production in Spain before the' 80s; ii) differences among areas due to size (overall number of citations for publications in a given area); and iii) the number of authors, are suggested. Their applicability has been tested for two different areas in the Biological Sciences. The proposed set of indexes accurately summarizes both the success and evolution of scientists' careers in Spain, and it may be useful in the evaluation of other not well established national scientific research systems.
This article offers information on the characteristics and number of materials research articles indexed in the Science Citation Index (SCI) database in the year of 2004. 22,843 articles in full-text forms from 169 journals from the materials field (which included ceramics, metallurgy, and polymer journals) were retrieved from the SCI database and exported to EndNote software. The retrieved articles were carefully analyzed by eight scientists and experts in those subfields and categorized using SPSS into eight different categories, being (1) New materials, (2) Materials characterizations, (3) Materials improvement, (4) New process and/or process improvement, (5) Mathematical and theoretical models and/or computer simulations, (6) Novel and comprehensive explanations, (7) Testing conditions, and (8) Comparative studies, whose definitions were clearly indicated. The results were then considered in terms of the percentage of the number of articles in each materials subfield, country of corresponding author, and number of authors. The overall results suggested that, most materials articles published in 2004 were focused on new process and process improvement (27%), while materials characterizations (23%) and testing conditions (12%) took the 2nd and 3rd places, especially for the ceramics and polymer articles. The highest numbers of articles in the ceramics and polymer subfields were focused on new processes and/or process improvement, and those for the metallurgy subfield were on materials characterization. In the SCI database, the largest number of materials articles was authored from Asian scientists although the majority of the materials journals were run by editors from Europe in North America/Canada continents. There was no coherent relationship between the authors' and editors' affiliations. China, Japan and the United States of America (USA) were shown to be the top three countries which had the highest publication numbers in the materials field. Japan had the highest publication numbers in the ceramics subfield while China possessed most publications in polymer and metallurgy subfields. However, when considering the journal impact factors, the leading positions of the countries changed. The results from this work could assist materials scientists to select suitable international journals in relevant association with the contents of their to-publish works. Finally, it was noted that most material research articles were written by 3-4 authorships.
This paper examines general characteristics of African science from a quantitative 'scientometric' perspective. More specifically, that of research outputs of Africa-based authors published in the scientific literature during the years 1980-2004, either within the international journals representing 'mainstream' science, or within national and regional journals reflecting 'indigenous science'. As for the international journals, the findings derived from Thomson Scientific's Citation Indexes show that while Africa's share in worldwide science has steadily declined, the share of international co-publications has increased very significantly, whereas low levels of international citation impact persist. A case study of South African journals reveals the existence of several journals that are not processed for these international databases but nonetheless show a distinctive citation impact on international research communities.
This article aims to provide scientometric evidence in order to confirm or refute the statement that the "rise in literalist religious thinking in the 1990s devastated science in the Islamic world by promoting the idea that all knowledge could be found in the Koran" published in a Special Report in the New Scientist and to map the literature related to fundamentalism over time and space during the last ten years. We find that despite the rise of fundamentalism, science was thriving in eight Islamic countries (Iran, Jordan, Indonesia, Egypt, Turkey, Malaysia, Morocco, and Pakistan) during the period and hence the statement is refuted. The mapping of the "fundamentalist" literature indicates that there are a constant number of articles per year (60 to 70) covering disciplines ranging from religion and sociology to political sciences and international relations. The center of research is revealed to be the Anglo-Saxon world with epicenter the USA. Finally, we identify that the debate of fundamentalism versus science is in an embryonic stage.
Based on the transfer function model of the observed citation distribution and the expression of the cumulative citation probability distribution, parameters of 12 citation distributions are identified from statistical data of age distributions of references of 10 journals in JCR using the parameter optimization fitting method. At same time, based on the steady state solution of differential equations of the publication delay process and data of publication delays of 10 journals, the publication delay parameters of every journal are identified using the fitting method. Identified parameters of every journal citation distribution are compared with the journal's publication delay parameters and some valuable conclusions are deduced.
This article summarizes findings from studies that employed electronic mail (e-mail) for conducting indepth interviewing. It discusses the benefits of, and the challenges associated with, using e-mail interviewing in qualitative research. The article concludes that while a mixed mode interviewing strategy should be considered when possible, e-mail interviewing can be in many cases a viable alternative to face-to-face and telephone interviewing. A list of recommendations for carrying out effective e-mail interviews is presented.
To test feasibility of cybermetric indicators for describing and ranking university activities as shown in their Web sites, a large set of 9,330 institutions worldwide was compiled and analyzed. Using search engines' advanced features, size (number of pages), visibility (number of external inlinks), and number of rich files (pdf, ps, doc, ppt, and As formats) were obtained for each of the institutional domains of the universities. We found a statistically significant correlation between a Web ranking built on a combination of Webometric data and other university rankings based on bibliometric and other indicators. Results show that cybermetric measures could be useful for reflecting the contribution of technologically oriented institutions, increasing the visibility of developing countries, and improving the rankings based on Science Citation Index (SCI) data with known biases.
In this article Web issue analysis is introduced as a new technique to investigate an issue as reflected on the Web. The issue chosen, integrated water resource management (IWRM), is a United Nations-initiated paradigm for managing water resources in an international context, particularly in developing nations. As with many international governmental initiatives, there is a considerable body of online information about it: 41,381 hypertext markup language (HTML) pages and 28,735 PDF documents mentioning the issue were downloaded. A page uniform resource locator (URL) and link analysis revealed the international and sectoral spread of IWRM. A noun and noun phrase occurrence analysis was used to identify the issues most commonly discussed, revealing some unexpected topics such as private sector and economic growth. Although the complexity of the methods required to produce meaningful statistics from the data is disadvantageous to easy interpretation, it was still possible to produce data that could be subject to a reasonably intuitive interpretation. Hence Web issue analysis is claimed to be a useful new technique for information science.
The KMODDL (kinematic models for design digital library) is a digital library based on a historical collection of kinematic models made of steel and bronze. The digital library contains four types of learning modules including textual materials, QuickTime virtual reality movies, Java simulations, and stereolithographic files of the physical models. The authors report an evaluation study on the uses of the KMODDL in two undergraduate classes. This research reveals that the users in different classes encountered different usability problems, and reported quantitatively different subjective experiences. Further, the results indicate that depending on the subject area, the two user groups preferred different types of learning modules, resulting in different uses of the available materials and different learning outcomes. These findings are discussed in terms of their implications for future digital library design.
Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.
User acceptance models such as the technology acceptance model, the theory of reasoned action, and the theory of planned behavior have been widely used to study a specific information system, a group of systems, or even computers in general. This study examines the usage of competitive information systems. It applies the theory of planned behavior (TPB) in a comparative frame of reference model (relative model) in which relative attitude, relative subjective norm, relative intention, and relative usage are examined. The study is set in the context of two instant messaging technologies. Based on a survey from 300 instant messaging users, the effects of attitude and subjective norm on intention in each model were different (i.e., when TPB is tested once for each application). This confirms that the behavioral model can show different effects for competitive products. In addition, correct competitive answers were given by the relative model; however, these may differ from the answers found from a single application model. The authors show the importance of studying the relative model for competitive products.
The present analysis looks at how scientists use the Internet for informal scientific communication. It investigates the relationship between several explanatory variables and Internet use in a cross-section of scientists from seven European countries and five academic disciplines (astronomy, chemistry, computer science, economics, and psychology). The analysis confirmed some of the results of previous U.S.-based analyses. In particular, it corroborated a positive relationship between research productivity and Internet use. The relationship was found to be nonlinear, with very productive (nonproductive) scientists using the Internet less (more) than would be expected according to their productivity. Also, being involved in collaborative R&D and having large networks of collaborators is associated with increased Internet use. In contrast to older studies, the analysis did not find any equalizing effect whereby higher Internet use rates help to overcome the problems of potentially disadvantaged researchers. Obviously, everybody who wants to stay at the forefront of research and keep upto-date with developments in their research fields has to use the Internet.
This article focuses on the relevance judgments made by health information users who use the Web. Health information users were conceptualized as motivated information users concerned about how an environmental issue affects their health. Users identified their own environmental health interests and conducted a Web search of a particular environmental health Web site. Users were asked to identify (by highlighting with a mouse) the criteria they use to assess relevance in both Web search engine surrogates and full-text Web documents. Content analysis of document criteria highlighted by users identified the criteria these users relied on most often. Key criteria identified included (in order of frequency of appearance) research, topic, scope, data, influence, affiliation, Web characteristics, and authority/ person. A power-law distribution of criteria was observed (a few criteria represented most of the highlighted regions, with a long tail of occasionally used criteria). Implications of this work are that information retrieval (IR) systems should be tailored in terms of users' tendencies to rely on certain document criteria, and that relevance research should combine methods to gather richer, contextualized data. Metadata for IR systems, such as that used in search engine surrogates, could be improved by taking into account actual usage of relevance criteria. Such metadata should be user-centered (based on data from users, as in this study) and contextappropriate (fit to users' situations and tasks).
The authors examine the need and adoption of teleophthalmology in sub-Saharan Africa. Ethiopia, like most sub-Saharan African countries, is faced with limited specialists and health care services. These services are often concentrated in the urban areas, leaving most of the rural population (about 70% of the country) without adequate and timely health care delivery. In Ethiopia, the ratio of ophthalmologists to the population is 1:1,200,000, resulting in inadequate delivery of ophthalmology-related health care services. Using both primary and secondary data collection approaches, the authors report the need for telemedicine as well as the adoption and application of teleophthalmology in Ethiopia. Further, they present Ethiopia's teleophthalmology network, integrated teleconsultation, and teleeducation services. The authors conclude by presenting this research as a starting point to investigate further teleophthalmology and other telemedicine services for Ethiopia and by extension, other developing countries. Therefore, they bring a much-underresearched region (sub-Saharan Africa) and a much-underresearched technology (telemedicine) to the forefront of information systems (IS) research. It is the authors' hope that colleagues in the field will be motivated to investigate this '' forgotten '' region of the world that is yet to reap the full potentials of information and communications technologies (ICTs).
This is the first part of a two-part article that offers a theoretical and an empirical model of the everyday life information needs of urban teenagers. The qualitative methodology used to gather data for the development of the models included written surveys, audio journals, written activity logs, photographs, and semistructured group interviews. Twenty-seven inner-city teens aged 14 through 17 participated in the study. Data analysis took the form of iterative pattern coding using QSR NVivo 2 software (QSR International, 2002). The resulting theoretical model includes seven areas of urban teen development: the social self, the emotional self, the reflective self, the physical self, the creative self, the cognitive self, and the sexual self. The researchers conclude that the essence of teen everyday life information seeking (ELIS) is the gathering and processing of information to facilitate the teen-to-adulthood maturation process. ELIS is self-exploration and world exploration that helps teens understand themselves and the social and physical worlds in which they live. This study shows the necessity of tying youth information-seeking research to developmental theory in order to examine the reasons why adolescents engage in various information behaviors.
The timeline used in ISI's Journal Citation Reports (JCR; Thomson ISI, formerly the Institute for Scientific Information, Philadelphia, PA) for half-life calculations, is not a timeline for (average) cited age. These two timelines are shifted over half a year.
In a recent article, Egghe (2005) discussed what he terms Lorenz concentration theory, covering the Lorenz curve and concentration measures such as the coefficient of variation and the Theil and Gini coefficients. In this note, we point out that neither the curve construction nor the concentration measures conform to the standard statistical/econometric definitions. We here give the standard formulations and apply them to the (truncated) Pareto distributions that are the subject of Egghe's (2005) article. We also interpret Egghe's usage.
Context is one of the most important concepts in information seeking and retrieval research. However, the challenges of studying context are great; thus, it is more common for researchers to use context as a post hoc explanatory factor, rather than as a concept that drives inquiry. The purpose of this study was to develop a method for collecting data about information seeking context in natural online environments, and identify which aspects of context should be considered when studying online information seeking. The study is reported in two parts. In this, the second part, results and implications of this research are presented. Part 1 (Kelly, 2006) discussed previous literature on information seeking context and behavior, situated the current study within this literature, and described the naturalistic, longitudinal research design that was used to examine and measure the online information seeking context of seven users during a 14-week period. Results provide support for the value of the method in studying online information seeking context, the relative importance of various measures of context, how these measures change over time, and, finally, the relationship between these measures. In particular, results demonstrate significant differences in distributions of usefulness ratings according to task and topic.
The application of clustering to Web search engine technology is a novel approach that offers structure to the information deluge often faced by Web searchers. Clustering methods have been well studied in research labs; however, real user searching with clustering systems in operational Web environments is not well understood. This article reports on results from a transaction log analysis of Vivisimo.com, which is a Web meta-search engine that dynamically clusters users' search results. A transaction log analysis was conducted on 2-week's worth of data collected from March 28 to April 4 and April 25 to May 2, 2004, representing 100% of site traffic during these periods and 2,029,734 queries overall. The results show that the highest percentage of queries contained two terms. The highest percentage of search sessions contained one query and was less than 1 minute in duration. Almost half of user interactions with clusters consisted of displaying a cluster's result set, and a small percentage of interactions showed cluster tree expansion. Findings show that 11.1 % of search sessions were multitasking searches, and there are a broad variety of search topics in multitasking search sessions. Other searching interactions and statistics on repeat users of the search engine are reported. These results provide insights into search characteristics with a cluster-based Web search engine and extend research into Web searching trends.
As the quantity of information continues to exceed our human processing capacity, information systems must support users as they face the daunting task of synthesizing information. One activity that consumes much of a scientist's time is developing models that balance contradictory and redundant evidence. Driven by our desire to understand the information behaviors of this important user group, and the behaviors of scientific discovery in general, we conducted an observational study of academic research scientists as they resolved different experimental results reported in the biomedical literature. This article is Part 2 of two articles that report our findings. In Part 1 (Blake & Pratt, 2006), we introduced the Collaborative Information Synthesis (CIS) model, which captures the salient information behaviors that we observed. In this article, we review existing cognitive and information seeking models that have inadvertently reported synthesis behavior and provide five recommendations for systems designers to build information systems that support synthesis activities.
The purpose of this study was to determine the effects work roles (patient management/service provider, administrator/manager, researcher, educator, and student) and their associated tasks have on the choice of information sources used to meet private practice dentists' information needs. Additionally, the study investigated how the Internet has affected the information seeking of dentists. Using Leckie, Pettigrew, and Sylvain's (1996) model of the information seeking of professionals as the conceptual framework, vignette-based, in-depth interviews were conducted with 12 dentists in the metropolitan areas of Seattle, Tacoma, and Everett, Washington. Follow-up interviews were used to investigate dentists' use of the Internet. Findings revealed that the type of work role-related task significantly shapes dentists' choices of information sources; the Internet emerged as a significant information source because it provides up-to-date information in a convenient and timely manner; the Internet is a complement to traditional information sources, not a replacement for them; and the Leckie and associates (1996) general model of professional information seeking is supported by this study.
Formal and informal modes of collaboration in life sciences research were explored paratextually. The bylines and acknowledgments of more than 1,000 research articles in the journal Cell were analyzed to reveal the strength of collegiate ties and the importance of material and ideational trading between both individuals and labs. Intense coauthorship and subauthorship collaboration were shown to be defining features of contemporary research in the life sciences.
In this article we distinguish between top-performance and lower-performance groups in the analysis of statistical properties of bibliometric characteristics of two large sets of research groups. We find intriguing differences between top-performance and lower-performance groups, and between the two sets of research groups. These latter differences may indicate the influence of research management strategies. We report the following two main observations: First, lower-performance groups have a larger size-dependent cumulative advantage for receiving citations than top-performance groups. Second, regardless of performance, larger groups have fewer not-cited publications. Particularly for the lower-performance groups, the fraction of not-cited publications decreases considerably with size. We introduce a simple model in which processes at the microlevel lead to the observed phenomena at the macrolevel. Next, we fit our findings into the novel concept of hierarchically layered networks. In this concept, which provides the "infrastructure" for the model, a network of research groups constitutes a layer of one hierarchical step higher than the basic network of publications connected by citations. The cumulative size advantage of citations received by a group resembles preferential attachment in the basic network in which highly connected nodes (publications) increase their connectivity faster than less connected nodes. But in our study it is size that causes an advantage. In general, the larger a group (node in the research group network), the more incoming links this group acquires in a nonlinear, cumulative way. Nevertheless, top-performance groups are about an order of magnitude more efficient in creating linkages (i.e., receiving citations) than lower-performance groups. This implies that together with the size-dependent mechanism, preferential attachment, a quite common characteristic of complex networks, also works. Finally, in the framework of this study on performance-related differences of bibliometric properties of research groups, we also find that top-performance groups are, on average, more successful in the entire range of journal impact.
Due to e-mail's ubiquitous nature, millions of users are intimate with the technology; however, most users are only familiar with managing their own e-mail, which is an inherently different task from exploring an e-mail archive. Historians and social scientists believe that e-mail archives are important artifacts for understanding the individuals and communities they represent. To understand the conversations evidenced in an archive, context is needed. In this article, we present a new way to gain this necessary context: analyzing the temporal rhythms of social relationships. We provide methods for constructing meaningful rhythms from the e-mail headers by identifying relationships and interpreting their attributes. With these visualization techniques, e-mail archive explorers can uncover insights that may have been otherwise hidden in the archive. We apply our methods to an individual's 15-year e-mail archive, which consists of about 45,000 messages and over 4,000 relationships.
In this article, we report results of an investigation into the effect of sponsored links on ecommerce information seeking on the Web. In this research, 56 participants each engaged in six ecommerce Web searching tasks. We extracted these tasks from the transaction log of a Web search engine, so they represent actual ecommerce searching information needs. Using 60 organic and 30 sponsored Web links, the quality of the Web search engine results was controlled by switching nonsponsored and sponsored links on half of the tasks for each participant. This allowed for investigating the bias toward sponsored links while controlling for quality of content. The study also investigated the relationship between searching self-efficacy, searching experience, types of ecommerce information needs, and the order of links on the viewing of sponsored links. Data included 2,453 interactions with links from result pages and 961 utterances evaluating these links. The results of the study indicate that there is a strong preference for nonsponsored links, with searchers viewing these results first more than 82% of the time. Searching self-efficacy and experience does not increase the likelihood of viewing sponsored links, and the order of the result listing does not appear to affect searcher evaluation of sponsored links. The implications for sponsored links as a long-term business model are discussed.
The aim of this research is twofold. On the one hand, high accuracy retrieval has been a concern of the information retrieval community for some time. We aim to investigate this issue via data fusion. On the other hand, the correlation among component results has been proven harmful to data fusion, but it has not been taken into account in data fusion algorithms. In the hope of achieving better performance, we propose a group of algorithms to eliminate the effect of uneven correlation among component results by assigning different weights to all component results or their combinations. Then the linear combination method or a variation is used for fusion. Extensive experimentation is carried out to evaluate the performances of these algorithms with six groups of component results, which are the top 10 systems submitted to Text REtrieval Conference (TREC) 6, 7, 8, 9, 2001, and 2002. The experimental results show that all eight data fusion methods involved outperform the best component system on average. Therefore, we demonstrate that the data fusion technique in general is effective with accurate retrieval results. The experimental results also demonstrate that all six methods presented in this article are effective for eliminating the effect of uneven correlation among component results. All of them outperform CombSum and five of them outperform CombMNZ on average.
International academic rankings that compare world universities have proliferated recently. In accordance with latter conceptual and methodological advances in academic rankings approaches, five selection criteria are defined and four international university rankings are selected. A comparative analysis of the four rankings is presented taking into account both the indicators frequency and its weights. Results show that, although some indicators differ considerably across selected rankings and even many indicators are unique, indicators referred to research and scientific productivity from university academic staff have a prominent role across all approaches. The implications of obtained data for main rankings consumers are discussed.
This paper maps the domain of earth and environmental sciences (EES) and investigates the relationship between cognitive problem structures and internationalisation patterns, drawing on the concepts of systemic versus cumulative global environmental change (GEC) and mutual task dependence in scientific fields. We find that scientific output concentration and internationalisation are significantly higher in the systemic GEC fields of Meteorology & Atmospheric Sciences and Oceanography than in the cumulative GEC fields Ecology and Water Resources. The relationship is explained by stronger mutual task dependence in systemic GEC fields. In contrast, the portion of co-authorships with developing, emerging and transition countries among all international publications is larger for Water Resources than for the three other fields, consistent with the most pressing needs for STI capacity development in these countries.
The two Journal Citation Reports of the Science Citation Index 2004 and the Social Science Citation Index 2004 were combined in order to analyze and map journals and specialties at the edges and in the overlap between the two databases. For journals which belong to the overlap (e.g., Scientometrics), the merger mainly enriches our insight into the structure which can be obtained from the two databases separately; but in the case of scientific journals which are more marginal in either database, the combination can provide a new perspective on the position and function of these journals (e.g., Environment and Planning B - Planning and Design). The combined database additionally enables us to map citation environments in terms of the various specialties comprehensively. Using the vector-space model, visualizations are provided for specialties that are parts of the overlap (information science, science & technology studies). On the basis of the resulting visualizations, "betweenness" - a measure from social network analysis - is suggested as an indicator for measuring the interdisciplinarity of journals.
There exists a quantitative relationship, which can be expressed as G=kF(lgP)N, where G is per capita GDP, F gross expenditure on R&D as % of GDP, P patent applications, N Internet users per 10,000 inhabitants, and k a constant ranging from 0.4 to 1.2 in most countries. The mechanism of the relationship is explained in the paper.
Aims: Undergraduate education in physical education is widely common in Turkey. Postgraduate training is provided mostly by institutes of health sciences, educational science and social sciences. The aim of this study was to evaluate the characteristics of PhD theses in sports sciences. Methods: The database of the Turkish Council of Higher Education has been searched the years 1988-2002 for PhD theses with different combinations of keywords like "Sport(s)", "All Dissertations" and "Physical Education". Theses were classified according to the institute, year, university, the title of the mentors and the field of sports sciences. The inter-and intra-validity of ratings were high (Kendall Tau_b=0.84 and 1.00, p < 0.01). Results: Most of theses were prepared in Institutes for Health Sciences (n=196, 86.3%), second mostly in Institutes of Social Sciences (n=25, 11.0%). Theses originated mostly from Marmara (n=90, 39.6%), Gazi (n=59, 25.9%) and Dokuz Eylul Universities (n=25, 11.0%). Ninety two theses (46.9%) were prepared in Training and Movement Sciences, 40 (20.4%) in Sports Management, 29 (14.7%) Psycho-Social Fields of Sports Sciences, 23 (11.7%) Sports Health Sciences and 13 (6.6%) in Sports Pedagogy. Conclusion: Most theses were prepared in Institutes of Health Sciences, but the subjects covered the field of training and movement sciences. The unique and multi-disciplinary nature of sports sciences seems to warrant the foundation of an Institute of Sports.
The capacity to attract citations from other disciplines - or knowledge export - has always been taken into account in evaluating the quality of scientific papers or journals. Some of the JCR's (ISI's Journal Citation Report) Subject Categories have a greater exporting character than others because they are less isolated. This influences the rank/JIF (ISI's Journal Impact Factor) distribution of the category. While all the categories fit a negative power law fairly well, those with a greater External JIF give distributions with a more sharply defined peak and a longer tail - something like an iceberg. One also observes a major relationship between the rates of export and import of knowledge.
The primary aim of this paper is to assess the contribution to the international literature of Spanish scientific production in the research stream of innovation and technology management. For this purpose 72 articles published in the last decade in the most prestigious international journals in this research stream have been evaluated. From this analysis we have concluded that there has been a positive evolution from 1995 to the present time, as much from a qualitative as from a quantitative point of view. Likewise, we have found that research in this research stream is concentrated fundamentally on a reduced group of universities. Nevertheless, these do not focus exclusively on one or a few research subjects, but on a wide range thereof.
Facing such serious problems in cultivating IT engineers as a mismatch in supply and demand of IT workers, shortage of globally competitive IT professionals, and insufficient education and training of university graduates, the Korean government has decided to adopt a new paradigm in national IT engineering education, based on supply chain management (SCM) in manufacturing. SCM weights improving competitiveness of the supply chain as a whole via a long-term commitment to supply chain relationships and a cooperative, integrated approach to business processes. These characteristics of SCM are believed to provide insight into a more effective IT education and industry-university relationship. On the basis of the SCM literature, a model for industry-oriented IT higher education is designed, and then applied in the field of computer-software engineering in Korea.
This article reports findings from the study of the international contribution to the system of library and information science communication in Poland in the years 2003-2005. The sample consists of articles published both in selected journals and collective works. Two important dimensions determining the internationalization of local scholarly communication are considered: direct contribution (foreign authors' articles and papers and their translations published in Poland) and indirect contribution (citedness of foreign authors' documents in articles and papers published in Poland). Bibliographic data about the geographical distribution and affiliation of foreign authors are gathered and analyzed. Furthermore, the findings of citation analysis are presented to determine the percentage share of citations received by foreign documents as well as to find out what is the structure of such citations regarding the language and form, which thematic areas are most replete with such citations and which foreign journals are most cited in Poland.
Innovation research builds on the analysis of micro level data describing innovative behaviour of individual firms. One increasingly popular type of data are Literature-based Innovation Output (LBIO) data. These are compiled by screening specialist trade journals for new-product announcements. Notwithstanding the substantial advantages, the eligibility of LBIO data for innovation research remains controversial. In this paper the merits of LBIO data are examined by means of comparative analysis. A newly built LBIO database is systematically compared with the widely used Community Innovation Survey. It shows that both databases identify similar innovators in terms of firm size, distribution across industries and degree of innovativeness: LBIO data can be considered a fully fledged alternative to traditional innovation data, highly eligible for innovation research.
The emergence of patent citations as a tool for patent estimation has been subjected to equally vocal champions and critics. In additional to patent citation, this article aims to contribute other factors, including court decisions, claim language, extension cases, patent family and portfolio, which should be deliberated during patent evaluation. It introduces the subject-matter by discussing the specialties and peculiarities of these proposed factors. Furthermore, comparisons between the patent citations and these factors are presented by illustrating several well-known patents. The results of the comparisons reveal that an adverse conclusion might be drawn if a patent is estimated only based on citations. The conclusion supports Meyer's study that "the general nature of a common framework for both scientific and patent citations would severely limit its usefulness." Therefore, those factors discussed in the article would be a great asset in patent evaluation. However, it only illustrates their impact on patent estimation using a couple well-known patents. Future research would be needed to investigate these factors in a more detailed manner.
As the web is continuously changing, perhaps growing exponentially since its inception, a major potential problem for webometrics is that web statistics may be obsolete by the time they are published in the academic literature. It is important therefore to know as much as possible about how the web is changing over time. This paper studies the UK, Australian and New Zealand academic webs from 2000 to 2005, finding that the number of static pages and links in each of the three academic webs appears to have stabilised as far back as 2001. This stabilisation may be partly due to increases in dynamic pages which are normally excluded from webometric analyses. Nevertheless, the results are encouraging evidence that webometrics for academic spaces may have a longer-term validity than would have been previously assumed.
To bound memory consumption, most compression systems provide a facility that controls the amount of data that may be processed at once-usually as a block size, but sometimes as a direct megabyte limit. In this work we consider the RE-PAIR mechanism of Larsson and Moffat (2000), which processes large messages as disjoint blocks to limit memory consumption. We show that the blocks emitted by RE-PAIR can be postprocessed to yield further savings, and describe techniques that allow files of 500 MB or more to be compressed in a holistic manner using less than that much main memory. The block merging process we describe has the additional advantage of allowing new text to be appended to the end of the compressed file.
Web citations have become common in scholarly publications as the amount of online literature increases. Yet, such links are not persistent and many decay over time, causing accessibility problems for readers. The present study investigates the link decay phenomenon in three leading information science journals. Articles spanning a period of 7 years (1997-2003) were downloaded, and their links were extracted. From these, a measure of link decay, the half-life, was computed to be approximately 5 years, which compares favorably against other disciplines (1.4-4.8 years). The study also investigated types of link accessibility errors encountered as well as examined characteristics of links that may be associated with decay. It was found that approximately 31% of all citations were not accessible during the time of testing, and the majority of errors were due to missing content (HTTP Error Code 404). Citations from the edu domain were also found to have the highest failure rates of 36% when compared with other popular top-level domains. Results indicate that link decay is a problem that cannot be ignored, and implications for journal authors and readers are discussed.
Aggregated journal-journal citation networks based on the Journal Citation Reports 2004 of the Science Citation Index (5,968 journals) and the Social Science Citation Index (1,712 journals) are made accessible from the perspective of any of these journals. A vector-space model is used for normalization, and the results are brought online at http://www.leydesdorff.net/jcr04 as input files for the visualization program Pajek. The user is thus able to analyze the citation environment in terms of links and graphs. Furthermore, the local impact of a journal is defined as its share of the total citations in the specific journal's citation environments; the vertical size of the nodes is varied proportionally to this citation impact. The horizontal size of each node can be used to provide the same information after correction for within-journal (self-)citations.In the "citing" environment, the equivalents of this measure can be considered as a citation activity index which maps how the relevant journal environment is perceived by the collective of authors of a given journal. As a policy application, the mechanism of interdisciplinary developments among the sciences is elaborated for the case of nanotechnology journals.
This study examined the relationship between print journal use, online journal use, and online journal discovery tools with local journal citations. Local use measures were collected from 1997 to 2004, and negative binomial regression models were designed to test the effect that local use, online availability, and access enhancements have on citation behaviors of academic research authors. Models are proposed and tested to determine whether multiple locally recorded usage measures can predict citations and if locally controlled access enhancements influence citation. The regression results indicated that print journal use was a significant predictor of local journal citations prior to the adoption of online journals. Publisher-provided and locally recorded online journal use measures were also significant predictors of local citations. Online availability of a journal was found to significantly increase local citations, and, for some disciplines, a new access tool like an OpenURL resolver significantly impacts citations and publisher-provided journal usage measures.
The amount of knowledge accumulated in published scientific papers has increased due to the continuing progress being made in scientific research. Since numerous papers have only reported fragments of scientific facts, there are possibilities for discovering new knowledge by connecting these facts. We therefore developed a system called BioTermNet to draft a conceptual network with hybrid methods of information extraction and information retrieval. Two concepts are regarded as related in this system if (a) their relationship is clearly described in MEDLINE abstracts or (b) they have distinctively co-occurred in abstracts. PRIME data, including protein interactions and functions extracted by NLP techniques, are used in the former, and the Singhal-measure for information retrieval is used in the latter. Relationships that are not clearly or directly described in an abstract can be extracted by connecting multiple concepts. To evaluate how well this system performs, Swanson's association between Raynaud's disease and fish oil and that between migraine and magnesium were tested with abstracts that had been published before the discovery of these associations. The result was that when start and end concepts were given, plausible and understandable intermediate concepts connecting them could be detected. When only the start concept was given, not only the focused concept (magnesium and fish oil) but also other probable concepts could be detected as related concept candidates. Finally, this system was applied to find diseases related to the BRCA1 gene. Some other new potentially related diseases were detected along with diseases whose relations to BRCA1 were already known. The BloTermNet is available at hftp://btn.ontology.ims.u-tokyo.ac.jp.
In this article, we assess the usability in an Internet-based system for e-learning in a cross-cultural environment. The context of the evaluation and testing was a training program launched with the intention of introducing and promoting a new way of learning about and understanding the emerging technologies in regions with a low educational level and a high unemployment rate. The aim of the study was to assess the usability of the e-learning system with different methods and approaches to get a good assessment of its learnability and applicability in various circumstances.
The aim of this article is to examine the role of information technologies (IT) in supporting practice and professional identity formation, both major axes for communities of practice. The article uses an ethnographic case study to understand how public defenders learn to improve their court performance. The concept of "communities of practice" helps to illuminate how the attorneys in a public defender's office share knowledge in order to practice effectively in court. This article presents findings that a community of practice serves as effective scaffolding to support professional development; this is especially true for the practice component. Further, this case study indicates that information technologies, such as listservs, are not very effective social integrators for professionals who work at different sites. In particular, today's IT forums are most effective when used for sharing technical information about work, and least effective for sharing important cultural meanings about how professionals should approach their work and develop professional identities. This research advances our understanding of the complexity of organizing communities of practice to support professional groups of colleagues and of organizing IT-enabled support for various activities of the community.
Most text classification techniques assume that manually labeled documents (corpora) can be easily obtained while learning text classifiers. However, labeled training documents are sometimes unavailable or inadequate even if they are available. The goal of this article is to present a self-learned approach to extract high-quality training documents from the Web when the required manually labeled documents are unavailable or of poor quality. To learn a text classifier automatically, we need only a set of user-defined categories and some highly related keywords. Extensive experiments are conducted to evaluate the performance of the proposed approach using the test set from the Reuters-21578 news data set. The experiments show that very promising results can be achieved only by using automatically extracted documents from the Web.
This article presents results from 21 semi-structured interviews with museum information professionals (MIPs) who were asked about their experiences working with information resources, tools, and technologies. Interviews were analyzed to determine (a) the challenges MIPs face as they adapt to changing technical capabilities and strive to meet the changing needs and expectations of museum users and (b) the coping mechanisms MIPs employ on the job that enable them to deal effectively with those challenges. This article explores the results of this analysis, exploring how MIPs cope with the changing nature of information work in museums by relying on thirteen different strategies including (a) assessing new technologies in relation to the museum's core mission, (b) helping museum professionals embrace new ideas about information access and provision, and (c) promoting internal practices that encourage the sharing of information and the integration of information science into museum work. This article also discusses the implications of these challenges and strategies for current and future MIPs, and assesses their impact on changing perceptions, roles, and research for information professionals in museums as they work to meet the information needs of all museum users.
This study investigates the potential impact of open access pricing on institutional journal expenditures in four subject fields at nine American colleges and universities. Three pricing models are evaluated: the conventional model (the current subscription model), the PLoS open access model (based on the fees currently charged by the Public Library of Science), and the equal-revenue open access model (which maintains current levels of total aggregate spending within each subject field). Because institutional disparities in publishing productivity are far greater than institutional disparities in library holdings, the shift from a subscription-based model to either open access model would bring dramatic cost savings for most colleges and universities. At the same time, a small number of institutions-the top research universities-would pay a far higher proportion of the total aggregate cost.
Many authors propose that open source software (OSS) is a good strategy to bring information and communication technologies to-developing countries. Nevertheless, the use of OSS needs to be more than just adopting Linux as the standard for operating systems. Adoption of OSS is not only a choice of software, but also a means of acquiring knowledge. Developing countries have to use OSS as a way to gain knowledge about the technology itself and as a way of creating technology products that fit their specific needs. In this article, the authors introduce a model of OSS based on its essential characteristics to understand how developing countries may use OSS to achieve their development goals. The authors argue that there are two defining properties of any open source software. The first property is the potential for shared conceptualization and the second is the potential for modularity. By assessing how each OSS project satisfies these two conditions, a taxonomy is built for open source projects. This taxonomy will help the development of more sensible policies to promote the use of open source in developing countries.
To automatically convert legacy data of taxonomic descriptions into extensible markup language (XML) format, the authors designed a machine-learning-based approach. In this project three corpora of taxonomic descriptions were selected to prove the hypothesis that domain knowledge and conventions automatically induced from some semistructured corpora (i.e., base corpora) are useful to improve the markup performance of other less-structured, quite different corpora (i.e., evaluation corpora). The "structuredness" of the three corpora was carefully measured. Basing on the structuredness measures, two of the corpora were used as the base corpora and one as the evaluation corpus. Three series of experiments were carried out with the MARTT (markuper of taxonomic treatments) system the authors developed to evaluate the effectiveness of different methods of using the n-grarn semantic class association rules, the element relative position probabilities, and a combination of the two types of knowledge mined from the automatically marked-up base corpora. The experimental results showed that the induced knowledge from the base corpora was more reliable than that learned from the training examples alone, and that the n-gram semantic class association rules were effective in improving the markup performance, especially on the elements with sparse training examples. The authors also identify a number of challenges for any automatic markup system using taxonomic descriptions.
As the Internet grows in importance, concerns about online privacy have arisen. The authors describe the development and validation of three short Internet administered scales measuring privacy-related attitudes (Privacy Concern) and behaviors (General Caution and Technical Protection). In Study 1, 515 people completed an 82-item questionnaire from which the three scales were derived. In Study 2, scale validity was examined by comparing scores of individuals drawn from groups considered likely to differ in privacy-protective behaviors. In Study 3, correlations between the scores on the current scales and two established measures of privacy concern were examined. The authors conclude that these scales are reliable and valid instruments suitable for administration via the Internet, and present them for use in online privacy research.
The authors review a log of billions of Web queries that constituted the total query traffic for a 6-month period of a general-purpose commercial Web search service. Previously, query logs were studied from a single, cumulative view. In contrast, this study builds on the authors' previous work, which showed changes in popularity and uniqueness of topically categorized queries across the hours in a day. To further their analysis, they examine query traffic on a daily, weekly, and monthly basis by matching it against lists of queries that have been topically precategorized by human editors. These lists represent 13% of the query traffic. They show that query traffic from particular topical categories differs both from the query stream as a whole and from other categories. Additionally, they show that certain categories of queries trend differently over varying periods. The authors key contribution is twofold: They outline a method for studying both the static and topical properties of a very large query log over varying periods, and they identify and examine topical trends that may provide valuable insight for improving both retrieval effectiveness and efficiency.
Research in information science now regards users'relevance judgment as subjective perception. However, user-centered studies in the extant literature mainly focus on relevance judgment in problem solving contexts in which the situational relevance of a document is the main concern for users. This study investigates users' relevance judgment in non-problem-solving contexts, i.e., when users search information for epistemic value or entertainment. It is posited that informative relevance and affective relevance should be the main concerns for users. Based on H. R Grice's (1975, 1989) communication theory and Y. Xu and Z. Chen's (2006) framework, this study tests the significance of topicality, novelty, reliability, understandability, and scope to informative relevance and affective relevance in non-problem-solving contexts. This empirical study finds novelty, reliability, and topicality to be key aspects of informative relevance.
Large national social surveys are expensive to conduct and to process into usable data files. The purpose of this article is to assess the impact of these national data sets on research using bibliometric measures. Peer-reviewed articles from research using numeric data files and documentation from the Canadian National Population Health Survey (NPHS) were searched in ISI's Web of Science and in Scopus for articles citing the original research. This article shows that articles using NPHS data files and products have been used by a diverse and global network of scholars, practitioners, methodologists, and policy makers. Shifts in electronic publishing and the emergence of new tools for citation analysis are changing the discovery process for published and unpublished work based on inputs to the research process. Evidence of use of large surveys throughout the knowledge transfer process can be critical in assessing grant and operating funding levels for research units, and in influencing design, methodology, and access channels in planning major surveys. The project has gathered citations from the peer-reviewed article stage of knowledge transfer, providing valuable evidence on the use of the data files and methodologies of the survey and of limitations of the survey. Further work can be done to expand the scope of material cited and analyze the data to understand how the longitudinal aspect of the survey contributes to the value of the research output. Building a case for continued funding of national, longitudinal surveys is a challenge. As far as I am aware, however, little use has been made of citation tracking to assess the long-term value of such surveys. Conducting citation analysis on research inputs (data file use and survey products) provides a tangible assessment of the value accrued from large-scale (and expensive) national surveys.
In recent years, a considerable body of Webometric research has used hyperlinks to generate indicators for the impact of Web documents and the organizations that cre- ated them. The relationship between this Web impact and other, offline impact indicators has been explored for entire universities, departments, countries, and scientific journals, but not yet for individual scientists-an important omission. The present research closes this gap by investigating factors that may influence the Web impact (i.e., inlink counts) of scientists' personal homepages. Data concerning 456 scientists from five scientific disciplines in six European countries were analyzed, showing that both homepage content and personal and institutional characteristics of the homepage owners had significant relationships with inlink counts. A multivariate statistical analysis confirmed that full-text articles are the most linked-to content in homepages. At the individual homepage level, hyperlinks are related to several offline characteristics. Notable differences regarding total inlinks to scientists' homepages exist between the scientific disciplines and the countries in the sample. There also are both gender and age effects: fewer external inlinks (i.e., links from other Web domains) to the homepages of female and of older scientists. There is only a weak relationship between a scientist's recognition and homepage inlinks and, surprisingly, no relationship between research productivity and inlink counts. Contrary to expectations, the size of collaboration networks is negatively related to hyperlink counts. Some of the relationships between hyperlinks to homepages and the properties of their owners can be explained by the content that the homepage owners put on their homepage and their level of Internet use; however, the findings about productivity and collaborations do not seem to have a simple, intuitive explanation. Overall, the results emphasize the complexity of the phenomenon of Web linking, when analyzed at the level of individual pages.
Information security is a growing concern among the general population. For instance, it has been estimated by the U.S. Department of Justice (2004) that one in three people will become victims of identity theft at some point in their lifetime. The bulk of the research into information security has gone into the investigation of technological aspects of security, and there are gaps in the literature relative to contravention of security measures. Drawing from deterrence theory and using the theory of planned behavior as a general framework, this empirical field study investigated the effects of punishment and ethics training on behaviors related to contravention of information security measures among information professionals to fill an important gap in the literature. We found that both punishment and ethics training can be effective in mitigating the threat of software and information security, but that these depend on certain underlying motivational factors of individuals. The results of this study suggest a need to develop and refine the theoretical models, and we offer suggestions for getting at the root of behavioral issues surrounding information security.
The journal structure in the China Scientific and Technical Papers and Citations Database (CSTPCD) is analyzed from three perspectives: the database level, the specialty level, and the institutional level (i.e., university journals vs. journals issued by the Chinese Academy of Sciences). The results are compared with those for (Chinese) journals included in the Science Citation Index (SCI). The frequency of journal-journal citation relations in the CSTPCD is an order of magnitude lower than in the SCI. Chinese journals, especially high-quality journals, prefer to cite international journals rather than domestic ones; however, Chinese journals do not get an equivalent reception from their international counterparts. The international visibility of Chinese journals is low, but varies among fields of science. Journals of the Chinese Academy of Sciences have a better reception in the international scientific community than university journals.
Digital libraries have become one of the most important Web services for information seeking. One of their main drawbacks is their global approach: In general, there is just one interface for all users. One of the key elements in improving user satisfaction in digital libraries is personalization. When considering personalizing factors, cognitive styles have been proved to be one of the relevant parameters that affect information seeking. This justifies the introduction of cognitive style as one of the parameters of a Web personalized service. Nevertheless, this approach has one major drawback: Each user has to run a time-consuming test that determines his or her cognitive style. In this article, we present a study of how different classification systems can be used to automatically identify the cognitive style of a user using the set of interactions with a digital library. These classification systems can be used to automatically personalize, from a cognitive-style point of view, the interaction of the digital library and each of its users.
We investigated how citations from documents labeled by the Institute for Scientific Information (ISI) as '' editorial material '' contribute to the impact factor of academic journals in which they were published. Our analysis is based on records corresponding to the documents classified by the ISI as editorial material published in journals covered by the Social Sciences Citation Index between 1999 and 2003 (50,273 records corresponding to editorial material published in 2,374 journals). The results appear to rule out widespread manipulation of the impact factor by academic journals publishing large amounts of editorial material with many citations to the journal itself as a strategy to increase the impact factor.
Information systems (IS) projects involving multiple organizations are very common today. Knowledge integration in such projects is a complex task of integrating diverse knowledge bases across organizations that may possess distinct strategic goals and even conflicting interests. Prior research has indicated that social capital, a resource based on social relationships, positively influences knowledge integration and interorganizational relationships, but the exact nature of the interaction has been unclear. Based on an in-depth case study, this article examines a four-organization (three clients and one IT service provider) collaborative IS project wherein the clients were business partners for 7 years when they embarked on the project. The study explicitly identifies the roles through which social capital can be leveraged for knowledge integration in a collaborative IS project. Findings suggest that social capital can be leveraged as a motivator, an integrator, and a facilitator during the various stages of a collaborative IS project.
The analytic advantages of central concepts from linguistics and information theory, and the analogies demonstrated between them, for understanding patterns of retrieval from full-text indexes to documents are developed. The interaction between the syntagm and the paradigm in computational operations on written language in indexing, searching, and retrieval is used to account for transformations of the signified or meaning between documents and their representation and between queries and documents retrieved. Characteristics of the message, and messages for selection for written language, are brought to explain the relative frequency of occurrence of words and multiple word sequences in documents. The examples given in the companion article are revisited and a fuller example introduced. The signified of the sequence stood for, the term classically used in the definitions of the sign, as something standing for something else, can itself change rapidly according to its syntagm. A greater than ordinary discourse understanding of patterns in retrieval is obtained.
A study of Thomson-Scientific ISI ranked Library and Information Science (LIS) journals (n=52) is reported. The study examined the stances of publishers as expressed in the Copyright Transfer Agreements (CTAs) of the journals toward self-archiving, the practice of depositing digital copies of one's works in an Open Archives Initiative (OAI)-compliant open access repository. Sixty-two percent (32) do not make their CTAs available on the open Web; 38% (20) do. Of the 38% that do make CTAs available, two are open access journals. Of the 62% that do not have a publicly available CTA, 40% are silent about self-archiving. Even among the 20 journal CTAs publicly available there is a high level of ambiguity. Closer examination augmented by publisher policy documents on copyright, self-archiving, and instructions to authors reveals that only five, 10% of the ISI-ranked LIS journals in the study, actually prohibit self-archiving by publisher rule. Copyright is a moving target, but publishers appear to be acknowledging that copyright and open access can co-exist in scholarly journal publishing. The ambivalence of LIS journal publishers provides unique opportunities to members of the community. Authors can self-archive in open access archives. A societyled, global scholarly communication consortium can engage in the strategic building of the LIS information commons. Aggregating OAI-compliant archives and developing disciplinary-specific library services for an LIS commons has the potential to increase the field's research impact and visibility. It may also ameliorate its own scholarly communication and publishing systems and serve as a model for others.
The recently developed h-index has been applied to the literature produced by senior British-based academics in librarianship and information science. The majority of those evaluated currently hold senior positions in UK information science and librarianship departments; however, a small number of staff in other departments and retired '' founding fathers '' were analyzed as well. The analysis was carried out using the Web of Science (Thomson Scientific, Philadelphia, PA) for the years from 1992 to October 2005, and included both second authored papers and self-citations. The top-ranking British information scientist, Peter Willett, has an h-index of 31. However, it was found that Eugene Garfield, the founder of modern citation studies, has an even higher h-index of 36. These results support other studies suggesting that the 17-index is a useful tool in the armory of bibliometrics.
An analogy is established between the syntagm and paradigm from Saussurean linguistics and the message and messages for selection from the information theory initiated by Claude Shannon. The analogy is pursued both as an end in itself and for its analytic value in understanding patterns of retrieval from full-text systems. The multivalency of individual words when isolated from their syntagm is contrasted with the relative stability of meaning of multiword sequences, when searching ordinary written discourse. The syntagm is understood as the linear sequence of oral and written language. Saussure's understanding of the word, as a unit that compels recognition by the mind, is endorsed, although not regarded as final. The lesser multivalency of multiword sequences is understood as the greater determination of signification by the extended syntagm. The paradigm is primarily understood as the network of associations a word acquires when considered apart from the syntagm. The restriction of information theory to expression or signals, and its focus on the combinatorial aspects of the message, is sustained. The message in the model of communication in information theory can include sequences of written language. Shannon's understanding of the written word, as a cohesive group of letters, with strong internal statistical influences, is added to the Saussurean conception. Sequences of more than one word are regarded as weakly correlated concatenations of cohesive units.
Information retrieval systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the World Wide Web (WWW). The authors' aim was to find out how corpus-based cross-language information retrieval (CLIR) manages in retrieving highly relevant documents. They created a Finnish-Swedish comparable corpus from two loosely related document collections and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels-liberal, regular, and stringent-were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of the Comparable Corpus Translation system (COCOT) was compared to that of a dictionary-based query translation program; the two translation methods were also combined. The results indicate that corpus-based CUR performs particularly well with highly relevant documents. In average precision, COCOT even matched the monolingual baseline on the highest relevance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.
The field of information science is constantly changing. Therefore, information scientists are required to regularly review-and if necessary-redefine its fundamental building blocks. This article is one of four articles that documents the results of the Critical Delphi study conducted in 2003-2005. The study, "Knowledge Map of Information Science," was aimed at exploring the foundations of information science. The international panel was composed of 57 leading scholars from 16 countries who represent nearly all the major subfields and important aspects of the field. In this study, the author documents 50 definitions of information science, maps the major theoretical issues relevant to the formulation of a systematic conception, formulates six different conceptions of the field, and discusses their implications.
The World Wide Web presents significant opportunities for business intelligence analysis as it can provide information about a company's external environment and its stakeholders. Traditional business intelligence analysis on the Web has focused on simple keyword searching. Recently, it has been suggested that the incoming links, or backlinks, of a company's Web site (i.e., other Web pages that have a hyperlink pointing to the company of interest) can provide important insights about the company's "online communities." Although analysis of these communities can provide useful signals for a company and information about its stakeholder groups, the manual analysis process can be very time-consuming for business analysts and consultants. In this article, we present a tool called Redips that automatically integrates backlink meta-searching and text-mining techniques to facilitate users in performing such business intelligence analysis on the Web. The architectural design and implementation of the tool are presented in the article. To evaluate the effectiveness, efficiency, and user satisfaction of Redips, an experiment was conducted to compare the tool with two popular business intelligence analysis methods-using backlink search engines and manual browsing. The experiment results showed that Redips was statistically more effective than both benchmark methods (in terms of Recall and F-measure) but required more time in search tasks. In terms of user satisfaction, Redips scored statistically higher than backlink search engines in all five measures used, and also statistically higher than manual browsing in three measures.
Google News and other newsbots have automated the process of news selection, providing Internet users with a virtually limitless array of news and public information dynamically culled from thousands of news organizations all over the world. In order to help users cope with the resultant overload of information, news leads are typically accompanied by three cues: (a) the name of the primary source from which the headline and lead were borrowed, (b) the time elapsed since the story broke, and (c) the number of related articles written about this story by other news organizations tracked by the newsbot. This article investigates the psychological significance of these cues by positing that the information scent transmitted by each cue triggers a distinct heuristic (mental shortcut) that tends to influence online users' perceptions of a given news item, with implications for their assessment of the item's relevance to their information needs and interests. A large 2 x 3 x 6 within-subjects online experiment (N = 523) systematically varied two levels of the source credibility cue, three levels of the upload recency cue and six levels of the number-of-related-articles cue in an effort to investigate their effects upon perceived message credibility, newsworthiness, and likelihood of clicking on the news lead. Results showed evidence for source primacy effect, and some indication of a cue-cumulation effect when source credibility is low. Findings are discussed in the context of machine and bandwagon heuristics.
A feature of modern democracies is public mistrust of scientists and the politicization of science policy, e.g., concerning stem cell research and genetically modified food. While the extent of this mistrust is debatable, its political influence is tangible. Hence, science policy researchers and science policy makers need early warning of issues that resonate with a wide public so that they can make timely and informed decisions. In this article, a semi-automatic method for identifying significant public science-related concerns from a corpus of Internet-based RSS (Really Simple Syndication) feeds is described and shown to be an improvement on a previous similar system because of the introduction of feed-based aggregation. In addition, both the RSS corpus and the concept of public science-related fears are deconstructed, revealing hidden complexity. This article also provides evidence that genetically modified organisms and stem cell research were the two major policy-relevant science concern issues, although mobile phone radiation and software security also generated significant interest.
In contrast to many recent large-scale catastrophic events, such as the Turkish earthquake in 1999, the 9/11 attack in New York in 2001, the Bali Bombing in 2002, and the Asian Tsunami in 2004, the initial rescue effort towards Hurricane Katrina in the U.S. in 2005 had been sluggish. Even as Congress has promised to convene a formal inquiry into the response to Katrina, this article offers another perspective by analyzing the delayed response through the lens of knowledge management (KM). A KM framework situated in the context of disaster management is developed to study three distinct but overlapping KM processes, namely, knowledge creation, knowledge transfer, and knowledge reuse. Drawing from a total of more than 400 documents-including local, national, and foreign news articles, newswires, congressional reports, and television interview transcripts, as well as Internet resources such as wikipedia and blogs - 14 major delay causes in Katrina are presented. The extent to which the delay causes were a result of the lapses in KM processes within and across the government agencies are discussed.
In this article, the author examines an enterprise resource planning (ERP) adoption process in a particular case setting to explore the knowledge management challenges encountered, specifically challenges related to the sharing and integration of knowledge, and the ways that social capital is used to overcome these challenges. More specifically, the author relates the different sources and effects of social capital to the different implementation phases, with their differing knowledge management challenges. By doing so, he highlights the relative importance of the bridging and bonding aspects of social capital that vary during different phases because of the different types of knowledge that become more or less important over the lifecycle of the project - embrained, embodied, encultured, embedded, and encoded.
With domestic violence directly impacting over 5 million victims in the United States annually, the growing e-health and e-government networks are developing digitally based resources for both victims and those who aid them. The well-established community information and referral role of public libraries dovetails with this digital referral network model; however, no study of the actual service provided by public libraries is available. This examination of e-mail reference responses to requests for safe-house contact information revealed major gaps in cyber-safety awareness and uneven implementation of professional standards for virtual reference service. Implications for information system design, professional standards, education, and future research are discussed.
A user's understanding of the libraries they work in, and hence of what they can do in those libraries, is encapsulated in their "mental models" of those libraries. In this article, we present a focused case study of users' mental models of traditional and digital libraries based on observations and interviews with eight participants. It was found that a poor understanding of access restrictions led to risk-averse behavior, whereas a poor understanding of search algorithms and relevance ranking resulted in trial-and-error behavior. This highlights the importance of rich feedback in helping users to construct useful mental models. Although the use of concrete analogies for digital libraries was not widespread, participants used their knowledge of Internet search engines to infer how searching might work in digital libraries. Indeed, most participants did not clearly distinguish between different kinds of digital resource, viewing the electronic library catalogue, abstracting services, digital libraries, and Internet search engines as variants on a theme.
This article describes a unique educational project that was implemented in the undergraduate study of computer science in 2002. Nesna University College has been using the example of sexual abuse of children in case study teaching in social informatics, in order to create an environment for intrinsically motivated learning. The project also gave the students a unique opportunity to get involved both emotionally and practically in the field of social informatics. The project is run in cooperation with Save the Children Norway and the Norwegian National Crime Squad. Nesna University College has the only computer science program in the world that has sexual abuse of children as the main topic on the curriculum. The computer science students provide both the Save the Children Norway and the National Criminal Investigation Service with reports on various topics such as secure chat, camera phones and possible abuse, Freenet as a tool for sexual abuse, etc. This exceptional cooperation between higher education and public and private organizations in this field makes the project not only unique, but might also be a major factor in both the willingness of students to learn social informatics and their development of skills in the various topics of social informatics.
When there are a group of articles and the present time is fixed we can determine the unique number h being the number of articles that received h or more citations while the other articles received a number of citations which is not larger than h. In this article, the time dependence of the h-index is determined. This is important to describe the expected career evolution of a scientist's work or of a journal's production in a fixed year. We use the earlier established cumulative n(th) citation distribution. We show that h = ((1-a(1))T alpha-1)(1/alpha) where a is the aging rate, alpha is the exponent of Lotka's law of the system, and T is the total number of articles in the group. For t = +infinity we refind the steady state (static) formula h = T-1/alpha which we proved in a previous article. Functional properties of the above formula are proven. Among several results we show (for a., a, T fixed) that h is a concavely increasing function of time, asymptotically bounded by T1-alpha.
The authors describe Lempel-Ziv to Compress Structure (LZCS), a novel Lempel-Ziv approach suitable for compressing structured documents. LZCS takes advantage of repeated substructures that may appear in the documents, by replacing them with a backward reference to their previous occurrence. The result of the LZCS transformation is still a valid structured document, which is human-readable and can be transmitted by ASCII channels. Moreover, LZCS transformed documents are easy to search, display, access at random, and navigate. In a second stage, the transformed documents can be further compressed using any semistatic technique, so that it is still possible to do all those operations efficiently; or with any adaptive technique to boost compression. LZCS is especially efficient in the compression of collections of highly structured data, such as extensible markup language (XML) forms, invoices, e-commerce, and Web-service exchange documents. The comparison with other structure-aware and standard compressors shows that LZCS is a competitive choice for these type of documents, whereas the others are not well-suited to support navigation or random access. When joined to an adaptive compressor, LZCS obtains by far the best compression ratios.
The field of Information Science is constantly changing. Therefore, information scientists are required to regularly review-and if necessary-redefine its fundamental building blocks. This article is one of a group of four articles, which resulted from a Critical Delphi study conducted in 2003-2005. The study, "Knowledge Map of Information Science," was aimed at exploring the foundations of information science. The international panel was composed of 57 leading scholars from 16 countries, who represent (almost) all the major subfields and important aspects of the field. This particular article documents 130 definitions of data, information, and knowledge formulated by 45 scholars, and maps the major conceptual approaches for defining these three key concepts.
The widely used Web search engines index and recommend individual Web pages in response to a few keywords queries to assist users in locating relevant documents. However, the Web search engines give different users the same answer set, although the users may have different preferences. A personalized Web search would carry out the search for each user according to his or her preferences. To conduct the personalized Web search, the authors provide a novel approach to model the user profile with a self-organizing map (SOM). Their results indicate that SOM is capable of helping the user to find the related category for each query used in the Web search to make a personalized Web search effective.
This study examines the influence of time constraints on Internet and Web search goals and search behavior. Specifically, it looks at the searching behavior of public library Internet users who, previously limited to 30 minutes per Internet session, are given an unlimited amount of time for use. Interviews and observations were conducted with 34 participants searching on their own queries. Despite an increase in the time allowed for searching, most people spent less than 30 minutes on the Internet, carrying out tasks like paying bills, shopping, browsing, and making reservations. Those who took more than 30 minutes were looking for jobs or browsing. E-mail use was universal. In this context, influences like time-dependent and time-independent tasks, use of search hubs to perform more efficient searches, and search diversity were recorded. Though there are a number of large and small studies of Internet and Web use, few of them focus on temporal influences. This study extends knowledge in this area of inquiry.
This article explores the status of research in hydrogeology using data mining techniques. First we try to explain what citation analysis is and review some of the previous work on citation analysis. The main idea in this article is to address some common issues about citation numbers and the use of these data. To validate the use of citation numbers, we compare the citation patterns for Water Resources Research papers in the 1980s with those in the 1990s. The citation growths for highly cited authors from the 1980s are used to examine whether it is possible to predict the citation patterns for highly-cited authors in the 1990s. If the citation data prove to be steady and stable, these numbers then can be used to explore the evolution of science in hydrogeology. The famous quotation, "If you are not the lead dog, the scenery never changes," attributed to Lee Iacocca, points to the importance of an entrepreneurial spirit in all forms of endeavor. In the case of hydrogeological research, impact analysis makes it clear how important it is to be a pioneer. Statistical correlation coefficients are used to retrieve papers among a collection of 2,847 papers before and after 1991 sharing the same topics with 273 papers in 1991 in Water Resources Research. The numbers of papers before and after 1991 are then plotted against various levels of citations for papers in 1991 to compare the distributions of paper population before and after that year. The similarity metrics based on word counts can ensure that the "before" papers are like ancestors and "after" papers are descendants in the same type of research. This exercise gives us an idea of how many papers are populated before and after 1991 (1991 is chosen based on balanced numbers of papers before and after that year). In addition, the impact of papers is measured in terms of citation presented as "percentile," a relative measure based on rankings in one year, in order to minimize the effect of time.
This article is part of a group of four articles that resulted from a Critical Delphi study conducted in 2003-2005. The study, "Knowledge Map of Information Science," was aimed at exploring the foundations of information science. The international panel was composed of 57 leading scholars from 16 countries who represent nearly all the major subfields and important aspects of the field. This article presents a systematic and comprehensive knowledge map of the field, and is grounded on the panel discussions. The map has 10 basic categories: (1) Foundations, (2) Resources, (3) Knowledge Workers, (4) Contents, (5) Applications, (6) Operations and Processes, (7) Technologies, (8) Environments, (9) Organizations, and (10) Users. The model establishes the groundwork for formulating theories of information science, as well as developing and evaluating information science academic programs and bibliographic resources.
In Sperber and Wilson's relevance theory (RT), the ratio Cognitive Effects/Processing Effort defines the relevance of a communication. The tf*idf formula from information retrieval is used to operationalize this ratio for any item co-occurring with a user-supplied seed term in bibliometric distributions. The tf weight of the item predicts its effect on the user in the context of the seed term, and its idf weight predicts the user's processing effort in relating the item to the seed term. The idf measure, also known as statistical specificity, is shown to have unsuspected applications in quantifying interrelated concepts such as topical and nontopical relevance, levels of user expertise, and levels of authority. A new kind of visualization, the pennant diagram, illustrates these claims. The bibliometric distributions visualized are the works cocited with a seed work (Moby Dick), the authors cocited with a seed author (White HD, for maximum interpretability), and the books and articles cocited with a seed article (S.A. Harter's "Psychological Relevance and Information Science," which introduced RT to information scientists in 1992). Pennant diagrams use bibliometric data and information retrieval techniques on the system side to mimic a relevance-theoretic model of cognition on the user side. Relevance theory may thus influence the design of new visual information retrieval interfaces. Generally, when information retrieval and bibliometrics are interpreted in light of RT, the implications are rich: A single sociocognitive theory may serve to integrate research on literature-based systems with research on their users, areas now largely separate.
In this article, the statistical principal components analysis (PCA) is proposed as a method for performance comparisons of different retrieval strategies. It is shown that the PCA method can reveal implicit performance relations among retrieval systems across information needs (i.e., queries, topics). For illustration, the TREC 12 robust track data have been reevaluated by the PCA method and have been shown to reveal easily the performance relations that are hard to see with traditional techniques. Therefore, PCA promises a uniform evaluation framework that can be used for large-scale evaluation of retrieval experiments. In addition to the mean average precision (MAP) measure, relative analytic distance (RAD) is proposed as a new performance summary measure based on the same notion introduced by PCA.
The article discusses Rob Kling's notion of the critical and how this term is embodied in Kling's social informatics and in works of other authors, which we identify as belonging to critical informatics. Issues of method and the notion of the empirical are discussed. The importance of such analyses in regard to social life and professional education is discussed.
It has been claimed that topic metadata can be used to improve the accuracy of text searches. Here, we test this claim by examining the contribution of metadata to effective searching within Web sites published by a university with a strong commitment to and substantial investment in metadata. The authors use four sets of queries, a total of 463, extracted from the university's official query logs and from the university's site map. The results are clear: The available metadata is of little value in ranking answers to those queries. A follow-up experiment with the Web sites published in a particular government jurisdiction confirms that this conclusion is not specific to the particular university. Examination of the metadata present at the university reveals that, in addition to implementation deficiencies, there are inherent problems in trying to use subject and description metadata to enhance the searchability of Web sites. Our experiments show that link anchor text, which can be regarded as metadata created by others, is much more effective in identifying best answers to queries than other textual evidence. Furthermore, query-independent evidence such as link counts and uniform resource locator (URL) length, unlike subject and description metadata, can substantially improve baseline performance.
This exploratory study investigates one type of video surrogate, storyboards, in terms of their ability to summarize and communicate the themes of arts-related videos. An HTML interface containing the storyboards, videos, and instructions was developed and run in a standard browser. Three phases-consisting of storyboard evaluation, full-length video evaluation, and their comparison-were completed by each user for three different videos. The data were analyzed for issues relating to keywords, summaries, and recognition of visual style for both the storyboards and the full-length videos. The linear sequence and narrative structure of storyboards are questioned, and a three-tiered model is proposed. The first layer consists of keyframes representing the "entity" and "action" of the video's central theme, the second layer consists of "entity" and "action" keyframes with regard to background or supporting information, and the third layer is composed of keyframes representing attributes, locations, and time periods. This structure facilitates the identification of appropriate keyframes for storyboards, eliminating redundant or peripheral images, and improves the storyboard's ability to communicate the essential message of videos. The tiered model is motivated and supported by the user study as well as current research on video surrogates and classical indexing theory.
The field of Information Science is constantly changing. Therefore, information scientists are required to regularly review-and if necessary-redefine its fundamental building blocks. This article is one of a group of four articles, which resulted from a Critical Delphi study conducted in 2003-2005 (Zins, 2007a, 2007b, 2007c). The study, "Knowledge Map of Information Science," was aimed at exploring the foundations of information science. The international panel was composed of 57 leading scholars from 16 countries who represent nearly all the major subfields and important aspects of the field. This particular article documents 28 classification schemes of Information Science that were compiled by leading scholars in the academic community. This unique collection of 28 classification schemes portrays and documents the profile of contemporary Information Science at the beginning of the 21st century.
The authors describe a statistical approach based on hidden Markov models (HMMs), for generating stemmers automatically. The proposed approach requires little effort to insert new languages in the system even if minimal linguistic knowledge is available. This is a key advantage especially for digital libraries, which are often developed for a specific institution or government because the program can manage a great amount of documents written in local languages. The evaluation described in the article shows that the stemmers implemented by means of HMMs are as effective as those based on linguistic rules.
Query-by-humming systems offer content-based searching for melodies and require no special musical training or knowledge. Many such systems have been built, but there has not been much useful evaluation and comparison in the literature due to the lack of shared databases and queries. The MUSART project testbed allows various search algorithms to be compared using a shared framework that automatically runs experiments and summarizes results. Using this testbed, the authors compared algorithms based on string alignment, melodic contour matching, a hidden Markov model, n-grams, and CubyHum. Retrieval performance is very sensitive to distance functions and the representation of pitch and rhythm, which raises questions about some previously published conclusions. Some algorithms are particularly sensitive to the quality of queries. Our queries, which are taken from human subjects in a realistic setting, are quite difficult, especially for n-gram models. Finally, simulations on query-by-humming performance as a function of database size indicate that retrieval performance falls only slowly as the database size increases.
Herdan's law in linguistics and Heaps' law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistic terms they state that vocabularies' sizes are concave increasing power laws of texts' sizes. This study investigates these laws from a purely mathematical and informetric point of view. A general informetric argument shows that the problem of proving these laws is, in fact, ill-posed. Using the more general terminology of sources and items, the author shows by presenting exact formulas from Lotkaian informetrics that the total number T of sources is not only a function of the total number A of items, but is also a function of several parameters (e.g., the parameters occurring in Lotka's law). Consequently, it is shown that a fixed T (or A) value can lead to different possible A (respectively, T) values. Limiting the T(A)-variability to increasing samples (e.g., in a text as done in linguistics) the author then shows, in a purely mathematical way, that for large sample sizes T approximate to A(theta), where theta is a constant, theta < 1 but close to 1, hence roughly, Heaps' or Herdan's law can be proved without using any linguistic or informetric argument. The author also shows that for smaller samples, 9 is not a constant but essentially decreases as confirmed by practical examples. Finally, an exact informetric argument on random sampling in the items shows that, in most cases, T=T(A) is a concavely increasing function, in accordance with practical examples.
With the rapid diffusion of the Internet, researchers, policy makers, and users have raised concerns about online privacy, although few studies have integrated aspects of usage with psychological and attitudinal aspects of privacy. This study develops a model involving gender, generalized self-efficacy, psychological need for privacy, Internet use experience, Internet use fluency, and beliefs in privacy rights as potential influences on online privacy concerns. Survey responses from 413 college students were analyzed by bivariate correlations, hierarchical regression, and structural equation modeling. Regression results showed that beliefs in privacy rights and a psychological need for privacy were the main influences on online privacy concerns. The proposed structural model was not well supported by the data, but a revised model, linking self-efficacy with psychological need for privacy and indicating indirect influences of Internet experience and fluency on online privacy concerns about privacy through beliefs in privacy rights, was supported by the data.
As networked digital systems are rapidly created and deployed, social, cultural, and community-focused issues are often neglected. Indeed much research has focused on the "effects" these systems hold, rather than viewing systems as tools to be designed given an understanding of sociocultural context. Acknowledging the cultural practices and belief systems of a set of users may allow systems to be more effectively created and deployed into particular contexts. Emerging research in community information systems and archives has highlighted possible interactions between system design and ethnographic research. These bridges include understanding how communities can begin (1) to create content for their own information systems, (2) to design the database architectures, and (3) to integrate systems within community infrastructures. In this article, I allude to several cultural criticisms that accompany the global proliferation of information technologies. These criticisms can be answered by research that focuses on developing systems based on ethnographic insights. Specifically, I present the research example of Tribal Peace, a cultural information system designed for and by community members of the 19 Native American reservations of San Diego County (California, United States). This case has demonstrated the potential for a community to create an information system that satisfies its own priorities. This precedent points to the need for further research that investigates this convergence.
We studied the views of scientists who experience resistance to their new ideas by surveying a sample of 815 scientists who are authors of highly cited articles. The 132 responses (16.2%) received indicated that only 47 scientists (35.6%) had no problems with referees, editors, or other scientists. The most common causes of difficulty were rejection of the manuscript, and scepticism, ignorance, and incomprehension. The most common arguments given by referees against papers were that the findings were an insufficient advance to warrant publication, lacked practical impact, were based on a wrong hypothesis, or were based on a wrong concept. The strategies authors used to overcome resistance included obtaining help from someone to publish problematic papers, making changes in the text, and simple persistence. Despite difficulties, however, some respondents acknowledged the positive effect of peer review.
Metasearch engines are an intuitive method for improving the performance of Web search by increasing coverage, returning large numbers of results with a focus on relevance, and presenting alternative views of information needs. However, the use of metasearch engines in an operational environment is not well understood. In this study, we investigate the usage of Dogpile.com, a major Web metasearch engine, with the aim of discovering how Web searchers interact with metasearch engines. We report results examining 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005 and compare these results with findings from other Web searching studies. We collect data on geographical location of searchers, use of system feedback, content selection, sessions, queries, and term usage. Findings show that Dogpile.com searchers are mainly from the USA (84% of searchers), use about 3 terms per query (mean = 2.85), implement system feedback moderately (8.4% of users), and generally (56% of users) spend less than one minute interacting with the Web search engine. Overall, metasearchers seem to have higher degrees of interaction than searchers on non-metasearch engines, but their sessions are for a shorter period of time. These aspects of metasearching may be what define the differences from other forms of Web searching. We discuss the implications of our findings in relation to metasearch for Web searchers, search engines, and content providers.
The Internet is increasingly being recognized for its potential for health communication and education. The perceived relative advantage of the Internet over other media is its cost-effectiveness and interactivity, which in turn contribute to its persuasive capabilities. Ironically, despite its potential, we are nowhere nearer understanding how interactivity affects processing of health information and its contribution in terms of health outcomes. An experiment was conducted to examine the effects of Web interactivity on comprehension of and attitudes towards two health Web sites, and whether individual differences might moderate such effects. Two sites on skin cancer were designed with different levels of interactivity and randomly assigned to 441 undergraduate students (aged 18-26) at a large southeastern university. The findings suggest that interactivity can significantly affect comprehension as well as attitudes towards health Web sites. The article also discusses insights into the role of interactivity on online health communications, and presents implications for the effective design of online health content.
Several characteristics of classical Lorenz curves make them unsuitable for the study of a group of top-performers. TOP-curves, defined as a kind of mirror image of TIP-curves used in poverty studies, are shown to possess the properties necessary for adequate empirical ranking of various data arrays, based on the properties of the highest performers (i.e., the core). TOP-curves and essential TOP-curves, also introduced in this article, simultaneously represent the incidence, intensity, and inequality among the top. It is shown that TOP-dominance partial order, introduced in this article, is stronger than Lorenz dominance order. In this way, this article contributes to the study of cores, a central issue in applied informetrics.
In philosophy, Ontology is the basic description of things in the world. In information science, an ontology refers to an engineering artifact, constituted by a specific vocabulary used to describe a certain reality. Ontologies have been proposed for validating both conceptual models and conceptual schemas; however, these roles are quite dissimilar. In this article, we show that ontologies can be better understood if we classify the different uses of the term as it appears in the literature. First, we explain Ontology (upper case O) as used in Philosophy. Then, we propose a differentiation between ontologies of information systems and ontologies for information systems. All three concepts have an important role in information science. We clarify the different meanings and uses of Ontology and ontologies through a comparison of research by Wand and Weber and by Guarino in ontology-driven information systems. The contributions of this article are twofold: (a) It provides a better understanding of what ontologies are, and (b) it explains the double role of ontologies in information science research.
Research is beginning to accumulate for proximal and virtual team-based work. However, little if any research has examined the effects of the degrees of virtualization on performance, yet purely proximal or virtual teamwork in most professional organizations is becoming rare. This field study examined the effects of virtualization on social influences and social identity factors, and these effects on performance. We found nonlinear relationships between virtualization and cohesion, and virtualization and conflict. Task-relationship orientation and social-technical skills were also found to interact with virtualization on performance. Consequently, recommendations are made regarding hybridization of teams.
Most text analysis and retrieval work to date has focused on the topic of a text; that is, what it is about. However, a text also contains much useful information in its style, or how it is written. This includes information about its author, its purpose, feelings it is meant to evoke, and more. This article develops a new type of lexical feature for use in stylistic text classification, based on taxonomies of various semantic functions of certain choice words or phrases. We demonstrate the usefulness of such features for the stylistic text classification tasks of determining author identity and nationality, the gender of literary characters, a text's sentiment (positive/negative evaluation), and the rhetorical character of scientific journal articles. We further show how the use of functional features aids in gaining insight about stylistic differences among different kinds of texts.
Recording evidence for data element values, in addition to the values themselves, in bibliographic records and descriptive metadata is likely to be useful for improving the expressivity and reliability of such records and metadata. Recorded evidence indicates why and how data values are recorded for elements. This article is Part II of a study to explore a way of assisting catalogers in recording evidence in bibliographic records, with the aim of minimizing the costs and effort of doing so. This article begins with a scenario for utilizing recorded evidence to which a cataloger refers for information and understanding of the ways that have been adopted to record data value(s) in a given element. In line with that scenario, the proper content of evidence to be recorded is first discussed. Second, the functionality of the system developed in Part I is extended and refined to make the system more useful and effective in recording such evidence. Third, the system's performance is experimentally examined, the results of which show its usefulness. And fourth, another system is developed for catalogers to retrieve and display recorded evidence together with bibliographic records in a flexible way.
Link (association) analysis has been used in the criminal justice domain to search large datasets for associations between crime entities in order to facilitate crime investigations. However, link analysis still faces many challenging problems, such as information overload, high search complexity, and heavy reliance on domain knowledge. To address these challenges, this article proposes several techniques for automated, effective, and efficient link analysis. These techniques include the co-occurrence analysis, the shortest path algorithm, and a heuristic approach to identifying associations and determining their importance. We developed a prototype system called CrimeLink Explorer based on the proposed techniques. Results of a user study with 10 crime investigators from the Tucson Police Department showed that our system could help subjects conduct link analysis more efficiently than traditional single-level link analysis tools. Moreover, subjects believed that association paths found based on the heuristic approach were more accurate than those found based solely on the co-occurrence analysis and that the automated link analysis system would be of great help in crime investigations.
This article studies the effect that two major disasters, the Three Mile Island nuclear disaster and the Chernobyl nuclear disaster, had on the publishing world. We expect consumer publishing to concentrate on major events as they unfold. The technical and scholarly publishing world, however, is believed to progress and develop in conjunction with the growth of science, as established in bibliometric laws. Articles about these disasters were tracked in four bibliographic databases representing scholarly, technical-scholarly, technical, and consumer literature. Several analyses of the data revealed that each body of literature responds in its own way to disasters and anniversaries of events affect publishing, other than government-sponsored research. More focused databases have a more highly correlated response to disasters than broad-based databases. Comparison to two previously published studies of fast-growing literatures reveals that while some measures are consistent, disasters experience participation from a larger number of researchers with publications spread across a broader base of journal titles.
Detecting query reformulations within a session by a Web searcher is an important area of research for designing more helpful searching systems and targeting content to particular users. Methods explored by other researchers include both qualitative (i.e., the use of human judges to manually analyze query patterns on usually small samples) and nondeterministic algorithms, typically using large amounts of training data to predict query modification during sessions. In this article, we explore three alternative methods for detection of session boundaries. All three methods are computationally straightforward and therefore easily implemented for detection of session changes. We examine 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005. We compare session analysis using (a) Internet Protocol address and cookie; (b) Internet Protocol address, cookie, and a temporal limit on intrasession interactions; and (c) Internet Protocol address, cookie, and query reformulation patterns. Overall, our analysis shows that defining sessions by query reformulation along with Internet Protocol address and cookie provides the best measure, resulting in an 82% increase in the count of sessions. Regardless of the method used, the mean session length was fewer than three queries, and the mean session duration was less than 30 min. Searchers most often modified their query by changing query terms (nearly 23% of all query modifications) rather than adding or deleting terms. Implications are that for measuring searching traffic, unique sessions may be a better indicator than the common metric of unique visitors. This research also sheds light on the more complex aspects of Web searching involving query modifications and may lead to advances in searching tools.
In this article, we investigated the factors determining the capability of academic articles to be cited in the future using a topological analysis of citation networks. The basic idea is that articles that will have many citations were in a "similar" position topologically in the past. To validate this hypothesis, we investigated the correlation between future times cited and three measures of centrality: clustering centrality, closeness centrality, and betweenness centrality. We also analyzed the effect of aging as well as of self-correlation of times cited. Case studies were performed in the two following recent representative innovations: Gallium Nitride and Complex Networks. The results suggest that times cited is the main factor in explaining the near future times cited, and betweenness centrality is correlated with the distant future times cited. The effect of topological position on the capability to be cited is influenced by the migrating phenomenon in which the activated center of research shifts from an existing domain to a new emerging domain.
The process of finding information to address problems that arise in everyday life situations is complex. Individuals are influenced by many factors when the information need occurs, including their social, psychological, political, economic, physical, and work environments. Research focusing on the social factors affecting information has stressed the importance of interpersonal communication and the quality of social networks in facilitating access to information. The study reported in this article investigates the role of social networks in affecting access to information and, more particularly, how social capital or the resources made available to individuals through their social networks influence their success in finding the information they need. Questionnaires were administered in a face-to-face format to a random sample of 320 residents of the city of Ulaanbaatar, Mongolia. The theoretical framework for the study is Lin's theory of social capital whose main proposition is that the ability of people to achieve desired outcomes is positively associated with social capital. The findings indicate that social capital did have a significant effect on information behavior, particularly on the choice of source, which in turn had a direct influence on successful search outcomes.
Children are increasingly using the Web. Cognitive theory tells us that directory structures are especially suited for information retrieval by children; however, empirical results show that they prefer keyword searching. One of the reasons for these findings could be that the directory structures and terminology are created by grown-ups. Using a card-sorting method and an enveloping system, we simulated the structure of a directory. Our goal was to try to understand what browsable, hierarchical subject categories children create when suggested terms are supplied and they are free to add or delete terms. Twelve groups of four children each (fourth and fifth graders) participated in our exploratory study. The initial terminology presented to the children was based on names of categories used in popular directories, in the sections on Arts, Television, Music, Cinema, and Celebrities. The children were allowed to introduce additional cards and change the terms appearing on the 61 cards. Findings show that the different groups reached reasonable consensus; the majority of the category names used by existing directories were acceptable by them and only a small minority of the terms caused confusion. Our recommendation is to include children in the design process of directories, not only in designing the interface but also in designing the content structure as well.
This study examines the relation between selection power and selection labor for information retrieval (IR). It is the first part of the development of a labor theoretic approach to IR. Existing models for evaluation of IR systems are reviewed and the distinction of operational from experimental systems partly dissolved. The often covert, but powerful, influence from technology on practice and theory is rendered explicit. Selection power is understood as the human ability to make informed choices between objects or representations of objects and is adopted as the primary value for IR. Selection power is conceived as a property of human consciousness, which can be assisted or frustrated by system design. The concept of selection power is further elucidated, and its value supported, by an example of the discrimination enabled by index descriptions, the discovery of analogous concepts in partly independent scholarly and wider public discourses, and its embodiment in the design and use of systems. Selection power is regarded as produced by selection labor, with the nature of that labor changing with different historical conditions and concurrent information technologies. Selection labor can itself be decomposed into description and search labor. Selection labor and its decomposition into description and search labor will be treated in a subsequent article, in a further development of a labor theoretic approach to information retrieval.
In this article, the authors discuss reretrieving personal information objects and relate the task to recovering from lapse(s) in memory. They propose that memory lapses impede users from successfully refinding the information they need. Their hypothesis is that by learning more about memory lapses in noncomputing contexts and about how people cope and recover from these lapses, we can better inform the design of personal information management (PIM) tools and improve the user's ability to reaccess and reuse objects. They describe a diary study that investigates the everyday memory problems of 25 people from a wide range of backgrounds. Based on the findings, they present a series of principles that they hypothesize will improve the design of PIM tools. This hypothesis is validated by an evaluation of a tool for managing personal photographs, which was designed with respect to the authors' findings. The evaluation suggests that users' performance when refinding objects can be improved by building personal information management tools to support characteristics of human memory.
Based on articles published in 1990-2004 in 21 library and information science (LIS) journals, a set of cocitation analyses was performed to study changes in research fronts over the last 15 years, where LIS is at now, and to discuss where it is heading. To study research fronts, here defined as current and influential cocited articles, a citations among documents methodology was applied; and to study changes, the analyses were time-sliced into three 5-year periods. The results show a stable structure of two distinct research fields: informetrics and information seeking and retrieval (ISR). However, experimental retrieval research and user oriented research have merged into one ISR field; and IR and informetrics also show signs of coming closer together, sharing research interests and methodologies, making informetrics research more visible in mainstream LIS research. Furthermore, the focus on the Internet, both in ISR research and in informetrics-where webometrics quickly has become a dominating research area-is an important change. The future is discussed in terms of LIS dependency on technology, how integration of research areas as well as technical systems can be expected to continue to characterize LIS research, and how webometrics will continue to develop and find applications.
Human information-seeking behavior is a topic of increasing interest in many disciplines. However, the dynamics of this behavior remain elusive. The extant research has taken cognitive and behavioral perspectives to study information-seeking behavior, and observed its dynamics in multiple sessions. However, the underlying mechanisms that govern the dynamics of information-seeking behavior are not well understood. With a focus on interactive information retrieval behavior, this study proposes an integrated framework based on activity theory. This framework is not only comprehensive and integrated, but also offers an explanation of the mechanisms governing the interaction between users' cognitive states and their manifested behavior when using an information retrieval system. A set of four propositions are advanced to describe the mechanisms. The implications are discussed.
There is a well-known gap between systems-oriented information retrieval (IR) and user-oriented IR, which cognitive IR seeks to bridge. It is therefore interesting to analyze approaches at the level of frameworks, models, and study designs. This article is an exercise in such an analysis, focusing on two significant approaches to IR: the lab IR approach and P. Ingwersen's (1996) cognitive IR approach. The article focuses on their research frameworks, models, hypotheses, laws and theories, study designs, and possible contributions. The two approaches are quite different, which becomes apparent in the use of independent, controlled, and dependent variables in the study designs of each approach. Thus, each approach is capable of contributing very differently to understanding and developing information access. The article also discusses integrating the approaches at the study-design level.
Human information-seeking behavior is complicated. Activity theory is a powerful theoretical instrument to untangle the "complications." Based on activity theory, a comprehensive framework is proposed in Part I (Y. Xu, 2007) of this report to describe interactive information retrieval (IIR) behavior. A set of propositions is also proposed to describe the mechanisms governing users' cognitive activity and the interaction between users' cognitive states and manifested retrieval behavior. An empirical study is carried out to verify the propositions. The authors' experimental simulation of 81 participants in one search session indicates the propositions are largely supported. Their findings indicate IIR behavior is planned. Users adopt a divide-and-conquer strategy in information retrieval. The planning of information retrieval activity is also partially manifested in query revision tactics. Users learn from previously read documents. A user's interaction with a system ultimately changes the user's information need and the resulting relevance judgment, but the dynamics of topicality perception and novelty perception occur at different paces.
Previous studies have examined various aspects of user behavior on the Web, including general information-seeking patterns, search engine use, and revisitation habits. Little research has been conducted to study how users navigate and interact with their Web browser across different information-seeking tasks. We have conducted a field study of 21 participants, in which we logged detailed Web usage and asked participants to provide task categorizations of their Web usage based on the following categories: Fact Finding, Information Gathering, Browsing, and Transactions. We used implicit measures logged during each task session to provide usage measures such as dwell time, number of pages viewed, and the use of specific browser navigation mechanisms. We also report on differences in how participants interacted with their Web browser across the range of information-seeking tasks. Within each type of task, we found several distinguishing characteristics. In particular, Information Gathering tasks were the most complex; participants spent more time completing this task, viewed more pages, and used the Web browser functions most heavily during this task. The results of this analysis have been used to provide implications for future support of information seeking on the Web as well as direction for future research in this area.
Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future? We formalize this question as the link-prediction problem, and we develop approaches to link prediction based on measures for analyzing the "proximity" of nodes in a network. Experiments on large coauthorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures.
The purpose of this work is to identify potential evaluation criteria for interactive, analytical question-answering (QA) systems by analyzing evaluative comments made by users of such a system. Qualitative data collected from intelligence analysts during interviews and focus groups were analyzed to identify common themes related to performance, use, and usability. These data were collected as part of an intensive, three-day evaluation workshop of the High-Quality Interactive Question Answering (HITIQA) system. Inductive coding and memoing were used to identify and categorize these data. Results suggest potential evaluation criteria for interactive, analytical QA systems, which can be used to guide the development and design of future systems and evaluations. This work contributes to studies of QA systems, information seeking and use behaviors, and interactive searching.
The number of non-English resources has been increasing rapidly on the Web. Although many studies have been conducted on the query logs in search engines that are primarily English-based (e.g., Excite and AltaVista), only a few of them have studied the information-seeking behavior on the Web in non-English languages. In this article, we report the analysis of the search-query logs of a search engine that focused on Chinese. Three months of search-query logs of Timway, a search engine based in Hong Kong, were collected and analyzed. Metrics on sessions, queries, search topics, and character usage are reported. N-gram analysis also has been applied to perform character-based analysis. Our analysis suggests that some characteristics identified in the search log, such as search topics and the mean number of queries per sessions, are similar to those in English search engines; however, other characteristics, such as the use of operators in query formulation, are significantly different. The analysis also shows that only a very small number of unique Chinese characters are used in search queries. We believe the findings from this study have provided some insights into further research in non-English Web searching.
We use a new data gathering method, "Web/URL citation," Web/URL and Google Scholar to compare traditional and Web-based citation patterns across multiple disciplines (biology, chemistry, physics, computing, sociology, economics, psychology, and education) based upon a sample of 1,650 articles from 108 open access (OA) journals published in 2001. A Web/URL citation of an online journal article is a Web mention of its title, URL, or both. For each discipline, except psychology, we found significant correlations between Thomson Scientific (formerly Thomson ISI, here: ISI) citations and both Google Scholar and Google Web/URL citations. Google Scholar citations correlated more highly with ISI citations than did Google Web/URL citations, indicating that the Web/URL method measures a broader type of citation phenomenon. Google Scholar citations were more numerous than ISI citations in computer science and the four social science disciplines, suggesting that Google Scholar is more comprehensive for social sciences and perhaps also when conference articles are valued and published online. We also found large disciplinary differences in the percentage overlap between ISI and Google Scholar citation sources. Finally, although we found many significant trends, there were also numerous exceptions, suggesting that replacing traditional citation sources with the Web or Google Scholar for research impact calculations would be problematic.
This is the first part of a two-part article that reviews 25 years of published research findings on end-user searching in online information retrieval (IR) systems. In Part 1 (Markey, 2007), the author seeks to answer the following questions: What characterizes the queries that end users submit to online IR systems? What search features do people use? What features would enable them to improve on the retrievals they have in hand? What features are hardly ever used? What do end users do in response to the system's retrievals? Are end users satisfied with their online searches? Summarizing searches of online IR systems by the search features people use everyday makes information retrieval appear to be a very simplistic one-stop event. In Part 2, the author examines current models of the information retrieval process, demonstrating that information retrieval is much more complex and involves changes in cognition, feelings, and/or events during the information seeking process. She poses a host of new research questions that will further our understanding about end-user searching of online IR systems.
We describe a procedure for quantitative evaluation of interactive question-answering systems and illustrate it with application to the High-Quality Interactive Question-Answering (HITIQA) system. Our objectives were (a) to design a method to realistically and reliably assess interactive question-answering systems by comparing the quality of reports produced using different systems, (b) to conduct a pilot test of this method, and (c) to perform a formative evaluation of the HITIQA system. Far more important than the specific information gathered from this pilot evaluation is the development of (a) a protocol for evaluating an emerging technology, (b) reusable assessment instruments, and (c) the knowledge gained in conducting the evaluation. We conclude that this method, which uses a surprisingly small number of subjects and does not rely on predetermined relevance judgments, measures the impact of system change on work produced by users. Therefore this method can be used to compare the product of interactive systems that use different underlying technologies.
Scientists may seek to report a single definable body of research in more than one publication, that is, in repeated reports of the same work or in fractional reports, in order to disseminate their research as widely as possible in the scientific community. Up to now, however, it has not been examined whether this strategy of "multiple publication" in fact leads to greater reception of the research. In the present study, we investigate the influence of number of articles reporting the results of a single study on reception in the scientific community (total citation counts of an article on a single study). Our data set consists of 96 applicants for a research fellowship from the Boehringer Ingelheirn Fonds (BIF), an international foundation for the promotion of basic research in biomedicine. The applicants reported to us all articles that they had published within the framework of their doctoral research projects. On this single project, the applicants had published from 1 to 16 articles (M = 4; Mdn = 3). The results of a regression model with an interaction term show that the practice of multiple publication of research study results does in fact lead to greater reception of the research (higher total citation counts) in the scientific community. However, reception is dependent upon length of article: the longer the article, the more total citation counts increase with the number of articles. Thus, it pays for scientists to practice multiple publication of study results in the form of sizable reports.
The vector space model of information retrieval is one of the classical and widely applied retrieval models. Paradoxically, it has been characterized by a discrepancy between its formal framework and implementable form. The underlying concepts of the vector space model are mathematical terms: linear space, vector, and inner product. However, in the vector space model, the mathematical meaning of these concepts is not preserved. They are used as mere computational constructs or metaphors. Thus, the vector space model actually does not follow formally from the mathematical concepts on which it has been claimed to rest. This problem has been recognized for more than two decades, but no proper solution has emerged so far. The present article proposes a solution to this problem. First, the concept of retrieval is defined based on the mathematical measure theory. Then, retrieval is particularized using fuzzy set theory. As a result, the retrieval function is conceived as the cardinality of the intersection of two fuzzy sets. This view makes it possible to build a connection to linear spaces. It is shown that the classical and the generalized vector space models, as well as the latent semantic indexing model, gain a correct formal background with which they are consistent. At the same time it becomes clear that the inner product is not a necessary ingredient of the vector space model, and hence of Information Retrieval (IR). The Principle of Object Invariance is introduced to handle this situation. Moreover, this view makes it possible to consistently formulate new retrieval methods: in linear space with general basis, entropy-based, and probability-based. It is also shown that Information Retrieval may be viewed as integral calculus, and thus it gains a very compact and elegant mathematical way of writing. Also, Information Retrieval may thus be conceived as an application of mathematical measure theory.
This is the second part of a two-part article that examines 25 years of published research findings on end-user searching of online information retrieval (IR) systems. In Part 1 (Markey, 2007), it was learned that people enter a few short search statements into online IR systems. Their searches do not resemble the systematic approach of expert searchers who use the full range of IR-system functionality. Part 2 picks up the discussion of research findings about end-user searching in the context of current information retrieval models. These models demonstrate that information retrieval is a complex event, involving changes in cognition, feelings, and/or events during the information seeking process. The author challenges IR researchers to design new studies of end-user searching, collecting data not only on system-feature use, but on multiple search sessions and controlling for variables such as domain knowledge expertise and expert system knowledge. Because future IR systems designers are likely to improve the functionality of online IR systems in response to answers to the new research questions posed here, the author concludes with advice to these designers about retaining the simplicity of online IR system interfaces.
The well-documented limitations of journal impact factor rankings and perceptual ratings, the evolving scholarly communication system, the open-access movement, and increasing globalization are some reasons that prompted an examination of journal value rather than just impact. Using a single, specialized journal established in 1960, about education for the Information professions, the author discusses the fall from citation grace of the Journal of Education for Library and Information Science (JELIS) in terms of impact factor and declining subscriptions. Journal evaluation studies in Library and Information Science based on subjective ratings are used to show the high rank of JELIS during the same period (1984-2004) and explain why impact factors and perceptual ratings either singly or jointly are inadequate measures for understanding the value of specialized, scholarly journals such as JELIS. This case study was also a search for bibliometric measures of journal value. Three measures, namely journal attraction power, author associativity, and journal consumption power, were selected; two of them were redefined as journal measures of affinity (the proportion of foreign authors), associativity (the amount of collaboration), and calculated as objective indicators of journal value. The affinity and associativity for JELIS calculated for 1984, 1994, 2004, and consumption calculated for 1985 and 1994 show a holding pattern; however, they also reveal interesting dimensions for future study. Journal value is multidimensional and citations do not capture all the facets, costs, benefits, and measures for informative and scientific value must be distinguished and developed in a fuller model of journal value.
This two-part article establishes a model of the mediating factors that influence student information behavior concerning electronic or digital information sources that support their learning. The first part reviews the literature that underpinned the development of the research methodology for the Joint Information Systems Committee (JISC) User Behavior Monitoring and Evaluation Framework, as well as the literature that has subsequently helped to develop the model over the 5 years the Framework operated in the United Kingdom, in five cycles of research that were adjusted to meet the emerging needs of the JISC at the time. The literature review attempts to synthesize the two main perspectives in the research studies: (a) small-scale studies of student information behavior; and (b) the studies that focus on the quantitative usage of particular electronic information services in universities, often including implications for training and support. As the review indicates, there are gaps in the evidence concerning the browsing and selection strategies of undergraduate students and the interaction of some of the mediating influences on information behavior. The Framework developed a multimethod, qualitative and quantitative methodology for the continued monitoring of user behavior. This article discusses the methods used and the project-management challenges involved, and concludes that at the outset, intended impacts need to be specified carefully, and that funding needs to be committed at that point for a longitudinal study. A research project on information behavior, intended to inform current policymaking on infrastructure provision, is inherently difficult as behavior changes lag behind provision.
In this article, the authors analyze the keywords given by authors of scientific articles and the descriptors assigned to the articles to ascertain the presence of the keywords in the descriptors. Six-hundred forty INSPEC (information Service for Physics, Engineering, and Computing), CAB (Current Agriculture Bibliography) abstracts, ISTA (information Science and Technology Abstracts), and LISA (Library and Information Science Abstracts) database records were consulted. After detailed comparisons, it was found that keywords provided by authors have an important presence in the database descriptors studied; nearly 25% of all the keywords appeared in exactly the same form as descriptors, with another 21% though normalized, still detected in the descriptors. This means that almost 46% of keywords appear in the descriptors, either as such or after normalization. Elsewhere, three distinct indexing policies appear, one represented by INSPEC and LISA (indexers seem to have freedom to assign the descriptors they deem necessary); another is represented by CAB (no record has fewer than four descriptors and, in general, a large number of descriptors is employed). In contrast, in ISTA, a certain institutional code exists towards economy in indexing because 84% of records contain only four descriptors.
This second part of a two-part article establishes a model of the mediating factors that influence student information behavior concerning the electronic or digital information sources used to support learning. This part discusses the findings of the Joint Information Systems Committee User Behavior Monitoring and Evaluation Framework (1999-2004) and development of a model that includes both the individual (micro) and organizational (macro) factors affecting student information behavior. The macro factors are information resource design, information and learning technology infrastructure, availability and constraints to access, policies and funding, and organizational leadership and culture. The micro factors are information literacy, academics' information behavior, search strategies, discipline and curriculum, support and training, and pedagogy. We conclude that the mediating factors interact in unexpected ways and that further research is needed to clarify how those interactions, particularly between the macro and micro factors, operate.
Although libraries have employed policies to protect the data about use of their services, these policies are rarely specific or standardized. Since 1996, the U.S. health care system has been grappling with the Health Insurance Portability and Accountability Act (HIPAA; Health Insurance Portability and Accountability Act, 1996), which is designed to provide those handling personal health information with standardized, definitive instructions as to the protection of data. In this work, the authors briefly discuss the present situation of privacy policies about library use data, outline the HIPAA guidelines to understand parallels between the two, and finally propose methods to create a de-identified library data warehouse based on HIPAA for the protection of user privacy.
A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors efficiently used and the collection size. Results show that our algorithm performs close to the expected O(n(2)/p) time on p processors rather than the worst-case O(n(3)/p) time. Furthermore, the O(n(2)/p) memory complexity per node allows larger collections to be clustered as the number of nodes increases. While partitioning algorithms such as k-means are trivially parallelizable, our results confirm those of other studies which showed that hierarchical algorithms produce significantly tighter clusters in the document clustering task. Finally, we show how our parallel hierarchical agglomerative clustering algorithm can be used as the clustering subroutine for a parallel version of the buckshot algorithm to cluster the complete TREC collection at near theoretical runtime expectations.
In previous research we examined the science base of US biotechnology utilizing several unique patent and scientific paper databases (MCMILLAN et al., 2000). Our findings highlighted the importance of public science in this industry. In this current research effort, we extend that analysis to include the subsequent citations those biotechnology patents received. Our conclusions are that the reliance on public science is stable when adjusted for forward citations, but the impact of different funding sources does change when citation weights are added. The science policy implications of these findings and future research opportunities are discussed.
This paper investigates factors behind co-authorships between scientists in Iran and elsewhere. It also compares the Iranian pattern of collaboration with other countries. A questionnaire was sent out to Iranian scientists in fields of physics, chemistry, and biology who had published an internationally co-authored journal article during 2003. The results show that not all co-authored articles were the result of a collaborative project. Also, the main collaborative motives behind the co-authorships were identified and described. Among these, we could mention sharing laboratory devices, accessing knowledge, and increased efficiency of the study at hand. It is clear that emigrated Iranian scientists play an important role as collaborators and probably also as links to the international scientific community as a whole. Cultural factors mix with scientific and work related ones. Although the proportion of international co-authorships is lower than in most other countries, the collaborative pattern seems rather similar.
I discuss the difficulties that I encountered in reproducing the results of the Shanghai ranking of world universities. In the Shanghai ranking, the dependence between the score for the SCI indicator and the weighted number of considered articles obeys a power law, instead of the proportional dependence that is suggested by the official methodology of the ranking. Discrepancies from proportionality are also found in some of the scores for the N&S and Size indicators. This shows that the results of the Shanghai ranking cannot be reproduced, given raw data and the public methodology of the ranking.
Innovative activities are fundamental to the competitiveness strategies of the firms in a globalized market. Their assessment, using indicators such as those utilized in the Community Innovation Survey (CIS), shows significant sectoral dispersion. Traditional industries are in a weak position because the innovation they are involved in is mainly aesthetic, which is not really addressed in innovation surveys. In this work, we review the various criticisms levelled at existing indicators and propose some new indicators that would capture the types of innovations that are conducted by the traditional industries. This work is based on a study of the features of traditional industries and the concept of aesthetic novelty. The proposed indicators are tested in the Spanish footwear industry.
The aim of this paper is to explore to what extent social integration influences scientists' research activity and performance. Data were obtained from a survey of researchers ascribed to the Biology and Biomedicine area of the Spanish Council for Scientific Research, as well as from their curricula vitae. The results provide empirical evidence that researchers who were highly integrated within their teams performed better than their less integrated colleagues in aspects of research activity such as collaboration with the private sector, patenting, participation in domestic funded research and development projects, and supervision of doctoral dissertations. Nevertheless, highly integrated researchers did not seem to be more prestigious than less integrated colleagues, nor did the former's publications have a higher impact.
This study examined why Websites were co-linked using Canadian university Websites as the test set. Pages that co-linked to these university Websites were located using Yahoo!. A random sample of 859 co-linking pages (the page that initiated the co-link) was retrieved and the contents of the page, as well as the context of the link, were manually examined to record the following variables: language, country, type of Website, and the reasons for co-linking. The study found that in over 94% of cases, the two co-linked universities were related academically; many of these cases (38%) showed a relationship specifically in teaching or research. This confirms results, from previous quantitative studies, that Web co-links can be a measure of the similarity or relatedness of sites being co-linked and that Web co-link analysis can thus be used to study relationships among linked Websites.
Developing countries share disbelief about the benefits of the endogenous production of science as a tool for economical growth. Hence, public policies to strengthen science and technology and promote the culture of innovation are, in general, weak and sometimes incoherent. Patenting has become not only an icon to protect discoveries which can yield profits and enable socio-economical growth but also a potent informetric tool to assess innovation and certainly, since the seminal work of Narin, to understand the multidimensional interactions between science, technology and innovation. In this article we examine the impact of Chilean research articles on world technology as viewed by the link between articles produced in Chile and US patents. Our results show that from 1987 to 2003, 509 US patents had 562 citations to 273 articles produced at least, by one author working in a Chilean institution. US, not Chilean companies are the holders of patents citing Chilean produced articles. The research articles covered many disciplines but a clear concentration occurred in the biomedical field. Additionally, chemistry was also well cited. Our results confirm that in Chile a non-patenting culture which involves researchers and institutions still prevails. Hence, public policies need to be designed and implemented to foster scientific production and innovation in order to advance progress in the current knowledge-economy-driven society which sustains competitiveness in the globalized world.
Multinational papers are defined here as ones written by authors who reside in different countries during the course of research. For each of 16 fields of science, I scanned the first 200 papers in 2005 in four major journals publishing original research papers. Those journals produced 40% of all the citations among those journals with Impact Factors greater than 1.0. The frequencies of multinational papers ranged from 13% in surgery to 55% in astronomy. Although one can list a dozen factors which might contribute toward multinational papers, I lack the data to test most of those. There are only minor correlations with team sizes and Impact Factors, inadequate to explain the range. There is a larger, but not convincing, dependence upon the fractions of single-author papers and its cause, if real, is unclear. However, the most prominent factor seems to be the nature of the objects studied; if they are usually local (e.g. in one hospital or in one laboratory), the papers tend to be domestic but if most of the objects are available simultaneously to scientists in many countries (e .g. the sky in astronomy or the oceans and the Earth's atmosphere in geosciences or widespread diseases in the area of infectious diseases or plants and animals widely distributed in biology), the papers are often international. Auxiliary results for 2005 are an average of 5.5 +/- 0.3 authors per paper and 6.6 +/- 1.0% one-author papers.
We offer two metrics that together help gauge how interdisciplinary a body of research is. Both draw upon Web of Knowledge Subject Categories (SCs) as key units of analysis. We have assembled two Substantial Web of Knowledge samples from which to determine how closely individual SCs relate to each other. "Integration" measures the extent to which a research article cites diverse SCs. "Specialization" considers the spread of SCs in which the body of research (e.g., the work of a given author in a specified time period) is published. Pilot results for a sample of researchers show a surprising degree of interdisciplinarity.
Since the pioneering studies of CARPENTER & NARIN (1983), and NARIN & NOMA (1985), non-patent references (NPRs) in patent documents have been widely used as an indicator of science-technology links. MEYER (2000) reviewed previous work in the patent citation literature and found that citation links between patents and papers are, if not explicitly, at least implicitly viewed as an indication of the contribution of science to technology. Using a sample of 850 patents of New Zealand companies granted by the USPTO between 1976 and 2004, we find evidence of systematic noise in NPR data. We suggest that future research should pay close attention to heterogeneity among countries, and that one should demonstrate more caution in applying and interpreting results based on the NPR methodology.
Systems off ices that deal with library information technologies have played an important role in academic libraries; however, not much is known about how systems offices are positioned within academic libraries. This study examined the present status and the influence of systems offices by exploring the power differences among five principal functional units based on strategic contingencies theory. A mail questionnaire was sent to the principal functional unit heads of each of 95 university libraries belonging to the Association of Research Libraries in the United States. A total of 484 questionnaires were sent; 235 questionnaires were returned. The three major findings of this study were that (a) systems offices had more perceived power than all but public services; (b) systems off ices had higher levels on contingency variables than did most of the other units; and finally, (c) criticality was a factor affecting perceived power between systems off ices and most of the other units. The study findings imply that strategic contingencies theory may be partially applicable to library settings. Library staff or units may strategically increase their power by aligning their services with goals critical to their library and cooperating with other staff or other units.
In this study, we investigate the similarities and differences between rankings of search results by users and search engines. Sixty-seven students took part in a 3-week-long experiment, during which they were asked to identify and rank the top 10 documents from the set of URLs that were retrieved by three major search engines (Google, MSN Search, and Yahoo!) for 12 selected queries. The URLs and accompanying snippets were displayed in random order, without disclosing which search engine(s) retrieved any specific URL for the query. We computed the similarity of the rankings of the users and search engines using four nonparametric correlation measures in [0,1] that complement each other. The findings show that the similarities between the users' choices and the rankings of the search engines are low. We examined the effects of the presentation order of the results, and of the thinking styles of the participants. Presentation order influences the rankings, but overall the results indicate that there is no "average user," and even if the users have the same basic knowledge of a topic, they evaluate information in their own context, which is influenced by cognitive, affective, and physical factors. This is the first large-scale experiment in which users were asked to rank the results of identical queries. The analysis of the experimental results demonstrates the potential for personalized search.
Metadata and an appropriate metadata model are nontrivial components of information architecture conceptualization and implementation, particularly when disparate and dispersed systems are integrated. Metadata availability can enhance retrieval processes, improve information organization and navigation, and support management of digital objects. To support these activities efficiently, metadata need to be modeled appropriately for the tasks. The authors' work focuses on how to understand and model metadata requirements to support the work of end users of an integrative statistical knowledge network (SKN). They report on a series of user studies. These studies provide an understanding of metadata elements necessary for a variety of user-oriented tasks, related business rules associated with the use of these elements, and their relationship to other perspectives on metadata model development. This work demonstrates the importance of the user perspective in this type of design activity and provides a set of strategies by which the results of user studies can be systematically utilized to support that design.
We present evidence that in some research fields, research published in journals and reported on the Web may collectively represent different evolutionary stages of the field, with journals lagging a few years behind the Web on average, and that a "two-tier" scholarly communication system may therefore be evolving. We conclude that in such fields, (a) for detecting current research fronts, author co-citation analyses (ACA) using articles published on the Web as a data source can outperform traditional ACAs using articles published in journals as data, and that (b) as a result, it is important to use multiple data sources in citation analysis studies of scholarly communication for a complete picture of communication patterns. Our evidence stems from comparing the respective intellectual structures of the XML research field, a subfield of computer science, as revealed from three sets of ACA covering two time periods: (a) from the field's beginnings in 1996 to 2001, and (b) from 2001 to 2006. For the first time period, we analyze research articles both from journals as indexed by the Science Citation Index (SCl) and from the Web as indexed by CiteSeer. We follow up by an ACA of SCI data for the second time period. We find that most trends in the evolution of this field from the first to the second time period that we find when comparing ACA results from the SCI between the two time periods already were apparent in the ACA results; from CiteSeer during the first time period.
In addition to science citation indicators of journals like impact and immediacy, social network analysis provides a set of centrality measures like degree, betweenness, and closeness centrality. These measures are first analyzed for the entire set of 7,379 journals included in the Journal Citation Reports of the Science Citation Index and the Social Sciences Citation Index 2004 (Thomson ISI, Philadelphia, PA), and then also in relation to local citation environments that can be considered as proxies of specialties and disciplines. Betweenness centrality is shown to be an indicator of the interdisciplinarity of journals, but only in local citation environments and after normalization; otherwise, the influence of degree centrality (size) overshadows the betweenness-centrality measure. The indicator is applied to a variety of citation environments, including policy-relevant ones like biotechnology and nanotechnology. The values of the indicator remain sensitive to the delineations of the set because of the indicator's local character. Maps showing interdisciplinarilty of journals in terms of betweenness centrality can be drawn using information about journal citation environments, which is available online.
This article provides an overview of the tools specified by the MPEG-7 standard for describing the structure of multimedia content. In particular, it focuses on tools that represent segments resulting from a spatial and/or temporal partitioning of multimedia content. The segments are described in terms of their decomposition and the general relations among them as well as attributes or features of segments. Decomposition efficiently represents segment hierarchies and can be used to create tables of contents or indexes. More general graph representations are handled by the various standard spatial and temporal relations. A segment can be described by a large number of features ranging from those targeting the life cycle of the content (e.g., creation and usage) to those addressing signal characteristics such as audio, color, shape, or motion properties.
Metadata describing multimedia can address a wide variety of purposes, from the purely physical characteristics of an item, to the circumstances surrounding its production, to attributes that cannot necessarily be determined by examining the item itself directly. These latter attributes, often dealing with "meaning" or interpretation of an item's content, are frequently deemed too difficult to determine and subject to individual and cultural variability. At the same time, however, research has shown that these abstract, interpretive attributes, which carry meaning, are frequently the ones for which people search. To describe an item fully, therefore, means to describe it at both the "syntactic" and the "semantic" levels. This article discusses the development of the semantic description schemes within the MPEG-7 standard from both a historical and an intellectual perspective, as well as the difficulties inherent in creating a descriptive schema that can fully capture the complexity of "narrative worlds."
In this article, we introduce a user preference model contained in the User Interaction Tools Clause of the MPEG-7 Multimedia Description Schemes, which is described by a UserPreferences description scheme (DS) and a UsageHistory description scheme (DS). Then we propose a user preference learning algorithm by using a Bayesian network to which weighted usage history data on multimedia consumption is taken as input. Our user preference learning algorithm adopts a dynamic learning method for learning real-time changes in a user's preferences from content consumption history data by weighting these choices in time. Finally, we address a user preference-based television program recommendation system on the basis of the user preference learning algorithm and show experimental results for a large set of realistic usage-history data of watched television programs. The experimental results suggest that our automatic user reference learning method is well suited for a personalized electronic program guide (EPG) application.
This article discusses the implementation of MPEG-7 within the Moving Image Collections (MIC) portal. MIC is a union catalog of the world's moving images, as well as a portal to information on the care, management, and use of moving images. The MIC Union Catalog utilizes a core registry schema that is designed to map readily to any metadata schema used to describe moving images. The MIC development team was particularly interested in supporting MPEG-7 for future nontextual digital video indexing applications. An MPEG-7 application profile and Microsoft Access cataloging utility were developed in order to test MPEG-7 within the MIC Union Catalog; 400 science digital videos in the ResearchChannel collection were cataloged in MPEG-7. The MPEG-7 records were mapped to MIC and ingested. Draft MPEG-7 to MIC and MIC to MPEG-7 maps were developed and are available at the MIC Web site. MPEG-7 records are available for viewing for any record in the MIC database via a collections explore search within the Archivists' portal. The MPEG-7 cataloging utility may be downloaded from the MIC project Web site (Moving Image Collections. MIC Cataloging Utility. http://gondolin.rutgers.edu/MIC/text/ how/cataloging-utility.htm). This article also discusses issues with MPEG-7 as a descriptive metadata schema, as well as mapping and implementation issues identified in the project.
This article provides an overview of our experiments in using MPEG-7 in a television news retrieval application. Our study is based on a survey of professional users in the Television Suisse Romande (TSR) television news production environment. We present here two main issues. First, we describe the way the generic and voluminous MPEG-7 Schema can be exploited in the context of a specific application domain. Second, we discuss the problem of how to search MPEG-7 descriptions, which are detailed and complex by nature, via a high-level user-oriented retrieval model.
Personal video recorders have the capability to change the media delivery industry fundamentally, and in this context, many believe the real international age of personal digital recorders (PDRs) will arrive with the use of "open" systems. The world reached an important milestone with the publication of the TV-Anytime Phase 1 specifications for unidirectional broadcast and metadata services over bidirectional networks. TV-Anytime is a worldwide prestandardization body; this article gives an overview of the main features of TV-Anytime's metadata specification and its relationship to MPEG-7 and provides insight into ways two organizations concerned with standards work together. Phase 2 has since been completed and TV-Anytime has been adopted by various international standards organizations dealing with telecommunications and is now in the implementation phase.
Universal Multimedia Access (UMA) deals with seamless access to once-only-created content via any kind of terminal and any kind of network connectivity, which implies that the content should be adapted in order to fit a variety of terminal and network characteristics, as well as user preferences. The MPEG-7 standard offers some support for UMA within its section on Multimedia Description Schemes (MDS). Within the standard, several groups of tools serve this purpose. For instance, the Navigation and Access Tools provide some Description Schemes that allow the description of adapted content variations and summaries and allow for preprocessed content versions. Some support is also found in the Content Metadata Tools (Media and Usage Tools), for real-time ease in creation of online content versions and in limited support for session description, which is completed in MPEG-21.
The MPEG-7 standard supports the description of both the structure and the semantics of multimedia; however, the generation and consumption of MPEG-7 structural and semantic descriptions are outside the scope of the standard. This article presents two research prototype systems that demonstrate the generation and consumption of MPEG-7 structural and semantic descriptions in retrieval applications. The active system for MPEGA video object simulation (AMOS) is a video object segmentation and retrieval system that segments, tracks, and models objects in videos (e.g., person, car) as a set of regions with corresponding visual features and spatiotemporal relations. The region-based model provides an effective base for similarity retrieval of video objects. The second system, the Intelligent Multimedia Knowledge Application (IMKA), uses the novel MediaNet framework for representing semantic and perceptual information about the world using multimedia. MediaNet knowledge bases can be constructed automatically from annotated collections of multimedia data and used to enhance the retrieval of multimedia.
Jorge Hirsch (2005a, 2005b) recently proposed the h index to quantify the research output of individual scientists. The new index has attracted a lot of attention in the scientific community. The claim that the h index in a single number provides a good representation of the scientific lifetime achievement of a scientist as well as the (supposed) simple calculation of the h index using common literature databases lead to the danger of improper use of the index. We describe the advantages and disadvantages of the h index and summarize the studies on the convergent validity of this index. We also introduce corrections and complements as well as single-number alternatives to the h index.
This study investigates the scientific output and publication patterns of Korean biotechnology before and after the start of the Korean Biotechnology Stimulation Plans (1994-2007), and then compares the results with publication data from the same time periods for Japan, the People's Republic of China, Taiwan and Singapore. For this study, 14,704 publications, published by at least one researcher from one of the five Asian nations (indexed by SCI Expanded during the years 1990-1993 and the years 2000-2003), were considered. A marked increase of Korean research output in biotechnology was largely influenced by an increasing tendency for researchers to enter the field of biotechnology and by increased expenditures for R&D activity through the Korean Biotechnology Stimulation Plans. In addition, the SCI Expanded coverage of national journals affected the scientific output and publication patterns of Japanese and Korean researchers. Looking at the Korean publications by collaboration type, international collaboration leads to more publications in mainstream journals of high impact factors than local and domestic collaborations for the two periods. However, although the Korean Biotechnology Stimulation Plans were followed by a remarkable increase in South Korea's research output, this increase has not been accompanied by growth in the quality of those publications in terms of impact factors of journals for Korean publications.
Cross-field comparison of citation measures of scientific achievement or research quality is severely hindered by the diversity of the stage of development and citation habits of different disciplines or fields. Based on the same principles of RCR (Relative Citation Rate) and RW (Relative Subfield Citedness), a new dimension - the Relative Superiority Coefficient (SC (n) ) in research quality was introduced. This can indicate clearly the relative research level for research groups at multiple levels in the respective field by consistent criteria in terms of research quality. Comparison of the SC (n) within or across 22 broad fields among 5 countries were presented as an application model. Hierarchical Cluster and One-Way ANOVA were applied and processed by the statistical program SPSS. All original data were from Essential Science Indicators (ESI) 1996-2006.
The purpose of this study is to explore the character and pattern of the linkage between science and technology in China, based on the database of United States Patent and Trademark Office (USPTO). The analysis is focused on the period 1995-2004, a rapid increasing period for Chinese US patents. Using the scientific non-patent references (NPRs) within patents, we investigate the science-technology connection in the context of Chinese regions as well as industrial sectors classified by International Patent Classification (IPC). 11 technological domains have been selected to describe the science intensity of the technology. The results suggest that the patents and the corresponding scientific citations are related in different ways. Finally, we match the scientific NPRs to the Science Citation Index (SCI) covered publications to identify the core journals and categories. It reveals that the scientific references covered by SCI show a skewed distribution not only in journals but also in categories.
Self-citations - those where authors cite their own works - account for a significant portion of all citations. These self-references may result from the cumulative nature of individual research, the need for personal gratification, or the value of self-citation as a rhetorical and tactical tool in the struggle for visibility and scientific authority. In this article we examine the incentives that underlie self-citation by studying how authors' references to their own works affect the citations they receive from others. We report the results of a macro study of more than half a million citations to articles by Norwegian scientists that appeared in the Science Citation Index. We show that the more one cites oneself the more one is cited by other scholars. Controlling for numerous sources of variation in cumulative citations from others, our models suggest that each additional self-citation increases the number of citations from others by about one after one year, and by about three after five years. Moreover, there is no significant penalty for the most frequent self-citers - the effect of self-citation remains positive even for very high rates of self-citation. These results carry important policy implications for the use of citations to evaluate performance and distribute resources in science and they represent new information on the role and impact of self-citations in scientific communication.
This paper describes an analysis of coverage of the risks from agricultural and food genetically-modified organisms (GMOs) from April 2002 to April 2004 in 14 news media from six countries (Canada, France, Germany, Spain, the UK and the USA) which was conducted as part of a review for the European Commission of the management of risk communication. A total of 597 relevant news articles were found and coded for their presentational tone, the types of risk (environmental, financial, health and political, in that order), the organisms described (mainly maize, rape and beet crops), and the documents, people and organisations cited. UK news media tended to be the most "scary" and Spanish ones the most "robust". Articles quoting public perceptions, non-governmental environmental organisations and politicians tended to emphasize the risks of GMOs; those quoting scientists tended to downplay the risks and describe their potential benefits. Some suggestions for possible action by the European Commission are put forward, such as the facilitation of contact between journalists and scientists, but it is recognized that for some newspapers, their editorial wish to campaign will inevitably over-ride their reporters' wish to present the truth.
The senior author is usually last on the byline of scientific publications, yet generally has made the second most important contribution. The explosion in author number per scientific paper, has necessitated limits on the number of authors allowed in cited references, frequently resulting in senior author truncation. Would potential visibility gained from citations in top-tier journals be offset by senior author omission? We found evidence for this in a sample of 208 journals, showing significant associations between author limits in cited references and various measures of journal quality. These associations, however, differed among biological science, physical science, and interdisciplinary journals.
This paper introduces a new approach to detecting scientists' field mobility by focusing on an author's self-citation network, and the co-authorships and keywords in self-citing articles. Contrary to much previous literature on self-citations, we will show that author's self-citation patterns reveal important information on the development and emergence of new research topics over time. More specifically, we will discuss self-citations as a means to detect scientists' field mobility. We introduce a network based definition of field mobility, using the Optimal Percolation Method (Lambiotte & Ausloos, 2005; 2006). The results of the study can be extended to selfcitation networks of groups of authors and, generally also for other types of networks.
The main objective of this paper is to provide an empirical insight into the changes in the basic characteristics of the knowledge production mode and of scientific productivity in the Croatian research system in the transitional period. Empirical analysis is based on the results of two comparable questionnaire studies. The first survey was conducted in 1990 and the sample covered 921 respondents, while the second survey was conducted in 2004 with a sample of 915 respondents. The central characteristics of the knowledge production mode and of productivity confirm an expected duality: the features that accompany the introduction of a competitive system of research funding and evaluation on the one hand, and the anachronistic and newly acquired peculiarity of the research system on the other. Thus, the gap between the improved scientific performance of the researchers and the conditions in which they work has deepened. Scientific productivity still lags behind the productivity of the (developed) countries. Though Croatian researchers publish less, they follow basic global trends in the structure of publications, especially the rise in foreign and co-authored works.
Characteristics of highly and poorly cited research articles (with Abstracts) published in The Lancet over a three-year period were examined. These characteristics included numerical (numbers of authors, references, citations, Abstract words, journal pages), organizational (first author country, institution type, institution name), and medical (medical condition, study approach, study type, sample size, study outcome). Compared to the least cited articles, the most cited have three to five times the median number of authors per article, fifty to six hundred percent greater median number of references per article, 110 to 490 times the median number of citations per article, 2.5 to almost seven times the median number of Abstract words per article, and 2.5 to 3.5 times the median number of pages per article. The most cited articles' medical themes emphasize breast cancer, diabetes, coronary circulation, and HIV immune system problems, focusing on large-scale clinical trials of drugs. The least cited articles' themes essentially do not address the above medical issues, especially from a clinical trials perspective, cover a much broader range of topics, and have much more emphasis on social and reproductive health issues. Finally, for sample sizes of clinical trials specifically, those of the most cited articles ranged from a median of about 1500 to 2500, whereas those of the least cited articles ranged from 30 to 40.
The CD-ROM and web versions of the Science Citation Index databases are compared as to their content and format features. Several differences have been detected such as the use of different punctuation marks in both versions and a different organisation of author's affiliation data. These differences make automatic comparisons of ISI products difficult and they should be considered when matching both databases. Some recommendations to ensure more normalisation and reliability of data are pointed out.
Based on the citation data of journals covered by the China Scientific and Technical Papers and Citations Database (CSTPCD), we obtained aggregated journal-journal citation environments by applying routines developed specifically for this purpose. Local citation impact of journals is defined as the share of the total citations in a local citation environment, which is expressed as a ratio and can be visualized by the size of the nodes. The vertical size of the nodes varies proportionally to a journal's total citation share, while the horizontal size of the nodes is used to provide citation information after correction for the within-journal (self-) citations. In the "citing" environment, the equivalent of the local citation performance can also be considered as a citation activity index. Using the "citing" patterns as variables one is able to map how the relevant journal environments are perceived by the collective of authors of a journal, while the "cited" environment reflects the impact of journals in a local environment. In this study, we analyze citation impacts of three Chinese journals in mathematics and compare local citation impacts with impact factors. Local citation impacts reflect a journal's status and function better than (global) impact factors. We also found that authors in Chinese journals prefer international instead of domestic ones as sources for their citations.
As the population ages in Taiwan, stroke research has received greater attention in recent years. Strokes have significant impacts on the health and well-being of the elderly. To formulate future research policy, information on stroke publications should be collected. In this research, we studied stroke-related research articles published by Taiwan researchers which were indexed in the Science Citation Index from 1991 to 2005. We found that the quantity of publications has increased at a quicker pace than the worldwide trend. Over the years, there has been an increase in international collaboration, mainly with researchers in the U. S. Article visibility, measured as the frequency of being cited, also increased during the period. It appears that stroke research in Taiwan has become more globally connected and has also improved in quality. The publication output was concentrated in a few institutes, but there was a wide variation among these institutes in the ability to independently conduct research. A wide array of keywords indicated a probable lack of continuity in research. Nevertheless, there was an inverse relationship between stroke mortality and number of published articles in Taiwan. To improve the quality and efficiency of stroke research, continuity in research focuses needs to be maintained, and thus funding should be allocated on a long-term basis to institutes with a proven record of success.
The issue of research continuance in a scientific discipline was analyzed and applied to the field of terrorism. The growing amount of literature in this field is produced mostly by one- timers who "visit" the field, contribute one or two articles, and then move to another subject area. This research pattern does not contribute to the regularity and constancy of publication by which a scientific discipline is formed and theories and paradigms of the field are created. This study observed the research continuance and transience of scientific publications in terrorism by using obtainable "most prolific terrorism authors" lists at different points in time. These lists designed by several terrorism researchers, presented a few researchers who contributed to the field continuously and many others whose main research interest lay in another discipline. The four lists observed included authors who were continuants, transients, new-comers, and terminators (who left the field). The lack of continuous, full-time research in a research field is typical of many disciplines, but the influence of this research pattern on a field's growth and stability is different for older, established disciplines than for new and formative fields of study. With in the former, intellectual mobility could contribute to the rise of new topics and probably enrich the particular scientific field; with the latter, by contrast, it could hamper the formation and growth of the field.
As in today's knowledge society the Internet is playing an important role in the information literacy of university students the goal of this paper is to analyse, after its first year on the Web, the informational impact of an e-learning resource developed by Granada's University lecturers (the e-COMS educational portal), a pioneer in Spain for training in information literacy. From the objective and subjective data provided by the own portal and by it users, two different and complementary kinds of analysis (functional and users') are performed. Assessment of various capabilities, among which visibility and usability stand out, is provided. The highly positive but improvable results offer a detailed analysis of the functional aspects of the portal itself and of the users' relations with this information resource. From these analyses strengths and weaknesses are extracted and some proposals for improvement are derived.
What is the value of a scientist and its impact upon the scientific thinking? How can we measure the prestige of a journal or a conference? The evaluation of the scientific work of a scientist and the estimation of the quality of a journal or conference has long attracted significant interest, due to the benefits by obtaining an unbiased and fair criterion. Although it appears to be simple, defining a quality metric is not an easy task. To overcome the disadvantages of the present metrics used for ranking scientists and journals, J. E. Hirsch proposed a pioneering metric, the now famous h-index. In this article we demonstrate several inefficiencies of this index and develop a pair of generalizations and effective variants of it to deal with scientist ranking and publication forum ranking. The new citation indices are able to disclose trendsetters in scientific research, as well as researchers that constantly shape their field with their influential work, no matter how old they are. We exhibit the effectiveness and the benefits of the new indices to unfold the full potential of the h-index, with extensive experimental results obtained from the DBLP, a widely known on-line digital library.
I describe a method to separate the articles of different authors with the same name. It is based on a distance between any two publications, defined in terms of the probability that they would have as many coincidences if they were drawn at random from all published documents. Articles with a given author name are then clustered according to their distance, so that all articles in a cluster belong very likely to the same author. The method has proven very useful in generating groups of papers that are then selected manually. This simplifies considerably citation analysis when the author publication lists are not available.
The number of citations of journal papers is an important measure of the impact of research. Thus, the modeling of citation behavior needs attention. Burrell, Egghe, Rousseau and others pioneered this type of modeling. Several models have been proposed for the citation distribution. In this note, we derive the most comprehensive collection of formulas for the citation distribution, covering some 17 flexible families. The corresponding estimation procedures are also derived by the method of moments. We feel that this work could serve as a useful reference for the modeling of citation behavior.
The research questions are as follows: to what extent do Canadian medical school faculty members have person-to-person interactions with individuals working in public and private sector organizations? What are the characteristics of Canadian medical school faculty members who interact with individuals working in these work settings? Are these different network patterns complementary or substitute? The data used for this study are from a cross-sectional survey of Canadian medical school faculty members (n = 907). Structural multivariate ordered probit models were estimated to explore the characteristics of faculty members with different network patterns and to see if these network patterns are complementary or substitute. Study results suggest that the different network patterns considered in the study are not conflicting, but that some patterns correspond to different faculty member profiles.
A methodology for creating bibliometric impact profiles is described. The advantages of such profiles as a management tool to supplement the reporting power of traditional average impact metrics are discussed. The impact profile for the UK as a whole reveals the extent to which the median and modal UK impact values differ from and are significantly below average impact. Only one-third of UK output for 1995-2004 is above world average impact although the UK's average world-normalised impact is 1.24. Time-categorised impact profiles are used to test hypotheses about changing impact and confirm that the increase in average UK impact is due to real improvement rather than a reduction in low impact outputs. The impact profile methodology has been applied across disciplines as well as years and is shown to work well in all subject categories. It reveals substantial variations in performance between disciplines. The value of calculating the profile median and mode as well as the average impact are demonstrated. Finally, the methodology is applied to a specific data-set to compare the impact profile of the elite Laboratory of Molecular Biology (Cambridge) with the relevant UK average. This demonstrates an application of the methodology by identifying where the institute's exceptional performance is located. The value of impact profiles lies in their role as an interpretive aid for non-specialists, not as a technical transformation of the data for scientometricians.
The purpose of the study proposed in this paper is to evaluate the Spanish public university websites dedicated to the European Higher Education Area (EHEA). To do so, the quality of these resources has been analysed in the light of data provided by a series of indicators grouped in seven criteria, most of which were used to determine what information is made available and in what way. The criteria used in our analysis are: visibility, authority, updatedness, accesibility, correctness and completeness, quality assessment and navigability. All in all, the results allow us to carry out an overall diagnosis of the situation and also provide us with information about the situation at each university, thus revealing their main strengths, namely authority and navegability, and also their chief shortcomings: updatedness, accessibility and quality assessment. In this way it is possible to detect the best practices in each of the aspects evaluated so that they can serve as an example and guide for universities with greater deficiencies and thus help them to improve their EHEA websites.
Rocchio's similarity-based relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive learning algorithm from examples in searching for documents represented by a linear classifier. Despite its popularity in various applications, there is little rigorous analysis of its learning complexity in literature. In this article, the authors prove for the first time that the learning complexity of Rocchio's algorithm is 0(d + d(2)(log d + log n)) over the discretized vector space (0,..., n - 1}(d) when the inner product similarity measure is used. The upper bound on the learning complexity for searching for documents represented by a monotone linear classifier ((q) over right arrow, 0) over {0,..., n - 1}(d) can be improved to, at most, 1 + 2k (n - 1) (log d + log(n - 1)), where k is the number of nonzero components in (q) over right arrow. Several lower bounds on the learning complexity are also obtained for Rocchio's algorithm. For example, the authors prove that Rocchio's algorithm has a lower bound Omega(((2)(d))log n) on its learning complexity over the Boolean vector space {0, 1)(d).
The rapid growth of the numbers of images and their users as a result of the reduction in cost and increase in efficiency of the creation, storage, manipulation, and transmission of images poses challenges to those who organize and provide access to images. One of these challenges is similarity matching, a key component of current content-based image retrieval systems. Similarity matching often is implemented through similarity measures based on geometric models of similarity whose metric axioms are not satisfied by human similarity judgment data. This study is significant in that it is among the first known to test Tversky's contrast model, which equates the degree of similarity of two stimuli to a linear combination of their common and distinctive features, in the context of image representation and retrieval. Data were collected from 150 participants who performed an image description and a similarity judgment task. Structural equation modeling, correlation, and regression analyses confirmed the relationships between perceived features and similarity of objects hypothesized by Tversky. The results hold implications for future research that will attempt to further test the contrast model and assist designers of image organization and retrieval systems by pointing toward alternative document representations and similarity measures that more closely match human similarity judgments.
The purpose of this study was to develop a method for automatic construction of multidocument summaries of sets of research abstracts that may be retrieved by a digital library or search engine in response to a user query. Sociology dissertation abstracts were selected as the sample domain in this study. A variable-based framework was proposed for integrating and organizing research concepts and relationships as well as research methods and contextual relations extracted from different dissertation abstracts. Based on the framework, a new summarization method was developed, which parses the discourse structure of abstracts, extracts research concepts and relationships, integrates the information across different abstracts, and organizes and presents them in a Web-based interface. The focus of this article is on the user evaluation that was performed to assess the overall quality and usefulness of the summaries. Two types of variable-based summaries generated using the summarization method-with or without the use of a taxonomy-were compared against a sentence-based summary that lists only the research-objective sentences extracted from each abstract and another sentencebased summary generated using the MEAD system that extracts important sentences. The evaluation results indicate that the majority of sociological researchers (70%) and general users (64%) preferred the variable-based summaries generated with the use of the taxonomy.
Through this article, we highlight that there are discernibly different patterns among conceptualizations of information, technology, and people across information systems and information science literatures. We do this to clarify the differences in these two areas of scholarship and to further encourage the substantial overlap possible, but not yet engaged, in the research pursued in these areas. We engage this by analyzing published literature in these areas to frame our discussion of the challenges and opportunities for scholars in information science and information systems disciplines to engage in collaborative work.
This article contrasts Bates' understanding of information as an observer-independent phenomenon with an understanding of information as situational, put forward by, among others, Bateson, Yovits, Spang-Hanssen, Brier, Buckland, Goguen, and Hjorland. The conflict between objective and subjective ways of understanding information corresponds to the conflict between an understanding of information as a thing or a substance versus an understanding of it as a sign. It is a fundamental distinction that involves a whole theory of knowledge, and it has roots back to different metaphors applied in Shannon's information theory. It is argued that a subject-dependent/ situation specific understanding of information is best suited to fulfill the needs in information science and that it is urgent for us to base Information Science (IS; or Library and Information Science, LIS) on this alternative theoretical frame.
Although the production of laboratory and field records is fundamental to the conduct of contemporary science, there has been little research into this topic in information studies. This article reports on a study in which, using ethnographic methods, the author studied record-keeping as it is practiced in a basic research science laboratory. The process by which the record is created to reflect both personal need and professional norms is framed as a series of acts of selection, synthesis, and standardization. The article concludes with reflections on the role of deep understanding of scientific record-keeping for other disciplines and the design of digital laboratory technologies.
Prior research on Web animation typically focuses on banner ads, where the findings suggest that users are capable of ignoring the animation when performing online tasks. Research on non-banner-ads animation (e.g., animation applied to the main content of an e-commerce Web site), however, is relatively scarce with inconclusive results. We propose that the effects of non-banner-ads animation are moderated by task type and the Web user's experience with the animation. Drawing upon divided attention theories, especially the central capacity theory, this research investigates the effects of non-banner-ads animation on Web users' clicking behavior, task performance, and perceptions through an online shopping experiment. The results show that non-banner-ads animation does attract Web users' attention, with the animated item more likely to be clicked first and also more likely to be purchased when users are performing browsing tasks. Meanwhile, Web users' task performance and perceptions are negatively affected in the presence of animation. Moreover, the negative effects of animation on task performance are greater in browsing tasks than in searching tasks. Finally, experience can help Web users to reduce the distraction from animation and is more effective when users are engaged in searching tasks than when they are engaged in browsing tasks.
The authors present results from a real-world study depicting remote collaboration trends of a community of more than 87,000 scientists over 30 years. They utilize publication records of more than 200,000 scholarly journal articles, together with affiliations of the authors to infer distance collaborations. The longevity of their study is of interest because it covers several years before and after the birth of the Internet and computer-supported collaborative work (CSCW) technologies. Thus, they provide one lens through which the impact of computer-assisted collaborative work technologies can be viewed. Their results show that there has been a steady and constant growth in the frequency of both interinstitute and cross-country collaborations in a particular physics domain, regardless of the introduction of these technologies. This suggests that we are witnessing an evolution, rather than a revolution, with respect to long-distance collaborative behavior. An interdisciplinary approach, combining numerical statistics, graph visualizations, and social network measurements, facilitates their remarks on the changes in the size and structure of these collaborations over this period of history.
This study details the activities and strategies that 11th grade students with high academic abilities used during their information seeking and use to complete class projects in a Persuasive Speech class. The study took place in a suburban high school in Maryland, and participants included 21 junior honors students, their teacher, and their library media specialist. Each student produced a 5-7-minute speech on a self-chosen topic. Conducted in the framework of qualitative research in a constructivist paradigm (E.G. Guba, & Y.S. Lincoln, 1998), the study used data collected from observations, individual interviews, and documents students produced for their projects-concept maps, paragraphs, outlines, and research journals. interview and observation data were analyzed using the constant comparative method (B. Glaser & A. Strauss, 1967) with the help of QSR NVivo 2 (QSR International Ply Ltd, 2002); students' documents were analyzed manually. The findings show that students' understanding, strategies, and activities during information seeking and use were interactive and serendipitous and that students learned about their topics as they searched. The research suggests that high school honors students in an information-rich environment are especially confident with learning tasks requiring an exploratory mode of learning.
This study compares the preparation and response efforts to Katrina and Rita through a knowledge management (KM) perspective. To achieve this objective, a theoretical KM framework is developed to examine the KIVI processes that underpin disaster management activities. The framework is then used to identify different dimensions along which the two disasters can be compared. The data, totaling some 500 documents, were drawn from a wide variety of news, congressional, and Internet sources. The findings show that the nonchalance towards the disaster's imminence, grossly inadequate preparations, and the chaotic responses seen in Katrina stood in stark contrast to the colossal scale of precautionary measures and response operations primed for Rita. The article concludes by highlighting three KM implications for managing large-scale natural disasters.
A conceptually relaxed utilization of the variable "search experience" makes it difficult for researchers to perform meaningful cross-study comparisons. The purpose of this study was to examine how search experience is defined and measured when used as a research variable. We implemented a qualitative analysis of 32 library and information science (LIS) research articles. We found that there was inconsistent terminology usage and measurements. Specifically, there were 21 unique labels to describe the search experience and 18 different measurements. The majority of the studies used a generic label "search experience" and relied on the reader to grasp specific context of the electronic information retrieval environment to which the variable applies from the description of the overall research design. In addition, there was a strong preference for measures that represented subjective self-reporting about the level of exposure to some information retrieval system. It is evident that there is a need for articles to contain detailed definitions of search experience variables for readers to truly understand the findings.
The h-index (Hirsch, 2005) is robust, remaining relatively unaffected by errors in the long tails of the citations-rank distribution, such as typographic errors that shortchange frequently cited articles and create bogus additional records. This robustness, and the ease with which h-indices can be verified, support the use of a Hirschtype index over alternatives such as the journal impact factor. These merits of the h-index apply both to individuals and to journals.
Contrary to Burrell's statements, Egghe's theory of continuous concentration does include the construction of a standard Lorenz curve.
The use of the bibilometric analytical technique for examining tsunami research does not exist in the literature. The objective of the study was to perform a bibliometric analysis of all tsunami-related publications in the Science Citation Index (SCI). Analyzed parameters included document type, language of publication, publication output, authorship, publication patterns, distribution of subject category, distribution of author keywords, country of publication, most-frequently cited article, and document distribution after the Indonesia tsunami. The US and Japan produced 53% of the total output where the seven major industrial countries accounted for the majority of the total production. English was the dominant language, comprising 95% of articles. A simulation model was applied to describe the relationship between the number of authors and the number of articles, the number of journals and the number of articles, and the percentage of total articles and the number of times a certain keyword was used. Moreover the tsunami publication patterns in the first 8 months after the Indonesia tsunami occurred on 26 December 2004 indicated a high percentage of non-article publications and more documents being published in journals with higher impact factors.
Hirsch's h- index gives a single number that in some sense summarizes an author's research output and its impact. Since an individual author's h-index will be time-dependent, we propose instead the h- rate which, according to theory, is (almost) constant. We re-analyse a previously published data set (LIANG, 2006) which, although not of the precise form to properly test our model, reveals that in many cases we do not have a constant h- rate. On the other hand this then suggests ways in which deeper scientometric investigations could be carried out. This work should be viewed as complementary to that of LIANG (2006).
The increase in patents is a main driving force for discussions of international competitiveness, knowledge spillovers, patent office efficiencies, and others. However, to the author's knowledge, it is interesting that no work has investigated the impact of the growth in the number of patents on patent-related scholarly (peer-reviewed) and media (e.g., press release) literatures, or evidence of inter-relatedness among these three literatures with those of the US market indices (viz., Dow, S&P500, NASDAQ). Here, I report that the growth in the number of US issued patents, the patent-related media and peer-reviewed publications, and these indices are statistically correlated, but with drastically different growth rates. This general result affords data supporting a hypothesis that publicly traded companies, as drivers of innovation, are priming a new research area within the scholarly communities and simultaneously affecting market value through, what-may-be-called, "patent journalism.".
Introduction: The present study endeavours to provide information on what are the research interests of Brazilian Public Health and how authors can be ranked. Methods: Post-graduate faculty members ISI data are analysed according to regions. Number of paper and its citations, papers' type-complexity-cooperation, Bradford's Law, Shannon's indexes, time dynamic functions, Lotka's Law, and ranking functions are examined. Results: Current production was built up in the last 30 years at a rate of 9.6% articles/year and 12.6% citations/year. 66% of potential authors were present in ISI data records, 64% achieved at least one citation. Research fields do not much depart from the traditional PH purview. More than 66% of authors have just one paper and decrease is steep. Subtle differences call attention to the South region. Conclusion: Brazilian PH is mainly committed to classical research fields and ranking among authors is narrow.
Our aim is to compare the coverage of the Scopus database with that of Ulrich, to determine just how homogenous it is in the academic world. The variables taken into account were subject distribution, geographical distribution, distribution by publishers and the language of publication. The analysis of the coverage of a product of this nature should be done in relation to an accepted model, the optimal choice being Ulrich's Directory, considered the international point of reference for the most comprehensive information on journals published throughout the world. The results described here allow us to draw a profile of Scopus in terms of its coverage by areas - geographic and thematic - and the significance of peer-review in its publications. Both these aspects are highly pragmatic considerations for information retrieval, the evaluation of research, and the design of policies for the use of scientific databases in scientific promotion.
Impact factors are a widely accepted means for the assessment of journal quality. However, journal editors have possibilities to influence the impact factor of their journals, for example, by requesting authors to cite additional papers published recently in that journal thus increasing the self-citation rate. I calculated self-citation rates of journals ranked in the Journal Citation Reports of ISI in the subject category "Ecology" (n = 107). On average, self citation was responsible for 16.2 +/- 1.3% (mean +/- SE) of the impact factor in 2004. The self-citation rates decrease with increasing journal impact, but even high impact journals show large variation. Six journals suspected to request for additional citations showed high self-citation rates, which increased over the last seven years. To avoid further deliberate increases in self-citation rates, I suggest to take journal-specific self-citation rates into account for journal rankings.
The authors present ranked lists of world's countries - with main focus on EU countries (together with newly acceeded and candidate countries) - by their h-index on various science fields. As main source of data Thomson Scientific's Essential Science Indicators (ESI) database was used. EU countries have strong positions in each field but none of them can successfully compete with the USA. The modest position of the newly accessed and candidate countries illustrate the importance of supportive economic and political background in order to achieve scientific success. An attempt is made to fit a recent theoretical model relating the h-index with two traditional scientometric indicators: the number of publications and the mean citation rate.
In contemporary query languages, the user is responsible for navigation among semantically related data. Because of the huge amount of data and the complex structural relationships among data in modern applications, it is unrealistic to suppose that the user could know completely the content and structure of the available information. There are several query languages whose purpose is to facilitate navigation in unknown structures of databases. However, the background assumption of these languages is that the user knows how data are related to each other semantically in the structure at hand. So far only little attention has been paid to how unknown semantic associations among available data can be discovered. We address this problem in this article. A semantic association between two entities can be constructed if a sequence of relationships expressed explicitly in a database can be found that connects these entities to each other. This sequence may contain several other entities through which the original entities are connected to each other indirectly. We introduce an expressive and declarative query language for discovering semantic associations. Our query language is able, for example, to discover semantic associations between entities for which only some of the characteristics are known. Further, it integrates the manipulation of semantic associations with the manipulation of documents that may contain information on entities in semantic associations.
The development of a facet analysis system to code and analyze data in a mixed-method study is discussed. The research goal was to identify the dimensions of interaction that contribute to student satisfaction in online Web-supported courses. The study was conducted between 2000 and 2002 at the Florida State University School of Information Studies. The researchers developed a facet analysis system that meets S. R. Ranganathan's (1967) requirements for articulation on three planes (idea, verbal, and notational). This system includes a codebook (verbal), coding procedures, and formulae (notational) for quantitative analysis of logs of chat sessions and postings to discussion boards for eight master's level courses taught online during the fall 2000 semester. Focus group interviews were subsequently held with student participants to confirm that results of the facet analysis reflected their experiences with the courses. The system was developed through a process of emergent coding. The researchers have been unable to identify any prior use of facet analysis for the analysis of research data as in this study. Identifying the facet analysis system was a major breakthrough in the research process, which, in turn, provided the researchers with a lens through which to analyze and interpret the data. In addition, identification of the faceted nature of the system opens up new possibilities for automation of the coding process.
This article examines the evolution of a collection of open access journals (OAJs,) indexed by the Science Citation Index (SCI; Thomson Scientific Philadelphia, PA) against four validity criteria including a free, immediate, full and constant access policy for at least 5 years. Few journals are found to be wrongly identified as OAJ or to have a dubious access policy. Some delayed journals evolved into gold OA; however, these are scarce compared to the number of journals that withdrew from gold OA to be an embargoed or a partially OAJ. A majority of the journals meet three of the criteria as they provide free and immediate access to their entire contents. Although a lot are found to follow a constant policy, a large number has an OA lifetime shorter than 5 years, due to the high frequency of newly launched or newly converted journals. That is the major factor affecting the validity of the collection. Only half of the collection meets all the requirements.
The present two-part article introduces matrix comparison as a formal means of evaluation in informetric studies such as cocitation analysis. In this first part, the motivation behind introducing matrix comparison to informetric studies, as well as two important issues influencing such comparisons, are introduced and discussed. The motivation is spurred by the recent debate on choice of proximity measures and their potential influence upon clustering and ordination results. The two important issues discussed here are matrix generation and the composition of proximity measures. The approach to matrix generation is demonstrated for the same data set, i.e., how data is represented and transformed in a matrix, evidently determines the behavior of proximity measures. Two different matrix generation approaches, in all probability, will lead to different proximity rankings of objects, which further lead to different ordination and clustering results for the same set of objects. Further, a resemblance in the composition of formulas indicates whether two proximity measures may produce similar ordination and clustering results. However, as shown in the case of the angular correlation and cosine measures, a small deviation in otherwise similar formulas can lead to different rankings depending on the contour of the data matrix transformed. Eventually, the behavior of proximity measures, that is whether they produce similar rankings of objects, is more or less data-specific. Consequently, the authors recommend the use of empirical matrix comparison techniques for individual studies to investigate the degree of resemblance between proximity measures or their ordination results. In part two of the article, the authors introduce and demonstrate two related statistical matrix comparison techniques the Mantel test and Procrustes analysis, respectively. These techniques can compare and evaluate the degree of monotonicity between different proximity measures or their ordination results. As such, the Mantel test and Procrustes analysis can be used as statistical validation tools in informetric studies and thus help choosing suitable proximity measures.
The present two-part article introduces matrix comparison as a formal means for evaluation purposes in informetric studies such as cocitation analysis. In the first part, the motivation behind introducing matrix comparison to informetric studies, as well as two important issues influencing such comparisons, matrix generation, and the composition of proximity measures, are introduced and discussed. In this second part, the authors introduce and thoroughly demonstrate two related matrix comparison techniques the Mantel test and Procrustes analysis, respectively. These techniques can compare and evaluate the degree of monotonicity between different proximity measures or their ordination results. In common with these techniques is the application of permutation procedures to test hypotheses about matrix resemblances. The choice of technique is related to the validation at hand. In the case of the Mantel test, the degree of resemblance between two measures forecast their potentially different affect upon ordination and clustering results. In principle, two proximity measures with a very strong resemblance most likely produce identical results, thus, choice of measure between the two becomes less important. Alternatively, or as a sup plement, Procrustes analysis compares the actual ordination results without investigating the underlying proximity measures, by matching two configurations of the same objects in a multidimensional space. An advantage of the Procrustes analysis though, is the graphical solution provided by the superimposition plot and the resulting decomposition of variance components. Accordingly, the Procrustes analysis provides not only a measure of general fit between configurations, but also values for individual objects enabling more elaborate validations. As such, the Mantel test and Procrustes analysis can be used as statistical validation tools in informetric studies and thus help choosing suitable proximity measures.
This study explores an image-based retrieval interface for drug information, focusing on usability for a specific population-seniors. Qualitative, task-based interviews examined participants' health information behaviors and documented search strategies using an existing database (www.drugs.com) and a new prototype that uses similarity-based clustering of pill images for retrieval. Twelve participants (aged 65 and older), reflecting a diversity of backgrounds and experience with Web-based resources, located pill information using the interfaces and discussed navigational and other search preferences. Findings point to design features (e.g., image enlargement) that meet seniors' needs in the context of other health-related information-seeking strategies (e.g., contacting pharmacists).
Elementary-age children (ages 6-11) are among the largest user groups of computers and the Internet. Therefore, it is important to design searching and browsing tools that support them. However, many interfaces for children do not consider their skills and preferences. Children are capable of creating Boolean queries using category browsers, but have difficulty with the hierarchies used in many category-browsing interfaces because different branches of the hierarchy must be navigated sequentially and top-level categories are often too abstract for them to understand. Based on previous research, the authors believe using a flat category structure, where only leaf-level categories are available and can be viewed simultaneously, might better support children. However, this design introduces many more items on the screen and the need for paging or scrolling, all potential usability problems. To evaluate these tradeoffs, the authors conducted two studies with children searching and browsing using two types of category browsers in the International Children's Digital Library. Their results suggest that a flat, simultaneous interface provides advantages over a hierarchical, sequential interface for children in both Boolean searching and casual browsing. These results add to our understanding of children's searching and browsing skills and preferences, and possibly serve as guidelines for other children's interface designers.
Although the analysis of citations in the scholarly literature is now an established and relatively well understood part of information science, not enough is known about citations that can be found on the Web. In particular, are there new Web types, and if so, are these trivial or potentially useful for studying or evaluating research communication? We sought evidence based upon a sample of 1,577 Web citations of the URLs or titles of research articles in 64 open-access journals from biology, physics, chemistry, and computing. Only 25% represented intellectual impact, from references of Web documents (23%) and other informal scholarly sources (2%). Many of the Web/URL citations were created for general or subject-specific navigation (45%) or for self-publicity (22%). Additional analyses revealed significant disciplinary differences in the types of Google unique Web/URL citations as well as some characteristics of scientific open-access publishing on the Web. We conclude that the Web provides access to a new and different type of citation information, one that may therefore enable us to measure different aspects of research, and the research process in particular; but to obtain good information, the different types should be separated.
The main aim of the article is the presentation of the construction process of a stopword list for a non-Latin language and the evaluation of the effect of stopword elimination from user queries. The article presents the phases of engineering a stopword list for the Greek language as well as the problems faced and the inferences deduced from this procedure. A set of 32 authentic queries are proposed by users and are run in Google with and without the stopwords. The importance of eliminating the stopwords from the user queries is then evaluated, in terms of relevance, in the top-10 results from Google.
Due to computer technology, a forced-response can be easily achieved in online questionnaires and is frequently used to gather complete datasets. An Internet-based quasi-experiment was conducted on the student server at the University of Vienna to study the influence of forced-response on dropout, demographic reports, and the content of the results. Forced-response was shown to substantially increase dropout. In addition, forced-response interacted with reported sex in eliminating a naturally occurring sex difference in dropout that was observed for the questionnaire whenever responses did not need to be enforced. Also reported sex turned out to have a mediating effect on time of dropout: Men dropped out earlier than did women. Further analyses revealed a reactance effect, as predicted by reactance theory. It is concluded that data from online questionnaires with forced-response designs are in danger of being hampered by dropout and reactance.
Recently we proposed a model in which when a scientist writes a manuscript, he picks up several random papers, cites them, and also copies a fraction of their references. The model was stimulated by our finding that a majority of scientific citations are copied from the lists of references used in other papers. It accounted quantitatively for several properties of empirically observed distribution of citations; however, important features such as power-law distributions of citations to papers published during the same year and the fact that the average rate of citing decreases with aging of a paper were not accounted for by that model. Here, we propose a modified model: When a scientist writes a manuscript, he picks up several random recent papers, cites them, and also copies some of their references. The difference with the original model is the word recent. We solve the model using methods of the theory of branching processes, and find that it can explain the aforementioned features of citation distribution, which our original model could not account for. The model also can explain "sleeping beauties in science;" that is, papers that are little cited for a decade or so and later "awaken" and get many citations. Although much can be understood from purely random models, we find that to obtain a good quantitative agreement with empirical citation data, one must introduce Darwinian fitness parameter for the papers.
Currently, there exists little evidence concerning how various characteristics of research cultures are associated with patterns of use of electronic library resources. The present study addresses this gap by exploring how research-group membership, across-fields scattering of literature, and degree of establishment of research area are related to patterns of digital library use. The analytic dimensions are derived from Richard Whitley's (1984 theory of the social and intellectual organization of academic fields. The article represents a first attempt to operationalize Whitley's concepts in a large-scale study of e-resources use. The data used in the study were gathered in 2004 by the Finnish Electronic Library (FinElib) through a nationwide Web-based user questionnaire (N = 900). Membership in a research group significantly increased searching in journal databases, the importance of colleagues as sources of information about electronic articles and journals, and the use of alert services. A significant interaction effect was found between degree of across-fields scattering of relevant resources and degree of establishment of research fields. A high degree of across-fields scattering of relevant literature increased the number of journal databases used mainly in less established research areas whereas it influenced the use of journal databases less in established fields. This research contributes to our picture concerning the complex set of interacting factors influencing patterns of use of e-resources.
In our query language introduced in Part I (Niemi & Jamsen, in press) the user can formulate queries to find out (possibly complex) semantic relationships among entities. In this article we demonstrate the usage of our query language and discuss the new applications that it supports. We categorize several query types and give sample queries. The query types are categorized based on whether the entities specified in a query are known or unknown to the user in advance, and whether text information in documents is utilized. Natural language is used to represent the results of queries in order to facilitate correct interpretation by the user. We discuss briefly the issues related to the prototype implementation of the query language and show that an independent operation like Rho (Sheth et al., 2005; Anyanwu & Sheth, 2002, 2003), which presupposes entities of interest to be known in advance, is exceedingly inefficient in emulating the behavior of our query language. The discussion also covers potential problems, and challenges for future work.
In a recent article in JASIST, L. Leydesdorff and L. Vaughan (2006) asserted that raw cocitation data should be analyzed directly, without first applying a normalization such as the Pearson correlation. In this communication, it is argued that there is nothing wrong with the widely adopted practice of normalizing cocitation data. One of the arguments put forward by Leydesdorff and Vaughan turns out to depend crucially on incorrect multidimensional scaling maps that are due to an error in the PROXSCAL program in SPSS.
This study focuses on the ways in which people define their source preferences in the context of seeking orienting information for nonwork purposes. The conceptual framework of the study combines ideas drawn from social phenomenology and information-seeking studies. The study utilizes Alfred Schutz's model describing the ways in which actors structure everyday knowledge into regions of decreasing relevance. It is assumed that this structuring based on the actor's interest at hand is also reflected in the ways in which an actor prefers information sources and channels. The concept of information source horizon is used to elicit articulations of source preferences. The empirical part of the study draws on interviews with 20 individuals active in environmental issues. Printed media (newspapers), the Internet, and broadcast media (radio, television) were preferred in seeking for orienting information. The major source preferences were content of information, and availability and accessibility. Usability of information sources, user characteristics such as media habits, and situational factors were mentioned less frequently as preference criteria.
One cannot manage information quality (IQ) without first being able to measure it meaningfully and establishing a causal connection between the source of IQ change, the IQ problem types, the types of activities affected, and their implications. In this article we propose a general IQ assessment framework. In contrast to context-specific IQ assessment models, which usually focus on a few variables determined by local needs, our framework consists of comprehensive typologies of IQ problems, related activities, and a taxonomy of IQ dimensions organized in a systematic way based on sound theories and practices. The framework can be used as a knowledge resource and as a guide for developing IQ measurement models for many different settings. The framework was validated and refined by developing specific IQ measurement models for two large-scale collections of two large classes of information objects: Simple Dublin Core records and online encyclopedia articles.
Studies of immigrant information behavior need to be situated within the dynamic contexts of globalization and diaspora. Most immigrant-focused information-science research has focused on distinctly local, place-based scenarios, while diasporic research on information behavior, in contrast, focuses mainly on issues of transnational identity online. This article suggests a methodology that mediates between these two poles, by recognizing the place-based, lived realities of immigrant communities while also acknowledging the existence of complex, globalized diasporic information environments. We refer to this methodology as the Diasporic Information Environment Model (DIEM), and argue that local information-science research can be extended to address the globalized experiences of immigrant communities.
Recreational reading among young people is reportedly on the decline in the United States. Some researchers have suggested that supporting children's strategies for book selection is crucial to encouraging children to engage with books, indicating that improving these strategies might increase the amount of reading they do. In response, this study explores how elementary-school children select books for recreational reading using a digital library. The work extends traditional models of relevance assessment with reader-response theory, employing the concept of "aesthetic relevance": the potential of a document to provide a suitable reading experience. Individuals define aesthetic relevance in personal terms and apply it as they assess documents, much as they do in traditional relevance assessment. This study identified a total of 46 factors organized along seven dimensions that influence children's assessment of the aesthetic relevance of books during selection. The analysis yielded differences in the prevalence of the aesthetic-relevance factors that children mention at various stages of book selection. In addition, the children exhibited differences by age and subtle differences by gender in the frequency of mention of various aesthetic-relevance factors. Recommendations drawn from the findings are offered to improve systems design and literacy education in order to enhance children's access to books and to promote recreational reading.
This article introduces a team-based model of researchers in a specialty and investigates the manifestation of such teams in a specialty's literature. The proposed qualitative behavioral model, with its mathematical expression as a growth model, is significant because it simultaneously describes the two phenomena of collaboration and author productivity (Lotka's law) in a specialty. The model is nested: A team process models the creation of research teams and the success-breeds-success process of their production of articles, while at a lower level the productivity of authors within teams is also modeled as a success-breeds-success process. Interteam collabora- tion (weak ties) is modeled as random events. This simple growth model is shown to faithfully mimic six network metrics of bipartite article-author networks. The model is demonstrated on three example article collections from specialties that have a wide range of degree of collaboration: (a) a distance education collection with low collaboration degree, (b) a complex networks collection with typical collaboration degree, and (c) an atrial ablation collection with heavy collaboration degree.
Selection power is taken as the fundamental value for information retrieval systems. Selection power is regarded as produced by selection labor, which itself separates historically into description and search labor. As forms of mental labor, description and search labor participate in the conditions for labor and for mental labor. Concepts and distinctions applicable to physical and mental labor are indicated, including the necessity of labor for survival, the idea of technology as a human construction, and the possibility of the transfer of human labor to technology. Distinctions specific to mental labor, particularly between semantic and syntactic labor, are introduced. Description labor is exemplified by cataloging, classification, and database description, can be more formally understood as the labor involved in the transformation of objects for description into searchable descriptions, and is also understood to include interpretation. The costs of description labor are discussed. Search labor is conceived as the labor expended in searching systems. For both description and search labor, there has been a progressive reduction in direct human labor, with its syntactic aspects transferred to technology, effectively compelled by the high relative costs of direct human labor compared to machine processes.
In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
We present an approach to enhancing information access through Web structure mining in contrast to traditional approaches involving usage mining. Specifically, we mine the hardwired hierarchical hyperlink structure of Web sites to identify patterns of term-term co-occurrences we call Web functional dependencies (FDs). Intuitively, a Web FD 'x -> y' declares that all paths through a site involving a hyperlink labeled x also contain a hyperlink labeled y. The complete set of FDs satisfied by a site help characterize (flexible and expressive) interaction paradigms supported by a site, where a paradigm is the set of explorable sequences therein. We describe algorithms for mining FDs and results from mining several hierarchical Web sites and present several interface designs that can exploit such FDs to provide compelling user experiences.
Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.
Documents discussing public affairs, common themes, interesting products, and so on, are reported and distributed on the Web. Positive and negative opinions embedded in documents are useful references and feed-backs for governments to improve their services, for companies to market their products, and for customers to purchase their objects. Web opinion mining aims to extract, summarize, and track various aspects of subjective information on the Web. Mining subjective information enables traditional information retrieval (IR) systems to retrieve more data from human viewpoints and provide information with finer granularity. Opinion extraction identifies opinion holders, extracts the relevant opinion sentences, and decides their polarities. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the nonsupportive evidence. Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions. To demonstrate and evaluate the proposed opinion mining algorithms, news and bloggers' articles are adopted. Documents in the evaluation corpora are tagged in different granularities from words, sentences to documents. In the experiments, positive and negative sentiment words and their weights are mined on the basis of Chinese word structures. The f-measure is 73.18% and 63.75% for verbs and nouns, respectively. Utilizing the sentiment words mined together with topical words, we achieve f-measure 62.16% at the sentence level and 74.37% at the document level.
With more and more information available on the Internet, the task of making personalized recommendations to assist the user's navigation has become increasingly important. Considering there might be millions of users with different backgrounds accessing a Web site everyday, it is infeasible to build a separate recommendation system for each user. To address this problem, clustering techniques can first be employed to discover user groups. Then, user navigation patterns for each group can be discovered, to allow the adaptation of a Web site to the interest of each individual group. In this paper, we propose to model user access sequences as stochastic processes, and a mixture of Markov models based approach is taken to cluster users and to capture the sequential relationships inherent in user access histories. Several important issues that arise in constructing the Markov models are also addressed. The first issue lies in the complexity of the mixture of Markov models. To improve the efficiency of building/maintaining the mixture of Markov models, we develop a lightweight adaptive algorithm to update the model parameters without recomputing model parameters from scratch. The second issue concerns the proper selection of training data for building the mixture of Markov models. We investigate two different training data selection strategies and perform extensive experiments to compare their effectiveness on a real dataset that is generated by a Web-based knowledge management system, Livelink.
With the overwhelming volume of information, the task of finding relevant information on a given topic on the Web is becoming increasingly difficult. Web search engines hence become one of the most popular solutions available on the Web. However, it has never been easy for novice users to organize and represent their information needs using simple queries. Users have to keep modifying their input queries until they get expected results. Therefore, it is often desirable for search engines to give suggestions on related queries to users. Besides, by identifying those related queries, search engines can potentially perform optimizations on their systems, such as query expansion and file indexing. In this work we propose a method that suggests a list of related queries given an initial input query. The related queries are based in the query log of previously submitted queries by human users, which can be identified using an enhanced model of association rules. Users can utilize the suggested related queries to tune or redirect the search process. Our method not only discovers the related queries, but also ranks them according to the degree of their relatedness. Unlike many other rival techniques, it also performs reasonably well on less frequent input queries.
Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.
English is typically considered not only the language of business, but also the language of the Internet. This brief communication explores the cultural implications of the power position of English as the language of the Internet and discusses the likelihood of its continued dominance.
This communication comments upon the article "Measuring the Utility of Journals in the Crime-Psychology Field: Beyond the Impact Factor" by Walters (2006), recently published in JASIST. Walters' article is utilized as a vehicle to demonstrate certain misconceptions and incorrect value judgments common to the literature on the impact factor, arising from a lack of understanding of this measure's role in the thought structure of its creator, Eugene Garfield, as well as of his utilization of this measure.
Background: Publication and citation rates mark the research activity and research quality of scientists. Question: Are bibliometric indicators valid instruments for early recognition of high quality researchers? Subjects and methods: The number of publications and citations of 26 assistant, associate and full professors of German psychiatry born after 1947 was analysed in their 30(th) and 31(st) year of age and between 1996 and 2000. Results: 58% of the selected 30 or 31 year old scientists had at least one publication in a journal with an impact factor, 93% of these as first or single author. 42% in this age group were at least cited once. Publication and citation rates in the early stage of a career provide hints on the later bibliometric data and the academic degree of scientists. Conclusion: High quality researchers can be recognised early in their careers by means of worldwide accessible bibliometric indicators.
Data have been collected on 55 members of the AEA Executive Committees for the years 1950-1960 (inclusive) on a variety of variables that measure the merit and non-merit characteristics of the economists. A logit is estimated in which the dependent variable is a dummy variable for whether an Executive Committee member was ever elected President of the American Economic Association (AEA). The number of publications and citations are important determinants of election. Receiving a PhD from one of the top three schools does not help and living in the South does not hurt. Economists who were older in 1956 were more likely to have eventually been elected to the AEA Presidency.
In a recently published article, HARGENS & FIERTING (2006) apply the row-column (RC) association model to peer review to analyze the association between two referees' recommendations and an editor's decision at two scholarly journals. In the present study we analyze 1,954 applications to the Boehringer Ingelheim Fonds (B.I.F.) for doctoral and post-doctoral fellowships, which the B.I.F. evaluates in three stages (first stage: evaluation by an external reviewer; second stage: evaluation by an internal reviewer (staff member); third stage: final decision by the B.I.F. Board of Trustees). Using the RC association model, we show - in accordance with the results of HARGENS & HERTING (2006) - that a single latent dimension is sufficient to account for the association between (internal and external) reviewers' recommendations and the fellowship award decision by the Board. This result indicates that the latent dimension underlying reviewers' recommendations and the Board's decisions reflects the merit of an application being evaluated. While the statistical analyses establish that overall, favorable evaluations by the reviewers correspond with favorable decisions by the Board (and vice versa), the ordering of the scale values yielded by the estimation of the RC association model also shows that internal reviewers' recommendations have a greater influence on the Board's decisions than recommendations by external reviewers.
This paper intends to observe the Asian R&D output on 'Spices' for the co-relation between the Asian Countries and that of the sub-fields of Spices Research and the dynamic changes, if any, in their research priorities. The chosen study period is two decades: 1983-2002. Hort CD is the source database for this research. On these premises, the frequency of keywords found in the Descriptor Field of each record in the chosen database. Mapping technique is adopted for analysis using Data and Text Mining (DTM) software. This enabled to correlate the countries versus the subject priority amongst the Asian Countries during the study period. The inferences drawn are reported along with the interpretations.
It is a well-known empirical fact that when informetric processes are observed over an extending period of time, the entire shape of the distribution changes. In particular, it has been shown that concentration aspects change. In this paper the recently introduced co-concentration coefficient (C-CC) is investigated via simple stochastic models of informetric processes to investigate its time-dependence. It is shown that it is important to distinguish between situations where the zero-producers can be counted and those where they cannot. A previously published data set is used to illustrate how the empirical C-CC develops in time and the general features are compared with those derived from the theoretical model.
The literature on publication counting demonstrates the use of various terminologies and methods. In many scientific publications, no information at all is given about the counting methods used. There is a lack of knowledge and agreement about the sort of information provided by the various methods, about the theoretical and technical limitations for the different methods and about the size of the differences obtained by using various methods. The need for precise definitions and terminology has been expressed repeatedly but with no success. Counting methods for publications are defined and analysed with the use of set and measure theory. The analysis depends on definitions of basic units for analysis (three chosen for examination), objects of study (three chosen for examination) and score functions (five chosen for examination). The score functions define five classes of counting methods. However, in a number of cases different combinations of basic units of analysis, objects of study and score functions give identical results. Therefore, the result is the characterization of 19 counting methods, five complete counting methods, five complete-normalized counting methods, two whole counting methods, two whole-normalized counting methods, and five straight counting methods. When scores for objects of study are added, the value obtained can be identical with or higher than the score for the union of the objects of study. Therefore, some classes of counting methods, including the classes of complete, complete-normalized and straight counting methods, are additive, others, including the classes of whole and whole-normalized counting methods, are non-additive. An analysis of the differences between scores obtained by different score functions and therefore the differences obtained by different counting methods is presented. In this analysis we introduce a new kind of objects of study, the class of cumulative-turnout networks for objects of study, containing full information on cooperation. Cumulative-turnout networks are all authors, institutions or countries contributing to the publications of an author, an institute or a country. The analysis leads to an interpretation of the results of score functions and to the definition of new indicators for scientific cooperation. We also define a number of other networks, internal cumulative-turnout networks, external cumulative-turnout networks, underlying networks, internal underlying networks and external underlying networks. The networks open new opportunities for quantitative studies of scientific cooperation.
Using the MACTOR (Matrix of Alliances and Conflicts: Tactics, Objectives and Recommendations) method, a set of 13 related journals covering the subject category "Chemistry, Multi disciplinary" was analyzed in terms of direct and indirect reciprocal influences (measured by relatedness indexes Rji), their positions towards a generic set of common objectives (total cites; impact factor; immediacy index; number of published articles; cited half life) and the convergences (Actors x Actors and Actors x Objectives) existing in the above-mentioned relatedness network. The study identified 4 types of actors: dominant (3), independent (8), relay (1) and dominated (1)- Maps of: influences and dependences between actors; convergence between actors; net distances between actors and actors-objectives relationships are presented, together with short interpretations. Defining scientific journals as actors on a specific "knowledge market", identifying influences and dependences between them and positioning these journals towards a set of measurable objectives creates an interesting possibility to define "relationships of power" of a strategic nature and enables the introduction of more complex future-oriented scientometric analyses than those based solely on standard bibliometric indicators such as the impact factor.
During the 1974-2004 period, the sleep literature had quadrupled (2384 publications in 1974, and 9721 in 2004) while overall scientific productivity had only doubled. The set of the seven most productive countries (USA, Japan, United Kingdom, Germany, France, Canada and Italy) in sleep research, and the geographical region distribution remained stable over the three decades. On the other hand several indicators appeared in the sleep research literature during the 1990s: the increasing productivity of sleep researchers; the growing number of countries publishing on sleep; the continuous creation of sleep-focused journals; the scattering of sleep publication among increasingly more scientific journals; the turnover among the leading journals; and the emergence of new entities such as China, Turkey, and the European Union.
Many scientific fields share common interests for research and education. Yet, very often, these fields do not communicate to each other and are unaware of the work in other fields. Understanding the commonalities and differences among related fields can broaden our understanding of the interested phenomena from various perspectives, better utilize resources, enhance collaboration, and eventually move the related fields forward together. In this article, we present a conceptual framework, namely the Information-Model or I-model, to describe various aspects of information related fields. We consider this a timely effort in light of the evolutions of several information related fields and a set of questions related to the identities of these fields. It is especially timely in defining the newly formed Information Field from a community of twenty some information schools. We posit that the information related fields are built on a number of other fields but with their own unique foci and concerns. That is, core components from other fundamental fields interact and integrate with each other to form dynamic and interesting information related fields that all have to do with information, technology, people, and organ ization/society. The conceptual framework can have a number of uses. Besides providing a unified view of these related fields, it can be used to examine old case studies, recent research projects, educational programs and curricula concerns, as well as to illustrate the commonalities and differences with the information related fields.
We explore the relationship between creativity and both chronological and professional age in information science using a novel bibliometric approach that allows us to capture the shape of a scholar's career. Our approach draws on D.W. Galenson's (2006) analyses of artistic creativity, notably his distinction between conceptual and experimental innovation, and also H.C. Lehman's (1953) seminal study of the relationship between stage of career and outstanding performance. The data presented here suggest that creativity is expressed in different ways, at different times, and with different intensities in academic information science.
This article shows how finite-state methods can be employed in a new and different task: the conflation of personal name variants in standard forms. In bibliographic databases and citation index systems, variant forms create problems of inaccuracy that affect information retrieval, the quality of information from databases, and the citation statistics used for the evaluation of scientists' work. A number of approximate string matching techniques have been developed to validate variant forms, based on similarity and equivalence relations. We classify the personal name variants as nonvalid and valid forms. In establishing an equivalence relation between valid variants and the standard form of its equivalence class, we defend the application of finite-state transducers. The process of variant identification requires the elaboration of: (a) binary matrices and (b) finite-state graphs. This procedure was tested on samples of author names from bibliographic records, selected from the Library and Information Science Abstracts and Science Citation Index Expanded databases. The evaluation involved calculating the measures of precision and recall, based on completeness and accuracy. The results demonstrate the usefulness of this approach, although it should be complemented with methods based on similarity relations for the recognition of spelling variants and misspellings.
In the article, the author aims to clarify some of the issues surrounding the discussion regarding the usefulness of a substantive classification theory in information science (IS) by means of a broad perspective. By utilizing a concrete example from the High Accuracy Retrieval from Documents (HARD) track of a Text Retrieval Conference (TREC), the author suggests that the "bag of words" approach to information retrieval (IR) and techniques such as relevance feedback have significant limitations in expressing and resolving complex user information needs. He argues that a comprehensive analysis of information needs involves explicating often implicit assumptions made by the authors of scholarly documents, as well as everyday texts such as news articles. He also argues that progress in IS can be furthered by developing general theories that are applicable to multiple domains. The concrete example of application of the domain-analytic approach to subject analysis in IS to the aesthetic evaluation of works of information arts is used to support this argument.
Although the body of literature pertaining to the study of micro, context-specific behavior on the Web is growing, there lacks a global, macro analysis as to the behavioral dimensions of Web users' online activities. In an attempt to fill in the gap, this study proposes a three-dimensional, cubic typology for the characterization of Web users' online information behavior. We discuss a set of hypotheses concerning the relationships among these dimensions as well as those between these dimensions and related behavioral aspects. Online panel data consisting of month-long clickstreams of 2,022 Web users obtained from lnsightXplore, Taiwan are made available for the empirical validation of the hypotheses. We found that a Web user's width (i.e., number of categories of Web sites explored), length (i.e., number of sites visited per category), and depth (i.e., number of pages downloaded per site) of online information behavior are highly correlated. Furthermore, these three dimensions of the behavioral "cube" are positively associated with speed of navigation, but negatively associated with the Web users' explicit online information search propensity and the degree of relatedness among the sites they visited.
The study adopts a naturalistic approach to investigate users' interaction with a browsable MeSH (medical subject headings) display designed to facilitate query construction for the PubMed bibliographic database. The purpose of the study is twofold: first, to test the usefulness of a browsable interface utilizing the principle of faceted classification; and second, to investigate users' preferred query submission methods in different problematic situations. An interface that incorporated multiple query submission methods-the conventional single-line query box as well as methods associated the faceted classification display was constructed. Participants' interactions with the interface were monitored remotely over a period of 10 weeks; information about their problematic situations and information retrieval behaviors were also collected during this time. The traditional controlled experiment was not adequate in answering the author's research questions; hence, the author provides his rationale for a naturalistic approach. The study's findings show that there is indeed a selective compatibility between query submission methods provided by the MeSH display and users' problematic situations. The query submission methods associated with the display were found to be the preferred search tools when users' information needs were vague and the search topics unfamiliar. The findings support the theoretical proposition that users engaging in an information retrieval process with a variety of problematic situations need different approaches. The author argues that rather than treat the information retrieval system as a general purpose tool, more attention should be given to the interaction between the functionality of the tool and the characteristics of users' problematic situations.
The purpose of this study is to investigate the role that domain knowledge plays in users' interactions with information systems. Two groups of users with two different areas of expertise were recruited for 34 experimental sessions to answer two research questions: (a) Does one group's domain knowledge (Geography majors) affect their performance on an information system more than another group's domain knowledge (Computer Science majors)? (b) Are there any differences and/or similarities in the performance of the two groups in terms of the information problem-solving processes? Task completion time, task completeness, and mouse movements were collected while users performed six tasks during the experimental sessions. Data were analyzed through repeated measures. An ANOVA was used for task completion time and task completeness. GOMS (Goals, Operators, Methods, and Selection rules) was also used for mouse movements to identify some of the similarities and differences between the two groups' information problem-solving processes. The GOMS analysis found the two groups' processing activities to be remarkably similar. The ANOVA results indicate that expertise type was not a major factor influencing user performance, but task and task combined with the type of expertise played a significant role in the users' interactions with the interface. External operators, goal decompositions, and methods related to the problem solving process through GOMS are also presented.
This article statistically analyzes how the citation impact of articles deposited in the Condensed Matter section of the preprint server ArXiv (hosted by Cornell University), and subsequently published in a scientific journal, compares to that of articles in the same journal that were not deposited in the archive. Its principal aim is to further illustrate and roughly estimate the effect of two factors, "early view" and "quality bias," on differences in citation impact between these two sets of papers, using citation data from Thomson Scientific's Web of Science. It presents estimates for a number of journals in the field of condensed matter physics. To discriminate between an "open access" effect and an early view effect, longitudinal citation data were analyzed covering a time period as long as 7 years. Quality bias was measured by calculating ArXiv citation impact differentials at the level of individual authors publishing in a journal, taking into account coauthorship. The analysis provided evidence of a strong quality bias and early view effect. Correcting for these effects, there is in a sample of six condensed matter physics journals studied in detail no sign of a general "open access advantage" of papers deposited in ArXiv. The study does provide evidence that ArXiv accelerates citation due to the fact that ArXiv makes papers available earlier rather than makes them freely available.
Citation and content analyses of eight American Chemical Society (ACS) journals in a range of fields of chemistry were used to describe the use of Web-based information resources by the authors and readers of the scholarly literature of chemistry. The analyses indicate that even though the number of Web-based information resources has grown steadily over the past decade, chemists are not taking full advantage of freely available Web-based resources. They are, however, making use of the ACS Electronic Supporting Information archive. The content of the Web-based resources that are used is primarily text based, and the URLs are provided in the articles' reference lists and experimental sections. The presence of a reference to a Web-based resource in a chemistry article does not influence its rate of citation, even though the viability of the URLs was found to erode with time. Comparison of citation and online access data reveals that at the highest levels of citation, articles also garner high levels of online access. This was especially true for articles describing a technique or methodology. Even though chemists do not incorporate large numbers of freely available Web-based resources into their publications, an increasingly important component of a chemist's information behavior for the direct support of his or her research is unfettered bench-top access via the Web.
Multimedia Messaging Services (MMS) is a new medium that enriches people's personal communication with their business partners, friends, or family. Following the success of Short Message Services, MMS has the potential to be the next mobile commerce "killer application" which is useful and popular among consumers; however, little is known about why people intend to accept and use it. Building upon the motivational theory and media richness theory, the research model captures both extrinsic (e.g., perceived usefulness and perceived ease of use) and intrinsic (e.g., perceived enjoyment) motivators as well as perceived media richness to explain user intention to use MMS. An online survey was conducted and 207 completed questionnaires were collected. By integrating the motivation and the media richness perspectives, the research model explains 65% of the variance. In addition, the results present strong support to the existing theoretical links as well as to those newly hypothesized in this study. Implications from the current investigation for research and practice are provided.
This article summarizes much of what is known from the communication and information literacy fields about the skills that Internet users need to assess the credibility of online information. The article reviews current recommendations for credibility assessment, empirical research on how users determine the credibility of Internet information, and describes several cognitive models of online information evaluation. Based on the literature review and critique of existing models of credibility assessment, recommendations for future online credibility education and practice are provided to assist users in locating reliable information online. The article concludes by offering ideas for research and theory development on this topic in an effort to advance knowledge in the area of credibility assessment of Internet-based information.
The article reports a field study which examined the mental models of 80 undergraduates seeking information for either a history or psychology course essay when they were in an early, exploration stage of researching their essay. This group is presently at a disadvantage when using thesaurus-type schemes in indexes and online search engines because there is a disconnect between how domain novice users of IR systems represent a topic space and how this space is represented in the standard IR system thesaurus. The study attempted to (a) ascertain the coding language used by the 80 undergraduates in the study to mentally represent their topic and then (b) align the mental models with the hierarchical structure found in many thesauri. The intervention focused the undergraduates' thinking about their topic from a topic statement to a thesis statement. The undergraduates were asked to produce three mental model diagrams for their real-life course essay at the beginning, middle, and end of the interview, for a total of 240 mental model diagrams, from which we created a 12-category mental model classification scheme. Findings indicate that at the end of the intervention, (a) the percentage of vertical mental models increased from 24 to 35% of all mental models; but that (b) 3rd-year students had fewer vertical mental models than did 1st-year undergraduates in the study, which is counterintuitive. The results indicate that there is justification for pursuing our research based on the hypothesis that rotating a domain novice's mental model into a vertical position would make it easier for him or her to cognitively connect with the thesaurus's hierarchical representation of the topic area.
The Institute for Scientific Information's (ISI, now Thomson Scientific, Philadelphia, PA) citation databases have been used for decades as a starting point and often as the only tools for locating citations and/or conducting citation analyses. The ISI databases (or Web of Science (WoS]), however, may no longer be sufficient because new databases and tools that allow citation searching are now available. Using citations to the work of 25 library and information science (LIS) faculty members as a case study, the authors examine the effects of using Scopus and Google Scholar (GS) on the citation counts and rankings of scholars as measured by WoS. Overall, more than 10,000 citing and purportedly citing documents were examined. Results show that Scopus significantly alters the relative ranking of those scholars that appear in the middle of the rankings and that GS stands out in its coverage of conference proceedings as well as international, non-English language journals. The use of Scopus and GS, in addition to WoS, helps reveal a more accurate and comprehensive picture of the scholarly impact of authors. The WoS data took about 100 hours of collecting and processing time, Scopus consumed 200 hours, and GS a grueling 3,000 hours.
In a recent article (Burrell, 2006), the author pointed out that the version of Lorenz concentration theory presented by Egghe (2005a, 2005b) does not conform to the classical statistical/econometric approach. Rousseau (2007) asserts confusion on our part and a failure to grasp Egghe's construction, even though we simply reported what Egghe stated. Here the author shows that Egghe's construction rather than "including the standard case," as claimed by Rousseau, actually leads to the Leimkuhler curve of the dual function, in the sense of Egghe. (Note that here we distinguish between the Lorenz curve, a convex form arising from ranking from smallest to largest, and the Leimkuhler curve, a concave form arising from ranking from largest to smallest. The two presentations are equivalent.See Burrell, 1991, 2005; Rousseau, 2007).
Soil science is a relatively young and specialised field of science. This note discusses the use of the h index as a scientific output measure in soil science. We explore the governing factors of h index in soil science: the number of soil scientists, the number of papers published, the average number of citations, and the age of the scientist. We found the average relationship between h index and scientific age for soil science: h = 0.7 t. The h index for soil science is smaller than other major science disciplines but norms for h need to be established.
Multivariate methods were successfully employed in a comprehensive scientometric analysis of geostatistics research, and the publications data for this research came from the Science Citation Index and spanned the period from 1967 to 2005. Hierarchical cluster analysis (CA) was used in publication patterns based on different types of variables. A backward discriminant analysis (DA) with appropriate statistical tests was then conducted to confirm CA results and evaluate the variations of various patterns. For authorship pattern, the 50 most productive authors were classified by CA into 4 groups representing different levels, and DA produced 92.0% correct assignment with high reliability. The discriminant parameters were mean impact factor (MIF), annual citations per publication (ACPP), and the number of publications by the first author; for country/region pattern, CA divided the top 50 most productive countries/regions into 4 groups with 95.9% correct assignments, and the discriminant parameters were MIF, ACCP, and independent publication (IP); for institute pattern, 3 groups were identified from the top 50 most productive institutes with nearly 88.0% correct assignment, and the discriminant parameters were MIF, ACCP, IP, and international collaborative publication; last, for journal pattern, the top 50 most productive journals were classified into 3 groups with nearly 98.0% correct assignment, and its discriminant parameters were total citations, impact factor and ACCP. Moreover, we also analyzed general patterns for publication document type, language, subject category, and publication growth.
In the fields of physics, astronomy, geophysics, mathematics, and chemistry, the numbers of American papers published depend only on the membership numbers of their scientific societies and not upon improved facilities or instrumental breakthroughs, although those improvements have caused the scientific contents of those papers to be far better in recent decades. In the past 30-35 years there have been no increases in the average annual number of published papers per scientist in those fields.
The Transtheoretical Model of behaviour change is currently one of the most promising models in terms of understanding and promoting behaviour change related to the acquisition of healthy living habits. By means of a bibliographic search of papers adopting a TTM approach to obesity, the present bibliometric study enables the scientific output in this field to be evaluated. The results obtained reveal a growing interest in applying this model to both the treatment of obesity and its prevention. Otherwise, author and journal outputs fit the models proposed by Lotka and Bradford, respectively.
We propose a simple way to put in a common scale the h values of researchers working in different scientific ISI fields, so that the foreseeable misuse of this index for inter-areas F comparison might be prevented, or at least, alleviated.
Owing to some discussions about manipulating impact factor by requesting authors to increase their citations to the publication journal, we theoretically establish a mathematical expression of a relation between the journal self-citation rate and its impact factor by the single-factor method in this paper. Based on self-citation data of some journals in JCR and the observed relation between journal impact factor and the self-cited rate, we analyze the possibility that journal editors manipulate impact factors of their journals by raising the self-cited rate. Finally, we make some suggestions for supervising this crude way of active manipulating the impact factor.
The internationalization of ten of China's English-language scientific journals is analyzed based on their Impact Factor, Total Citation, JCR list rank, international paper proportion and international citation proportion. Six of these journals were financed three times by the National Natural Science Foundation of China (NNSF) between 2001-2006 and four journals maintained a higher impact factor (> 1.0) in 2003-2005. The data show that though the total trend of Impact Factor and Total Citation keeps rising, their subject rank has shown a slight decrease. Moreover, the proportion of international papers and international citations do not match their JCR rank and IF: high rank journals have a low proportion of international papers (Chinese Phys Lett, Chinese Phys) and low rank journals have a high Impact Factor (Cell Res, Asian J Androl). This inconsistency may result from their insufficient internationalization either in international paper proportion (less than 20%) or in the amount of high-quality manuscripts, probably caused by their local journal title, circulation and low IF. Suggested means of improving internationalization include encouraging Chinese scientists to cite more home journals when they publish their papers in foreign journals; soliciting the submission of international co-authorships based on the unavailability of pure foreign authorship; cooperating with internationally recognized publishers to utilize their globalization platform; employing overseas scientists to recruit international papers; improving writing style and content, to enable greater accessibility to worldwide readers.
In a follow-up study of a previous analysis concerning the period 1980-1999, we found that inter-/multidisciplinary remained a highlighted title term both in science and social science papers. It is suggested that science policy should give proper priority to inter- and multidisciplinary research.
The fractions of single-authored papers in four science fields (astronomy, physics, chemistry, and biology) were determined at five-year intervals during 1975-2005. In each case the distribution is best fitted with an exponential function that never reaches zero, implying that single-authored papers will continue to be published in the foreseeable future. This is contrary to the prediction that they would become extinct.
This study proposes a new methodology that allows for the generation of scientograms of major scientific domains, constructed on the basis of cocitation of Institute of Scientific Information categories, and pruned using PathfinderNetwork, with a layout determined by algorithms of the spring-embedder type (Kamada-Kawai), then corroborated structurally by factor analysis. We present the complete scientogram of the world for the Year 2002. It integrates the natural sciences, the social sciences, and arts and humanities. Its basic structure and the essential relationships therein are revealed, allowing us to simultaneously analyze the macrostructure, microstructure, and marrow of worldwide scientific output.
A survey of 4,032 youth in grades 5 through 12 was conducted to determine the impact youth's use of the Internet was having on their use of the public library. Results indicated that 100% of the youth had access to the Internet from one or more locations, and that although one quarter of the youth accessed the Internet at the public library, the public library was the least frequently used source of Internet access. For youth without Internet access at home, the public library was also the least used alternate source of access. Approximately 69% of the youth reported that they had visited a public library during the school year. Having Internet access at home did not affect whether or not youth visited the library; however, Internet access at home appears to have affected the frequency with which youth visit the library. Youth without Internet access at home visited the library more frequently, whereas youth with Internet access at home visited the library less frequently. Use of the Internet also appeared to have diminished youth's need to use the public library as a source of personal information; however, use of the Internet appeared not to have affected their use of the public library for school work or for recreation. Among youth, use of both the Internet and the public library appear to be complementary activities.
Latent semantic analysis has been used for several years to improve the performance of document library searches. We show that latent semantic analysis, augmented with a Part-of-Speech Tagger, may be an effective algorithm for classifying a textual document as well. Using Brille's Part-of-Speech Tagger, we truncate the singular value decomposition used in latent semantic analysis to reduce the size of the word-frequency matrix. This method is then tested on a toy problem, and has shown to increase search accuracy. We then relate these results to natural language processing and show that latent semantic analysis can be combined with context free grammars to infer semantic meaning from natural language. English is the natural language currently being used.
The information seeking behavior of academic scientists is being transformed by the availability of electronic resources for searching, retrieving, and reading scholarly materials. A census survey was conducted of academic science researchers at the University of North Carolina at Chapel Hill to capture their current information seeking behavior. Nine hundred two subjects (26%) completed responses to a 15-minute Web-based survey. The survey questions were designed to quantify the transition to electronic communications and how this affects different aspects of information seeking. Significant changes in information seeking behavior were found, including increased reliance on web based resources, fewer visits to the library, and almost entirely electronic communication of information. The results can guide libraries and other information service organizations as they adapt to meet the needs of today's information searchers. Simple descriptive statistics are reported for the individual questions. Additionally, analysis of results is broken out by basic science and medical science departments. The survey tool and protocol used in this study have been adopted for use in a nationwide survey of the information seeking behavior of academic scientists.
This exploratory study compares two approaches to understanding the "collaboration propensity" of individual researchers. On the one hand, social comparisons of disciplines would suggest that collaboration is a function of orientation toward individual versus collective responsibility for discovery. A contrasting approach would hold that collaboration depends on the work researchers are engaged in-when it is useful to collaborate, they will do so regardless of the social climate. Results presented here suggest that this latter approach is potentially more powerful but that there are complexities in measurement and operationalization that urge a more nuanced treatment of collaboration propensity.
Advocates of geographic information technologies (GIT) have long claimed significant advantages to bringing a spatially oriented perspective to bear on organizational and policy decision making, however, for a variety of reasons, these advantages have been more difficult to realize in practice than might be supposed. In this article, we argue that awareness and, appreciation of the potential value of GIT changed dramatically as a result of the World Trade Center (WTC) attacks on September 11, 2001. We use a structurationist theoretical perspective to show that GITs were "enacted" in a variety of novel ways by social actors thrust together by the demands of the crisis to form interorganizational systems, and we illustrate this process through three extended examples of GIT adaptation and innovation during the crisis. One lasting consequence of this episode is that GITs have moved from serving as a relatively static reference tool to a dynamic decision-making tool for emergency situations. We conclude by suggesting that the crisis was a catalyst for change in the use of GIT and, reciprocally, in the social structures in which GIT will, be deployed in the future.
Information systems (IS) project failure is a costly and common problem. This is despite advances in development tools and technologies. In this article, we argue that one reason for this is the failure of project postmortems to generate constructive "lessons learned" from previous projects. Over time, the ineffective practices would persist in the organization, rendering it resistant to change. The attribution theory literature serves as one of the few promising theoretical bases to explain why project post-mortems fail. A case study of a project post-mortem undertaken for an abandoned electronic procurement system project is discussed and analyzed. We identify five antecedent conditions of attribution error: the presence of self-appointed mindsets, the general persistence of negative beliefs, memory decay, selective recall of project events and the influence of power dynamics within the organization. We discuss the research and practical implications of these findings and suggest how the problem of attribution error may be minimized in project post-mortems.
User studies provide libraries with invaluable insight into their users' information needs and behaviors, allowing them to develop services that correspond to these needs. This insight has become even more important for libraries since the advent of the Internet. The Internet has brought about a development of information technologies and electronic information sources that have had a great impact on both the ways users search for information and the ways libraries manage information. Although humanists represent an important group of users for academic libraries, research studies into their information-seeking behavior since the advent of the Internet have been quite scarce (Ellis & Oldman, 2005) in the past decade. This study presents updated research on a group of humanists, Jewish studies scholars living in Israel, as information users in the digital age based on two categories: (a) the use of formal and informal information channels, and (b) the use of information technologies and their impact on humanistic research.
Inspired by the dependency degree gamma, a traditional measure in Rough Set Theory, we propose a generalized dependency degree, Gamma, between two given sets of attributes, which counts both deterministic and indeterministic rules while gamma counts only deterministic rules. We first give its definition in terms of equivalence relations and then interpret it in terms of minimal rules, and further describe the algorithm for its computation. To understand Gamma better, we investigate its various properties. We further extend Gamma to incomplete information systems. To show its advantage, we make a comparative study with the conditional entropy and gamma in a number of experiments. Experimental results show that the speed of the new C4.5 using Gamma is greatly improved when compared with the original C4.5R8 using conditional entropy, while the prediction accuracy and tree size of the new C4.5 are comparable with the original one. Moreover, Gamma achieves better results on attribute selection than gamma. The study shows that the generalized dependency degree is an informative measure in decision trees and in attribute selection.
This article discusses the rapidly emerging field of computer-based assessment for adaptive content in e-learning (National Research Council, 2002), which we call differentiated e-learning. In e-learning products, a variety of assessment approaches are being used for such diverse purposes as adaptive delivery of content, individualizing learning materials, dynamic feedback, cognitive diagnosis, score reporting, and course placement (Gifford, 2001). A recent paper at the General Teaching Council Conference in London, England, on teaching, learning, and accountability described assessment for personalized learning through e-learning products as a "quiet revolution" taking place in education (Hopkins, 2004). In our study, we examine approaches for the use of assessment evidence in e-learning in four case studies. The products in the case studies were selected for exhibiting at least one exemplary aspect regarding assessment and measurement. The principles of the Berkeley Evaluation & Assessment Research Center Assessment System (Wilson & Sloane, 2000) are used as a framework of analysis for these products with respect to key measurement principles.
This study expands the perspective of knowledge sharing by categorizing the different types of knowledge that individuals shared with one another and examining the patterns of motivators and barriers of knowledge sharing across three online environments pertaining to the following professional practices-advanced nursing practice, Web development, and literacy education. The patterns indicate the different possible combinations of motivators or barriers that may exist in individuals. Data were gathered through online observations and semi-structured interviews with 54 participants. The cross-case analysis shows that the most common type of knowledge shared across all three environments was practical knowledge. Overall, seven motivators were found. Analysis also suggests that the most common combination of motivators for knowledge sharing was collectivism and reciprocity. A total of eight barriers were identified. The most common combination of barriers varied in each online environment. Discussions as to how the types of professional practices may contribute to the different results are provided, along with implications and future possible research directions.
This study develops three alternative models of academic library Web site usage based on the Technology Acceptance Model (TAM). The three alternative models depict relationships among various intrinsic and extrinsic determinant factors of an academic library's Web site usage. The four factors included in the models are perceived ease-of-use, perceived usefulness, service functionality, and task functionality. These four factors are hypothesized to affect directly or indirectly both factors of satisfaction and intention-to-use. LISREL analysis using survey data shows that the best-fit model is the "Dual Mediation Impact" Model. Research and managerial implications for the academic library are discussed. Future research directions and limitations also are provided.
Previous research assessing the effectiveness of structured abstracts has been limited in two respects. First, when comparing structured abstracts with traditional ones, investigators usually have rewritten the original abstracts, and thus confounded changes in the layout with changes in both the wording and the content of the text. Second, investigators have not always included the title of the article together with the abstract when asking participants to judge the quality of the abstracts, yet titles alert readers to the meaning of the materials that follow. The aim of this research was to redress these limitations. Three studies were carried out. Four versions of each of four abstracts were prepared. These versions consisted of structured/traditional abstracts matched in content, with and without titles. In Study 1, 64 undergraduates each rated one of these abstracts on six separate rating scales. In Study 2, 225 academics and research workers rated the abstracts electronically, and in Study 3, 252 information scientists did likewise. In Studies 1 and 3, the respondents rated the structured abstracts significantly more favorably than they did the traditional ones, but the presence or absence of titles had no effect on their judgments. In Study 2, no main effects were observed for structure or for titles. The layout of the text, together with the subheadings, contributed to the higher ratings of effectiveness for structured abstracts, but the presence or absence of titles had no clear effects in these experimental studies. It is likely that this spatial organization, together with the greater amount of information normally provided in structured abstracts, explains why structured abstracts are generally judged to be superior to traditional ones.
Researchers have traditionally used bibliographic databases to search out information. Today, the full-text of resources is increasingly available for searching, and more researchers are performing full-text searches. This study compares differences in the number of articles discovered between metadata and full-text searches of the same literature cohort when searching for gene names in two biomedical literature domains. Three reviewers additionally ranked 100 articles in each domain. Significantly more articles were discovered via full-text searching; however, the precision of full-text searching also is significantly lower than that of metadata searching. Certain features of articles correlated with higher relevance ratings. A significant feature measured was the number of matches of the search term in the full-text of the article, with a larger number of matches having a statistically significant higher usefulness (i.e., relevance) rating. By using the number of hits of the search term in the full-text to rank the importance of the article, performance of full-text searching was improved so that both recall and precision were as good as or better than that for metadata searching. This suggests that full-text searching alone may be sufficient, and that metadata searching as a surrogate is not necessary.
In this presentation, we propose a novel integrated information retrieval approach that provides a unified solution for two challenging problems in the field of information retrieval. The first problem is how to build an optimal vector space corresponding to users' different information needs when applying the vector space model. The second one is how to smoothly incorporate the advantages of machine learning techniques into the language modeling approach. To solve these problems, we designed the language-modeling kernel function, which has all the modeling powers provided by language modeling techniques. In addition, for each information need, this kernel function automatically determines an optimal vector space, for which a discriminative learning machine, such as the support vector machine, can be applied to find an optimal decision boundary between relevant and nonrelevant documents. Large-scale experiments on standard test-beds show that our approach makes significant improvements over other state-of-the-art information retrieval methods.
The politicization of science is not a new phenomenon, but the disputes surrounding global climate change have been particularly subject to ideological positioning. The work conducted by researchers on the description of and possible causes for, climate change is reflected in the formal record of scientific discourse. The political and ideological claims about climate change are themselves reflected in the governmental and popular records. With regard to the particular work by Michael Mann and his colleagues, the three records (scientific, governmental, and popular) collide. Close examination of the totality of the record demonstrates the background, nature, and bases of claims made on all sides. The examination further demonstrates that the governmental and popular records are informed not by scientific research and communication but by ideological stances.
The second half of the twentieth century saw the emergence of three knowledge-system models: Mode 2 knowledge production, the Triple Helix, and Post-Norma I Science (PNS). Today, this emphasis on knowledge use is the focus of such important health movements as evidence-based medicine. Building on the methodological work of Shinn (2002) and the theoretical work of Holzner and Marx (1979), we conducted a bibliometric study of the extent to which the three knowledge-system models are used by researchers to frame problems of health-knowledge use. By doing so, we reveal how these models fit into a larger knowledge system of health and evidence-based decision making. The study results show clearly that although these knowledge models are extremely popular for contextualizing research, there is a distinct lack of emphasis on use of the models in knowledge utilization or evidence-based medicine. We recommend using these models for further research in three specific dimensions of health systems analysis: (a) differences in language use, (b) transformative thinking about health-knowledge functions, and (c) ethical analysis of institutional linkages.
This article reports on the development of a novel method for the analysis of Web logs. The method uses techniques that look for similarities between queries and identify sequences of "query transformation". It allows sequences of query transformations to be represented as graphical networks, thereby giving a richer view of search behavior than is possible with the usual sequential descriptions. We also perform a basic analysis to study the correlations between observed transformation codes, with results that appear to show evidence of behavior habits. The method was developed using transaction logs from the Excite search engine to provide a tool for an ongoing research project that is endeavoring to develop a greater understanding of Web-based searching by the general public.
We address the issue of differentiation of the profile of universities and offer a set of new indicators based on microdata at the individual level and the application of robust nonparametric efficiency measures. In particular, we use efficiency measures in order to characterize the way in which universities use their inputs (academic and non academic staff, funding) in the effort to position themselves in the space of output (undergraduate teaching, postgraduate education, fundamental research, contract research, third mission), while keeping efficiency under control. The strategic problem of universities is defined as making best use of existing resources in the short run, while enlarging the scope of autonomy in procuring additional resources in the long run. In order to make best use of resources universities are led to increase their specialization and differentiate their offering profile. This happens even if the European institutional landscape does not encourage universities to differentiate.
There are increasing moves to deploy quantitative indicators in the assessment of research, particularly in the university sector. In Australia, discussions surrounding their use have long acknowledged the unsuitability of many standard quantitative measures for most humanities, arts, social science, and applied science disciplines. To fill this void, several projects are running concurrently. This paper details the methodology and initial results for one of the projects that aims to rank conferences into prestige tiers, and which is fast gaining a reputation for best practice in such exercises. The study involves a five-stage process: identifying conferences; constructing a preliminary ranking of these; engaging in extensive consultation; testing performance measures based on the rankings on 'live' data; and assessing the measures. In the past, many similar attempts to develop a ranking classification for publication outlets have faltered due to the inability of researchers to agree on a hierarchy. However the Australian experience suggests that when researchers are faced with the imposition of alternative metrics that are far less palatable, consensus is more readily achieved.
Q-measures for binary divided networks were introduced in 2004. These measures can value the status of notes as linkage (or bridges) between two groups in a connected undirected network. We collected data from the Web of Science and used a computer programme in order to study Qmeasures for an England-Germany collaboration network in fluid mechanics. The result indicates that Cambridge University, Manchester University, Technische Universitat Berlin, the Max Planck Institute, Stuttgart University and Forschungszentrum Karlsruhe play the most important roles as bridges between England and Germany. It is shown that having a high degree centrality and being a key node are important factors explaining the ranking of nodes in a network according to Q-value. It is observed that institutes with a high Q-value have, on average, a higher production than those with a lower Q-value.
The US-EU race for world leadership in science and technology has become the favourite subject of recent studies. Studies issued by the European Commission reported the increase of the European share in the world's scientific production and announced world leadership of the EU in scientific output at the end of the last century. In order to be able to monitor those types of global changes, the present study is based on the 15-year period 1991-2005. A set of bibliometric and technometric indicators is used to analyse activity and impact patterns in science and technology output. This set comprises publication output indicators such as (1) the share in the world total, (2) subject-based publication profiles, (3) citation-based indicators like journal-and subject-normalised mean citation rates, (4) international co-publications and their impact as well as (5) patent indicators and publication-patent citation links (both directions). The evolution of national bibliometric profiles, 'scientific weight' and science-technology linkage patterns are discussed as well. The authors show, using the mirror of science and technology indicators, that the triad model does no longer hold in the 21(st) century. China is challenging the leading sciento-economic powers and the time is approaching when this country will represent the world's second largest potential in science and technology. China and other emerging scientific nations like South Korea, Taiwan, Brazil and Turkey are already changing the balance of power as measured by scientific production, as they are at least in part responsible for the relative decline of the former triad.
The objective of this paper is to depict the knowledge array of standards. This is done by identifying and analyzing external effects, specifically spillover effects. The database used is Perinorm. We use a cluster analysis in order to create groups of technology fields for German standards according to the fields of the International Classification of Standards. Methodologically, the distances between these objects or clusters are defined by the chosen distance measure, which in turn is determined by the sum of their cross references. The applied joining clustering method uses these distances between the objects and allows the data to be mapped within a two dimensional space. The results of this mapping show the existence of structures within the standards data fitting to the well-known structure of patent spillovers.
Although the writing of a thesis is a very important step for scientists undertaking a career in research, little information exists on the impact of theses as a source of scientific information. Knowing the impact of theses is relevant not only for students undertaking graduate studies, but also for the building of repositories of electronic theses and dissertations (ETD) and the substantial investment this involves. This paper shows that the impact of theses as information sources has been generally declining over the last century, apart from during the period of the 'golden years' of research, 1945 to 1975. There is no evidence of ETDs having a positive impact; on the contrary, since their introduction the impact of theses has actually declined more rapidly. This raises questions about the justification for ETDs and the appropriateness of writing monograph style theses as opposed to publication of a series of peer-reviewed papers as the requirement for fulfilment of graduate studies.
This study on co-authorship networks in the area of nanostructured solar cells aims to contribute to a further understanding of the use of research evaluation measures of science output, impact and structure in an emerging research field. The study incorporates quantitative bibliometric methods of analysis and social network analysis in combination with a qualitative case study research approach. Conclusions drawn from the results emphasise, firstly, the importance of distinguishing between early and later phases of the evolution of a novel research field, and secondly, the application of a systemic view on learning processes and knowledge diffusion in a science-based technology field.
The aim of this study is to reveal the possible linkage among the 40 primary organizations in Genetic Engineering Research by taking the Patent Coupling approach. The primary organizations were defined by the productivity and identified by the patent count and Bradford Law. The author analyzed the cited patents of the patents granted by United States Patent and Trademark Office (USPTO) from 1991 to 2002 to the 40 primary organizations (assignees) in Genetics Engineering Research to establish the correlation. 780 coupling pairs formed by the 40 primary organizations and Coupling Index and Coupling Strength were calculated for each pair and primary organization. Correlation Analysis and Multiple-Dimension Scaling were applied further based on Coupling Index. Technological clusters were found in the results of the analyses.
A longitudinal analysis of UK science covering almost 20 years revealed in the years prior to a Research Assessment Exercise (RAE 1992, 1996 and 2001) three distinct bibliometric patterns, that can be interpreted in terms of scientists' responses to the principal evaluation criteria applied in a RAE. When in the RAE 1992 total publications counts were requested, UK scientists substantially increased their article production. When a shift in evaluation criteria in the RAE 1996 was announced from 'quantity' to 'quality', UK authors gradually increased their number of papers in journals with a relatively high citation impact. And during 1997-2000, institutions raised their number of active research staff by stimulating their staff members to collaborate more intensively, or at least to co-author more intensively, although their joint paper productivity did not. This finding suggests that, along the way towards the RAE 2001, evaluated units in a sense shifted back from 'quality' to 'quantity'. The analysis also observed a slight upward trend in overall UK citation impact, corroborating conclusions from an earlier study. The implications of the findings for the use of citation analysis in the RAE are briefly discussed.
This paper examines policy-relevant effects of a yearly public ranking of individual researchers and their institutes in economics by means of their publication output in international top journals. In 1980, a grassroots ranking ('Top 40') of researchers in the Netherlands by means of their publications in international top journals started a competition among economists. The objective was to improve economics research in the Netherlands to an internationally competitive level. The ranking lists did stimulate output in prestigious international journals. Netherlands universities tended to perform well compared to universities elsewhere in the EU concerning volume of output in ISI source journals, but their citation impact was average. Limitations of ranking studies and of bibliometric monitoring in the field of economics are discussed.
Stressing that some universities have adopted unrealistic requirements for tenure of information systems (IS) faculty members, a recent editorial in MIS Quarterly contends that the group of premier IS journals needs to be generally recognized as having more than just two members. This article introduces the publication power approach to identifying the premier IS journals, and it does indeed find that there are more than two. A journal's publication power is calculated from the actual publishing behaviors of full-time, tenured IS faculty members at a sizable set of leading research universities. The underlying premise is that these researchers produce excellent work, collectively spanning the IS field's subject matter, and that the greatest concentrations of their collective work appear in highest visibility, most important journals suitable for its subject matter. The new empirically based approach to identifying premier IS journals (and, more broadly, identifying journals that figure most prominently in publishing activity of tenured IS researchers) offers an attractive alternative to promulgations by individuals or cliques (possibly based on outdated tradition or vested interests), to opinion surveys (subjective, possibly ill-informed, vague about rating criteria, and/or biased in various ways), and to citation analyses (which ignore semantics of references and, in the case of ISI impact factors, have additional problems that cast considerable doubt on their meaningfulness within the IS field and its subdisciplines). Results of the publication power approach can be applied and supplemented according to needs of a particular university in setting its evaluation standards for IS tenure, promotion, and merit decisions.
Recently, lecture videos have been widely used in e-learning systems. Envisioning intelligent e-learning systems, this article addresses the challenge of information seeking in lecture videos by retrieving relevant video segments based on user queries, through dynamic segmentation of lecture speech text. In the pro- posed approach, shallow parsing such as part of-speech tagging and noun phrase chunking are used to parse both questions and Automated Speech Recognition (ASR) transcripts. A sliding-window algorithm is proposed to identify the start and ending boundaries of returned segments. Phonetic and partial matching is utilized to correct the errors from automated speech recognition and noun phrase chunking. Furthermore, extra knowledge such as lecture slides is used to facilitate the ASR transcript error correction. The approach also makes use of proximity to approximate the deep parsing and structure match between question and sentences in ASR transcripts. The experimental results showed that both phonetic and partial matching improved the segmentation performance, slides-based ASR transcript correction improves information coverage, and proximity is also effective in improving the overall performance.
The information science research community is characterized by a paradigm split, with a system-centered cluster working on information retrieval (IR) algorithms and a user-centered cluster working on user behavior. The two clusters rarely leverage each other's insight and strength. One major suggestion from user-centered studies is to treat the relevance judgment of documents as a subjective, multidimensional, and dynamic concept rather than treating it as objective and based on topicality only. This study explores the possibility to enhance users' topical ity-based relevance judgment with subjective novelty judgment in interactive IR. A set of systems is developed which differs in the way the novelty judgment is incorporated. In particular, this study compares systems which assume that users' novelty judgment is directed to a certain subtopic area and those which assume that users' novelty judgment is undirected. This study also compares systems which assume that users judge a document based on topicality first and then novelty in a stepwise, noncompensatory fashion and those which assume that users consider topicality and novelty simultaneously and as compensatory to each other. The user study shows that systems assuming directed novelty in general have higher relevance precision, but systems assuming a stepwise judgment process and systems assuming a compensatory judgment process are not significantly different.
The objective of this study was to integrate the information, business, and production process of a powder coating manufacturing via computer simulation. Previous studies considered only conventional customer lead-time, which is defined as customer lead-times to receive goods or services. The integrated approach of this study is capable of evaluating customer lead-times in six different dimensions. Furthermore, the integrated simulation approach considers conventional customer lead-time (from when the customer places an order) in addition to five other customer indices. This is the first study to consider the integrated modeling of the information, business, and production process. Previous studies mostly considered individual simulation modeling of production system, information system, or business system in various settings. It is claimed that by integrating information and business systems and production systems through simulation, major and minor organization and production issues become visible. This study also shows perceived improvements through integration of the information system and production process modeling. In summary, the unique features of this study are 3-fold. First, the integrated approach of this study identifies major bottlenecks of the production process and information system and business process concurrently. Second, the integrated approach models and produces several dimensions of customer satisfaction. Finally, the integrated approach allows the effects of business process reengineering and information technology to be evaluated before actual implementation. In addition, by integrated modeling of this study the hidden and concurrent effect of the business and production processes are identified and improved.
Personal Information Management (PIM) is an activity in which an individual stores personal information items to retrieve them later. In a former article, we suggested the user-subjective approach, a theoretical approach proposing design principles with which PIM systems can systematically use subjective attributes of information items. In this consecutive article, we report on a study that tested the approach by exploring the use of subjective attributes (i.e., project, importance, and context) in current PIM systems, and its dependence on design characteristics. Participants were 84 personal computer users. Tools included a questionnaire (N = 84), a semistructured interview that was transcribed and analyzed (n = 20), and screen captures taken from this subsample. Results indicate that participants tended to use subjective attributes when the design encouraged them to; however, when the design discouraged such use, they either found their own alternative ways to use them or refrained from using them altogether. This constitutes evidence in support of the user-subjective approach as it implies that current PIM systems do not allow for sufficient use of subjective attributes. The article also introduces seven novel system design schemes, suggested by the authors, which demonstrate how the user-subjective principles can be implemented.
We study the coupling of basic quantitative portfolio selection strategies with a financial news article prediction system, AZFinText. By varying the degrees of portfolio formation time, we found that a hybrid system using both quantitative strategy and a full set of financial news articles performed the best. With a 1-week portfolio formation period, we achieved a 20.79% trading return using a Momentum strategy and a 4.54% return using a Contrarian strategy over a 5-week holding period. We also found that trader overreaction to these events led AZFinText to capitalize on these short-term surges in price.
In this article, we assess the effectiveness of Contextual Document Clustering (CDC) as a means of indexing within a dynamic and rapidly changing environment. We simulate a dynamic environment, by splitting two chronologically ordered datasets into time-ordered segments and assessing how the technique performs under two different scenarios. The first is when new documents are added incrementally without reclustering [incremental CDC (iCDC)], and the second is when reclustering is performed [nonincremental CDC (nCDC)]. The datasets are very large, are independent of each other, and belong to two very different domains. We show that CDC itself is effective at clustering very large document corpora, and that, significantly, it lends itself to a very simple, efficient incremental document addition process that is seen to be very stable over time despite the size of the corpus growing considerably. It was seen to be effective at in- crementally clustering new documents even when the corpus grew to six times its original size. This is in contrast to what other researchers have found when applying similar simple incremental approaches to document clustering. The stability of iCDC is accounted for by the unique manner in which CDC discovers cluster themes.
In "A Brief History of Information Ethics," Thomas Froehlich (2004) quickly surveyed under several broad categories some of the many issues that constitute information ethics: under the category of librarianship-censorship, privacy, access, balance in collections, copyright, fair use, and codes of ethics; under information science, which Froehlich sees as closely related to libraiwanship-confidentiality, bias, and quality of information; under computer ethics-intellectual property, privacy, fair representation, nonmaleficence, computer crime, software reliability, artificial intelligence, and e-commerce; under cyberethics (issues related to the Internet, or "cyberspace")-expert systems, artificial intelligence (again), and robotics; under media ethics-news, impartiality, journalistic ethics, deceit, lies, sexuality, censorship (again), and violence in the press; and under intercultural information ethics-digital divide, and the ethical role of the Internet for social, political, cultural, and economic development. Many of the debates in information ethics, on these and other issues, have to do with specific kinds of relationships between subjects. The most important subject and a familiar figure in information ethics is the ethical subject engaged in moral deliberation, whether appearing as the bearer of moral rights and obligations to other subjects, or as an agent whose actions are judged, whether by others or by oneself, according to the standards of various moral codes and ethical principles. Many debates in information ethics revolve around conflicts between those acting according to principles of unfettered access to information and those finding some information offensive or harmful. Subjectivity is at the heart of information ethics. But how is subjectivity understood? Can it be understood in ways that broaden ethical reflection to include problems that remain invisible when subjectivity is taken for granted and when how it is created remains unquestioned? This article proposes some answers by investigating the meaning and role of subjectivity in information ethics.
Aging of publications, percentage of self-citations, and impact vary from journal to journal within fields of science. The assumption that citation and publication practices are homogenous within specialties and fields of science is invalid. Furthermore, the delineation of fields and among specialties is fuzzy. Institutional units of analysis and persons may move between fields or span different specialties. The match between the citation index and institutional profiles varies among institutional units and nations. The respective matches may heavily affect the representation of the units. Non-institute of Scientific Information (ISI) journals are increasingly cornered into "transdisciplinary" Mode-2 functions with the exception of specialist journals publishing in languages other than English. An "externally cited impact factor" can be calculated for these journals. The citation impact of non-ISI journals will be demonstrated using Science and Public Policy as the example.
Despite a very large number of studies on the aging and obsolescence of scientific literature, no study has yet measured, over a very long time period, the changes in the rates at which scientific literature becomes obsolete. This article studies the evolution of the aging phenomenon and, in particular, how the age of cited literature has changed over more than 100 years of scientific activity. It shows that the average and median ages of cited literature have undergone several changes over the period. Specifically, both World War I and World War II had the effect of significantly increasing the age of the cited literature. The major finding of this article is that contrary to a widely held belief, the age of cited material has risen continuously since the mid-1960s. In other words, during that period, researchers were relying on an increasingly old body of literature. Our data suggest that this phenomenon is a direct response to the steady-state dynamics of modern science that followed its exponential growth; however, we also have observed that online preprint archives such as arXiv have had the opposite effect in some subfields.
This study examines cancer patients' and companions' uses and gratifications of blogs and the relationship between different types of blogging activities and gratification outcomes. In an online survey of 113 respondents, cancer patients were found to be more likely than their companions to host their own blogs. Four areas emerged as gratifications of blog use: prevention and care, problem-solving, emotion management, and information-sharing. Cancer patients and companions both found blogging activity to be most helpful for emotion management and information-sharing. Further, cancer patients were more gratified than their companions in the areas of emotion management and problem-solving. Regression analyses indicate that perceived credibility of blogs, posting comments on others' blogs, and hosting one's own blog significantly increased the explanatory power of the regression models for each gratification outcome.
This study revisits managers who were first interviewed more than 10 years ago to identify their personal information management (PIM) behaviors. The purpose of this study was to see how advances in technology and access to the Web may have affected their PIM behaviors. PIM behaviors seem to have changed little over time, suggesting that technological advances are less important in determining how individuals organize and use information than are the tasks that they perform. Managers identified increased volume of e-mail and the frustration with having to access multiple systems with different, unsynchronized passwords as their greatest PIM challenges. Organizational implications are discussed.
Wikipedia is fast becoming a key information source for many despite criticism that it is unreliable and inaccurate. A number of recommendations have been made to sort the chaff from the wheat in Wikipedia, among which is the idea of color-coding article segment edits according to age (Cross, 2006). Using data collected as part of a wider study published in Nature, this article examines the distribution of errors throughout the life of a select group of Wikipedia articles. The survival time of each "error edit" in terms of the edit counts and days was calculated and the hypothesis that surviving material added by older edits is more trustworthy was tested. Surprisingly, we find that roughly 20% of errors can be attributed to surviving text added by the first edit, which confirmed the existence of a first-mover" effect (Viegas, Wattenberg, & Kushal, 2004) whereby material added by early edits are less likely to be removed. We suggest that the sizable number of errors added by early edits is simply a result of more material being added near the beginning of the life of the article. Overall, the results do not provide support for the idea of trusting surviving segments attributed to older edits because such edits tend to add more material and hence contain more errors which do not seem to be offset by greater opportunities for error correction by later edits.
Index modeling and computer simulation techniques are used to examine the influence of indexing frequency distributions, indexing exhaustivity distributions, and three weighting methods on hypothetical document spaces in a vector-based information retrieval (IR) system. The way documents are indexed plays an important role in retrieval. The authors demonstrate the influence of different indexing characteristics on document space density (DSD) changes and document space discriminative capacity for IR. Document environments that contain a relatively higher percentage of infrequently occurring terms provide lower density outcomes than do environments where a higher percentage of frequently occurring terms exists. Different indexing exhaustivity levels, however, have little influence on the document space densities. A weighting algorithm that favors higher weights for infrequently occurring terms results in the lowest overall document space densities, which allows documents to be more readily differentiated from one another. This in turn can positively influence IR. The authors also discuss the influence on outcomes using two methods of normalization of term weights (i.e., means and ranges) for the different weighting methods.
We propose an approach to content-based Distributed Information Retrieval based on the periodic and incremental centralization of full-content indices of widely dispersed and autonomously managed document sources. Inspired by the success of the Open Archive Initiative's (OAI) Protocol for metadata harvesting, the approach occupies middle ground between content crawling and distributed retrieval. As in crawling, some data move toward the retrieval process, but it is statistics about the content rather than content itself; this grants more efficient use of network resources and wider scope of application. As in distributed retrieval, some processing is distributed along with the data, but it is indexing rather than retrieval; this reduces the costs of content provision while promoting the simplicity, effectiveness, and responsiveness of retrieval. Overall, we argue that the approach retains the good properties of centralized retrieval without renouncing to cost-effective, large-scale resource pooling. We discuss the requirements associated with the approach and identify two strategies to deploy it on top of the OAI infrastructure. In particular, we define a minimal extension of the OAI protocol which supports the coordinated harvesting of full-content indices and descriptive metadata for content resources. Finally, we report on the implementation of a proof-of-concept prototype service for multimodel content-based retrieval of distributed file collections.
In this article, we describe the development of an extension to the Simple Knowledge Organization System (SKOS) to accommodate the needs of vocabulary development applications (VDA) managing metadata schemes and requiring close tracking of change to both those schemes and their member concepts. We take a neo-pragmatic epistemic stance in asserting the need for an entity in SKOS modeling to mediate between the abstract concept and the concrete scheme. While the SKOS model sufficiently describes entities for modeling the current state of a scheme in support of indexing and search on the Semantic Web, it lacks the expressive power to serve the needs of VDA needing to maintain scheme historical continuity. We demonstrate preliminarily that conceptualizations drawn from empirical work in modeling entities in the bibliographic universe, such as works, texts, and exemplars, can provide the basis for SKOS extension in ways that support more rigorous demands of capturing concept evolution in VDA.
Although designed for general Web searching, Webometrics and related research commercial search engines are also used to produce estimated hit counts or lists of URLs matching a query. Unfortunately, however, they do not return all matching URLs for a search and their hit count estimates are unreliable. In this article, we assess whether it is possible to obtain complete lists of matching URLs from Windows Live, and whether any of its hit count estimates are robust. As part of this, we introduce two new methods to extract extra URLs from search engines: automated query splitting and automated domain and TLD searching. Both methods successfully identify additional matching URLs but the findings suggest that there is no way to get complete lists of matching URLs or accurate hit counts from Windows Live, although some estimating suggestions are provided.
The old Asian legend about the blind men and the elephant comes to mind when looking at how different authors of scientific papers describe a piece of related prior work. It turns out that different citations to the same paper often focus on different aspects of that paper and that neither provides a full description of its full set of contributions. In this article, we will describe our investigation of this phenomenon. We studied citation summaries in the context of research papers in the biomedical domain. A citation summary is the set of citing sentences for a given article and can be used as a surrogate for the actual article in a variety of scenarios. It contains information that was deemed by peers to be important. Our study shows that citation summaries overlap to some extent with the abstracts of the papers and that they also differ from them in that they focus on different aspects of these papers than do the abstracts. In addition to this, co-cited articles (which are pairs of articles cited by another article) tend to be similar. We show results based on a lexical similarity metric called cohesion to justify our claims.
Personal Web portfolios have become a popular information source and an effective method for individuals to present themselves to others in cyberspace. Thus, the quality of personal Web portfolios is critical and affects the perception that others have of the individuals. But how do we measure quality of personal Web portfolios? What are the important factors affecting quality of personal Web portfolios? This study presents the development of an instrument measuring factors affecting information quality of personal Web portfolios. The proposed instrument, based on the Information Quality framework, was refined and validated to assess its construct validity, convergent validity, and discriminant validity. The proposed instrument can be used to guide those who want to design their personal Web portfolios and also to help those who need to evaluate the quality of personal Web portfolios.
The debate about which similarity measure one should use for the normalization in the case of Author Co-citation Analysis (ACA) is further complicated when one distinguishes between the symmetrical co-citation-or, more generally, co-occurrence-matrix and the underlying asymmetrical citation-occurrence-matrix. In the Web environment, the approach of retrieving original citation data is often not feasible. In that case, one should use the Jaccard index, but preferentially after adding the number of total citations (i.e., occurrences) on the main diagonal. Unlike Salton's cosine and the Pearson correlation, the Jaccard index abstracts from the shape of the distributions and focuses only on the intersection and the sum of the two sets. Since the correlations in the co-occurrence matrix may be spurious, this property of the Jaccard index can be considered as an advantage in this case.
XML documents combine features from classical IR systems allowing free text, with explicit structures as in databases. Many query languages have been specially designed for IR applications on XML documents. This work concentrates on a special type of language for which the problem of processing queries including metrical constraints is investigated. The main question is how to define the distance between terms in different locations of the XML tree in an intuitively justifiable way, without jeopardizing the ability to get good retrieval results in terms of recall and precision. A new definition is given and its usefulness is shown on several examples from the INEX collection.
Polarity mining provides an in-depth analysis of semantic orientations of text information. Motivated by its success in the area of topic mining, we propose an ontology-supported polarity mining (OSPM) approach. The approach aims to enhance polarity mining with ontology by providing detailed topic-specific information. OSPM was evaluated in the movie review domain using both supervised and unsupervised techniques. Results revealed that OSPM outperformed the baseline method without ontology support. The findings of this study not only advance the state of polarity mining research but also shed light on future research directions.
On the Web, there are limited ways of finding people sharing similar interests with a given person. The current methods are either ineffective or time consuming. In this paper, we present a new approach for searching people sharing similar interests from the Web. Given a person, to find similar people from the Web, there are two major research issues: person representation and matching persons. In this study, we propose a person representation method which uses a person's website to represent this person. Our design of matching process takes person representation into consideration to allow the same representation to be used when composing the query. Under this person representation method, the proposed algorithm integrates textual content and hyperlink information of all the pages belonging to a personal website to represent a person and match persons. Other algorithms are also explored and compared to the proposed algorithm. Experimental results are presented.
This article presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration and acquires knowledge iteratively from the Web. We study the unsupervised learning and the active learning strategies that minimize human supervision in terms of data labeling. The learning process refines the PSM and constructs a transliteration lexicon at the same time. We evaluate the proposed PSM and its learning algorithm through a series of systematic experiments, which show that the proposed framework is reliably effective on two independent databases.
There exist ample demonstrations that indicators of scholarly impact analogous to the citation-based ISI Impact Factor can be derived from usage data; however, so far, usage can practically be recorded only at the level of distinct information services. This leads to community-specific assessments of scholarly impact that are difficult to generalize to the global scholarly community. In contrast, the ISI Impact Factor is based on citation data and thereby represents the global community of scholarly authors. The objective of this study is to examine the effects of community characteristics on assessments of scholarly impact from usage. We define a journal Usage Impact Factor that mimics the definition of the Thomson Scientific ISI Impact Factor. Usage Impact Factor rankings are calculated on the basis of a large-scale usage dataset recorded by the linking servers of the California State University system from 2003 to 2005. The resulting journal rankings are then compared to the Thomson Scientific ISI Impact Factor that is used as a reference indicator of general impact. Our results indicate that the particular scientific and demographic characteristics of a discipline have a strong effect on resulting usage-based assessments of scholarly impact. In particular, we observed that as the number of graduate students and faculty increases in a particular discipline, Usage Impact Factor rankings will converge more strongly with the ISI Impact Factor.
The main fields of research in Library Science and Documentation are identified by quantifying the frequency of appearance and the analysis of co-occurrence of the descriptors assigned to 11,273 indexed works in the Library and Information Science Abstracts (LISA) database for the 2004-2005 period. The analysis made has enabled three major core research areas to be identified: World Wide Web, Libraries and Education. There are a further 12 areas of research with specific development, one connected with the library sphere and another 11 connected with the World Wide Web and Internet: Networks, Computer Security, Information technologies, Electronic Resources, Electronic Publications, Bibliometrics, Electronic Commerce, Computer applications, Medicine, Searches and Online Information retrieval.
The present work shows the applying of successive H indices in the evaluation of a scientific institution, using the researcher-department-institution hierarchy as level of aggregation. The scientific production covered by the Web of Science of the researcher's staff from the Cuban National Scientific Research Center, during the period 2001-2005, was studied. The Hirsch index (h-index; J.E. Hirsch, 2005) was employed to calculate the individual performance of the staff, using the g-index created by Leo Egghe (2006) and the A-index developed by Jin Bi-Hui (2006) as complementary indicators. The successive H indices proposed by Andras Schubert (2007) were used to determine the scientific performance of each department as well as the general performance of the institution. The possible advantages of the method for the institutional evaluation processes were exposed.
In a replication of the high-profile contribution by Wenneras and Wold on grant peer-review, we investigate new applications processed by the medical research council in Sweden. Introducing a normalisation method for ranking applications that takes into account the differences between committees, we also use a normalisation of bibliometric measures by field. Finally, we perform a regression analysis with interaction effects. Our results indicate that female principal investigators (PIs) receive a bonus of 10% on scores, in relation to their male colleagues. However, male and female PIs having a reviewer affiliation collect an even higher bonus, approximately 15%. Nepotism seems to be a persistent problem in the Swedish grant peer review system.
The term "European Paradox" describes the perceived failure of the EU to capture full benefits of its leadership of science as measured by publications and some other indicators. This paper investigates what might be called the "American Paradox," the decline in scientific publication share of the U.S. despite world-leading investments in research and development (R&D) - particularly as that decline has accelerated in recent years. A multiple linear regression analysis was made of which inputs to the scientific enterprise are most strongly correlated with the number of scientific papers produced. Research investment was found to be much more significant than labor input, government investment in R&D was much more significant than that by industry, and government non-defense investment was somewhat more significant than its defense investment. Since the EU actually leads the U.S. in this key component, this could account for gradual loss of U.S. paper share and EU assumption of leadership of scientific publication in the mid-1990s. More recently the loss of U.S. share has accelerated, and three approaches analyzed this phenomenon: (1) A companion paper shows that the SCI database has not significantly changed to be less favorable to the U.S.; thus the decline is real and is not an artifact of the measurement methods. (2) Budgets of individual U.S. research agencies were correlated with overall paper production and with papers in their disciplines. Funding for the U.S. government civilian, non-healthcare sector was flat in the last ten years, resulting in declining share of papers. Funding for its healthcare sector sharply increased, but there were few additional U.S. healthcare papers. While this inefficiency contributes to loss of U.S. share, it is merely a specific example of the general syndrome that increased American investments have not produced increased publication output. (3) In fact the decline in publication share appears to be due to rapidly increasing R&D investments by China, Taiwan, S. Korea, and Singapore. A model shows that in recent years it is a country's share of world investment that is most predictive of its publication share. While the U.S. has increased its huge R&D investment, its investment share still declined because of even more rapidly increasing investments by these Asian countries. This has likely led to their sharply increased share of scientific publication, which must result in declines of shars of others - the U.S. and more recently, the EU.
We explore an empirical approach to studying the social and political implications of science by gathering scientists' perceptions of the social impacts of their research. It was found that 78 percent of surveyed scientists from a variety of fields responding to a survey indicated that the research performed in connection with a recent highly cited paper had such implications. Health related implications were the most common, but other types of implications encountered were technological spin-offs, public understanding, economic and policy benefits. Surprisingly many scientists considered the advancement of science itself to be a social implication of their research. The relations of these implications to the field and topics of research are examined, and a mapping of implications gives an overview of the major dimensions of the social impacts of science.
In the present study we propose a solution for a common problem in benchmarking tasks at institutional level. The usage of bibliometric indicators, even after standardisation, cannot disguise that comparing institutes remains often like comparing apples with pears. We developed a model to assign institutes to one of 8 different groups based on their research profile. Each group has a different focus: 1. Biology, 2. Agricultural Sciences, 3. Multidisciplinary, 4. Geo & Space Sciences, 5. Technical and natural Sciences, 6. Chemistry, 7. General and Research Medicine, 8. Specialised Medicine. Two applications of this methodology are described. In the first application we compare the composition of clusters at national level with the national research profiles. This gives a deeper insight in the national research landscape. In a second application we look at the dynamics of institutes by comparing their subject clustering at two different points in time.
Significant discrepancies were found in the ratio and relative impact of the journal papers of several scientific fields of some Central and Eastern European (CEE) countries compared to the European Community member states, the US and Japan (EUJ countries). A new indicator, characterizing the Mean Structural Difference of scientific fields between countries has been introduced and calculated for CEE countries. For EUJ countries correlation between the GDP and number of publications of a given year proved to be non-significant. Longitudinal studies showed, however, significant correlations between the yearly values of GDP and number of papers published. Studying data referring to consecutive time periods revealed that there is no direct relationship between the GDP and information production of countries. It may be assumed that grants for R&D do not actually depend on real needs, but the fact is that rich countries can afford to spend more whilst poor countries only less money on scientific research.
This paper compares the h-indices of a list of highly-cited Israeli researchers based on citations counts retrieved from the Web of Science, Scopus and Google Scholar respectively. In several case the results obtained through Google Scholar are considerably different from the results based on the Web of Science and Scopus. Data cleansing is discussed extensively.
For practical reasons, bibliographic databases can only contain a subset of the scientific literature. The ISI citation databases are designed to cover the highest impact scientific research journals as well as a few other sources chosen by the Institute for Scientific Information (ISI). Google Scholar also contains citation information, but includes a less quality controlled collection of publications from different types of web documents. We define Google Scholar unique citations as those retrieved by Google Scholar which are not in the ISI database. We took a sample of 882 articles from 39 open access ISI-indexed journals in 2001 from biology, chemistry, physics and computing and classified the type, language, publication year and accessibility of the Google Scholar unique citing sources. The majority of Google Scholar unique citations (70%) were from full-text sources and there were large disciplinary differences between types of citing documents, suggesting that a wide range of non-ISI citing sources, especially from non-journal documents, are accessible by Google Scholar. This might be considered to be an advantage of Google Scholar, since it could be useful for citation tracking in a wider range of open access scholarly documents and to give a broader type of citation impact. An important corollary from our study is that Google Scholar's wider coverage of Open Access (OA) web documents is likely to give a boost to the impact of OA research and the OA movement.
This paper shows maps of the web presence of the European Higher Education Area (EHEA) on the level of universities using hyperlinks and analyses the topology of the European academic network. Its purpose is to combine methods from Social Network Analysis (SNA) and cybermetric techniques in order to ask for tendencies of integration of the European universities visible in their web presence and the role of different universities in the process of the emergence of an European Research Area. We find as a main result that the European network is set up by the aggregation of well-defined national networks, whereby the German and British networks are dominant. The national networks are connected to each other through outstanding national universities in each country.
Google Scholar was used to generate citation counts to the web-based research output of New Zealand Universities. Total citations and hits from Google Scholar correlated with the research output as measured by the official New Zealand Performance-Based Research Fund (PBRF) exercise. The article discusses the use of Google Scholar as a cybermetric tool and methodology issues in obtaining citation counts for institutions. Google Scholar is compared with other tools that provide web citation data: Web of Science, SCOPUS, and the Wolverhampton Cybermetric Crawler.
A sample of 1,483 publications, representative of the scholarly production of LIS faculty, was searched in Web of Science (WoS), Google, and Google Scholar. The median number of citations found through WoS was zero for all types of publications except book chapters; the median for Google Scholar ranged from 1 for print/subscription journal articles to 3 for books and book chapters. For Google the median number of citations ranged from 9 for conference papers to 41 for books. A sample of the web citations was examined and classified as representing intellectual or non-intellectual impact. Almost 92% of the citations identified through Google Scholar represented intellectual impact - primarily citations from journal articles. Bibliographic services (non-intellectual impact) were the largest single contributor of citations identified through Google. Open access journal articles attracted more web citations but the citations to print/subscription journal articles more often represented intellectual impact. In spite of problems with Google Scholar, it has the potential to provide useful data for research evaluation, especially in a field where rapid and fine-grained analysis is desirable.
The purpose of this study is to provide results from experiments designed to investigate-the cross-validation of an artificial neural network application to automatically identify topic changes in Web search engine user sessions by using data logs of different Web search engines for training and testing the neural network. Sample data logs from the FAST and Excite search engines are used in this study. The results of the study show that identification of topic shifts and continuations on a particular Web search engine user session can be achieved with neural networks that are trained on a different Web search engine data log. Although FAST and Excite search engine users differ with respect to some user characteristics (e.g., number of queries per session, number of topics per session), the results of this study demonstrate that both search engine users display similar characteristics as they shift from one topic to another during a single search session. The key finding of this study is that a neural network that is trained on a selected data log could be universal; that is, it can be applicable on all Web search engine transaction logs regardless of the source of the training data log.
Trust in information is developing into a vitally important topic as the Internet becomes increasingly ubiquitous within society. Although many discussions of trust in this environment focus on issues like security, technical reliability, or e-commerce, few address the problem of trust in the information obtained from the Internet. The authors assert that there is a strong need for theoretical and empirical research on trust within the field of information science. As an initial step, the present study develops a model of trust in digital information by integrating the research on trust from the behavioral and social sciences with the research on information quality and human-computer interaction. The model positions trust as a key mediating variable between information quality and information usage, with important consequences for both the producers and consumers of digital information. The authors close by outlining important directions for future research on trust in information science and technology.
Using off-the-shelf search technology provides a single point of access into library resources, but we found that such commercial systems are not entirely satisfactory for the academic library setting. In response to this, Oregon State University (OSU) Libraries designed and deployed LibraryFind, a metasearch system. We conducted a usability experiment comparing LibraryFind, the OSU Libraries Web site, and Google Scholar. Each participant used all three search systems in a controlled setting, and we recorded their behavior to determine the effectiveness and efficiency of each search system. In this article, we focus on understanding what factors are important to undergraduates in choosing their primary academic search system for class assignments. Based on a qualitative and quantitative analysis of the results, we found that mimicking commercial Web search engines is an important factor to attract undergraduates; however, when undergraduates use these kinds of search engines, they expect similar performance to Web search engines, including factors such as relevance, speed, and the availability of a spell checker. They also expected to be able to find out what kinds of content and materials are available in a system. Participants' prior experience using academic search systems also affected their expectations of a new system.
We propose a new optimal clustering effectiveness measure, called CS1, based on a combination of clusters rather than selecting a single optimal cluster as in the traditional MK1 measure. For hierarchical clustering, we present an algorithm to compute CS1, defined by seeking the optimal combinations of disjoint clusters obtained by cutting the hierarchical structure at a certain similarity level. By reformulating the optimization to a 0-1 linear fractional programming problem, we demonstrate that an exact solution can be obtained by a linear time algorithm. We further discuss how our approach can be generalized to more general problems involving overlapping clusters, and we show how optimal estimates can be obtained by greedy algorithms.
In this study, we investigate information retrieval (IR) on Turkish texts using a large-scale test collection that contains 408,305 documents and 72 ad hoc queries. We examine the effects of several stemming options and query-document matching functions on retrieval performance. We show that a simple word truncation approach, a word truncation approach that uses language-dependent corpus statistics, and an elaborate lemmatizer-based stemmer provide similar retrieval effectiveness in Turkish IR. We investigate the effects of a range of search conditions on the retrieval performance; these include scalability issues, query and document length effects, and the use of stopword list in indexing.
This paper introduces a grammar for Arabic nominal sentences written in the formalism of Head-Driven Phrase Structure Grammar (HPSG). This grammar covers simple Arabic nominal sentences, though the same approach can be extended to cover other types of Arabic nominal sentences. The formalization has been implemented using the Linguistic Knowledge Building (LKB) system.
Data cubes for OLAP (On-Line Analytical Processing) often need to be constructed from data located in several distributed and autonomous information sources. Such a data integration process is challenging due to semantic, syntactic, and structural heterogeneity among the data. While XML (extensible markup language) is the de facto standard for data exchange, the three types of heterogeneity remain. Moreover, popular path-oriented XML query languages, such as XQuery, require the user to know in much detail the structure of the documents to be processed and are, thus, effectively impractical in many real-world data integration tasks. Several Lowest Common Ancestor (LCA)-based XML query evaluation strategies have recently been introduced to provide a more structure-independent way to access XML documents. We shall, however, show that this approach leads in the context of certain-not uncommon-types of XML documents to undesirable results. This article introduces a novel high-level data extraction primitive that utilizes the purpose-built Smallest Possible Context (SPC) query evaluation strategy. We demonstrate, through a system prototype for OLAP data cube construction and a sample application in informetrics, that our approach has real advantages in data integration.
This paper provides empirical support for some of the key assumptions guiding the design of data fusion methods. It computes and analyzes the overlap structures between the search results of retrieval systems that participated in the short, long, and manual tracks in TREC 3, 65 7, and 8 to examine what can be learned to infer a document's probability of being relevant. This paper shows that the potential relevance of a document increases exponentially as the number of systems retrieving it increases-called the Authority Effect. It also shows that documents higher up in ranked lists and found by more systems are more likely to be relevant-called the Ranking Effect. A contribution of this paper is that it shows that the Authority and Ranking Effects can be observed regardless of whether a query is generated manually or automatically and short or long queries are used. Further, it is illustrated that the Authority and Ranking Effects can be observed if the result sets of random groupings of five retrieval systems are compared and only the top 50 results are used in the overlap computation. Also discussed is how the Authority and Ranking Effects can help explain why major data fusion methods perform well.
The statistical properties of bibliometric indicators related to research performance, field citation density, and journal impact were studied for the 100 largest European research universities. A size-dependent cumulative advantage was found for the impact of universities in terms of total number of citations. In the author's previous work, a similar scaling rule was found at the level of research groups. Therefore, this scaling rule is conjectured to be a prevalent property of the science system. The lower performance universities have a larger size-dependent cumulative advantage for receiving citations than top performance universities. For the lower performance universities, the fraction of noncited publications decreases considerably with size. Generally, the higher the average journal impact of the publications of a university, the lower the number of noncited publications. The average research performance was found not to dilute with size. Evidently, large universities, particularly top performance universities are characterized by being "big and beautiful." They succeed in keeping a high performance over a broad range of activities. This most probably is an indication of their overall attractive scientific and intellectual power. It was also found that particularly for the lower performance universities, the field citation density provides a strong cumulative advantage in citations per publication. The relation between number of citations and field citation density found in this study can be considered as a second basic scaling rule of the science system. Top performance universities publish in journals with significantly higher journal impact as compared to the lower performance universities. A significant decrease of the fraction of self-citations with increasing research performance, average field citation density, and average journal impact was found.
In this article, the authors present evaluation results for transitive dictionary-based cross-language information retrieval (CLIR) using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. Source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish) via an intermediate (or pivot) language. Effectiveness of the transitively translated queries was compared to that of the directly translated and monolingual Finnish queries. Pseudo-relevance feedback (PRF) was also used to expand the original transitive target queries. Cross-language information retrieval performance was evaluated on three relevance thresholds: stringent, regular, and liberal. The transitive translations performed well achieving, on the average, 85-93% of the direct translation performance, and 66-72% of monolingual performance. Moreover, PRF was successful in raising the performance of transitive translation routes in absolute terms as well as in relation to monolingual and direct translation performance applying PRF
A modified approach to algorithmic historiography is used to investigate the changing influence of the work of Conrad Hal Waddington over the period 1945-2004. Overall, Waddington's publications were cited by almost 5,500 source items in the Web of Science (Thomson Scientific, formerly Thomson ISI, Philadelphia, PA). Rather than simply analyzing the data set as a whole, older works by Waddington are incorporated into a series of historiographic maps (networks of highly cited documents), which show long-term and short-term research themes grounded in Waddington's work. Analysis by 10-20-year periods and the use of social network analysis software reveals structures-thematic networks and subnetworks-that are hidden in a mapping of the entire 60-year period. Two major Waddington-related themes emergecanalization/genetic assimilation and embryonic induction. The first persists over the 60 years studied while active, visible research in the second appears to have declined markedly between 1965 and 1984, only to reappear in conjunction with the emergence of a new research field-Evolutionary Developmental Biology.
This article examines the impact of the Internet and the age distribution of research scholars on academic citation age with a mathematical model proposed by Barnett, Fink, and Debus (1989) and a revised model that incorporates information about the online environment and scholar age distribution. The modified model fits the data well, accounting for 99.6% of the variance for science citations and 99.8% for social science citations. The Internet's impact on the aging process of academic citations has been very small, accounting for only 0.1% for the social sciences and 0.8% for the sciences. Rather than resulting in the use of more recent citations, the Internet appears to have lengthened the average life of academic citations by 6 to 8 months. The aging of scholars seems to have a greater impact, accounting for 2.8% of the variance for the sciences and 0.9% for the social sciences. However, because the diffusion of the Internet and the aging of the professoriate are correlated over this time period, differentiating their effects is somewhat problematic.
Research is divided about the potential of e-service to bridge communication gaps, particularly to diverse user groups. According to the existing body of literature, e-service may either increase or decrease the quality of service received. This study analyzes the level of service received by different genders and ethnic groups when academic and public librarians answered 676 online reference queries. Quality of e-service was evaluated along three dimensions: timely response, reliability, and courtesy. This study found no significant differences among different user groups along any of these dimensions, supporting the argument that the virtual environment facilitates equitable service and may overcome some challenges of diverse user groups.
The authors describe a large-scale, longitudinal citation analysis of intellectual trading between information studies and cognate disciplines. The results of their investigation reveal the extent to which information studies draws on and, in turn, contributes to the ideational substrates of other academic domains. Their data show that the field has become a more successful exporter of ideas as well as less introverted than was previously the case. In the last decade, information studies has begun to contribute significantly to the literatures of such disciplines as computer science and engineering on the one hand and business and management on the other, while also drawing more heavily on those same literatures.
A representation of science as a citation density landscape is proposed and scaling rules with the field-specific citation density as a main topological property are investigated. The focus is on the size-dependence of several main bibliometric indicators for a large set of research groups while distinguishing between top-performance and lower-performance groups. It is demonstrated that this representation of the science system is particularly effective to understand the role and the interdependencies of the different bibliometric indicators and related topological properties of the landscape.
The authors report on the findings of a usability test conducted to evaluate the usability of the VeriaGrid online system. The VeriaGrid (www.theveriagrid.org) is a prototype virtual map that focuses on the provision of information related to the cultural heritage of the city of Veria (Greece). It has been developed under the Light Project by the Central Public Library of Veria (www.libver.gr). It is an interactive application that includes various functional or thematic areas such as an interactive digital map of Veria, image gallery, videoclips, panoramic site photos, and general information about the city of Veria. The findings of the usability test revealed that users had some difficulties in using novel features of the digital map (such as the Recommended Points and the Routes functions) and finding textual information about cultural heritage of the city of Veria. Users, however, were satisfied with the overall usability of the system. In light of these findings, some recommendations for improving the usability of the system are made.
This study explores how the use of electronic information resources has influenced scholars' opinion of their work, and how this is connected to their publication productivity. The data consist of a nationwide Web-based survey of the end-users of FinELib, the Finnish Electronic Library, at all universities in Finland. Scholars feel that the use of electronic literature has improved their work considerably in several ways. This influence can be differentiated into two dimensions. The first one is improved accessibility and availability of literature, and the second is more directly related to the content and quality of scholarly work. The perceived improved access is positively associated with the number of international publications produced, among doctoral students in particular. The more direct influence of e-resource use on the content of scholarly work is, however, not associated with publication productivity. The results seem to imply that investments in academic digital libraries are beneficial for the researchers and for the universities.
The authors describe their research which improves software reuse by using an automated approach to semantically search for and retrieve reusable software components in large software component repositories and on the World Wide Web (WWW). Using automation and smart (semantic) techniques, their approach speeds up the search and retrieval of reusable software components, while retaining good accuracy, and therefore improves the affordability of software reuse. A program understanding of software components and natural language understanding of user queries was employed. Then the software component descriptions were compared by matching the resulting semantic representations of the user queries to the semantic representations of the software components to search for software components that best match the user queries. A proof of concept system was developed to test the authors' approach. The results of this proof of concept system were compared to human experts, and statistical analysis was performed on the collected experimental data. The results from these experiments demonstrate that this automated semantic-based approach for software reusable component classification and retrieval is successful when compared to the labor-intensive results from the experts, thus showing that this approach can significantly benefit software reuse classification and retrieval.
Web links have been used for around ten years to explore the online impact of academic information and information producers. Nevertheless, few studies have attempted to relate link counts to relevant offline attributes of the owners of the targeted Web sites, with the exception of research productivity. This article reports the results of a study to relate site inlink counts to relevant owner characteristics for over 400 European life-science research group Web sites. The analysis confirmed that research-group size and Web-presence size were important for attracting Web links, although research productivity was not. Little evidence was found for significant influence of any of an array of factors, including research-group leader gender and industry connections. In addition, the choice of search engine for link data created a surprising international difference in the results, with Google perhaps giving unreliable results. Overall, the data collection, statistical analysis and results interpretation were all complex and it seems that we still need to know more about search engines, hyperlinks, and their function in science before we can draw conclusions on their usefulness and role in the canon of science and technology indicators.
The overhead of assessing technology readiness for deployment and investment purposes can be costly to both large and small businesses. Recent advances in the automatic interpretation of technology readiness levels (TRLs) of a given technology can substantially reduce the risk and associated cost of bringing these new technologies to market. Using vector-space information-retrieval models, such as latent semantic indexing, it is feasible to group similar technology descriptions by exploiting the latent structure of term usage within textual documents. Once the documents have been semantically clustered (or grouped), they can be classified based on the TRL scores of (known) nearest-neighbor documents. Three automated (no human curation) strategies for assigning TRLs to documents are discussed with accuracies as high as 86% achieved for two-class problems.
This article presents a bilingual ontology-based dialog system with multiple services. An ontology-alignment algorithm is proposed to integrate ontologies of different languages for cross-language applications. A domain-specific ontology is further extracted from the bilingual ontology using an island-driven algorithm and a domain corpus. This study extracts the semantic words/concepts using latent semantic analysis (LSA). Based on the extracted semantic words and the domain ontology, a partial pattern tree is constructed to model the speech act of a spoken utterance. The partial pattern tree is used to deal with the ill-formed sentence problem in a spoken-dialog system. Concept expansion based on domain ontology is also adopted to improve system performance. For performance evaluation, a medical dialog system with multiple services, including registration information, clinic information, and FAQ information, is implemented. Four performance measures were used separately for evaluation. The speech act identification rate was 86.2%. A task success rate of 77% was obtained. The contextual appropriateness of the system response was 78.5%. Finally, the rate for correct FAQ retrieval was 82%, an improvement of 15% over the keyword-based vector-space model. The results show the proposed ontology-based speech-act identification is effective for dialog management.
A large-scale study was set up aiming at the clarification of the influence of demographic and musical background on the semantic description of music. Our model for rating high-level music qualities distinguishes between affective/emotive, structural and kinaesthetic descriptors. The focus was on the understanding of the most important attributes of music in view of the development of efficient search and retrieval systems. We emphasized who the users of such systems are and how they describe their favorite music. Particular interest went to inter-subjective similarities among listeners. The results from our study suggest that gender, age, musical expertise, active musicianship, broadness of taste and familiarity with the music have an influence on the semantic description of music.
In study we attempt to solve the problem of determining how authentic published images on the Internet are, and to what degree they may be identified by comparison to the original image. The technique proposed aims to serve the new requirements of libraries. One of these is the development of computational tools for the control and preservation of intellectual property such as digital objects, and especially of digital images. For this purpose, this article proposes the use of a serial number extracted using a previously tested semantic-properties method. This method, based on the multilayers of a set of arithmetic points, assures the following two properties: the uniqueness of the final extracted number, and the semantic dependence of this number on the image used as the method's input. The major advantage of this method is that it can serve as the authentication for a published image or detect partial modifications to a reliable degree. Also, it requires the better of the known hash functions that the digital-signature schemes use, and produces alphanumeric strings for checking authenticity and the degree of similarity between an unknown image and an original image. As an example of a possible application, this article suggests that this method could be incorporated into the well-known DOI system in order to provide a reliable tool for identification and comparison of digital images.
A recent development in biological research is the emergence of bioinformatics, which employs novel informatics techniques to handle biological data. Although the importance of bioinformatics training is widely recognized, little attention has been paid to its effect on the acceptance of bioinformatics by biologists. In this study, the effect of training on biologists' acceptance of bioinformatics tools was tested using the technology acceptance model (TAM) as a theoretical framework. Ninety individuals participated in a field experiment during seven bioinformatics workshops. Pre- and post-intervention tests were used to measure perceived usefulness, perceived ease of use, and intended use of bioinformatics tools for primer design and microarray analysis-a simple versus a complex tool that is used for a simple and a complex task, respectively. Perceived usefulness and ease of use were both significant predictors of intended use of bioinformatics tools. After hands-on experience, intention to use both tools decreased. The perceived ease of use of the primer design tool increased but that of the microarray analysis tool decreased. It is suggested that hands-on training helps researchers to form realistic perceptions of bioinformatics tools, thereby enabling them to make informed decisions about whether and how to use them.
This article synthesizes the labor theoretic approach to information retrieval. Selection power is taken as the fundamental value for information retrieval and is regarded as produced by selection labor. Selection power remains relatively constant while selection labor modulates across oral, written, and computational modes. A dynamic, stemming principally from the costs of direct human mental labor and effectively compelling the transfer of aspects of human labor to computational technology, is identified. The decision practices of major information system producers are shown to conform with the motivating forces identified in the dynamic. An enhancement of human capacities, from the increased scope of description processes, is revealed. Decision variation and decision considerations are identified. The value of the labor theoretic approach is considered in relation to preexisting theories, real-world practice, and future possibilities. Finally, the continuing intractability of information retrieval is suggested.
Managing knowledge means managing the processes of creation, development, distribution and utilisation of knowledge in order to improve organizational performance and increase competitive capacity. However, serious difficulties arise when attempts are made to implement knowledge management in enterprises. One of the reasons behind this situation is the lack of suitable methodologies for guiding the process of development and implementation of a knowledge management system (KMS), which is a computer system that allows the processes of creating, collecting, organising, accessing and using knowledge to be automated as far as possible. In this article we propose a methodology for directing the process of developing and implementing a knowledge management system in any type of organization. The methodology is organised in phases and outlines the activities to be performed, the techniques and supporting tools to be used, and the expected results for each phase. In addition, we show how the proposed methodology can be applied to the particular case of an enterprise.
Relation extraction is the process of scanning text for relationships between named entities. Recently, significant studies have focused on automatically extracting relations from biomedical corpora. Most existing biomedical relation extractors require manual creation of biomedical lexicons or parsing templates based on domain knowledge. In this study, we propose to use kernel-based learning methods to automatically extract biomedical relations from literature text. We develop a framework of kernel-based learning for biomedical relation extraction. In particular, we modified the standard tree kernel function by incorporating a trace kernel to capture richer contextual information. In our experiments on a biomedical corpus, we compare different kernel functions for biomedical relation detection and classification. The experimental results show that a tree kernel outperforms word and sequence kernels for relation detection, our trace-tree kernel outperforms the standard tree kernel, and a composite kernel outperforms individual kernels for relation extraction.
The number and size of digital repositories containing visual information (images or videos) is increasing and thereby demanding appropriate ways to represent and search these information spaces. Their visualization often relies on reducing the dimensions of the information space to create a lower-dimensional feature space which, from the point-of-view of the end user, will be viewed and interpreted as a perceptual space. Critically for information visualization, the degree to which the feature and perceptual spaces correspond is still an open research question. In this paper we report the results of three studies which indicate that distance (or dissimilarity) matrices based on low-level visual features, in conjunction with various similarity measures commonly used in current CBIR systems, correlate with human similarity judgments.
Composite indexes of digital preparedness, such as the Networked Readiness Index (NRI) and the Digital Opportunity Index (DOI), have caused a great deal of confusion in the more general literature on the digital divide. For whereas one would expect preparedness to be an input into the utilization of information technologies (the digital divide), the recent indicators add inputs and outputs, or means and ends. I suggest instead two separate indexes for means and ends, which can be more usefully related to one another in terms of productivity (one index divided by the other), or as dependent and independent variables (one index in a functional relationship to the other).
This research explores the link between information culture and information use in three organizations. We ask if there is a way to systematically identify information behaviors and values that can characterize the information culture of an organization, and whether this culture has an effect on information use outcomes. The primary method of data collection was a questionnaire survey that was applied to a national law firm, a public health agency, and an engineering company. Over 650 persons in the three organizations answered the survey. Data analysis suggests that the questionnaire instrument was able to elicit information behaviors and values that denote an organization's information culture. Moreover, the information behaviors and values of each organization were able to explain 30-50% of the variance in information use outcomes. We conclude that it is possible to identify behaviors and values that describe an organization's information culture, and that the sets of identified behaviors and values can account for significant proportions of the variance in information use outcomes.
Open-access online publication has made available an increasingly wide range of document types for scientometric analysis. In this article, we focus on citations in online presentations, seeking evidence of their value as nontraditional indicators of research impact. For this purpose, we searched for online PowerPoint files mentioning any one of 1,807 ISI-indexed journals in ten science and ten social science disciplines. We also manually classified 1,378 online PowerPoint citations to journals in eight additional science and social science disciplines. The results showed that very few journals were cited frequently enough in online PowerPoint files to make impact assessment worthwhile, with the main exceptions being popular magazines like Scientific American and Harvard Business Review. Surprisingly, however, there was little difference overall in the number of PowerPoint citations to science and to the social sciences, and also in the proportion representing traditional impact (about 60%) and wider impact (about 15%). It seems that the main scientometric value for online presentations may be in tracking the popularization of research, or for comparing the impact of whole journals rather than individual articles.
In software engineering (SE) and in the computing disciplines, papers presented at conferences are considered as formal papers and counted when evaluating research productivity of academic staff. In spite of this, conference papers may still be extended for publication in academic journals. In this research, we have studied the process of extension from conference to journal publication, and tried to explain the different purposes these two forms of publication serve in the field. Twenty-two editors in chief and associate editors in chief of major publications in SE and related fields were interviewed, and 122 authors of extended versions of conference papers answered a Web questionnaire regarding the extension of their papers. As a result, the process of extending conference papers for journal publication in SE is recorded. In the conclusion, we comment on the following: (a) the role of the conference in the development of the research work; (b) the review process at the conference and at the journal stage; and (c) the different purposes conference and journal publication fulfill in SE.
In this study, we examined empirical results on the h index and its most important variants in order to determine whether the variants developed are associated with an incremental contribution for evaluation purposes. The results of a factor analysis using bibliographic data on postdoctoral researchers in biomedicine indicate that regarding the h index and its variants, we are dealing with two types of indices that load on one factor each. One type describes the most productive core of a scientist's output and gives the number of papers in that core. The other type of indices describes the impact of the papers in the core. Because an index for evaluative purposes is a useful yardstick for comparison among scientists if the index corresponds strongly with peer assessments, we calculated a logistic regression analysis with the two factors resulting from the factor analysis as independent variables and peer assessment of the postdoctoral researchers as the dependent variable. The results of the regression analysis show that peer assessments can be predicted better using the factor 'impact of the productive core' than using the factor 'quantity of the productive core.'
The phenomenon that different persons may have the same author name (homonymy) represents a major problem for publication analysis at individual levels and for retriving publications based on author names more generally. In such cases, all publications from the persons sharing the name will be collected in search results. This makes it difficult to provide a true picture of a researcher's publication output. The present study examines how frequent homonyms occur in a population of more than 30,000 individuals. The population represents the entire set of research personell in Norway. It is found that 14% of the persons share their author name with one or more other individuals. For the remaining 86% there is a one-to-one correspondence. Thus, for the large majority of persons, homonyms do not represent a problem. In the final part of the article, potential practical applications of these findings are given particular attention.
In a recent article, Birger Hjorland (2007) critiqued the author's efforts in defining and conceptualizing information as a core concept in information science (Bates, 2005, 2006). It is argued that Hjorland has seriously misrepresented and confused the actual line of argument in those articles. Specifics of that case are presented, and the reader is urged to return to the original Bates articles to understand her claims. In those articles, Bates attempted to develop a broad conception of information, as well as a number of subtypes of information, for use in the field of information science. The development of information was related to evolutionary processes, with emergence as a significant theme.
The use of digital libraries has seen steady growth in the past two decades. However, as with other new technologies, effective use of digital libraries depends on user acceptance, which in turn is affected by users' perception of the system's ease of use. Since the introduction of new technologies often involves some form of change for users, the recent identification of the resistance to change (RTC) personality trait, and the development of a scale to measure it, provides an opportunity to assess the impact of RTC on new users of a digital library system. Drawing on prior research focused on personal differences and system characteristics as determinants of perceived ease of use, in the present study we explore the relationship between RTC and perceived ease of use of a university digital library. The results of a survey of 170 new users of the library system suggest that RTC is a significant determinant of perceived ease of use, and improves the explanatory power of previous technology-acceptance models. Implications of the findings are discussed.
This paper describes science and technology (S&T) metrics, especially impact of metrics on strategic management. The main messages to be conveyed from this paper are: (1) metrics play many roles in supporting management of the S&T enterprise; (2) metrics can influence S&T development incentives; (3) incorrect selection and implementation of metrics can have negative unintended consequences on the research and research documentation generated and (4) before implementing metrics, an organization should identify and evaluate the intended and unintended consequences of the specific metrics' implementation, and identify the impact of these consequences on the organization's core mission. (c) 2007 Elsevier Ltd. All rights reserved.
Evolution of information production processes (IPPs) can be described by a general transformation function for the sources and for the items. It generalises the Fellman-Jakobsson transformation which only works on the items. In this paper the dual informetric theory of this double transformation, defined by the rank-frequency function, is described by, e.g. determining the new size-frequency function. The special case of power law transformations is studied thereby showing that a Lotkaian system is transformed into another Lotkaian system, described by a new Lotka exponent. We prove that the new exponent is smaller (larger) than the original one if and only if the change in the sources is smaller (larger) than that of the items. Applications to the study of the evolution of networks are given, including cases of deletion of nodes and/or links but also applications to other fields are given. (c) 2007 Elsevier Ltd. All rights reserved.
A model is proposed for the creation and transmission of scientific knowledge, based on the network of citations among research articles. The model allows to assign to each article a non-negative value for its creativity, i.e. its creation of new knowledge. If the entire publication network is truncated to the first neighbors of an article (the n references that it makes and the m citations that it receives), its creativity value becomes a simple function of n and m. After splitting the creativity of each article among its authors, the cumulative creativity of an author is then proposed as an indicator of her or his merit of research. In contrast with other merit indicators, this creativity index yields similar values for the top scientists in two very different areas (life sciences and physics), thus offering good promise for interdisciplinary analyses. (c) 2007 Elsevier Ltd. All rights reserved.
Really simple syndication (RSS) is becoming a ubiquitous technology for notifying users of new content in frequently updated web sites, such as blogs and news portals. This paper describes a feature-based, local clustering approach for generating over-view timelines for major events, such as the tsunami tragedy, from a general-purpose corpus of RSS feeds. In order to identify significant events, we automatically (1) selected a set of significant terms for each day; (2) built a set of (term-co-term) pairs and (3) clustered the pairs in an attempt to group contextually related terms. The clusters were assessed by 10 people, finding that the average percentage apparently representing significant events was 68.6%. Using these clusters, we generated overview timelines for three major events: the tsunami tragedy, the US election and bird flu. The results indicate that our approach is effective in identifying predominantly genuine events, but can only produce partial timelines. (c) 2007 Elsevier Ltd. All rights reserved.
Researchers worldwide are increasingly being assessed by the citation rates of their papers. These rates have potential impact on academic promotions and funding decisions. Currently there are several different ways that citation rates are being calculated, with the state of the art indicator being the crown indicator. This indicator has flaws and improvements could be considered. An item oriented field normalized citation score average ((c) over bar (f)) is an incremental improvement as it differs from the crown indicatorin so much as normalization takes place on the level of individual publication (or item) rather than on aggregated levels, and therefore assigns equal weight to each publication. The normalization on item level also makes it possible to calculate the second suggested indicator: total field normalized citation score (Sigma c(f)). A more radical improvement (or complement) is suggested in the item oriented field normalized logarithm-based citation z-score average ((c) over bar (fz[ln]) or citation z-score). This indicator assigns equal weight to each included publication and takes the citation rate variability of different fields into account as well as the skewed distribution of citations over publications. Even though the citation z-score could be considered a considerable improvement it should not be used as a sole indicator of research performance. Instead it should be used as one of many indicators as input for informed peer review. (c) 2007 Elsevier Ltd. All rights reserved.
An empirical law for the rank-order behavior of journal impact factors is found. Using an extensive data base on impact factors including journals on education, agrosciences, geosciences, mathematics, chemistry, medicine, engineering, physics, biosciences and environmental, computer and material sciences, we have found extremely good fittings outperforming other rank-order models. Based in our results, we propose a two-exponent Lotkaian Informetrics. Some extensions to other areas of knowledge are discussed. (c) 2007 Elsevier Ltd. All rights reserved.
The Web of Science is no longer the only database which offers citation indexing of the social sciences. Scopus, CSA Illumina and Google Scholar are new entrants in this market. The holdings and citation records of these four databases were assessed against two sets of data one drawn from the 2001 Research Assessment Exercise and the other from the International bibliography of the Social Sciences. Initially, CSA Illumina's cove-rage at journal title level appeared to be the most comprehensive. But when recall and average citation count was tested at article level and rankings extrapolated by submission frequency to individual journal titles, Scopus was ranked first. When issues of functionality, the quality of record processing and depth of coverage are taken into account, Scopus and Web of Science have a significant advantage over the other two databases. From this analysis, Scopus offers the best coverage from amongst these databases and could be used as an alternative to the Web of Science as a tool to evaluate the research impact in the social sciences. (c) 2007 Charles Oppenheim. Published by Elsevier Ltd. All rights reserved.
Hirsch's h-index seeks to give a single number that in some sense summarizes an author's research output and its impact. Essentially, the h-index seeks to identify the most productive core of an author's output in terms of most received citations. This most productive set we refer to as the Hirsch core, or h-core. Jin's A-index relates to the average impact, as measured by the average number of citations, of this "most productive" core. In this paper, we investigate both the total productivity of the Hirsch core - what we term the size of the h-core - and the A-index using a previously proposed stochastic model for the publication/citation process, emphasising the importance of the dynamic, or time-dependent, nature of these measures. We also look at the inter-relationships between these measures. Numerical investigations suggest that the A-index is a linear function of time and of the h-index, while the size of the Hirsch core has an approximate square-law relationship with time, and hence also with the A-index and the h-index. (c) 2007 Elsevier Ltd. All rights reserved.
The majority of author cocitation analysis (ACA) have relied on the Institute for Scientific Information (ISI) citation databases. ISI convention allows only the retrieval of papers that cite works of which the author is first or sole author. Non-primary authors (authors whose name appear in second or later position) will not be counted when assembling a cocitation frequency matrix. Therefore, this has been a methodological issue in ACA study. This paper empirically examines the impact of the ISI convention on the results of ACA. Previous research has addressed and shed light on some parts of methodological issues, but failed to address issues such as to what extent the use of different approach has resulted in different outcomes in terms of actual intellectual structure of a given academic discipline. Using our data and cociation matrix generation systems, we compare the differences in the process and outcomes of using different cocitation matrices. Our study concludes that all author based ACA is better than first author based ACA to capture all influential researchers in a field. It also identifies more research subspecialties. Finally, all Author based ACA and first author based ACA produce little differences in stress values of MDS outputs. Published by Elsevier Ltd.
A bibliometric analysis was applied in this work to evaluate global scientific production of geographic information system (GIS) papers from 1997 to 2006 in any journal of all the subject categories of the Science Citation Index compiled by Institute for Scientific Information (ISI), Philadelphia, USA. 'GIS' and 'geographic information system' were used as keywords to search parts of titles, abstracts, or keywords. The published output analysis showed that GIS research steadily increased over the past 10 years and the annual paper production in 2006 was about three times 1997s paper production. There are clear distinctions among author keywords used in publications from the five most productive countries (USA, UK, Canada, Germany and China) in GIS research. Bibliometric methods could quantitatively characterize the development of global scientific production in a specific research field. The analytical results eventually provide several key findings. (C) 2007 Elsevier Ltd. All rights reserved.
The Maximum Entropy Principle (MEP) maximizes the entropy provided that the effort remains constant. The Principle of Least Effort (PLE) minimizes the effort provided that the entropy remains constant. The paper investigates the relation between these two principles. In some kinds of effort functions, called admissible, it is shown that these two principles are equivalent. The results are illustrated by the size-frequency statistical distribution met in infometry in Information Production Processes. (C) 2007 Elsevier Ltd. All rights reserved.
When carrying out a research project, some materials may not be available in-house. Thus, investigators resort to external providers for conducting their research. To that end, the exchange may be formalised through material transfer agreements. In this context, industry, government and academia have their own specific expectations regarding compensation for the help they provide when transferring the research material. This paper assesses whether these contracts might have had an impact on visibility of researchers. Visibility is thereby operationalised on the basis of a bibliometric approach. In the sample utilised, researchers that availed themselves of these contracts were more visible compared to those who did not use them, controlling for seniority and co-authorship. Nonetheless, providers and receivers could not be differentiated in terms of visibility but by research sector and co-authorship. Being a user of these contracts might, to some extent, be the reflection of systematic differences in the stratification of science based on visibility. (C) 2007 Elsevier Ltd. All rights reserved.
Egghe and Proot [Egghe, L., & Proot, G. (2007). The estimation of the number of lost multi-copy documents: A new type of informetrics theory. Journal of Informetrics] introduce a simple probabilistic model to estimate the number of lost multi-copy documents based on the numbers of retrieved ones. We show that their model in practice can essentially be described by the well-known Poisson approximation to the binomial. This enables us to adopt a traditional maximum likelihood estimation (MLE) approach which allows the construction of (approximate) confidence intervals for the parameters of interest, thereby resolving an open problem left by the authors. We further show that the general estimation problem is a variant of a well-known unseen species problem. This work should be viewed as supplementing that of Egghe and Proot [Egghe, L., & Proot, G. (2007). The estimation of the number of lost multi-copy documents: A new type of informetrics theory. Journal of Informetrics]. It turns out that their results are broadly in line with those produced by this rather more robust statistical analysis. (C) 2007 Elsevier Ltd. All rights reserved.
A theoretical model of the dependence of Hirsch-type indices on the number of publications and the average citation rate is tested successfully on empirical samples of journal h-indices.
In this paper, we propose two methods for scoring scientific output based on statistical quantile plotting. First, a resealing of journal impact factors for scoring scientific output on a macro level is proposed. It is based on normal quantile plotting which allows to transform impact data over several subject categories to a standardized distribution. This can be used in comparing scientific output of larger entities such as departments working in quite different areas of research. Next, as an alternative to the Hirsch index [Hirsch, J.E. (2005). An index to quantify an individuals scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569-16572], the extreme value index is proposed as an indicator for assessment of the research performance of individual scientists. In case of Lotkaian-Zipf-Pareto behaviour of citation counts of an individual, the extreme value index can be interpreted as the slope in a Pareto-Zipf quantile plot. This index, in contrast to the Hirsch index, is not influenced by the number of publications but stresses the decay of the statistical tail of citation counts. It appears to be much less sensitive to the science field than the Hirsch index. (c) 2007 Elsevier Ltd. All rights reserved.
The relationship of the h-index with other bibliometric indicators at the micro level is analysed for Spanish CSIC scientists in Natural Resources, using publications downloaded from the Web of Science (1994-2004). Different activity and impact indicators were obtained to describe the research performance of scientists in different dimensions, being the h-index located through factor analysis in a quantitative dimension highly correlated with the absolute number of publications and citations. The need to include the remaining dimensions in the analysis of research performance of scientists and the risks of relying only on the h-index are stressed. The hypothesis that the achievement of some highly visible but intermediate-productive authors might be underestimated when compared with other scientists by means of the h-index is tested. (c) 2007 Elsevier Ltd. All rights reserved.
Hirsch [Hirsch, J. E. (2005). An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569-16572] has proposed the h index as a single-number criterion to evaluate the scientific output of a researcher. We investigated the convergent validity of decisions for awarding long-term fellowships to post-doctoral researchers as practiced by the Boehringer Ingelheim Fonds (B.I.F.) by using the h index. Our study examined 414 B.I.F. applicants (64 approved and 350 rejected) with a total of 1586 papers. The results of our study show that the applicants' h indices correlate substantially with standard bibliometric indicators. Even though the h indices of approved B.I.F. applicants on average (arithmetic mean and median) are higher than those of rejected applicants (and with this, fundamentally confirm the validity of the funding decisions), the distributions of the h indices show in part overlaps that we categorized as type I error (falsely drawn approval) or type II error (falsely drawn rejection). Approximately, one-third of the decisions to award a fellowship to an applicant show a type I error, and about one-third of the decisions not to award a fellowship to an applicant show a type II error. Our analyses of possible reasons for these errors show that the applicant's field of study but not personal ties between the B.I.F. applicant and the B.I.F. can increase or decrease the risks for type I and type II errors. (c) 2007 Elsevier Ltd. All rights reserved.
Never before in history has mankind produced and had access to so much data, information, knowledge, and expertise as today. To organize, access, and manage these valuable assets effectively, we use taxonomies, classification hierarchies, ontologies, controlled vocabularies, and other approaches. We create directory structures for our files. We use organizational hierarchies to structure our work environment. However, the design and continuous update of these organizational schemas with potentially thousands of class nodes organizing millions of entities is challenging for any human being. The taxonomy visualization and validation (TV) tool introduced in this paper supports the semi-automatic validation and optimization of organizational schemas such as file directories, classification hierarchies, taxonomies, or other structures imposed on a data set for organization, access, and naming. By showing the "goodness of fit" for a schema and the potentially millions of entities it organizes, the TV tool eases the identification and reclassification of misclassified information entities, the identification of classes that grow too large, the evaluation of the size and homogeneity of existing classes, the examination of the "well-formedness" of an organizational schema, and more. As a demonstration, the TV tool is applied to display and examine the United States Patent and Trademark Office patent classification, which organizes more than three million patents into about 160,000 distinct patent classes. The paper concludes with a discussion and an outlook to future work. (c) 2007 Elsevier Ltd. All rights reserved.
Narrative reviews of peer review research have concluded that there is negligible evidence of gender bias in the awarding of grants based on peer review. Here, we report the findings of a meta-analysis of 21 studies providing, to the contrary, evidence of robust gender differences in grant award procedures. Even though the estimates of the gender effect vary substantially from study to study, the model estimation shows that all in all, among grant applicants men have statistically significant greater odds of receiving grants than women by about 7%. (c) 2007 Elsevier Ltd. All rights reserved.
Basic publication-citation matrices are used to calculate informetric indicators such as journal impact factors or R-sequences. Transforming these publication-citation matrices clarifies the construction of other indicators. In this article, some transformations are highlighted together with some of their invariants. Such invariants offer a rigorous mathematically founded way of comparing informetric matrices before and after a transformation. (c) 2007 Elsevier Ltd. All rights reserved.
A probabilistic model is presented to estimate the number of lost multi-copy documents, based on retrieved ones. For this, we only need the number of retrieved documents of which we have one copy and the number of retrieved documents of which we have two copies. If we also have the number of retrieved documents of which we have three copies then we are also able to estimate the number of copies of the documents that ever existed (assumed that this number is fixed over all documents). Simulations prove the stability of the model. The model is applied to the estimation of the number of lost printed programmes of Jesuit theatre plays in the Provincia Flandro-Belgica before 1773. This Jesuit province was an administrative entity of the order, which was territorially slightly larger in extent than present day Flanders, the northern, Dutch-speaking part of Belgium. It is noted that the functional model P-j for the fraction of retrieved documents with j copies is a size-frequency function satisfying (Pj+1/P-j)/(P-j/Pj-1)<1 for all j. It is further noted that the "classical" size-frequency functions are different: Lotka's function satisfies the opposite inequality and the decreasing exponential one gives always 1 for the above ratio, hence showing that we are in a new type of informetrics theory. We also provide a mathematical rationale for the "book historical law" stating that the probability to lose a copy of a multi-copy document (i.e. an edition) is an increasing function of the size of the edition. The paper closes with some open problems and a description of other potential applications of this probabilistic model. (c) 2007 Published by Elsevier Ltd.
Under the National Innovation System (NIS) framework, knowledge stock has been recognized as a key factor for enhancing national innovative capabilities. However, despite the importance of patents and papers for measuring knowledge, previous research has not fully utilized patent and paper databases, and has instead relied on research and development (R&D) data. Therefore, in this research, I introduce a way to utilize both types of useful data when measuring industrial knowledge stocks. As primary data sources, the United States Patent and Trademark Office (USPTO) Web site for patents and the science citation index (SCI) for papers are used. In the case of Korea, the amount of knowledge stock proxied by patents and papers is different from that proxied by R&D, which indicates in turn that using a single indicator such as R&D may be misleading. Although the result may vary depending on the selected nation, the proposed method will be useful for gauging knowledge stocks in a more complementary way. (c) 2007 Elsevier Ltd. All rights reserved.
We focus on the statistics of word occurrences and of the waiting times between such occurrences in Blogs. Due to the heterogeneity of words' frequencies, the empirical analysis is performed by studying classes of "frequently-equivalent" words, i.e. by grouping words depending on their frequencies. Two limiting cases are considered: the dilute limit, i.e. for those words that are used less than once a day, and the dense limit for frequent words. In both cases, extreme events occur more frequently than expected from the Poisson hypothesis. These deviations from Poisson statistics reveal non-trivial time correlations between events that are associated with bursts of activities. The distribution of waiting times is shown to behave like a stretched exponential and to have the same shape for different sets of words sharing a common frequency, thereby revealing universal features. (c) 2007 Elsevier Ltd. All rights reserved.
Based on previous findings and theoretical considerations, it was suggested that bibliographic coupling could be combined with a cluster method to provide a method for science mapping, complementary to the prevailing co-citation cluster analytical method. The complete link cluster method was on theoretical grounds assumed to provide a suitable cluster method for this purpose. The objective of the study was to evaluate the proposed method's capability to identify coherent research themes. Applying a large multidisciplinary test bed comprising more than 600,000 articles and 17 million references, the proposed method was tested in accordance with two lines of mapping. In the first line of mapping, all significant (strong) links connecting 'core documents' (strongly and frequently coupled documents) in clusters with any other core document was mapped. This resulted in a depiction of all significant artificially broken links between core documents in a cluster and core documents extrinsic to that cluster. The second line of mapping involved the application of links between clusters only. They were used to successively merge clusters on two subsequent levels of fusion, where the first generation of clusters were considered objects for a second clustering, and the second generation of clusters gave rise to a final cluster fusion. Changes of cluster composition on the three levels were evaluated with regard to several variables. Findings showed that the proposed method could provide with valid depictions of current research, though some severe restrictions would adhere to its application. (c) 2007 Elsevier Ltd. All rights reserved.
The original Lotka's Law refers to single scientist distribution, i.e. the frequency of authors A(i) with i publications per author is a function of i: A(i) =f(i). However, with increasing collaboration in science and in technology the study of the frequency of pairs or triples of co-authors is highly relevant. Starting with pair distribution well-ordered collaboration structures of co-author pairs will be presented, i.e. the frequency of co-author pairs N-ij between authors with i publications per author and authors with j publications per author is a function of i and j: N-ij =f(i,j) using the normal count procedure for counting i or j. We have assumed that the distribution of co-author pairs' frequencies can be considered to be reflection of a social Gestalt and therefore can be described by the corresponding mathematical function based on well-known general characteristics of structures in interpersonal relations in social networks. We have shown that this model of social Gestalts can better explain the distribution of co-author pairs than by a simple bivariate function in analogy to Lotka's Law. This model is based on both the Gestalt theory and the old Chinese Yin/Yang theory. (c) 2007 Elsevier Ltd. All fights reserved.
The method of bibliographic coupling in combination with the complete link cluster method was applied for mapping of the field of organic chemistry with the purpose of testing the applicability of a proposed mapping method on the field level. The method put forward aimed at the generation of cognitive cores of documents, so-called 'bibliographic cliques' in the network of bibliographically coupled research articles. The defining feature of these cliques is that they can be considered complete graphs where each bibliographic coupling link ties an unordered pair of documents. In this way, it was presumed that coherent groups of documents in the research front would be found and that these groups would be intellectually coherent as well. Statistical analysis and subject specialist evaluations confirmed these presumptions. The study also elaborates on the choice of observation period and the application of thresholds in relation to the size of document populations. (c) 2007 Elsevier Ltd. All rights reserved.
We show that usually the influence on the Hirsch index of missing highly cited articles is much smaller than the number of missing articles. This statement is shown by a combinatorial argument. We further show, by using a continuous power law model, that the influence of missing articles is largest when the total number of publications is small, and non-existing when the number of publications is very large. The same conclusion can be drawn for missing citations. Hence, the h-index is resilient to missing articles and to missing citations. (c) 2006 Elsevier Ltd. All rights reserved.
We apply the Google PageRank algorithm to assess the relative importance of all publications in the Physical Review family of journals from 1893 to 2003. While the Google number and the number of citations for each publication are positively correlated, outliers from this linear relation identify some exceptional papers or "gems" that are universally familiar to physicists. (c) 2006 Published by Elsevier Ltd.
We propose a simple stochastic model for an author's production/citation process in order to investigate the recently proposed h-index for measuring an author's research output and its impact. The parametric model distinguishes between an author's publication process and the subsequent citation processes of the published papers. This allows us to investigate different scenarios such as varying the production/publication rates and citation rates as well as the researcher's career length. We are able to draw tentative results regarding the dependence of Hirsch's h-index on each of these fundamental parameters. We conjecture that the h-index is, according to this model, (approximately) linear in career length, log publication rate and log citation rate, at least for moderate citation rates. (c) 2006 Elsevier Ltd. All rights reserved.
Citation analysis was traditionally based on data from the ISI Citation indexes. Now with the appearance of Scopus, and with the free citation tool Google Scholar methods and measures are need for comparing these tools. In this paper we propose a set of measures for computing the similarity between rankings induced by ordering the retrieved publications in decreasing order of the number of citations as reported by the specific tools. The applicability of these measures is demonstrated and the results show high similarities between the rankings of the ISI Web of Science and Scopus and lower similarities between Google Scholar and the other tools. (c) 2006 Elsevier Ltd. All rights reserved.
This paper presents a method for assessing the quality of similarity functions. The scenario taken into account is that of approximate data matching, in which it is necessary to determine whether two data instances represent the same real world object. Our method is based on the semi-automatic estimation of optimal threshold values. We propose two methods for performing such estimation. The first method is an algorithm based on a reward function, and the second is a statistical method. Experiments were carried out to validate the techniques proposed. The results show that both methods for threshold estimation produce similar results. The output of such methods was used to design a grading function for similarity functions. This grading function, called discernability, was used to compare a number of similarity functions applied to an experimental data set. (c) 2006 Elsevier Ltd. All rights reserved.
This paper investigates the mechanism of the Journal Impact Factor (JIF). Although created as a journal selection tool the indicator is probably the central quantitative indicator for measuring journal quality. The focus is journal self-citations as the treatment of these in analyses and evaluations is highly disputed. The role of self-citations (both self-citing rate and self-cited rate) is investigated on a larger scale in this analysis in order to achieve statistical reliable material that can further qualify that discussion. Some of the hypotheses concerning journal self-citations are supported by the results and some are not. (c) 2006 Elsevier Ltd. All rights reserved.
Statistical distributions in the production of information are most often studied in the framework of Lotkaian informetrics. In this paper, we recall some results of basic theory of Lotkaian informetrics, then we transpose methods (Theorem 1) applied to Lotkaian distributions by Leo Egghe (Theorem 2) to the exponential distributions (Theorem 3, Theorem 4). We give examples and compare the results (Theorem 5). Finally, we propose to widen the problem using the concept of exponential informetric process (Theorem 6). (c) 2006 Elsevier Ltd. All rights reserved.
The peer-review process, in its present form, has been repeatedly criticized. Of the many critiques ranging from publication delays to referee bias, this paper will focus specifically on the issue of how submitted manuscripts are distributed to qualified referees. Unqualified referees, without the proper knowledge of a manuscript's domain, may reject a perfectly valid study or potentially more damaging, unknowingly accept a faulty or fraudulent result. In this paper, referee competence is analyzed with respect to referee bid data collected from the 2005 Joint Conference on Digital Libraries (JCDL). The analysis of the referee bid behavior provides a validation of the intuition that referees are bidding on conference submissions with regards to the subject domain of the submission. Unfortunately, this relationship is not strong and therefore suggests that there may potentially exists other factors beyond subject domain that may be influencing referees to bid for particular submissions. (c) 2006 Elsevier Ltd. All rights reserved.
Aim: The scientific norm of universalism prescribes that external reviewers recommend the allocation of awards to young scientists solely on the basis of their scientific achievement. Since the evaluation of grants utilizes scientists with different personal attributes, it is natural to ask whether the norm of universalism reflects the actual evaluation practice. Subjects and methods: We investigated the influence of three attributes of external reviewers on their ratings in the selection procedure followed by the Boehringer Ingelheim Fonds (B.I.F.) for awarding long-term fellowships to doctoral and post-doctoral researchers in biomedicine: (i) number of applications assessed in the past for the B. I. F. (reviewers' evaluation experience), (ii) the reviewers' country of residence and (iii) the reviewers' gender. To analyze the reviewers' ratings (1: award; 2: maybe award; 3: no award) in an ordinal regression model (ORM) the following were considered in addition to the three attributes: (i) the scientific achievements of the fellowship applicants, (ii) interaction effects between reviewers' and applicants' attributes and (iii) judgmental tendencies of reviewers. Results: The results of the model estimations show no significant effect of the reviewers' attributes on the evaluation of B. I. F. fellowship applications. The ratings of the external reviewers are mainly determined by the applicants' scientific achievement prior to application. Conclusions: The results suggest that the external reviewers of the B. I. F. indeed achieved the foundation's goal of recommending applicants with higher scientific achievement for fellowships and of recommending those with lower scientific achievement for rejection. (c) 2006 Elsevier Ltd. All rights reserved.
In an earlier paper by Glanzel and Schubert [Glanzel, W., & Schubert, A. (1988a). Characteristic scores and scales in assessing citation impact. Journal of Information Science, 14(2), 123-127; Glanzel, W., & Schubert, A. (1988b). Theoretical and empirical studies of the tail of scientometric distributions. In L. Egghe, & R. Rousseau (Eds.), Informetrics: Vols. 87/88, (pp. 75-83). Elsevier Science Publisher B. V.], a method for classifying ranked observations into self-adjusting categories was developed. This parameter-free method, which was called method of characteristic scores and scales, is independent of any particular bibliometric law. The objective of the present study is twofold. In the theoretical part, the analysis of its properties for the general form of the Pareto distribution will be extended and deepened; in the empirical part the citation history of individual scientific disciplines will be studied. The chosen citation window of 21 years makes it possible to analyse dynamic aspects of the method, and proves sufficiently large to also obtain stable patterns for each of the disciplines. The theoretical findings are supplemented by regularities derived from the long-term observations. (c) 2006 Elsevier Ltd. All rights reserved.
A dialogue-based interface for information systems is considered a potentially very useful approach to information access. A key step in computer processing of natural-language dialogues is dialogue-act (DA) recognition. In this paper, we apply a feature-based classification approach for DA recognition, by using the maximum entropy (ME) method to build a classifier for labeling utterances with DA tags. The ME method has the advantage that a large number of heterogeneous features can be flexibly combined in one classifier, which can facilitate feature selection. A unique characteristic of our approach is that it does not need to model the prior probability of DAs directly, and thus avoids the use of a discourse grammar. This simplifies the implementation of the classifier and improves the efficiency of DA recognition, without sacrificing the classification accuracy. We evaluate the classifier using a large data set based on the Switchboard corpus. Encouraging performance is observed; the highest classification accuracy achieved is 75.03%. We also propose a heuristic to address the problem of sparseness of the data set. This problem has resulted in poor classification accuracies of some DA types that have very low occurrence frequencies in the data set. Preliminary evaluation shows that the method is effective in improving the macroaverage classification accuracy of the ME classifier.
This article is concerned with the difficulty of crossword puzzles. A model is proposed that quantifies the difficulty of a Puzzle P with respect to its clues. Given a clue-answer pair (c,a), we model the difficulty of guessing a based on c using the conditional probability P(a vertical bar c); easier mappings should enjoy a higher conditional probability. The model is tested by two experiments, each of which involves estimating the difficulty of puzzles taken from The New York Times. Additionally, we discuss how the notion of information implicit in our model relates to more easily quantifiable types of information that figure into crossword puzzles.
Many automatic text summarization models have been developed in the last decades. Related research in information science has shown that human abstractors extract sentences for summaries based on the hierarchical structure of documents; however, the existing automatic summarization models do not take into account the human abstractor's behavior of sentence extraction and only consider the document as a sequence of sentences during the process of extraction of sentences as a summary. In general, a document exhibits a well-defined hierarchical structure that can be described as fractals-mathematical objects with a high degree of redundancy. In this article, we introduce the fractal summarization model based on the fractal theory. The important information is captured from the source document by exploring the hierarchical structure and salient features of the document. A condensed version of the document that is informatively close to the source document is produced iteratively using the contractive transformation in the fractal theory. The fractal summarization model is the first attempt to apply fractal theory to document summarization. It significantly improves the divergence of information coverage of summary and the precision of summary. User evaluations have been conducted. Results have indicated that fractal summarization is promising and outperforms current summarization techniques that do not consider the hierarchical structure of documents.
This study provides the first nationwide analysis of states'e-government support for domestic violence (DV) survivors, identifying characteristics and patterns of domestic violence content and access to this content on all state government Web sites (50 states plus the District of Columbia). Using a systematic examination of click paths and site search results, DV content was located, examined, and codified in terms of information type (e.g., shelter access), accessibility (e.g., language), and type of authoring agency (e.g., law enforcement). General DV resources such as hotline/referral services were more prevalent than content related to specific needs such as child custody. States provide substantially more information on immediate emergency needs, which are actually met at the local level, than on intermediate or long-term support. Accessibility was hampered by both cognitive concerns (e.g., English-only sites) and affective concerns (e.g., a tone which focused on data transmission rather than on information use). Legal/law enforcement agencies rather than social service or medical agencies consistently provided the most information as well as the largest numbers of connections to other sites, both within and beyond the state government site.
Using an enriched author cocitation analysis (ACA), we map information science (IS) for 1996-2005, a decade of explosive development of the World Wide Web, to examine its development since the landmark study by White and McCain (1998). The Web, we find, has had a profound impact on IS, driving the creation of new disciplines and revitalization or obsolescence of old, and most importantly, bridging the chasm between the "literatures" and "retrieval" IS camps. Simultaneously, the development of IS towards cognitive aspects has intensified. Our study enriches classic ACA in that it employs both orthogonal and oblique rotations in the factor analysis (FA), and reports both pattern and structure matrices for the latter, thus enabling a comparison between these several FA methods in ACA. Each method provides interesting information not available from the others, we find, especially when results are also visualized in the novel manner we introduce here.
The purpose of this article is to critically deconstruct the term engagement as it applies to peoples' experiences with technology. Through an extensive, critical multidisciplinary literature review and exploratory study of users of Web searching, online shopping, Webcasting, and gaming applications, we conceptually and operationally defined engagement. Building on past research, we conducted semistructured interviews with the users of four applications to explore their perception of being engaged with the technology. Results indicate that engagement is a process comprised of four distinct stages: point of engagement, period of sustained engagement, disengagement, and reengagement. Furthermore, the process is characterized by attributes of engagement that pertain to the user, the system, and user-system interaction. We also found evidence of the factors that contribute to nonengagement. Emerging from this research is a definition of engagement-a term not defined consistently in past work-as a quality of user experience characterized by attributes of challenge, positive affect, endurability, aesthetic and sensory appeal, attention, feedback, variety/novelty, interactivity, and perceived user control. This exploratory work provides the foundation for future work to test the conceptual model in various application areas, and to develop methods to measure engaging user experiences.
Hirsch's (2005) h index of scholarly output has generated substantial interest and wide acceptance because of its apparent ability to quantify scholarly impact simply and accurately. We show that the excitement surrounding h is premature for three reasons: h stagnates with increasing scientific age; it is highly dependent on publication quantity; and it is highly dependent on field-specific citation rates. Thus, it is not useful for comparing scholars across disciplines. We propose the scholarly "index of quality and productivity" (IQp) as an alternative to h. The new index takes into account a scholar's total impact and also corrects for field-specific citation rates, scholarly productivity, and scientific age. The IQp accurately predicts group membership on a common metric, as tested on a sample of 80 scholars from three populations: (a) Nobel winners in physics (n = 10), chemistry (n = 10), medicine (n = 10), and economics (n = 10), and towering psychologists (n = 10); and scholars who have made more modest contributions to science including randomly selected (b) fellows (n = 15) and (c) members (n = 15) of the Society of Industrial and Organizational Psychology. The IQp also correlates better with expert ratings of greatness than does the h index.
This article reports the results of a study into the use of discrete journal-article components, particularly tables and figures extracted from published scientific journal articles, and their application to teaching and research. Sixty participants were introduced to and asked to perform searches in a journal-article component prototype that presents individual tables and figures as the items returned in the search results set. Multiple methods, including questionnaires, observations, and structured diaries, were used to collect data. The results are analyzed in the context of previous studies on the use of scientific journal articles and in terms of research on scientists' use of specific journal-article components to find information, assess its relevance, read, interpret, and disaggregate the information found, and reaggregate components into new forms of information. Results indicate that scientists believe searching for journal-article components has value in terms of (a) higher precision result sets, (b) better match between the granularity of the prototype's index and the granularity of the information sought for particular tasks, and (c) fit between journal-article component searching and the established teaching and research practices of scientists.
The classic problem within the information quality (IQ) research and practice community has been the problem of defining IQ. It has been found repeatedly that IQ is context sensitive and cannot be described, measured, and assured with a single model. There is a need for empirical case studies of IQ work in different systems to develop a systematic knowledge that can then inform and guide the construction of context-specific IQ models. This article analyzes the organization of IQ assurance work in a large-scale, open, collaborative encyclopedia-Wikipedia. What is special about Wikipedia as a resource is that the quality discussions and processes are strongly connected to the data itself and are accessible to the general public. This openness makes it particularly easy for researchers to study a particular kind of collaborative work that is highly distributed and that has a particularly substantial focus, not just on error detection but also on error correction. We believe that the study of those evolving debates and processes and of the IQ assurance model as a whole has useful implications for the improvement of quality in other more conventional databases.
The author describes an exploratory analysis of the influence of place and proximity on collaboration. Bibliometric data and biographical information are combined to reveal the extent to which co-authorship relationships are a function of physical collocation.
This article focuses on why academic writers in computer science and sociology sometimes supply the reader with more details of citees' names than they need to: Why do citers name citees when using the Footnote System, and why do citers include citees' first names when using the Harvard System? These questions were investigated as part of a qualitative, interview-based study of citation behavior. A number of motivations were advanced by informants, including the desire for stylistic elegance, for informality, to make the text accessible to less informed readers, to mark a close relationship between citer and citee, to alert readers to a little known citee, and to acknowledge seminal sources. In a number of cases, however, informants were unable to offer any motivation, reporting that their behavior had been unconscious or accidental. The study underlines B. Cronin's (1984, 2005) argument that citation is a private and subjective process, and shows that interview-based studies afford the analyst insights into writers' citing practices which alternative methodologies cannot.
The intellectual structure and main research fronts of the Faculty of Natural Sciences and Museum of the National University of La Plata, Argentina is studied, based on the cocitation analysis of subject categories, journals and authors of their scientific publications collected in the Science Citation index, CD-ROM version, for the period 1991-2000. The objective of this study is to test the utility of those techniques to explore and to visualize the intellectual structure and research fronts of multidisciplinary institutional domains. Special emphasis is laid on the identification of multilevel structures, by means of arrangements of subject categories cocitation analysis and journal cocitation analysis.
Beginning from the premise that research competitiveness at the university level is the starting point for national competitiveness as a whole, this paper analyzes the correlation between university research-related performance and the scholarly or academic resources available through a country's library system. An analysis of this correlation from two different angles - a macroscopic approach considering universities in OECD nations and a microscopic approach focusing only upon universities in Korea - found that there is indeed a significant correlation between university research performance and the scholarly information available at libraries. A regression analysis of the two approaches also found that the more journal titles subscribed to by university libraries and the higher their budget for materials, the greater the contribution university libraries make to university research competitiveness in Korea as well as other OECD countries. In this light, in order for Korea to reach a level of research competitiveness comparable to other OECD members, policies need to be created that will effectively increase the number of journals subscribed to by university libraries.
This article deals with the role of internationally co-authored papers (co-publications). Specifically, we compare, within a data-set of German research units, citation and co-publication indicators as a proxy for the unobserved quality dimension of scientific research. In that course we will also deal with the question whether both citations and co-publications are considerably related. Our results suggest that, although there is a strong partial correlation between citations and co-publications within a multivariate setting, we cannot use reasonably normalised co-publication indicators as an alternative proxy for quality. Thus, concerning quality assessment, there remains a primer on citation analysis.
Introduction: Publication delay, chronological distance between completion of a scientific work and distribution of its achievements as a peer reviewed paper, is a negative phenomenon in scientific information dissemination. It can be further subdivided in successive stages corresponding to the peer review process and the technical preparation of accepted manuscripts. Formal online posting in electronic versions of journals has been considered as a shortening of the process. Objectives: To determine publication delay in a group of leading Food Research journals, as well as factors affecting this lag and also to compute the effect of formal online posting on the distribution of papers in electronic form. Secondary objective is also to study the possible effect of informal posting of papers through some repositories on the publication delay in the field. Methods: 14 Food Research journals were selected and 4836 papers published in 2004 were examined. Dates of first submission, submission of revised manuscripts, acceptation, online posting and final publication were recorded for each paper. Analysis: Data collected were analyzed using SPSS and SigmaPlot. Parametric correlation between some variables was determined and ANOVA was performed with BMDP package for significance analysis of differences among journals. Results: average publication delay of papers submitted to the set of selected journals is 348 :h 104 days, with European Food Research and Technology and Journal of Agricultural and Food Chemistry showing the shortest delays. Total delay strongly depends on the peer review process. On average, 85.75% of manuscripts are corrected prior to their acceptance by journals. Online posting of papers prior to their print publication reduces total delay in about 29%. On average, a paper is posted online 260 days after its submission to the set of journals. Conclusions: Publication delay of papers is strongly dependent on the peer review process, which affects most of the manuscripts in the Food Research field. Advanced online publication through formal posting at the editor's sites only slightly reduces the time between reception and final publication of papers.
Although universities' world rankings are popular, their design and methods still request considerable elaborations. The paper demonstrates some shortcomings in the Academic World Ranking of Universities (ARWU, Shanghai Jiao Tong University) ranking methods. One deficiency is that universities' scale differences are neglected due to omitting the whole input side. By resampling and reanalyzing the ARWU data, the paper proposes an input-output analysis for measuring universities' scientific productivity with special emphasis on those universities which meet the productivity threshold (i.e. share of output exceeds share of input) in a certain group of universities. The productivity analysis on Scandinavian universities evaluates Multidisciplinary and specialized universities on their own terms; consequently the ranking based on scientific productivity deviates significantly from the ARWU.
We investigated the publication trends in the international earth science literature coming out of Turkey in the period of 1970-2005 using the Science Citation Index Expanded database. A database of 23 10 earth science publications with at least one of the authors with an address in Turkey was compiled. The number of earth science publications from Turkey shows a very rapid increase starting in the 1990's in parallel with the increase in the total scientific output of Turkey. In the last decade the annual growth rate has been 16%. There was also a concomitant increase in the number of citations. The causes of the sharp increase in the publication numbers are, in order of importance, changes in the rules of academic promotion and appointment, changes in academic attitudes towards publishing, increasing support for research, financial incentives for publishing, and expansion of higher education. However, the sharp increase in the publication numbers was not accompanied by a similar increase in the impact of the publications as measured by the citations. Although publications with first authors from outside Turkey make up only 20% of the Turkish earth science publications in the period 1970-2005, these account for 38% of the total citations, and constitute 48 out of 100 most cited papers.
We investigate possible effects from a strong encouragement for a large number of publications on the scientific production of a Brazilian cell biology department. An average increase in individual absolute production and a concomitant decrease in individual participation in each paper were detected by traditional bibliometric parameters, such as number of publications, citations, impact factors and h index, combined to their "effective" versions, in which co-authorship is taken into consideration. The observed situation, which might well represent a national trend, should be considered as a strong wanting against current criteria of scientific evaluation heavily based on uncritical counting of publications.
This paper presents a study of possible changes in patterns of document types in economics journals since the mid-1980s. Furthermore, the study includes an analysis of a possible relation between the profile of a journal concerning composition of document types and factors such as place of publication and JIF. The results provide little evidence that the journal editors have succeeded in manipulating the distribution of document types. Furthermore, there is little support for the hypothesis that journal editors decrease the number of publications included in the calculation of JIF or for that matter for the hypothesis that journal editors increase the number of publications not included in the calculation of JIF. The results of the analyses show that there is a clear distinction of journals based on place of publication and JLF.
All references data was extracted from the annual volumes of the CD-Edition of Science Citation Index (SCI) and the Web of Science of the Institute for Scientific Information (ISI), the journal citation and self-citation data extracted from the Journal Citation Report (JCR), the self-citing rate and self-cited rate calculated based on the JCR method. To determine the trend of mean value of references per paper throughout 1970-2005, a total number of 10,000 records were randomly chosen for each year of under study, and the mean value of references per paper was calculated. To determine the growth of journals IF a total number of 5,499 journals were chosen in the JCR in 2002 and the same set of journals in the year 2004. To show the trend of journals IF, all journals indexed in the JCR throughout 1999-2005 were extracted and the mean values of their IFs was calculated annually. The study showed that the number of references per paper from 1970 to 2005 has steady increased. It reached from 8.40 in 1970 to 34.63 in 2005, an increase of more than 4 times. The majority of publications (76.17%) were in the form of Journals Article. After articles,, Meeting Abstracts (9.46%), Notes (3.90%) and Editorial Material (3.78%) are the most frequented publication forms, respectively. 94.57% of all publications were in English. After English, German (1.50%), Russian (1.48%) and French (1.37%) were the most frequented languages, respectively. The study furthermore showed that there is a significant correlation between the IF and total citation of journals in the JCR, and there is an important hidden correlation between IF and the self-citation of journals. This phenomena causes the elevation of journals IF. The more often a journal is citing other journals, the more often it is also cited (by a factor of 1.5) by others. In consequence the growing percentage of journal self-citation is followed by journal self-citedness, which can be considered as the Matthew Effect. There is a linear correlation between journal self-citing and journal self-cited value, the mean value of self-cited rate always stays higher than the self-citing rate. The mean value of self-cited rate in 2000 was 1.4% and the mean value of self-citing rate is 6.61%, whereas the mean value of self-cited rate in 2005 was 12% and the mean value of self-citing rate was 7.81%.
while implementing a large-scale research project, it is necessary to appoint some principle scientists, and let each principle scientist lead a research group. In a scientific collaboration community, different scientists perform different roles while they implement the project, and some scientists may be more active than others; these active scientists often undertake the role of leadership or key coordinator in the project. Obviously, we should assign the role of principle scientists onto those active actors in the communities. In this paper, we present the model and algorithms for locating active actors in the community based on the analyses of scientists' interaction topology, the actors with high connection degrees in the interaction topology can be considered as active ones. Finally, we make some case studies for our model and algorithms.
This paper is an investigation of the knowledge sources of Korean innovation studies using citation analysis, based on a Korean database during 1993-2004. About two thirds of knowledge has come from foreign sources and 94% of them are from English materials. Research Policy is the most frequently cited journal followed by Harvard Business Review, R&D Management and American Economic Review. An analysis of who cites the most highly cited journal is also included. Neo-Schumpeterians in Korea cite more papers from Research Policy than general researchers, and there is no difference between groups in the year of citation.
Although many Indian surnames are common across the whole country, some are specifically associated with just one or a few of the 35 states and union territories that comprise India today. For example, Reddy comes from Andhra Pradesh and Das, Ghosh and Roy from West Bengal. We investigated the extent to which researchers with names associated with some of the larger states were writing scientific papers in those states, and in other ones, and to see how these concentrations (relative to the whole of India) had changed since the early 1980s. We found that West Bengalis, for example, were now significantly less concentrated in their home state than formerly, and that their concentrations elsewhere were strongly influenced by the state's geographical distance from West Bengal and, to a lesser extent, by the correlation between the scientific profile of their host state and their own preferences (which favoured physics and engineering over biology and mathematics). Thus they were strongly represented in nearby Bihar, Assam and Orissa, and much less so in Tamil Nadu and Kerala.
We propose new methods to detect paradigmatic fields through simple statistics over a scientific content database. We propose an asymmetric paradigmatic proximity metric between terms which provide insight into hierarchical structure of scientific activity and test our methods on a case study with a database made of several millions of resources. We also propose overlapping categorization to describe paradigmatic fields as sets of terms that may have several different usages. Terms can also be dynamically clustered providing a high-level description of the evolution of the paradigmatic fields.
The paper was to establish an easy and effective method to investigate and develop a specific technological field from Japanese patent information. The walking technique of the biped humanoid robot was used as an example to study the relative research capabilities and patent citation conditions for patent owners and patent map by the searching method of the theme code for FI (File Index) and F-term classification system of the Japanese Patent Office (JPO). A formulated technical matrix of patent map was established to indicate that the ZMP (Zero Moment Point) control means was the main technology to achieve stabilized walking control of the humanoid biped robot. This method can aid to establish a specific technological matrix from the specific selected term codes (single viewpoint or multiple viewpoints) of the F-term list in the theme code of the JPO system through Boolean logical operations. The resulting particular technical fields were developed to improve the technological capability or seek the merging technology opportunities.
The objective of the study was to perform a bibliometric analysis of all pentachlorophenolrelated publications in the Science Citation Index (SCI). Analyzed parameters included document type, language of publication, page count, publication output, authorship, keywords plus, publication pattern, citation and country of publication. The US produced 29% of the total single country publications where the seven major industrial countries accounted for the majority of the total production (66%). An indicator citation per publication was successfully applied in this study to evaluate the impact of number of authors, countries, and journals. The mean value of citation per publication of collaborative papers was higher than that of single country publications. In addition analysis of keywords plus in different period was applied to indicate a research trend.
Bibliometric indicators are widely used to compare performance between units operating in different fields of science. For cross-field comparisons, article citation rates have to be normalised to baseline values because citation practices vary between fields, in respect of timing and volume. Baseline citation values vary according to the level at which articles are aggregated (journal, sub-field, field). Consequently, the normalised citation performance of each research unit will depend on the level of aggregation, or 'zoom', that was used when the baselines were calculated. Here, we calculate the citation performance of UK research units for each of three levels of article-aggregation. We then compare this with the grade awarded to that unit by external peer review. We find that the correlation between average normalised citation impact and peerreviewed grade does indeed vary according to the selected level of zoom. The possibility that the level of 'zoom' will affect our assessment of relative impact is an important insight. The fact that more than one view and hence more than one interpretation of performance might exist would need to be taken into account in any evaluation methodology. This is likely to be a serious challenge unless a reference indicator is available and will generally require any evaluation to be carried out at multiple levels for a reflective review.
This is the first article using bibliometrics to study the field of contingent valuation research. The purpose of this study was to evaluate the contingent valuation research performance based on all the related articles in SCI and SSCI databases from 1991 to 2005. An indicator named citation per publication (CPP) was presented in this study to assess the impact of article output per year, different countries, institutes, and authors from the worldwide. Publication per institute (PPI) in a country was used to be an indicator to compare institute's research performance by country. Citation analysis was made to select the most frequently cited articles since publication to 2005 of each year. A citation model was applied to describe the relationship between the cumulative number of citations and article life. The results indicate that with the increase article output per year, the CPP decreased slightly since 1997. The USA produced 55% of all pertinent articles. Institutes from the UK had a higher PPI. The most prolific institutes and authors, and the most frequently cited articles per year were all listed. In addition, a citation model was successfully applied to evaluate performance of each year, and the most frequently cited articles of each year were also compared by the model.
A keyword analysis was applied in this work to evaluate research trends of DDT (1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane) papers published between 1991 and 2005 in any journal of all the subject categories of the Science Citation Index compiled by ISI (Institute for Scientific Information, Philadelphia, USA). DDT was used as a keyword to search parts of titles, abstracts, or keywords. The published output analysis showed that DDT research steadily increased over the past 15 years and the annual publication output in 2005 was about twice that of 1991. The two peaks in 1997 and 2000 were closely related to two new research fields on DDT, namely the endocrine disruption and the persistent organic pollutants (POPs). A paper entitled "Persistent DDT metabolite p,p'-DDE is a potent androgen receptor antagonist" published in 1995 in Nature by Kelce et al. firstly discovered DDT's toxicity for humans. As a result, public concerns regarding DDT ballooned and now play a key role in DDT research. Keyword analysis indicated that the research interest changed remarkably from 1991 to 2005. "Endocrine disruption" was one of the most frequently used author keywords in the period between 2002 and 2005 whilst it did not appear before 1997. The new conception of POPs showed the same trend. The whole paper published by India and Mexico ranked at 6(th) and 13(th). That showed that DDT research is often related with DDT's risk and benifits to humans.
The study analyses 27018 research papers published by India in condensed matter physics as seen from Science Citation Index-Extended Version (SCIE) (Web of Science) database for the period 1993-1995, 1996-1998 and 1999-2001. The study reports that condensed matter physics is the most sought after branch in physics research in India, accounting for 20% share of the country output in physics. The University & College sector as well as R&D sector are the major contributors to condensed matter physics. However, the country growth in this field, computed on six yearly basis, has still been negative (-1%) compared to 17.4% country growth in overall physics during the same period, 1993-1995 to 1999-2001. The study also maps condensed matter physics research on other dimensions such as institutional productivity, nature of collaboration in research, and institutional specialization. It examines highly cited papers, and lists prominent and productive scientists in this field. It also provides suggestions for accelerating condensed matter research in India.
Impact factors, publication-citation patterns and growth dynamics were analyzed for the Latin America and the Caribbean journals covered by the Science Citation Index (SCI) and Social Science Citation Index from 1995-2003. Two main journal groups were identified: those publishing mainly in English with substantial contributions from outside the region, and those publishing in local languages, principally by the local community and on subjects of local interest. We found little inter-citation among the local papers while the highest number of citations by extra-regional authors was to papers published in English. Quantitative indicators show that LA-C journals are better positioned in the mainstream literature than ever before.
We extend the pioneering work of J. E. Hirsch, the inventor of the h-index, by proposing a simple and seemingly robust approach for comparing the scientific productivity and visibility of institutions. Our main findings are that i) while the h-index is a sensible criterion for comparing scientists within a given field, it does not directly extend to rank institutions of disparate sizes and journals, ii) however, the h-index, which always increases with paper population, has an universal growth rate for large numbers of papers; iii) thus the h-index of a large population of papers can be decomposed into the product of an impact index and a factor depending on the population size, iv) as a complement to the h-index, this new impact index provides an interesting way to compare the scientific production of institutions (universities, laboratories or journals).
The purpose of this article is to provide information about author productivity as reflected through the number of occurrences of personal name headings in the Slovenian online catalogue COBIB. Only authors associated with monographs are treated. So, author productivity of monographs that has not been widely researched is empirically examined to determine conformity or nonconformity to Lotka's law. A random sample of 1.600 Slovenian authors is drawn from the authority file CONOR. Next, the authors are searched in COBIB and each attributed the number of monographs. Using the formula: x(n) y = c, the values of the exponent n and the constant c are computed and the Kolmogorov-Smirnov test is applied. The paper shows that the author productivity distribution predicted by Lotka also holds for the occurrences of personal name headings in COBIB.
The structure of scientific collaboration networks in scientometrics is investigated at the level of individuals by using bibliographic data of all papers published in the international journal Scientometrics retrieved from the Science Citation Index (SCI) of the years 1978-2004. Combined analysis of social network analysis (SNA), co-occurrence analysis, cluster analysis and frequency analysis of words is explored to reveal: (1) The microstructure of the collaboration network on scientists' aspects of scientometrics; (2) The major collaborative fields of the whole network and of different collaborative sub-networks; (3) The collaborative center of the collaboration network in scientometrics.
Recently, philosophers of science have argued that the epistemological requirements of different scientific fields lead necessarily to differences in scientific method. In this paper, we examine possible variation in how language is used in peer-reviewed journal articles from various fields to see if features of such variation may help to elucidate and support claims of methodological variation among the sciences. We hypothesize that significant methodological differences will be reflected in related differences in scientists' language style. This paper reports a corpus-based study of peer-reviewed articles from twelve separate journals in six fields of experimental and historical sciences. Machine learning methods were applied to compare the discourse styles of articles in different fields, based on easily-extracted linguistic features of the text. Features included function word frequencies, as used often in computational stylistics, as well as lexical features based on systemic functional linguistics, which affords rich resources for comparative textual analysis. We found that indeed the style of writing in the historical sciences is readily distinguishable from that of the experimental sciences. Furthermore, the most significant linguistic features of these distinctive styles are directly related to the methodological differences posited by philosophers of science between historical and experimental sciences, lending empirical weight to their contentions.
This paper introduces a new method for evaluating national publication activities. This new indicator, thought leadership, captures whether the nation is a thought leader (building on the more recently cited literature for that field) or follower (building on the older cited literature for that field). Publication data for 2003 are used to illustrate which nations tend to build on the more recent discoveries in chemistry and clinical medicine. Implications for national and laboratory policy are discussed.
This article proposed a new index, so-called "Article-Count Impact Factor" (ACIF) for evaluating journal quality in light of citation behaviour in comparison with the ISI journal impact factors. The ACIF index was the ratio of the number of articles that were cited in the current year to the source items published in that journal during the previous two years. In this work, we used 171 journal titles in materials categories published in the years of 2001-2004 in international journals indexed in the Science Citation Index Expanded (SCI) database as data source. It was found that ACIF index could be used as an alternative tool in assessing the journal quality, particularly in the case where the assessed journals had the same (equal or similar) JIF values. The experimental results suggested that the higher the ACIF value, the more the number of articles being cited. The changes in ACIF values were more dependent on the JIF values rather than the total number of articles. Polymer Science had the greatest ACIF values, suggesting that the articles in Polymer Science had greater "citation per article" than those in Metallurgical Engineering and Ceramics. It was also suggested that in order to increase a JIF value of 1.000, Ceramics category required more articles to be cited as compared to Metallurgical Engineering and Polymer Science categories.
The aim of this research is to gain an insight into international recognition of the STM (Science, Technology, and Medicine) Croatian journals measured by citations in SCI-expanded database. The sample for the research was a citation analysis of 142 journals in time span 1975-2001 for papers published in 1975-1998. More than 90% of those journals are not indexed by SCI-expanded. For the purpose of this research we introduced a new scientometric indicator Normalized number of Citations per 100 Papers (NCP) that allows us direct comparison of the journals from various categories (NCP = 100C/P / IF1989). We chose the year 1989 as a mean value for time span 1975-2001. By citation analysis we established the influence of errors on recognition of Croatian journals and their articles. Obtained results show that an article-to-article link is not found for 32% of cited items. The most frequent type of error is journal title, 37%, which indicates that approximately one third of Croatian journals can not be found when searching by journal title only. Some Croatian journals, even not indexed by SCI-expanded, showed relatively high rank in an impact, i.e. their NCP is higher than 100, and number of citations per paper is higher than 1.
This paper reviews the literature on the concerns stemming from university patenting and licensing activities. Scholars investigated threats to scientific progress due to increasing disclosure restrictions; changes in the nature of the research (declining patents' and publications' quality, skewing research agendas toward commercial priorities, and crowding-out between patents and publications); diverting energies from teaching activity and reducing its quality. A small section explores problems lamented by industry. Each of these issues is presented and discussed, based on 82 papers published from 1980 to 2006. Some suggestions for further research conclude the essay.
This paper studies the main characteristics of the citation indexes currently developed in Spain. The paper compares the impact factors offered by Spanish citation indexes with the impact factor of Spanish journals also collected by the JCRs of the ISI (SCI and SSCI) over a five-year period (2001-2005). Spanish journals published in English have higher impact factor scores in the JCR databases of the ISI than in Spanish citation indexes.
The renewal of patents and their geographical scope for protection constitute two essential dimensions in a patent's life, and probably the most frequently used patent value indicators. The intertwining of these dimensions (the geographical scope of protection may vary over time) makes their analysis complex, as any measure along one dimension requires an arbitrary choice on the second. This paper proposes a new indicator of patent value, the scope-year index, combining the two dimensions. The index is computed for patents filed at the EPO from 1980 to 1996 and validated in its member states. It shows that the average value of patent filings has increased in the early eighties but has constantly decreased from the mid-eighties until the mid nineties, despite the institutional expansion of the EPO. This result sheds a new and worrying light on the worldwide boom in patent filings.
We develop and discuss the theoretical basis of a new criterion for ranking scientific institutions. Our novel index, which is related to the h-index, provides a metric which removes the size dependence. We discuss its mathematical properties such as merging rules of two sets of papers and analyze the relations between the underlying rank/citation-frequency law and the h-index. The proposed index should be seen as a complement to the h-index, to compare the scientific production of institutions (universities, laboratories or journals) that could be of disparate sizes.
Bioinformatics is an emerging and rapidly evolving discipline. The bioinformatics literature is growing exponentially. This paper aims to provide an integrated bibliometric study of the knowledge base of Chinese research community, based on the bibliometric information in the field of bioinformatics from SCI-Expanded database during the period of 2000-2005. It is found that China is productive in bioinformatics as far as publication activity in international journals is concerned. For comparative purpose, the results are benchmarked against the findings from five other major nations in the field of bioinformatics: USA, UK, Germany, Japan and India. In terms of collaboration profile, the findings imply that the collaborative scope of China has gradually transcended boundaries of organizations, regions and nations as well. Finally, further analyses on the citation share and some surrogate scientometric indicators show that the publications of Chinese authors suffer from a lowest international visibility among the six countries. Strikingly, Japan has achieved most remarkable impact of publication when compared to research effort devoted to bioinformatics amongst the six countries. The policy implication of the findings lies in that Chinese scientific community needs much work on improving the research impact and pays more attention to strengthening the academic linkages between China and worldwide nations, particularly scientifically advanced countries.
We have compared patenting propensity in the Czech Republic with eight EU countries: Germany, Austria, Hungary, Poland, Finland, Belgium, Ireland and Greece. In comparison based on the EPO and USPTO patents listed per million inhabitants, the Czech Republic ranks rather low. The Czech Republic also generated fewer patents per R&D employee than most other countries. The time series data have shown a decrease of number of Czech patents after 1990 with some revival after 1996. As our analysis indicated, the decrease was partially caused by dissolution or transformation of major patent generators, but the most important cause may lie in a little interest of local enterprises.
We rank economics departments in the Republic of Ireland according to the number of publications, number of citations, and successive h-index of research-active staff. We increase the discriminatory power of the h(1)-index by introducing three generalizations, each of which is a rational number. The first (h(1)(+)) measures the excess over the actual h-index, while the other two (h(1)*, h(1)(Delta)) measures the distance to the next h-index. At the individual level, h* and h(Delta) coincide while h(+) is undefined.
Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.
This article surveys the use of eye tracking in investigations of online search. Three eye tracking experiments that we undertook are discussed and compared to additional work in this area, revealing recurring behaviors and trends. The first two studies are described in greater detail in Granka, Joachims, & Gay (2004), Lorigo et al. (2006), and Pan et al. (2007), and the third study is described for the first time in this article. These studies reveal how users view the ranked results on a search engine results page (SERP), the relationship between the search result abstracts viewed and those clicked on, and whether gender, search task, or search engine influence these behaviors. In addition, we discuss a key challenge that arose in all three studies that applies to the use of eye tracking in studying online behaviors which is due to the limited support for analyzing scanpaths, or sequences of eye fixations. To meet this challenge, we present a preliminary approach that involves a graphical visualization to compare a path with a group of paths. We conclude by summarizing our findings and discussing future work in further understanding online search behavior with the help of eye tracking.
This research explores the use of genre as a document descriptor in order to improve the effectiveness of Web searching. A major issue to be resolved is the identification of what document categories should be used as genres. As genre is a kind of folk typology, document categories must enjoy widespread recognition by their intended user groups in order to qualify as genres. Three user studies were conducted to develop a genre palette and show that it is recognizable to users. (Palette is a term used to denote a classification, attributable to Karlgren, Bretan, Dewe, Hallberg, and Wolkert, 1998.) To simplify the users'classification task, it was decided to focus on Web pages from the edu domain. The first study was a survey of user terminology for Web pages. Three participants separated 100 Web page printouts into stacks according to genre, assigning names and definitions to each genre. The second study aimed to refine the resulting set of 48 (often conceptually and lexically similar) genre names and definitions into a smaller palette of user-preferred terminology. Ten participants classified the same 100 Web pages. A set of five principles for creating a genre palette from individuals' sortings was developed, and the list of 48 was trimmed to 18 genres. The third study aimed to show that users would agree on the genres of Web pages when choosing from the genre palette. In an online experiment in which 257 participants categorized a new set of 55 pages using the 18 genres, on average, over 70% agreed on the genre of each page. Suggestions for improving the genre palette and future directions for the work are discussed.
Current model-driven Web engineering approaches provide methods and compilers for the effective design and development of Web applications. However, these proposals also have some limitations, especially when it comes to exchanging model specifications or adding further concerns such as architectural styles, technology independence, or distribution. One solution to these issues is based on the possibility of making Web proposals interoperate, being able to complement each other, and to exchange models between their tools. We analyze how a common reference model shared by Web engineering proposals can be effectively used to achieve the desired interoperability. We also examine how such a common reference model can be used to combine models coming from different proposals, and discuss the problems that can occur when integrating these separate models. Finally, we show how high-level model transformations allow to efficiently solving these problems.
Traditional collaborative filtering (CF) systems perform filtering tasks on existing databases; however, data collected for recommendation purposes may split between different online vendors. To generate better predictions, offer richer recommendation services, enhance mutual advantages, and overcome problems caused by inadequate data and/or sparseness, e-companies want to integrate their data. Due to privacy, legal, and financial reasons, however, they do not want to disclose their data to each other. Providing privacy measures is vital to accomplish distributed databased top-N recommendation (TN), while preserving data holders' privacy. In this article, the authors present schemes for binary ratings-based TN on distributed data (horizontally or vertically), and provide accurate referrals without greatly exposing data owners' privacy. Our schemes make it possible for online vendors, even competing companies, to collaborate and conduct TN with privacy, using the joint data while introducing reasonable overhead costs.
With the rapidly changing web-enabled world, the already existing dichotomy between knowing of and knowing about, or information naivete, widens daily. This article explores the ethical dilemmas that can result from the lack of information literacy. The article also discusses conditions and consequences of information naivete, media bias, possessive memory, and limited contexts and abilities. To help avoid information failure, the author recommends producers, contributors, disseminators, and aggregators of information be less information naive.
This essay explores the question of whether records professionals are as aware of the ethical dimensions of their work as they should be. It consider first the historical and professional context of archival ethics, then examines a recent case about business archives involving the author that suggests the need for renewed attention to professional ethics, and concludes with a discussion about how archivists might reconsider the ethical dimensions of their work.
This article introduces a normative principle for the behavior of contemporary computing and communication systems and considers some of its consequences. The principle, named the principle of distribution, says that in a distributed multi-agent system, control resides as much as possible with the individuals constituting the system rather than in centralized agents; and when that is unfeasible or becomes inappropriate due to environmental changes, control evolves upwards from the individuals to an appropriate intermediate level rather than being imposed from above. The setting for the work is the dynamically changing global space resulting from ubiquitous communication. Accordingly, the article begins by determining the characteristics of the distributed multiagent space it spans. It then fleshes out the principle of distribution, with examples from daily life as well as from Computer Science. The case is made for the principle of distribution to work at various levels of abstraction of system behavior: to inform the high-level discussion that ought to precede the more low-level concerns of technology, protocols, and standardization, but also to facilitate those lower levels. Of the more substantial applications given here of the principle of distribution, a technical example concerns the design of secure ad hoc networks of mobile devices, achievable without any form of centralized authentication or identification but in a solely distributed manner. Here, the context is how the principle can be used to provide new and provably secure protocols for genuinely ubiquitous communication. A second, more managerial example concerns the distributed production and management of open-source software, and a third investigates some pertinent questions involving the dynamic restructuring of control in distributed systems, important in times of disaster or malevolence.
Once taken for granted as morally legitimate, legal protection of intellectual property rights have come under fire in the last 30 years as new technologies have evolved and severed the link between expression of ideas and such traditional material-based media as books and magazines. These advances in digital technology have called attention to unique features of intellectual content that problematize intellectual property protection; any piece of intellectual content, for example, can be simultaneously appropriated by everyone in the world without thereby diminishing the supply of that content available to others. This essay provides an overview and assessment of the arguments and counterarguments on the issue of whether intellectual property should be legally protected.
The first part of this article deals with some initiatives concerning the role of information ethics for Africa, such as the New Partnership for Africa's Development, United Nations Information Communications Technology (ICT), and the African Information Society Initiative particularly since the World Summit on the Information Society. Information Ethics from Africa is a young academic field, and not much has been published so far on the impact of ICT on African societies and cultures from a philosophical perspective. The second part of the article analyzes some recent research on this matter particularly with regard to the concept of ubuntu. Finally, the article addresses some issues of the African Conference on Information Ethics held February 3-5, 2007, in Pretoria, South Africa(1).
This article discusses social justice as a moral norm that can be used to address the ethical challenges facing us in the global Information Society. The global Information Society is seen as a continuation of relationships which have been altered by the use of modern information and communication technologies (ICTs). Four interrelated characteristics of the global Information Society also are identified. After a brief overview of the main socioethical issues facing the global Information Society, the article discusses the application of social justice as a moral tool that has universal moral validity and which can be used to address these ethical challenges. It is illustrated that the scope of justice is no longer limited to domestic issues. Three core principles of justice are furthermore distinguished, and based on these three principles, seven categories of justice are introduced. It is illustrated how these categories of justice can be applied to address the main ethical challenges of the Information Society.
A brief communication appearing in this journal ranked UK-based LIS and (some) IR academics by their h-index using data derived from the Thomson ISI Web of Science (TM) (WoS). In this brief communication, the same academics were re-ranked, using other popular citation databases. It was found that for academics who publish more in computer science forums, their h was significantly different due to highly cited papers missed by WoS; consequently, their rank changed substantially. The study was widened to a broader set of UK-based LIS and IR academics in which results showed similar statistically significant differences. A variant of h, h(mx), was introduced that allowed a ranking of the academics using all citation databases together.
This paper describes a new algorithm for finding and tracking different subjects within an ongoing debate. The algorithm finds blocks of co-occurring terms, representing subjects, including blocks for which the term co-occurrence pattern forms a ring topology. We used short online debate forum data and longer summary bulletins to assess the extent to which the algorithm could correctly detect subjects, according to the judgements of human evaluators. The results show that it could normally detect subject-shifting and track different subjects over time in online debate forums and with adjustments could find subjects in bulletins, but could not track the subjects in the bulletins because the interlinking between subjects was too dense in the longer documents. (C) 2008 Elsevier Ltd. All rights reserved.
The paper introduces a new journal impact measure called The Reference Return Ratio (3R). Unlike the traditional Journal Impact Factor (JIF), which is based on calculations of publications and citations, the new measure is based on calculations of bibliographic investments (references) and returns (citations). A comparative study of the two measures shows a strong relationship between the 3R and the JIF. Yet, the 3R appears to correct for citation habits, citation dynamics, and composition of document types - problems that typically are raised against the JIF. In addition, contrary to traditional impact measures, the 3R cannot be manipulated ad infinitum through journal self-citations. (C) 2007 Elsevier Ltd. All rights reserved.
General results on transformations on information production processes (IPPs), involving transformations of the h-index and related indices, are applied in concrete, simple cases: doubling the production per source, doubling the number of sources, doubling the number of sources but halving their production, halving the number of sources but doubling their production (fusion of sources) and, finally, special cases of general power law transformations. In each case we calculate concrete transformation formulae for the h-index h (transformed into h*) and we discuss when we have h* < h, h* = h or h* > h. These results are then extended to some other h-type indices such as the g-index, the R-index and the weighted h-index. (C) 2007 Elsevier Ltd. All rights reserved.
A rational, successive g-index is proposed, and applied to economics departments in Ireland. The successive g-index has greater discriminatory power than the successive h-index, and the rational index performs better still. The rational, successive g-index is also more robust to differences in department size. (C) 2008 Elsevier Ltd. All rights reserved.
In this paper, we provide a general definition of the Leimkuhler curve in terms of the theoretical cumulative distribution function. The definition applies to discrete, continuous and mixed random variables. Several examples are given to illustrate the use of the formula. (C) 2008 Elsevier Ltd. All rights reserved.
The impact factor of a journal reflects the frequency with which the journal's articles are cited. It is the best available measure of journal quality. For calculation of impact factor, we just count the number of citations, no matter howprestigious the citing journal is. We think that impact factor as a measure of journal quality, may be improved if in its calculation, we not only take into account the number of citations, but also incorporate a factor reflecting the prestige of the citing journals relative to the cited journal. In calculation of this proposed "weighted impact factor," each citation has a coefficient (weight) the value of which is 1 if the citing journal is as prestigious as the cited journal; is > 1 if the citing journal is more prestigious than the cited journal; and is < 1 if the citing journal has a lower standing than the cited journal. In this way, journals receiving many citations from prestigious journals are considered prestigious themselves and those cited by low-status journals seek little credit. By considering both the number of citations and the prestige of the citing journals, we expect the weighted impact factor be a better scientometrics measure of journal quality. (C) 2008 Elsevier Ltd. All rights reserved.
Despite the rapid growth of text-based computer-mediated communication (CIVIC), its limitations have rendered the media highly incoherent. This poses problems for content analysis of online discourse archives. Interactional coherence analysis (ICA) attempts to accurately identify and construct CMC interaction networks. In this study, we propose the Hybrid Interactional Coherence (HIC) algorithm for identification of web forum interaction. HIC utilizes a bevy of system and linguistic features, including message header information, quotations, direct address, and lexical relations. Furthermore, several similarity-based methods including a Lexical Match Algorithm (LMA) and a sliding window method are utilized to account for interactional idiosyncrasies. Experiments results on two web forums revealed that the proposed HIC algorithm significantly outperformed comparison techniques in terms of precision, recall, and F-measure at both the forum and thread levels. Additionally, an example was used to illustrate how the improved ICA results can facilitate enhanced social network and role analysis capabilities.
The objective of this study was to characterize the types of tissue-centric users based on tissue use, requirements, and their job or work-related variables at the University of Pittsburgh Medical Center (UPMC), Pittsburgh, PA. A self-reporting questionnaire was distributed to biomedical researchers at the UPMC. Descriptive and cluster analyses were performed to identify and characterize the complex types of tissue-based researchers. A total of 62 respondents completed the survey, and two clusters were identified based on all variables. Two distinct groups of tissue-centric users made direct use of tissue samples for their research as well as associated information, while a third group of indirect users required only the associated information. The study shows that tissue-centric users were composed of various types. These types were distinguished in terms of tissue use and data requirements, as well as by their work or research-related activities.
Recently, several researchers have questioned the predicting power of intention to actual system usage (Burton-Jones & Straub, 2006; Jasperson, Carter, & Zmud, 2005; Kim & Malhotra, 2005; Kim, Malhotra, & Narasimhan, 2005; Limayem & Hirt, 2003). In this article, we report a study that investigates the gap between intention and usage by observing an Internet-based knowledge management system, SCTNet, from the perspective of volitional control. Relying on the theory of planned behavior and the theory of action control, we investigate four types of volitional control mechanisms that may impact people's knowledge-sharing practices. Our results show that in knowledge-management-based knowledge sharing, people do not always perform in a manner consistent with their espoused beliefs. This intention-action inconsistency can be explained by perceived self-efficacy, but not by intention and controllability. In addition, a person's action/state orientation moderates his or her enactment of intention toward knowledge sharing into behaviors. The main theoretical implication of this study is that the study of knowledge-management-based knowledge sharing has to go beyond intention to include the investigation of both the actual behaviors of knowledge sharing and the volitional control constructs that predict these behaviors. Furthermore, previous research has shown that the environment interacts reciprocally with individuals' psychological control mechanisms in regulating their behaviors. Thus, the management must focus on the social and cultural attributes of organizational settings that may strengthen people's volitional control in practicing knowledge sharing.
Information technology provides resources with which human actors can change the patterns of social action in which they participate. Studies of genre change have been among those that have focused on such change. Those studies, however, have tended not to focus on creative genres. Producers in creative genres often produce their work in an atmosphere with little or no central control and tend to hold values that set their work apart from work found in commercial settings. This study focuses on information technology use by literary authors and asks whether literary authors are likely to use information technology in ways that will reinforce or alter the traditional values seen in American literary publishing.
Affect factors have gained researchers' attention in a number of fields. The Information Systems (IS) literature, however, shows some gaps and inconsistencies regarding the role of affect factors in human-computer interaction. Building upon prior research, this study aims at a better understanding of affect factors by clarifying their relationships with each other and with other primary user acceptance factors. Two affect variables that are different in nature were examined: computer playfulness (CP) and perceived enjoyment (PE). We theoretically clarified and methodologically verified their mediating effects and causal relationships with other primary factors influencing user technology acceptance, namely perceived ease of use (PEOU), perceived usefulness (PU), and behavioral intention (131). Quantitative data were analyzed using R.M. Baron and D. Kenny's (1986) method for mediating effects and P.R. Cohen, A. Carlsson, L. Ballesteros, and R.S. Amant's (1993) path analysis method for causal relationships. These analyses largely supported our hypotheses. Results from this research indicate that a PIE -> PEOU causal direction is favored, and PEOU partially mediates PE's impacts on PU whereas PE fully mediates CP's impact on PEOU. With the increased interest in various affect factors in user technology acceptance and use, our study sheds light on the role of affect factors from both theoretical and methodological perspectives. Practical implications are discussed as well.
The order effect of relevance judgment refers to the different relevance perceptions of a document when it appears in different positions in a list. Although the order effect of relevance judgment has significant theoretical and practical implications, the extant literature is inconclusive regarding its existence and forming mechanisms. This study proposes a set of order effect forming mechanisms, including the learning effect, the subneed scheduling effect, and the cursoriness effect based on the conceptualization of dynamic relevance and the psychology of cognitive elaboration. Our empirical study indicates that in an interactive information retrieval setting, when a document list is reasonably long, order effects demonstrate a curvilinear pattern that conforms to the combined effect of the three mechanisms. Moreover, the curvilinear pattern of order effect could differ for documents of different relevance levels.
In this article, for any group of authors, we define three different h-indices. First, there is the successive h-index h(2) based on the ranked list of authors and their h-indices h, as defined by Schubert (2007). Next, there is the h-index hp based on the ranked list of authors and their number of publications. Finally, there is the h-index h(c) based on the ranked list of authors and their number of citations. We present formulae for these three indices in Lotkaian informetrics from which it also follows that h(2) < h(p) < h(c). We give a concrete example of a group of 167 authors on the topic "optical flow estimation." Besides these three h-indices, we also calculate the two-by-two Spearman rank correlation coefficient and prove that these rankings are significantly related.
This study develops evaluation indicators for a consortium of Korean institutional repositories called "dCollection" and validates the indicators against actual data from the participants of this consortium. The literature review reveals a conceptual framework for institutional repository (IR) evaluation with four categories and 19 items. In developing the initial framework, equal amounts of emphasis are put on the assessments of procedural achievement and actual performance to pinpoint the procedural weaknesses of each IR and to determine its customized solution. A Delphi method of three rounds with the help of IR librarians reveals a converging tendency pertaining to the measures of importance ascribed to the categories and items. Through a focus-group interview with middle- to top-level managers, 39 indicators derived from 19 items are identified as possessing relevancy, measurability, data availability, and differentiability. Validation of evaluation indicators employs actual evaluation data from 32 university IRs. Factor analysis shows a simpler structural pattern containing 12 factors than that of the structural pattern of the conceptual framework that contains 19 items. Correlation analysis using the factor scores identifies six key factors: Registration Rate, Archiving, Resource Allocation, System Performance, Multifunctionality, and Use Rate. The results from regression analyses suggest that two different approaches can be employed to promote the Use Rate factor. In the content-oriented approach, the Registration Rate factor is crucial while in the policy-oriented approach the Archiving factor assumes this role; however, the System Performance factor plays a mediating role for the key factors, thus forming a contingency for either approach.
The main purpose of this article is to explore the determinants of book borrowing demand from local public libraries in Norway, using balanced panel data for the period 2001-2004. The more striking results of these calculations show the basic differences between children and adults in the effects of main borrowing determinants. While income is quite important and the shadow price is quite unimportant for children, the opposite is true for adults. A likely explanation of this finding is that the real shadow price is higher for adults and that it is also higher in communities with high income levels. It was found that both stock and growth of the stock are important factors for book loans as well as loans of other media. There is a basic difference, however, in the effect of books on the demand for other media as compared with the opposite cross-effect: The book stock has a substantial and significantly negative impact on the demand for other media. The face value of this finding implies that there is actually a crowding-out effect on other media from books, while it is usually expected to be the other way around.
In a previous article, we introduced a general transformation on sources and one on items in an arbitrary information production process (11212). In this article, we investigate the influence of these transformations on the h-index and on the g-index. General formulae that describe this influence are presented. These are applied to the case that the size-frequency function is Lotkaian (i.e., is a decreasing power function). We further show that the h-index of the transformed IPP belongs to the interval bounded by the two transformations of the h-index of the original IPP, and we also show that this property is not true for the g-index.
This study surveyed 536 CAL publications in 71 SSCI (Social Science Citation Index) journals from 1998 to February 2006 to identify trends and lacunae for future research. The parameters and keywords employed by the authors are first presented, followed by a description of the study's general findings. A comparison is then drawn between CAL and recent depictions of the "biogosphere," for the majority of the contributors to the field produced only a few articles and authors of individual publications demonstrated a far greater collective influence on the field than the more frequently-cited authors. Results also revealed that the amount of articles pertaining to the aged, disabled children, and home schooling were significantly lower than those relating to school student's learning. This study offers an interesting snapshot of a field that is apparently on the rise; moreover, it raises some issues to be addressed in further research on CAL-related topics.
In 2007, the social networking Web site MySpace apparently overthrew Google as the most visited Web site for U.S. Web users. If this heralds a new era of widespread online social networking, then it is important to investigate user behaviour and attributes. Although there has been some research into social networking already, basic demographic data is essential to set previous results in a wider context and to give insights to researchers, marketers and developers. In this article, the demographics of MySpace members are explored through data extracted from two samples of 15,043 and 7,627 member profiles. The median declared age of users was surprisingly high at 21, with a small majority of females. The analysis confirmed some previously reported findings and conjectures about social networking, for example, that female members tend to be more interested in friendship and males more interested in dating. In addition, there was some evidence of three different friending dynamics, oriented towards close friends, acquaintances, or strangers. Perhaps unsurprisingly, female and younger members had more friends than others, and females were more likely to maintain private profiles, but both males and females seemed to prefer female friends, with this tendency more marked in females for their closest friend. The typical MySpace user is apparently female, 21, single, with a public profile, interested in online friendship and logging on weekly to engage with a mixed list of mainly female "friends" who are predominantly acquaintances.
In this article, the authors report a series of evaluations of two metadata schemes developed for Moving Image Collections (MIC), an integrated online catalog of moving images. Through two online surveys and one experiment spanning various stages of metadata implementation, the MIC evaluation team explored a user-centered approach in which the four generic user tasks suggested by IFLA FRBR (international Association of Library Associations Functional Requirement for Bibliographic Records) were embedded in data collection and analyses. Diverse groups of users rated usefulness of individual metadata fields for finding, identifying, selecting, and obtaining moving images. The results demonstrate a consistency across these evaluations with respect to (a) identification of a set of useful metadata fields highly rated by target users for each of the FRBR generic tasks, and (b) indication of a significant interaction between MIC metadata fields and the FRBR generic tasks. The findings provide timely feedback for the MIC implementation specifically, and valuable suggestions to other similar metadata application settings in general. They also suggest the feasibility of using the four IFLA FRBR generic tasks as a framework for user-centered functional metadata evaluations.
While the Web has become a worldwide platform for communication, terrorists share their ideology and communicate with members on the "Dark Web"-the reverse side of the Web used by terrorists. Currently, the problems of information overload and difficulty to obtain a comprehensive picture of terrorist activities hinder effective and efficient analysis of terrorist information on the Web. To improve understanding of terrorist activities, we have developed a novel methodology for collecting and analyzing Dark Web information. The methodology incorporates information collection, analysis, and visualization techniques, and exploits various Web information sources. We applied it to collecting and analyzing information of 39 Jihad Web sites and developed visualization of their site contents, relationships, and activity levels. An expert evaluation showed that the methodology is very useful and promising, having a high potential to assist in investigation and understanding of terrorist activities by producing results that could potentially help guide both policymaking and intelligence research.
In reference to an exemplary bibliometric publication and citation analysis for a University Department of Psychology, some general conceptual and methodological considerations on the evaluation of university departments and their scientists are presented. Data refer to publication and citation-by-others analyses (PsycINFO, PSYNDEX, SSCI, and SCI) for 36 professorial and non-professorial scientists from the tenure staff of the department under study, as well as confidential interviews on self-and colleagues-perceptions with seven of the sample under study. The results point at (1) skewed (Pareto-) distributions of all bibliometric variables demanding nonparametrical statistical analyses, (2) three personally identical outliers which must be excluded from some statistical analyses, (3) rather low rank-order correlations of publication and citation frequencies having approximately 15% common variance, (4) only weak interdependences of bibliometric variables with age, occupational experience, gender, academic status, and engagement in basic versus applied research, (5) the empirical appropriateness and utility of a normative typological model for the evaluation of scientists' research productivity and impact, which is based on cross-classifications with reference to the number of publications and the frequency of citations by other authors, and (6) low interrater reliabilities and validity of ad hoc evaluations within the departments' staff. Conclusions refer to the utility of bibliometric data for external peer reviewing and for feedback within scientific departments, in order to make colleague-perceptions more reliable and valid.
What is science is not only intellectually interesting but also politically crucial in the proper allocation of budget. As science does not define itself and only philosophy defines everything including science, this paper first sketches the philosophical view of science. Then, hypotheses are presented as to what definition is actually given for science by scientific circles themselves. The hypotheses are tested in a scientometric way by observing the trend in the magazine Science. Unexpected results are obtained. The actual trend in Science does not reflect what has long been considered about science. Specifically, chemistry is at the top in the number of papers, far above physics. More papers are in historical sciences (part of the humanities) than in mathematics, computer science and social science. It is discussed in what respect chemistry is the most scientific, and the humanities is more scientific than the abovementioned three scientific fields. It is interpreted that, out of the two aspects in Galilei's view of science (metodo compositivo and metodo risolutivo.), the latter (empirical solution of problems by using technical instruments) dominates the former (systematic theory using mathematics) in Science.
This study evaluates trends in quality of nanotechnology and nanoscience papers produced by South Korean authors. The metric used to gauge quality is ratio of highly cited nanotechnology papers to total nanotechnology papers produced in sequential time frames. In the first part of this paper, citations (and publications) for nanotechnology documents published by major producing nations and major producing global institutions in four uneven time frames are examined. All nanotechnology documents in the Science Citation Index [SCI, 2006] for 1998, 1999-2000, 2001-2002, 2003 were retrieved and analyzed in March 2007. In the second part of this paper, all the nanotechnology documents produced by South Korean institutions were retrieved and examined. All nanotechnology documents produced in South Korea (each document had at least one author with a South Korea address) in each of the above time frames were retrieved and analyzed. The South Korean institutions were extracted, and their fraction of total highly cited documents was compared to their fraction of total published documents. Non-Korean institutions that co-authored papers were included as well, to offer some perspective on the value of collaboration.
Open Access movement has been proven to be capable to enhance the recognition of scientific outputs by improving their visibility. However, it is not clear how different entities benefit from the Open Access advantage; because, the recognition process is dominated by some psychological or realistic biases, resulting in an unequal distribution of citations between different entities. The biases may be exacerbated in Open Access world, e.g. due to the scientists uncertainty about the quality of Open Access materials, or quantitatively or qualitatively unequal presence of countries. Consequently, although, Open Access is able to achieve their potential citations, it is not unlikely that it increases the inequalities, depriving the already "have-nots". Trying to illuminate how countries are benefiting from Open Access advantage, this study compares citation performances of the world's countries in two journal sets, i.e. Open Access and non Open Access journals. The results of the analyses conducted at subject field level show that overall recognition gap between developed and less-developed blocks is widened by publishing in Open Access journals. The verification of individual countries' performances confirms the finding by revealing that open-access-advantaged nations are mainly consisted of developed ones. However, some open-access-advantaged instances are from the less-developed block, which may promisingly suggest early heralds of Open Access potentialities to achieve the recognition of "lost sciences", leading to relative reparation of the gap in future.
To improve the quality of journals in Taiwan, the National Science Council (NSC) of the Republic of China evaluates journals in the fields of humanities and social sciences periodically. This paper describes the evaluation of 46 management journals conducted by the authors, as authorized by the NSC. Both a subjective approach, with judgments solicited from 345 experts, and an objective approach, with data collected on four indicators: journal cross citation, dissertation citation, authors' scholastic reputation, and author diversity, were used to make a comprehensive evaluation. Performance in the four indicators were aggregated using weights which were most favourable to all journals, in a compromise sense, to produce the composite indices. The subjective evaluation reflects the general image, or reputation, of journals while the objective evaluation discloses blind spots which have been overlooked by experts. The results show that using either approach alone would have produced results which are misleading, which suggests that both approaches should be used. All of the editors of the journals being evaluated agreed that the evaluation was appropriate and the results are reasonable.
In a general framework, given a set of articles and their received citations (time periods of publication or citation are not important here) one can define the impact factor (IF) as the total number of received citations divided by the total number of publications (articles). The uncitedness factor (UF) is defined as the fraction of the articles that received no citations. It is intuitively clear that IF should be a decreasing function of UF. This is confirmed by the results in [VAN LEEUWEN & MOED, 2005] but all the given examples show a typical shape, seldom seen in informetrics: a horizontal S-shape (first convex then concave). Adopting a simple model for the publication-citation relation, we prove this horizontal S-shape in this paper, showing that such a general functional relationship can be generally explained.
Scientific journals represent a significant and growing part of the libraries and many researchers have attempted to measure their use by various methodological approaches till date. In this paper, the author reviews the methodologies employed by researchers working on scientific journals usage. It aims to present an overall picture of the research methods used in the area, in a way that will be of value to anyone seeking to study scientific journals. The author reviews four main research methodologies which are being used for profiling scientific journals usage including questionnaire, interview, citation analysis and transaction log analysis.
In this paper, we present several modifications of the classical PageRank formula adapted for bibliographic networks. Our versions of PageRank take into account not only the citation but also the co-authorship graph. We verify the viability of our algorithms by applying them to the data from the DBLP digital library and by comparing the resulting ranks of the winners of the ACM E. F. Codd Innovations Award. Rankings based on both the citation and co-authorship information turn out to be "better" than the standard PageRank ranking.
The journal set which provides a representation of nanoscience and nanotechnology at the interfaces among applied physics, chemistry, and the life sciences is developing rapidly because of the introduction of new journals. The relevant contributions of nations can be expected to change according to the representations of the relevant interfaces among journal sets. In the 2005 set the position of the USA decreased more than in the 2004-set, while the EU-27 gained in terms of its percentage of world share of citations. The tag "Y01N" which was newly added to the EU classification system for patents, allows for the visualization of national profiles of nanotechnology in terms of relevant patents and patent classes.
Factors contributing to citation impact in social-personality psychology were examined in a bibliometric study of articles published in the field's three major journals. Impact was operationalized as citations accrued over 10 years by 308 articles published in 1996, and predictors were assessed using multiple databases and trained coders. Predictors included author characteristics (i.e., number, gender, nationality, eminence), institutional factors (i.e., university prestige, journal prestige, grant support), features of article organization (i.e., title characteristics, number of studies, figures and tables, number and recency of references), and research approach (i.e., topic area, methodology). Multivariate analyses demonstrated several strong predictors of impact, including first author eminence, having a more senior later author, journal prestige, article length, and number and recency of references. Many other variables - e.g., author gender and nationality, collaboration, university prestige, grant support, title catchiness, number of studies, experimental vs. correlational methodology, topic area - did not predict impact.
The g index was introduced by Leo Egghe as an improvement of Hirsch's index h for measuring the overall citation record of a set of articles. It better takes into account the highly skewed frequency distribution of citations than the h index. I propose to sharpen this g index by excluding the self-citations. I have worked out nine practical cases in physics and compare the h and g values with and without self-citations. As expected, the g index characterizes the data set better than the h index. The influence of the self-citations appears to be more significant for the g index than for the h index.
University web sites play an important role in facilitating a wide range of types of communication. This paper reports an analysis of international academic linking in Europe, with particular reference to European Union (EU) integration. The Microsoft search service was used to calculate international interlinking to universities and from universities. Four different web topologies were found for the link structure data and poorly connected countries were identified. The results show the expected EU dominance of the large richer Western European nations, particularly the UK and Germany. The new EU countries are not yet integrated into the EU web but some show strong regional connections.
The present study explores the characteristics of hydrogen energy literature from 1965 to 2005 based on the database of Science Citation Index Expanded (SCIE) and its implication using the bibliometric techniques. The results of this work reveal that the literature on hydrogen energy grows exponentially with all annual growth rate of about 18% for the last decade. Most of document type is in the form of journal articles or meeting abstracts, constituting 90.17% of the total literature and English is the predominant language (94.66%). USA, Japan and China are the three biggest contributing countries on hydrogen energy literature publishing, 25.8%, 14.9%, 7.7%, respectively. The Chinese Academy of Sciences in China is the largest contributor publishing 308 papers. The journal literature on hydrogen energy does not confirm the typical S-shape for the Bradford-Zipf plot, but five core journals, i.e. International Journal of hydrogen Energy, Journal of Power Source, Journal of the Electrochemical Society, Solid State Ionics, and Electrochimica Act, contributing about 41% can be identified. Journals with highly cited articles and most highly cited articles are also identified, in which the most highly cited article receives more than 1,000 citations.
During the last decade, we have witnessed a sustained growth of South Korea's research output in terms of the world share of publications in the Science Citation Index database. However, Korea's citation performance is not yet as competitive as publication performance. In this study, the authors examine the intellectual structure of Korean S&T field based on social network analysis of journal-journal citation data using the ten Korean SCI journals as seed journals. The results reveal that Korean SCI journals function more like publication places, neither research channels nor information sources among national scientists. Thus, these journals may provide Korean scholars with access to international scientific communities by facilitating the respective entry barriers. However, there are no citation relations based on their Korean background. Furthermore, we intend to draw some policy implications which may be helpful to increase Korea's research potential.
The quality and value of a patent can be represented by several proxies, such as how often the patent is cited in other patents, whether it is licensed, and the age of the patent. The paper uses a binary choice model to investigate factors affecting patent licensing, and it uses double-bounded tobit and duration models to investigate factors affecting patent life. Explanatory variables and dependent variables are extracted from U.S. patent information and related data. Findings suggest research collaboration has a positive effect on both patent licensing and patent life. Other characteristics such as invention size, namely, the scope of the invention measured by number of claims, and organizational technological cumulativeness, measured by self-citation counts, also affect patent life.
The mobility of Spanish biochemists from Europe to the United States over the past 80 years (1927-2006) is approached from a historical perspective. The academic community on human genetics has awarded this emigrated Spanish community with the Nobel prize as well as other awards from European foundations. The vertical/horizontal integration methodology offers an opportunity to understand the extremely satisfactory history of a small European community overseas. To piece the puzzle together, continuous reference is made to the theory of systems. To test and use this holistic history, the circulation of the knowledge produced on cancer has been studied as intrinsically related to time by using the algorithmic historiography. Francisco Duran Reynals and Severe, Ochoa have been selected as examples of the vertical integration. The former one because he was the director of an important collaborator, his own wife; the latter, as the founder of a Spanish specific research school in America based in his own work. The simultaneous stay of several young Spanish scientists at the Columbia University (Mariano Barbacid, Manuel Perucho and Angel Pellicer) serves to design the horizontal integration, to create a holon hierarchy to reflect the criteria of subsidiarity and acceptability, and to focus on the Spanish discoveries and contributions to cancer research. The transatlantic flows of knowledge generated by the Spanish elite of biochemists in the USA from 1927 on define a network of geographical displacements. As a result, the social structure thus visualizes the identity of the international mobility of scientists who leave for Europe/USA, and their return to Spain. A model of the brain drain of professionals to the USA, that retain 80% of the Spanish cancer researchers, is developed.
We analyze the temporal evolution of emerging fields within several scientific disciplines in terms of numbers of authors and publications. From bibliographic searches we construct databases of authors, papers, and their dates of publication. We show that the temporal development of each field, while different in detail, is well described by population contagion models, suitably adapted from epidemiology to reflect the dynamics of scientific interaction. Dynamical parameters are estimated and discussed to reflect fundamental characteristics of the field, such as time of apprenticeship and recruitment rate. We also show that fields tire characterized by simple scaling laws relating numbers of new publications to new authors, with exponents that reflect increasing or decreasing returns in scientific productivity.
Conference proceedings are one of the key communication channels in computer science. This paper aims to analyze the Chinese outputs in the context of conference papers in computer science through an exploration of the conference proceedings series book - Lecture Notes in Computer Science (LNCS) in the period of 1997-2005. Results indicate that: 1. The number of Chinese papers in LNCS keeps growing in the studied period; the share of Chinese papers in LNCS in recent years is much higher than that of Chinese SCI papers in the world; In sharp contrast with remarkable growth of the share of Chinese papers in LNCS, the share of SCI articles in top journals of computer science published by the scientists of mainland China is negligible during the same period. 2. Chinese researchers are more likely to collaborate with domestic fellows; 3. In spite of the increasing amounts of Chinese papers in LNCS, they receive only a few citations; 4. The articles are strikingly more cited by authors themselves and international authors' citations are more than Chinese authors' non-self-citations in the first three years after publication; 5. Based on the new indicator Impact Index (II) the authors proposed, the relative impact of Chinese articles in LNCS is increasing although the average impact of Chinese papers in LNCS is obviously less than that of the publications in LNCS in each year during the studied period.
It is well known from previous research activities that R&D collaboration among economic actors for knowledge production is very important. An accompanying analysis of the impact of R&D collaboration on innovative performance has to be conducted for transferring knowledge to the globalized knowledge-based economy. When we first investigated previous research concerning R&D collaboration, we found some limitations in the analysis methodology. In order to overcome these limitations in previous research, we applied a Bayesian network for analyzing the impact of R&D collaboration in Korean firms on their innovative performance.
The increase of co-authored papers is a recognized fact. At the same time the factors influencing this change is not well known. This article aims at studying the patterns of EU science co-authorships. We analyzed articles published in 18 EU countries and their intra-EU (within EU) and extra-EU (with partners outside EU) co-publication pattern in five scientific fields. The results point to a Europeanisation of shared co-authorship rather than an internationalization outside Europe. Smaller countries co-authored more with other EU countries than bigger countries while the co-authorship rate with extra-EU partners was not dependent of the country's size. The co-authorship patterns were also found to depend on the scientific field. Engineering and Computing & Technology was the field with the highest level of national publications and Physical, Chemical & Earth Sciences the field with the highest level of both intra-EU and extra-EU collaborations. These results support the view that a single market for research is developing within the EU with a seamless extension of national systems into other Member States ones.
Q-measures express the bridging function of nodes in a network subdivided into two groups. An approach to Q-measures in the context of weighted or valued directed networks is proposed. This new approach uses flow centrality as the main concept. Simple examples illustrate the definition.
In this paper we examine the applicability of the concept of h-index to topics, where a topic has index h, if there are h publications that received at least h citations and the rest of the publications on the topic received at most h citations. We discuss methodological issues related to the computation of h-index of topics (denoted h-b index by BANKS [2006]). Data collection for computing the h-b index is much more complex than computing the index for authors, research groups and/or journals, and has several limitations. We demonstrate the methods on a number of informetric topics, among them the h-index.
Previous studies have shown that hybrid clustering methods that incorporate textual content and bibliometric information can outperform clustering methods that use only one of these components. In this paper we apply a hybrid clustering method based on Fisher's inverse chi-square to integrate full-text with citations and to provide a mapping of the field of information science. We quantitatively and qualitatively asses the added value of such an integrated analysis and we investigate whether the clustering outcome is a better representation of the field by comparing with a text-only clustering and with another hybrid method based on linear combination of distance matrices. Our data set consists of almost 1000 articles and notes published in the period 2002-2004 in 5 representative journals. The optimal number of clusters for the field is 5, determined by using a combination of distance-based and stability-based methods. Term networks present the cognitive structure of the field and are complemented by the most representative publications. Three large traditional sub-disciplines, particularly, information retrieval, bibliometrics/scientometrics, and more social aspects, and two smaller clusters about patent analysis and webometrics, can be distinguished.
This paper examines the probability structure of the 2005 Science Citation Index (SCI) and Social Sciences Citation Index (SSCI) Journal Citation Reports (JCR) by analyzing the Impact Factor distributions of their journals. The distribution of the SCI journals corresponded with a distribution generally modeled by the negative binomial distribution, whereas the SSCI distribution fit the Poisson distribution modeling random, rare events. Both Impact Factor distributions were positively skewed-the SCI much more so than the SSCI-indicating excess variance. One of the causes of this excess variance was that the journals highest in the Impact Factor in both JCRs tended to class in subject categories well funded by the National Institutes of Health. The main reason for the SCI Impact Factor distribution being more skewed than the SSCI one was that review journals defining disciplinary paradigms play a much more important role in the sciences than in the social sciences.
Information search and retrieval interactions usually involve information content in the form of document collections, information retrieval systems and interfaces, and the user. To fully understand information search and retrieval interactions between users' cognitive space and the information space, researchers need to turn to cognitive models and theories. In this article, the authors use one of these theories, the basic level theory. Use of the basic level theory to understand human categorization is both appropriate and essential to user-centered design of taxonomies, ontologies, browsing interfaces, and other indexing tools and systems. Analyses of data from two studies involving free sorting by 105 participants of 100 images were conducted. The types of categories formed and category labels were examined. Results of the analyses indicate that image category labels generally belong to superordinate to the basic level, and are generic and interpretive. Implications for research on theories of cognition and categorization, and design of image indexing, retrieval and browsing systems are discussed.
This study assessed the intermix of local citation analysis and survey of journal use and reading patterns for evaluating an academic library's research collection. Journal articles and their cited references from faculties at the University of New South Wales were downloaded from the Web of Science (WoS) and journal impact factors from the Journal Citation Reports. The survey of the University of New South Wales (UNSW) academic staff asked both reader-related and reading-related questions. Both methods showed that academics in medicine published more and had more coauthors per paper than academics in the other faculties; however, when correlated with the number of students and academic staff, science published more and engineering published in higher impact journals. When "recalled" numbers of articles published were compared to "actual" numbers, all faculties over-estimated their productivity by nearly two-told. The distribution of cited serial references was highly skewed with over half of the titles cited only once. The survey results corresponded with U.S. university surveys with one exception: Engineering academics reported the highest number of article readings and read mostly for research related activities. Citation analysis data showed that the UNSW library provided the majority of journals in which researchers published and cited, mostly in electronic formats. However, the availability of non-journal cited sources was low. The joint methods provided both confirmatory and contradictory results and proved useful in evaluating library research collections.
Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naive Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.
The international standard ISO 214:1976 defines an abstract as "an abbreviated, accurate representation of the contents of a document" (p. 1) that should "enable readers to identify the basic content of a document quickly and accurately to determine relevance" (p. 1). It also should be useful in computerized searching. The ISO standard suggests including the following elements: purpose, methods, results, and conclusions. Researchers have often challenged this structure and found that different disciplines and cultures prefer different information content. These claims are partially supported by the findings of our research into the structure of pharmacology, sociology, and Slovenian language and literature abstracts of papers published in international and Slovenian scientific periodicals. The three disciplines have different information content. Slovenian pharmacology abstracts differ in content from those in international periodicals while the differences between international and Slovenian abstracts are small in sociology. In the field of Slovenian language and literature, only domestic abstracts were studied. The identified differences can in part be attributed to the disciplines, but also to the different role of journals and papers in the professional society and to differences in perception of the role of abstracts. The findings raise questions about the structure of abstracts required by some publishers of international journals.
Research evaluation is increasingly popular and important among research funding bodies and science policy makers. Various indicators have been proposed to evaluate the standing of individual scientists, institutions, journals, or countries. A simple and popular one among the indicators is the h-index, the Hirsch index (Hirsch 2005), which is an indicator for lifetime achievement of a scholar. Several other indicators have been proposed to complement or balance the h-index. However, these indicators have no conception of aging. The AR-index (Jin et al. 2007) incorporates aging but divides the received citation counts by the raw age of the publication. Consequently, the decay of a publication is very steep and insensitive to disciplinary differences. In addition, we believe that a publication becomes outdated only when it is no longer cited, not because of its age. Finally, all indicators treat citations as equally material when one might reasonably think that a citation from a heavily cited publication should weigh more than a citation froma non-cited or little-cited publication. We propose a new indicator, the Discounted Cumulated Impact (DCI) index, which devalues old citations in a smooth way. It rewards an author for receiving new citations even if the publication is old. Further, it allows weighting of the citations by the citation weight of the citing publication. DCI can be used to calculate research performance on the basis of the h-core of a scholar or any other publication data set. Finally, it supports comparing research performance to the average performance in the domain and across domains as well.
This study utilized bibliometric tools to analyze the relationship between two separate, but related, fields: Library and Information Science (LIS) and Management Information Systems (MIS). The top-ranked 48 journals in each field were used as the unit of analysis. Using these journals, field cocitation was introduced as a method for evaluating the relationships between the two fields. The three-phased study evaluated (a) the knowledge imported/exported between LIS and MIS, (b) the body of knowledge influenced by both fields, and (c) the overlap in fields as demonstrated by multidimensional scaling. Data collection and analysis were performed using DIALOG and SPSS programs. The primary findings from this study indicate that (a) the MIS impact on LIS is greater than the reverse, (b) there is a growing trend for shared impact between the two disciplines, and (c) the area of overlap between the two fields is predominately those journals focusing on technology systems and digital information. Additionally, this study validated field cocitation as a method by which to evaluate relationships between fields.
In this article we offer a new approach to evaluating Organizational Memory (OM). Our proposed evaluation methodology, named KnowledgeEco, is based on an ontology for the domain of OM. Its key steps are:1) mapping the OM in the evaluated organization onto the ontology concepts; 2) noting which entities from the ontology are missing in the OM; and 3) applying a series of rules that help assess the impact of the OM on organizational learning. This systematic evaluation thus helps to propose ways to improve the evaluated OM. We present three case studies that demonstrate the feasibility of KnowledgeEco for evaluating OM and for suggesting improvements. We also identify some weaknesses in the OMs common to the three organizations cited in the case studies. Finally, we discuss how the KnowledgeEco ontology-based methodology establishes utility and contributes to further research in the field of OM.
Fundamental mathematical properties of rhythm sequences are studied. In particular, a set of three axioms for valid rhythm indicators is proposed, and it is shown that the R-indicator satisfies only two out of three but that the R'-indicator satisfies all three. This fills a critical, logical gap in the study of these indicator sequences. Matrices leading to a constant R'-sequence are called baseline matrices. They are characterized as matrices with constant w-year diachronous impact factors. The relation with classical impact factors is clarified. Using regression analysis matrices with a rhythm sequence that is on average equal to 1 (smaller than 1, larger than 1) are characterized.
Text representation, central to information processing, must be descriptive and discriminative. Although some of the many techniques to construct document representations may outperform others for certain tasks, no one is consistently better than others. Representations are still problematic. Evaluation techniques are needed to penetrate foundational questions about term behavior in representation. A study that applies the shape recovery analysis method is reported here as an evaluative tool to compare different indexing schemes. Three weight coefficients are used to rank indexing terms and are compared to the documents' full text. Two of the weight coefficients are novel and the third relies on the chi-squared distribution. Multidimensional scaling reduces the dimensional space of the document surrogates into a two-dimensional Cartesian space. Ten concentric circles evenly separated at 10% intervals of relevant data points starting at the centroid are used to construct a precision-recall curve. ANOVA is used for a straightforward computation of the 4 x 11 matrix of test data to see whether the four treatments yield the same P-R result. A post hoc HSD Tukey multiple comparisons test among pairwise treatments is also used to discover homogeneous groups. The findings show the value of the methodology to study term weighting schemes, and their descriptiveness and discriminative power, as well as the potential strength of the novel coefficients introduced.
J.E. Hirsch (2005) introduced the h-index to quantify an individual's scientific research output by the largest number h of a scientist's papers that received at least h citations. To take into account the highly skewed frequency distribution of citations, L. Egghe (2006a) proposed the g-index as an improvement of the h-index. I have worked out 26 practical cases of physicists from the Institute of Physics at Chemnitz University of Technology, and compare the h and g values in this study. It is demonstrated that the 9-index discriminates better between different citation patterns. This also can be achieved by evaluating B.H. Jin's (2006) A-index, which reflects the average number of citations in the h-core, and interpreting it in conjunction with the h-index. h and A can be combined into the R-index to measure the h-core's citation intensity. I also have determined the A and R values for the 26 datasets. For a better comparison, I utilize interpolated indices. The correlations between the various indices as well as with the total number of papers and the highest citation counts are discussed. The largest Pearson correlation coefficient is found between g and R. Although the correlation between g and h is relatively strong, the arrangement of the datasets is significantly different depending on whether they are put into order according to the values of either h or g.
New mass publishing genres, such as blogs and personal home pages provide a rich source of social data that is yet to be fully exploited by the social sciences and humanities. Information-centered research (ICR) not only provides a genuinely new and useful information science research model for this type of data, but can also contribute to the emerging e-research infrastructure. Nevertheless, ICR should not be conducted on a purely abstract level, but should relate to potentially relevant problems.
Until recently, Chinese texts could not be studied using co-word analysis because the words are not separated by spaces in Chinese (and Japanese). A word can be composed of one or more characters. The online availability of programs that separate Chinese texts makes it possible to analyze them using semantic maps. Chinese characters contain not only information but also meaning. This may enhance the readability of semantic maps. In this study, we analyze 58 words which occur 10 or more times in the 1,652 journal titles of the China Scientific and Technical Papers and Citations Database. The word-occurrence matrix is visualized and factor-analyzed.
The paper presents a new national level indicator based on shares of OECD aggregate 'external' patent applications world-wide. It provides the first reliable trend data for patent applications since new patent application procedures were introduced in the 1980s. The trends show a strong correlation with business R&D expenditure (BERD) trend data similarly based on shares of OECD aggregate BERD, reaffirming a relationship observed in previous studies using granted patents. However the reliability of the current indicator over an extended 20 year period shows that in two cases, the US and UK, there is divergence in correlation over part of the period studied. This aspect of the study provides evidence that the surge in external patenting in the US, over the period 1989 to 1996, is not driven by BERD and strongly suggests public sector research as a driver. This result shows that the new patent applications indicator can monitor factors in national systems not easily observed by other means. In this case it shows potential for monitoring the success of policies in driving public sector research towards commercial outcomes.
This paper presents a methodology for measuring the technical efficiency of research activities. It is based on the application of data envelopment analysis to bibliometric data on the Italian university system. For that purpose, different input values (research personnel by level and extra funding) and output values (quantity, quality and level of contribution to actual scientific publications) are considered. Our study aims at overcoming some of the limitations connected to the methodologies that have so far been proposed in the literature, in particular by surveying the scientific production of universities by authors' name.
This article evaluates the scientific research competitiveness of world universities in computer science. The data source is the Essential Science Indicator (ESI) database with a time span of more than 10 years, from 01/01/1996 to 08/31/2006. We establish a hierarchical indicator system including four primary indicators which consist of scientific research production, influence, innovation and development and six secondary indicators which consist of the number of papers, total citations, highly cited papers, hot papers, average citations per paper and the ration of highly cited papers to papers. Then we assign them with proper weights. Based on these, we obtain the rankings of university and country/territory competitiveness in computer science. We hope this paper can contribute to the further study in the evaluation of a certain subject or a whole university.
This study explores the representation of scientific journals from Italy, Hungary, Slovenia, Croatia, and Serbia and Montenegro in the Thomson Scientific's 2005 Journal Citation Reports (JCR). The number of journals covered by JCR was analyzed in relation to scientific productivity of selected countries and the size of their economies, and no apparent relationship between these factors was found. Our findings suggest that other factors, including the quality of individual journals, may influence how many journals a country will have in the JCR.
This paper deals with two document-document similarity approaches in the context of science mapping: bibliographic coupling and a text approach based on the number of common abstract stems. We used 43 articles, published in the journal Information Retrieval, as test articles. An information retrieval expert performed a classification of these articles. We used the cosine measure for normalization, and the complete linkage method was used for clustering the articles. A number of articles pairs were ranked (1) according to descending normalized coupling strength, and (2) according to descending normalized frequency of common abstract stems. The degree of agreement between the two obtained rankings was low, as measured by Kendall's tau. The agreement between the two cluster solutions, one for each approach, was fairly low, according to the adjusted Rand index. However, there were examples of perfect agreement between the coupling solution and the stems solution. The classification generated by the expert contained larger groups compared to the coupling and stems solutions, and the agreement between the two solutions and the classification was not high. According to the adjusted Rand index, though, the stems solution was a better approximation of the classification than the coupling solution. With respect to cluster quality, the overall Silhouette value was slightly higher for the stems solution. Examples of homogeneous cluster structures, as well as negative Silhouette values, were found with regard to both solutions. The expert classification indicates that the field of information retrieval, as represented by one volume of articles published in Information Retrieval, is fairly heterogeneous regarding research themes, since the classification is associated with 15 themes. The complete linkage method, in combination with the upper tail rule, gave rise to a fairly good approximation of the classification with respect to the number of identified groups, especially in case of the stems approach.
This study aims to describe international scientific production and collaboration in Epidemiology and Public Health. It is a bibliometric analysis of articles published during 1997-2002 in 39 international journals. The United States has the greatest production in absolute terms, participating in 46% of the articles studied. Next come Great Britain (13.3%), and Canada (6.8%). In 34.8% of the articles involved participation by at least one of the 15 European Union countries. After adjustment for population and GDP, the Scandinavian countries, The Netherlands, and Australia holding the leading positions. In terms of collaboration, groups of countries with similar profiles are observed.
With the data retrieved from the search engines of Yahoo and SOGOU, this article aims to study the total backlink counts, external backlink counts and the Web Impact Factors for government websites of Chinese provincial capitals. By studying whether the backlink counts and WIFs of websites associate with the comprehensive ratings for these websites, the article demonstrates that the backlink counts can be a better evaluation measure for government websites than WIFs. At length, this article also discusses the correlation between backlink counts and economic capacity, and illustrates that backlink counts can also be an indicator for economic status.
Since the term "co-link" was put forward, many scholars have done exploratory investigations to prove the applicability and validity of co-link analysis used in mapping internet structure and mining relationships among internet colonies. All of these studies are based on the whole links in the web called "total co-link analysis". However, there are both substantive and non-substantive links in the web, and the number of the latter outweights that of the former, which makes the preconditions of total co-link analysis untenable. In this paper, we present "substantive co-link analysis", and believe it is more sound and valid than "total co-link analysis". Then exploratory investigations on both total and substantive co-link analysis are carried out with the sample of 20 academic blogs on Library and Information Science, the results of which testify our assumption that "substantive co-link analysis" is more efficient and reasonable than "total co-link analysis".
Based on a face-to-face survey of 312 scientists from government research institutes and state universities in two Philippine locations - Los Banos, Laguna and Munoz, Nueva Ecija - we examine how graduate training and digital factors shape the professional network of scientists at the "Global South." Results suggest that scientists prefer face-to-face interaction; there is no compelling evidence that digitally-mediated interaction will replace meaningful face-to-face interaction. What is evident is that among none face-to-face modes of communication a reordering maybe in progress. The effect of digital factors - expressed through advance hardware-software-user interaction skills - lies on network features pertaining to size, proportion of male and of core-based alters, and locational diversity. International graduate training and ascribed factors (gender and number of children) also configure the professional network of scientists - actors traditionally viewed as the epitome of rationality and objectivity. We argue that these factors influence knowledge production through a system of patronage and a culture that celebrates patrifocality. We forward the hypothesis that knowledge production at the "Global South" closely fits Callon's [1995] extended translation model of science.
Condensing the work of any academic scientist into a one-dimensional indicator of scientific performance is a difficult problem. Here, we employ Bayesian statistics to analyze several different indicators of scientific performance. Specifically, we determine each indicator's ability to discriminate between scientific authors. Using scaling arguments, we demonstrate that the best of these indicators require approximately 50 papers to draw conclusions regarding long term scientific performance with usefully small statistical uncertainties. Further, the approach described here permits statistical comparison of scientists working in distinct areas of science.
Currently the Journal Impact Factors (JIF) attracts considerable attention as components in the evaluation of the quality of research in and between institutions. This paper reports on a questionnaire study of the publishing behaviour and researchers preferences for seeking new knowledge information and the possible influence of JIF on these variables. 54 Danish medical researchers active in the field of Diabetes research took part. We asked the researchers to prioritise a series of scientific journals with respect to which journals they prefer for publishing research and gaining new knowledge. In addition we requested the researchers to indicate whether or not the JIF of the prioritised journals has had any influence on these decisions. Furthermore we explored the perception of the researchers as to what degree the JIF could be considered a reliable, stable or objective measure for determining the scientific quality of journals. Moreover we asked the researchers to judge the applicability of JIF as a measure for doing research evaluations. One remarkable result is that app. 80% of the researchers share the opinion that JIF does indeed have an influence on which journals they would prefer for publishing. As such we found a statistically significant correlation between how the researchers ranked the journals and the JIF of the ranked journals. Another notable result is that no significant correlation exists between journals where the researchers actually have published papers and journals in which they would prefer to publish in the future measured by JIF. This could be taken as an indicator for the actual motivational influence on the publication behaviour of the researchers. That is, the impact factor actually works in our case. It seems that the researchers find it fair and reliable to use the Journal Impact Factor for research evaluation purposes.
The relationship between science and technology has been extensively studied from both theoretical and quantitative perspectives. Quantitative studies typically use patents as proxy for technology and scientific papers as proxy for science, and investigate the relationship between the two. Most such studies have been limited to a single discipline or country. In this paper, we investigate science-technology interaction over a broad range of science and technology by identifying and validating a set of 18,251 inventor-authors through matching of rare names obtained from paper and patent data. These inventor-authors are listed as inventors on nearly 56,000 US patents between 2002 and 2006. Analysis of the distribution of these patents over classes shows that this 6.7% sample is a suitable sample for further analysis. In addition, a map of 290 IPC patent subclasses was created, showing the relationship between patent classes and industries as well as the distribution of patent classes with high science orientation and low science orientation. (C) 2008 Elsevier Ltd. All rights reserved.
This research presents a new metrics to measure and assess the scientific performance of public research institutes, which improves models based on standard multivariate techniques. These models called Research Lab Evaluation (RELEV) adjusted are successfully applied to Italian public research institutes, operating in five scientific fields. In addition, the paper presents a comparison between this method and the Data Envelopment Analysis to show some analogies and differences in the results. (C) 2008 Elsevier Ltd. All rights reserved.
This article presents a study that compares detected structural communities in a coauthorship network to the socioacademic characteristics of the scholars that compose the network. The coauthorship network was created from the bibliographic record of a multi-institution, interdisciplinary research group focused on the study of sensor networks and wireless communication. Four different community detection algorithms were employed to assign a structural community to each scholar in the network: leading eigenvector, walktrap, edge betweenness and spinglass. Socioacademic characteristics were gathered from the scholars and include such information as their academic department, academic affiliation, country of origin, and academic position. A Pearson's chi(2)test, with a simulated Monte Carlo, revealed that structural communities best represent groupings of individuals working in the same academic department and at the same institution. A generalization of this result suggests that, even in interdisciplinary, multi-institutional research groups, coauthorship is primarily driven by departmental and institutional affiliation. (C) 2008 Elsevier Ltd. All rights reserved.
The structure of different types of time series in citation analysis is revealed, using an adapted form of the Frandsen-Rousseau notation. Special cases where this approach can be used include time series of impact factors and time series of h-indices, or h-type indices. This leads to a tool describing dynamic aspects of citation analysis. Time series of h-indices are calculated in some specific models. (C) 2008 Elsevier Ltd. All rights reserved.
In order to take multiple co-authorship appropriately into account, a straightforward modi. cation of the Hirsch index was recently proposed. Fractionalised counting of the papers yields an appropriate measure which is called the h(m)-index. The effect of this procedure is compared in the present work with other variants of the h-index and found to be superior to the fractionalised counting of citations and to the normalization of the h-index with the average number of authors in the h-core. Three fictitious examples for model cases and one empirical case are analysed. (C) 2008 Elsevier Ltd. All rights reserved.
Although it is generally understood that different citation counting methods can produce quite different author rankings, and although "optimal" author co-citation counting methods have been identified theoretically, studies that compare author co-citation counting methods in author co-citation analysis (ACA) studies are still rare. The present study applies strict all-author-based ACA to the Information Science (IS) field, in that all authors of all cited references in a classic IS dataset are counted, and in that even the diagonal values of the co-citation matrix are computed in their theoretically optimal form. Using Scopus instead of SSCI as the data source, we find that results from a theoretically optimal all-author ACA appear to be excellent in practice, too, although in a field like IS where co-authorship levels are relatively low, its advantages over classic first-author ACA appear considerably smaller than in the more highly collaborative ones targeted before. Nevertheless, we do find some differences between the two approaches, in that first-author ACA appears to favor theorists who presumably tend to work alone, while all-author ACA appears to paint a somewhat more recent picture of the field, and to pick out some collaborative author clusters. (C) 2008 Elsevier Ltd. All rights reserved.
Informetric researchers have long chafed at the limitations of bibliographic databases for their analyses, without being able to visualize or develop real solutions to the problem. This paper describes a solution developed to provide for the specialist needs of informetric researchers. In a collaborative exercise between the fields of computer science and informetrics, data modelling was used in order to address the requirements of complex and dynamic informetric data. This paper reports on this modelling experience with its aim of building an object-relational database (ORDB) for informetric research purposes. The paper argues that ORM (object-relational model) is particularly suitable because it allows for the modelling of complex data and accommodates the various data source formats and standards used by a variety of bibliographic databases. Further, ORM captures the dynamic nature of informetric data by allowing user-defined data types and by embedding basic statistical calculating tools as object functions in these user-defined data types. The main ideas of the paper are implemented in an Oracle database management system. (C) 2008 Elsevier Ltd. All rights reserved.
Each information production process has a unique h-index. This paper studies the problem: what are possible h-index values if we merge two or more IPPs? First the paper gives examples of IPP mergings. There are at least two types: one where common sources add their number of items and one where common sources get the maximum of their number of items in the two IPPs. In each case we show that max(h(1), h(2)) <= h <= h(1) + h(2) for both types of mergings (h(i), i = 1, 2 the h-index in the two IPPs; h is the h-index in the merged one). We show that the above inequalities cannot be improved ( in both merging types). We also show that the same inequalities are true for the g-index and the R-index but that they are false for the weighted h-index. For the R-index we can even refine the above inequality R <= root R-1(2) + R-2(2) < R-1 + R-2. (C) 2008 Elsevier Ltd. All rights reserved.
This article presents findings from a research study (Trace, 2004) that looked at a particular aspect of human information behavior: children's information creation in a classroom setting. In the portion of the study described here, naturalism and ethnomethodology are used as theoretical frameworks to investigate informal documents as an information genre. Although previous studies have considered the role of informal documents within the classroom, little sustained attention has been paid to pre-adolescents, particularly in terms of how they create unofficial or vernacular literacies both to navigate their growing awareness of the formal (albeit sometimes "hidden") curriculum and, on occasion, to subvert it, positing an alternative economy that itself can be "hidden"via surreptitious use of informal documents. Making explicit the ties that exist between these objects and the worlds in which they, are embedded demonstrates that informal documents hold a particular relevance for children within this social context (Garfinkel & Bittner, 1999). Furthermore, this article demonstrates that an ethnomethodologically informed viewpoint of information creation brings a level of dignity and determination to an individual's human information behavior, allowing us to appreciate the human ability to recontextualize or reenvisage sanctioned or official information genres to meet our own needs and purposes.
The purpose of this study was to develop concepts from information science to understand information behavior between multiple actors in high reliability operations. Based on previous research and framework development in human information behavior, the Distributed Information Behavior System was designed to assess the social practice of information identification, gathering, and use. In this study, flight crews were used as the test bed. The goal of this research was to assess if different information behaviors are practiced by accident-involved crews and crews not involved in an accident. The results indicate that differences indeed exist in the way crews identify, gather, and use information based on their performance level. This study discerns that high performing crews practice substantially different information behaviors than low performing and accident-involved crews. This work serves as a way to understand the social practice of information structuring within high reliability operations. Subsequently, this may aid researchers to identify the role sequencing plays in critical information negotiation. This work also serves as a tool to inform training and is applicable to other domains where work is supported through distributed collective practice.
Bibliographic databases (including databases based on open access) are routinely used for bibliometric research. The value of a specific database depends to a large extent on the coverage of the discipline(s) under study. A number of studies have determined the coverage of databases in specific disciplines focusing on interdisciplinary differences; however, little is known about the potential existence of intradisciplinary differences in database coverage. Focusing on intradisciplinary differences, the article documents large database-coverage differences within two disciplines (economics and psychology). The point extends to include both the uneven coverage of specialties and research traditions. The implications for bibliometric research are discussed, and precautions which need to be taken are outlined.
Using the 138,751 patents filed in 2006 under the Patent Cooperation Treaty, co-classification analysis is pursued on the basis of three-and four-digit codes in the International Patent Classification (IPC, 8th ed.). The co-classifications among the patents enable us to analyze and visualize the relations among technologies at different levels of aggregation. The hypothesis that classifications might be considered as the organizers of patents into classes, and therefore that co-classification patterns-more than co-citation patterns-might be useful for mapping, is not corroborated. The classifications hang weakly together, even at the four-digit level; at the country level, more specificity can be made visible. However, countries are not the appropriate units of analysis because patent portfolios are largely similar in many advanced countries in terms of the classes attributed. Instead of classes, one may wish to explore the mapping of title words as a better approach to visualize the intellectual organization of patents.
Developed countries have an even distribution of published papers on the seventeen model organisms. Developing countries have biased preferences for a few model organisms which are associated with endemic human diseases. A variant of the Hirsch-index, that we call the mean (mo)h-index ("model organism h-index"), shows an exponential relationship with the amount of papers published in each country on the selected model organisms. Developing countries cluster together with low mean (mo)h-indexes, even those with high number of publications. The growth curves of publications on the recent model Caenorhabditis elegans in developed countries shows different formats. We also analyzed the growth curves of indexed publications originating from developing countries. Brazil and South Korea were selected for this comparison. The most prevalent model organisms in those countries show different growth curves when compared to a global analysis, reflecting the size and composition of their research communities.
This article studies the h-index (Hirsch index) and the g-index of authors, in case one counts authorship of the cited articles in a fractional way. There are two ways to do this: One counts the citations to these papers in a fractional way or one counts the ranks of the papers in a fractional way as credit for an author. In both cases, we define the fractional h- and g-indexes, and we present inequalities (both upper and lower bounds) between these fractional h- and g-indexes and their corresponding unweighted values (also involving, of course, the coauthorship distribution). Wherever applicable, examples and counterexamples are provided. In a concrete example (the publication citation list of the present author), we make explicit calculations of these fractional h- and g-indexes and show that they are not very different from the unweighted ones.
User satisfaction has become one of the most important measures of the success or effectiveness of information systems (IS). In the current study, the dimensionality of Web-based information systems (WIS) satisfaction was first examined. Two composite latent variable models with factor-order structures were then empirically tested and compared to describe the relationships among observable variables concerned with WIS satisfaction. Using data from a sample of 515 university students, a third-order composite latent variable model was retained based on statistical and theoretical criteria. At the third-order level, WIS satisfaction is determined by two second-order constructs: Web information satisfaction and Web system satisfaction. Web information satisfaction is determined by understandability, reliability, and usefulness, while Web system satisfaction is determined by access, usability, and navigation. Overall, the model provides a good fit to the data and is theoretically valid, reflecting logical consistency. Implications of the current investigation for practice and research are provided.
Previous research has demonstrated that lower performance groups have a larger size-dependent cumulative advantage for receiving citations than do top-performance groups. Furthermore, regardless of performance, larger groups have less not-cited publications. Particularly for the lower performance groups, the fraction of not-cited publications decreases considerably with size. These phenomena can be explained with a model in which self-citation acts as a promotion mechanism for external citations. In this article, we show that for self-citations, similar size-dependent scaling rules apply as for citations, but generally the power law exponents are higher for self-citations as compared to citations. We also find that the fraction of self-citations is smaller for the higher performance groups, and this fraction decreases more rapidly with increasing journal impact than that for lower performance groups. An interesting novel finding is that the variance in the correlation of the number of self-citations with size is considerably less than the variance for external citations. This is a clear indication that size is a stronger determinant for self-citations than it is for external citations. Both higher and particularly lower performance groups have a size-dependent cumulative advantage for self-citations, but for the higher performance groups only in the lower impact journals and in fields with low citation density.
The concept of the "work" in art differs from and challenges traditional concepts of the "work" in bibliography. Whereas the traditional bibliographic concept of the work takes an ideational approach that incorporates mentalist epistemologies, container-content metaphors, and the conduit metaphor of information transfer and re-presentation, the concept of the work of art as is presented here begins with the site-specific and time-valued nature of the object as a product of human labor and as an event that is emergent through cultural forms and from social situations. The account of the work, here, is thus materialist and expressionist rather than ideational. This article takes the discussion of the work in the philosopher Martin Heidegger's philosophical-historical account and joins this with the concept of the work in the modern avant-garde, toward bringing into critique the traditional bibliographic conception of the work and toward illuminating a materialist perspective that may be useful in understanding cultural work-objects, as well as texts proper.
We provide in this article a number of new insights into the methodological discussion about author co-citation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors' co-citation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. We show by means of an example that the choice of an appropriate similarity measure has a high practical relevance. Finally, we discuss the use of similarity measures for statistical inference.
Wikipedia (the "free online encyclopedia that anyone can edit") is having a huge impact on how a great many people gather information about the world. So, it is important for epistemologists and information scientists to ask whether people are likely to acquire knowledge as a result of having access to this information source. In other words, is Wikipedia having good epistemic consequences? After surveying the various concerns that have been raised about the reliability of Wikipedia, this article argues that the epistemic consequences of people using Wikipedia as a source of information are likely to be quite good. According to several empirical studies, the reliability of Wikipedia compares favorably to the reliability of traditional encyclopedias. Furthermore, the reliability of Wikipedia compares even more favorably to the reliability of those information sources that people would be likely to use if Wikipedia did not exist (viz., Web sites that are as freely and easily accessible as Wikipedia). In addition, Wikipedia has a number of other epistemic virtues (e.g., power, speed, and fecundity) that arguably outweigh any deficiency in terms of reliability. Even so, epistemologists and information scientists should certainly be trying to identify changes (or alternatives) to Wikipedia that will bring about even better epistemic consequences. This article suggests that to improve Wikipedia, we need to clarify what our epistemic values are and to better understand why Wikipedia works as well as it does.
Proper nouns may be considered the most important query words in information retrieval. If the two languages use the same alphabet, the same proper nouns can be found in either language. However, if the two languages use different alphabets, the names must be transliterated. Short vowels are not usually marked on Arabic words in almost all Arabic documents (except very important documents like the Muslim and Christian holy books). Moreover, most Arabic words have a syllable consisting of a consonant-vowel combination (CV), which means that most Arabic words contain a short or long vowel between two successive consonant letters. That makes it difficult to create English-Arabic transliteration pairs, since some English letters may not be matched with any romanized Arabic letter. In the present study, we present different approaches for extraction of transliteration proper-noun pairs from parallel corpora based on different similarity measures between the English and romanized Arabic proper nouns under consideration. The strength of our new system is that it works well for low-frequency proper noun pairs. We evaluate the new approaches presented using two different English-Arabic parallel corpora. Most of our results outperform previously published results in terms of precision, recall, and F-Measure.
The article studies the influence of the query formulation of a topic on its h-Index. In order to generate pure random sets of documents, we used N-grams (N variable) to measure this influence: strings of zeros, truncated at the end. The used databases are WoS and Scopus. The formula h =T-1/alpha, proved in Egghe and Rousseau (2006) where T is the number of retrieved documents and a is Lotka's exponent, is confirmed being a concavely increasing function of T. We also give a formula for the relation between h and N the length of the N-gram: h = D10(-N/alpha). where D is a constant, a convexly decreasing function, which is found in our experiments. Nonlinear regression on h=T-1/alpha gives an estimation of alpha, which can then be used to estimate the h-index of the entire database (Web of Science [WoS] and Scopus): h=S-1/alpha, where S is the total number of documents in the database.
Search engines are normally used to find information or Web sites, but Webometric investigations use them for quantitative data such as the number of pages matching a query and the international spread of those pages. For this type of application, the accuracy of the hit count estimates and range of URLs in the full results are important. Here, we compare the applications programming interfaces of Google, Yahoo!, and Live Search for 1,587 single word searches. The hit count estimates were broadly consistent but with Yahoo! and Google, reporting 5-6 times more hits than Live Search. Yahoo! tended to return slightly more matching URLs than Google, with Live Search returning significantly fewer. Yahoo!'s result URLs included a significantly wider range of domains and sites than the other two, and there was little consistency between the three engines in the number of different domains. In contrast, the three engines were reasonably consistent in the number of different top-level domains represented in the result URLs, although Yahoo! tended to return the most. In conclusion, quantitative results from the three search engines are mostly consistent but with unexpected types of inconsistency that users should be aware of. Google is recommended for hit count estimates but Yahoo! is recommended for all other Webometric purposes.
This study examines the differences between Scopus and Web of Science in the citation counting, citation ranking, and h-index of 22 top human-computer interaction (HCI) researchers from EQUATOR-a large British Interdisciplinary Research Collaboration project. Results indicate that Scopus provides significantly more coverage of HCI literature than Web of Science, primarily due to coverage of relevant ACM and IEEE peer-reviewed conference proceedings. No significant differences exist between the two databases if citations in journals only are compared. Although broader coverage of the literature does not significantly alter the relative citation ranking of individual researchers, Scopus helps distinguish between the researchers in a more nuanced fashion than Web of Science in both citation counting and h-index. Scopus also generates significantly different maps of citation networks of individual scholars than those generated by Web of Science. The study also presents a comparison of h-index, scores based on Google Scholar with those based on the union of Scopus and Web of Science. The study concludes that Scopus can be used as a sole data source for citation-based research and evaluation in HCI, especially when citations in conference proceedings are sought, and that researchers should manually calculate h scores instead of relying on system calculations.
In recent years, the Internet has experienced a boom as an information source. The use of search engines is the most common way of finding this information. This means that less visible contents (for search engines) are increasingly difficult or even almost impossible to find. Thus,Web users are forced to accept alternative services or contents only because they are visible and offered to users by search engines. If a company's Web site is not visible, that company is losing clients. Therefore, it is fundamental to assure that one's Web site will be indexed and, consequently, visible to as many Web users as possible. To quantitatively evaluate the visibility of a Web site, this article introduces a method that Web administrators may use. The method consists of four activities and several tasks. Most of the tasks are accompanied by a set of defined measures that can help the Web administrator determine where the Web design is failing (from the positioning point of view). Some tools that can be used for the determination of the measure values also are referenced in the description of the method. The method is furthermore accompanied by examples to help in understanding how to apply it.
By 2000, the Internet became an information and communication medium that was integrated in our everyday lives. Following an interdisciplinary approach, the research reported in this article analyzes the wide variety of information that people seek on the Internet and investigates trends in Internet information activities between 2000 and:2004, using repeated cross-sectional data from the Pew Internet and American Life surveys to examine Internet activities that contribute to everyday life and their predictors. The objective is to deepen our understanding of Internet activities and everyday life and contribute to a growing body of research that utilizes large-scale empirical data on Internet use and everyday life. We ask: who is embedding the Internet into their everyday lives and what are the activities they pursue to facilitate everyday life? Findings demonstrate the differential returns for Internet use, particularly in key demographic categories. The study also contributes to emerging research on the digital divide, namely emphasis on the study of use rather than access to technology. Identifying trends in key Internet use dimensions enables policymakers to target populations who underutilize the potential of networked technologies.
Increasingly, individuals use communication technologies such as e-mail, IMs, blogs, and cell phones to locate, learn about, and communicate with one another. Not much, however, is known about how individuals relate to various personal technologies, their preferences for each, or their extensional associations with them. Even less is known about the cultural differences in these preferences. The current study used the Galileo system of multidimensional scaling to systematically map the extensional associations with nine personal communication technologies across three cultures: U.S., Germany, and Singapore. Across the three cultures, the technologies closest to the self were similar, suggesting a universality of associations with certain technologies. In contrast, the technologies farther from the self were significantly different across cultures. Moreover, the magnitude of associations with each technology differed based on the extensional association or distance from the self. Also, and more importantly, the antecedents to these associations differed significantly across cultures, suggesting a stronger influence of cultural norms on personal-technology choice.
While several authors have argued that conference proceedings are an important source of scientific knowledge, the extent of their importance has not been measured in a systematic manner. This article examines the scientific impact and aging of conference proceedings compared to those of scientific literature in general. It shows that the relative importance of proceedings is diminishing over time and currently represents only 1.7% of references made in the natural sciences and engineering, and 2.5% in the social sciences and humanities. Although the scientific impact of proceedings is losing ground to other types of scientific literature in nearly all fields, it has grown from 8% of the references in engineering papers in the early 1980s to its current 10%. Proceedings play a particularly important role in computer sciences, where they account for close to 20% of the references. This article also shows that not unexpectedly, proceedings age faster than cited scientific literature in general. The evidence thus shows that proceedings have a relatively limited scientific impact, on average representing only about 2% of total citations, that their relative importance is shrinking, and that they become obsolete faster than the scientific literature in general.
This study compares the United States of America and Korea's cases of national information infrastructure (NII) development, focusing on the role of the governments in the development of their NIIs and on the realization of the next generation of information infrastructure vision. The important similarities and differences can be seen by comparison on sociotechnical dimensions: government function, histories, visions, policy design, implementation plans, and realities and prospects. Findings show different patterns of NII development, providing insights for the next generation of NIIs. This study provides a prospect towards future information infrastructure needs in the context of dynamic sociotechnical changes.
Ontology plays an essential role in recognizing the meaning of the information in Web documents. It has been shown that extracting concepts is easier than building relationships among them. For a defined set of concepts, many existing algorithms produce all possible relationships for that set. This makes the process of refining the relationships almost impossible. A new algorithm is needed to reduce the number of relationships among a defined set of concepts produced by existing algorithms. This article contributes such an algorithm, which enables a domain-knowledge expert to refine the relationships linking a set of concepts. In the research reported here, text-mining tools have been used to extract concepts in the domain of e-commerce laws. A new algorithm has been proposed to reduce the number of extracted relationships. It groups the concepts according to the number of relationships with other concepts and provides formalization. An experiment and software have been built, proving that reducing the number of relationships will reduce the efforts needed from a human expert.
The dynamic analysis of structural change in the organization of the sciences requires, methodologically, the integration of multivariate and time-series analysis. Structural change-for instance, interdisciplinary development-is often an objective of government interventions. Recent developments in multidimensional scaling (MDS) enable us to distinguish the stress originating in each time-slice from the stress originating from the sequencing of time-slices, and thus to locally optimize the trade-offs between these two sources of variance in the animation. Furthermore, visualization programs like Pajek and Visone allow us to show not only the positions of the nodes, but also their relational attributes such as betweenness centrality. Betweenness centrality in the vector space can be considered as an indicator of interdisciplinarity. Using this indicator, the dynamics of the citation-impact environments of the journals Cognitive Science, Social Networks, and Nanotechnology are animated and assessed in terms of interdisciplinarity among the disciplines involved.
The goal of research evaluation is to reveal the achievement and progress of research. Research output offers a basis for empirical evaluation. A fair and just research evaluation should take into account the diversity of research output across disciplines and Include all major forms of research publications. This article reviews the literature an the nature of research output in social sciences and humanities in terms of the characteristics of research publications, and then discusses the implications for the research evaluation of social sciences and humanities, researchers.
From a user-centered perspective, an effective search engine needs to attract new users to try out its features, and retain those users so that they continue using the features. In this article, we investigate the relations between users' motivation for using (i.e., trying out and continuing to use) a search engine and the engine's functional features. Based on Herzberg's two-factor theory (F. Herzberg, 2003; F. Herzberg, M. Bernard, & B. Snyderman,1959), the features can be categorized as hygiene factors and motivation factors. Hygiene factors support the query process and provide a basic "task context" for information seeking that allows users to access relevant information. Motivation factors, on the other hand, help users navigate (i.e., browse) and comprehend the retrieved information, related to the "task-content" aspect of information seeking. Given the consistent findings that hygiene factors induce work motivation for a shorter period of time, it is hypothesized that hygiene factors are more effective in attracting users; while motivation factors are more effective in retaining than in attracting users. A survey, with 758 valid participants, was conducted to test the hypotheses. The empirical results provide substantial support for the proposed hypotheses and suggest that the two-factor theory can account for the motivation for using a search engine.
All journals that use peer review have to deal with the following question: Does the peer review system fulfill its declared objective to select the "best" scientific work? We investigated the journal peer-review process at Angewandte Chemie International Edition (AC-IE), one of the prime chemistry journals worldwide, and conducted a citation analysis for Communications that were accepted by the journal (n = 878) or rejected but published elsewhere (n = 959). The results of negative binomial-regression models show that holding all other model variables constant, being accepted by AC-IE increases the expected number of citations by up to 50%. A comparison of average citation counts (with 95% confidence intervals) of accepted and rejected (but published elsewhere) Communications with international scientific reference standards was undertaken. As reference standards, (a) mean citation counts for the journal set provided by, Thomson Reuters corresponding to the field "chemistry" and (b) specific reference standards that refer to the subject areas of Chemical Abstracts were used. When compared to reference standards, the mean impact on chemical research is for the most part far above average riot only for accepted Communications but also for rejected (but published elsewhere) Communications. However, average and below-average scientific impact is to be expected significantly less frequently for accepted Communications than for rejected Communications. All in all, the results of this study confirm that peer review at AC-IE is able to. select the "best" scientific work with the highest impact on chemical research.
A time-dependent h-type indicator is proposed. This indicator depends on the size of the h-core, the number of citations. received, and recent change in the value of the h-index. As such, it tries to combine in a dynamic way older information about the source (e.g., a scientist or research institute that is evaluated) with recent information.
A new approach to the field normalization of the classical journal impact factor is introduced. This approach, called the audience factor, takes into consideration the citing propensity of journals for a given cited journal, specifically, the mean number of references of each citing journal, and fractionally weights the citations from those citing journals. Hence, the audience factor is a variant of a fractional citation-counting scheme, but computed on the citing journal rather than the citing article or disciplinary level, and, in contrast to other cited-side normalization strategies, is focused on the behavior of the citing entities. A comparison with standard journal impact factors from Thomson Reuters shows a more diverse representation of fields within various quintiles of impact, significant movement in rankings for a number of individual journals, but nevertheless a high overall correlation with standard impact factors.
The authors use a novel method-the Author Affiliation Index (AAI)-to determine whether faculty at the top-10 North American library and information science (LIS) programs have a disproportionate presence in the premier journals of the field. The study finds that LIS may be both too small and too interdisciplinary a domain for the AAI to provide reliable results.
Based on the simulation study of the publication delay control process [YU & AL., 2005], transfer function models of delay control processes by adjusting the accepted contribution flux and the published contribution flux are identified using system identification. According to Cybernetics, the feedback control system of the publication delay is designed and control processes are simulated and analyzed when the average publication delay are regarded as the controlled object. On the basis of the relation between the average publication delay and the deposited contribution quantity, another control method is proposed that the deposited contribution quantity is regarded as the controlled object and the simulation result proves that the method is an excellent means and can help editors expediently manage their journals and control publication delays.
The use of indicators based on the analysis of the scientific literature cited in patent documents is proposed for the evaluation of biomedical research. A study carried out on several groups of researchers working in universities, public research centers, and hospitals, has shown that an important percentage of Spanish scientists have authored publications that are cited in US patents in the field of Biotechnology. The study and analysis of those cites allows a evaluation of the flow of knowledge generated by the different groups of scientists towards the development of technologies, and to learn on the relationship between the characteristics of the cited publications and the frequency they are cited in the patents. The results obtained avail the use of new indicators based on the cites in patents to perform a more complete evaluation of the published research related with Biotechnology and Biomedicine, both at the level of research institutions and individual scientists.
This paper examines the common contentions that the collective aging of tenured academic staff has negative effects on research performance of universities due to (a) negative effects of aging in itself, and (b) to a lack of newcomers who could revitalise the research. Data on academic staff and research at Norwegian universities over two decades have been used to examine these contentions. While older staff published less than their younger colleagues two decades ago, no differences in productivity are found today. Furthermore, during this period, a large increase in the number of post-doctoral fellows and PhD students has taken place, compensating for the aging of tenured staff.
The effects of team consolidation and social integration on individual scientists' activity and performance were investigated by analysing the relationships between these factors and scientists' productivity, impact, collaboration patterns, participation in funded research projects and programs, contribution to the training of junior researchers, and prestige. Data were obtained from a survey of researchers ascribed to the Biology and Biomedicine area of the Spanish Council for Scientific Research, and from their curricula vitae. The results show that high levels of team consolidation and of integration of the scientist within his or her team are factors which might help create the most favourable social climate for research performance and productivity. Researchers who carried out their activity in a social climate characterized by these factors participated in more domestic research projects and supervised more doctoral dissertations than the rest of their colleagues. They were also more productive, as shown by the higher number of papers published in journals included in the Journal Citation Reports and the higher number of patents granted. These metrics are the main indicators taken into account in the evaluation of the research activity of Spanish scientists, and are therefore the activities that scientists invest the most energy in with a view to obtaining professional recognition. The results corroborate the importance of research teamwork, and draw attention to the importance of teamwork understood not as two or more scientists working together to solve a problem, but as a complex process involving interactions and interpersonal relations within a particular contextual framework.
This study examines the relative efficiency of the R&D process across a group of 22 developed and developing countries using Data Envelopment Analysis (DEA). The R&D technical efficiency is examined using a model with patents granted to residents as an output and gross domestic expenditure on R&D and the number of researchers as inputs. Under CRS (Constant Returns to Scale), Japan, the Republic of Korea and China are found to be efficient, whereas under the VRS (Variable Returns to Scale) framework, Japan, the Republic of Korea, China, India, Slovenia and Hungary are found to be efficient. The emergence of some of the developing nations on the efficiency frontier indicates that these nations can also serve as benchmarks for their efficient use of R&D resources. The inefficiency in the R&D resource usage highlighted by this study indicates the underlying potential that can be tapped for the development and growth of nations.
In the highly competitive world, there has been a concomitant increase in the need for the research and planning methodology, which can perform an advanced assessment of technological opportunities and an early perception of threats and possibilities of the emerging technology according to the nation's economic and social status. This research is aiming to provide indicators and visualization methods to measure the latest research trend and aspect underlying scientific and technological documents to researchers and policy planners using "co-word analysis". Information Security field is a highly prospective market value. In this paper, we presented an analysis Information Security. Co-word analysis was employed to reveal patterns and trends in the Information Security fields by measuring the association strength of terms representatives of relevant publications or other texts produced in the Information Security field. Data were collected from SCI and the critical keywords could be extracted from the author keywords. These extracted keywords were further standardized. In order to trace the dynamic changes in the Information Security field, we presented a variety of technology mapping. The results showed that the Information Security field has some established research theme and also rapidly transforms to embrace new themes.
In order to identify the indicators having world-wide standards for the assessment of scientific performances at the level of both individual and institutions normalized for disciplines, we have carried out a comparative analysis of the relative scientific and technological level of individual scientists and individual scientific institutions competing internationally for given fields, using alternative indicators all based on the number of publications and on their impact factors in international SCI journals properly ranked properly weighted for their position, number of coauthors and discipline using deciles. This study, contrary to some gloomy opinions, suggests that interesting conclusions can be drawn from the above indicators. The utilization of the chosen indicators, tested world-wide in real situations, appears capable to effectively and objectively assess institutions and individual university professors and researchers proving to be quite significant and should be used to provide computer-assisted evaluation criteria for either maintaining or upgrading the given position, maintaining or closing public Institutions, and filtering grant applications.
This paper provides a detailed assessment of recent indexed journal publications by Turkish social scientists. We first present information on SCI, SSCI and AHCI indexed journal articles that were published by Turkish researchers over the past three decades. An inspection of publication statistics indicates a considerable improvement, especially during the last five years of the 1973-2005 period that we examine, in Turkey's publication record in terms of number of articles authored or co-authored by Turkish researchers. In the next step, we scrutinize institutional sources of this improvement, emphasizing regulatory and organizational changes that have both forced researchers to publish in indexed journals and remunerated those who did so. Finally, we provide a qualitative assessment of recent improvement in publication performance of Turkish researchers by focusing on a particular behavioral consequence of institutional changes and its implications for impact that research from Turkey has on global research activity. Bibliometric analysis of articles published by Turkish researchers in SSCI-indexed journals during 2000-2005 shows that recent regulatory and organizational changes seem to have instituted a particular publication habit, publishing in journals with lower impact factor, which was earlier observed in other parts of the world where publication counts were used for performance evaluation, and that signs of improvement in our select indicators of impact are yet to be observed.
In this article we analyse whether university-industry relations (UIR) are penalising research activity and inhibiting university researchers' scientific productivity and, if so, to what extent. The analysis is based on a case study of two Spanish universities. We find that UIR exercise a positive effect on university scientific productivity only when they are based on the development of R&D contracts, and when the funds obtained through these activities do not exceed 15% of the researcher's total budget. We also find that researchers who combine research and UIR activities obtain higher funding from competitive public sources than that engage only in research. In addition, their average scientific productivity is higher and they achieve higher status within their institutions than those members of faculty who concentrate only on research.
An individual's h-index corresponds to the number h of his/her papers that each has at least h citations. When the citation count of an article exceeds h, however, as is the case for the hundreds or even thousands of citations that accompany the most highly cited papers, no additional credit is given (these citations falling outside the so-called "Durfee square"). We propose a new bibliometric index, the "tapered h-index" (h(T)), that positively enumerates all citations, yet scoring them on an equitable basis with h. The career progression of h(T) and h are compared for six eminent scientists in contrasting fields. Calculated h(T) for year 2006 ranged between 44.32 and 72.03, with a corresponding range in h of 26 to 44. We argue that the h(T)-index is superior to h, both theoretically (it scores all citations), and because it shows smooth increases from year to year as compared with the irregular jumps seen in h. Conversely, the original h-index has the benefit of being conceptually easy to visualise. Qualitatively, the two indices show remarkable similarity (they are closely correlated), such that either can be applied with confidence.
Many writers of structured abstracts spend a good deal of time revising and polishing their texts-but is it worth it? Do readers notice the difference? In this paper we report three studies of readers using rating scales to judge (electronically) the clarity of an original and a revised abstract, both as a whole and in its constituent parts. In Study 1, with approximately 250 academics and research workers, we found some significant differences in favor of the revised abstract, but in Study 2, with approximately 210 information scientists, we found no significant effects. Pooling the data from Studies 1 and 2, however, in Study 3, led to significant differences at a higher probability level between the perception of the original and revised abstract as a whole and between the same components as found in Study 1. These results thus indicate that the revised abstract as a whole, as well as certain specific components of it, were judged significantly clearer than the original one. In short, the results of these experiments show that readers can and do perceive differences between original and revised texts-sometimes-and that therefore these efforts are worth the time and effort.
Although there have been a number of fairly recent studies in which researchers have explored the information-seeking and management behaviors of people interacting with musical retrieval systems, there have been very few published studies of the interaction and use behaviors of musicians interacting with their primary information object, the musical score. The ethnographic research reported here seeks to correct this deficiency in the literature. In addition to observing rehearsals and conducting 22 in-depth musician interviews, this research provides in-depth analysis of 25,000 annotations representing 250 parts from 13 complete musical works, made by musicians of all skill levels and performance modes. In addition to producing specific and practical recommendations for digital-library development, this research also provides an augmented annotation framework that will enable more specific study of human-information interaction, both with musical scores, and with more general notational/instructional information objects.
As information becomes richer and more complex, alternative information-organization methods are needed to more effectively and efficiently retrieve information from various systems, including the Web. The objective of this study is to explore how a Topic Maps-based ontology approach affects users' searching performance. Forty participants participated in a task-based evaluation where two dependent variables, recall and search time, were measured. The results of this study indicate that a Topic Maps-based ontology information retrieval (TOIR) system has a significant and positive effect on both recall and search time, compared to a thesaurus-based information retrieval (TIR) system. These results suggest that the inclusion of a Topic Maps-based ontology is a beneficial approach to take when designing information retrieval systems.
Network scaling algorithms such as the Pathfinder algorithm are used to prune many different kinds of networks, including citation networks, random networks, and social networks. However, this algorithm suffers from run time problems for large networks and online processing due to its O(n(4)) time complexity. In this article, we introduce a new alternative, the MST-Pathfinder algorithm, which will allow us to prune the original network to get its PFNET(infinity, n - 1) in just O(n(2).log n) time. The underlying idea comes from the fact that the union (superposition) of all the Minimum Spanning Trees extracted from a given network is equivalent to the PFNET resulting from the Pathfinder algorithm parameterized by a specific set of values (r = infinity and q = n - 1), those usually considered in many different applications. Although this property is well-known in the literature, it seems that no algorithm based on it has been proposed, up to now, to decrease the high computational cost of the original Pathfinder algorithm. We also present a mathematical proof of the correctness of this new alternative and test its good efficiency in two different case studies: one dedicated to the post-processing of large random graphs, and the other one to a real world case in which medium networks obtained by a cocitation analysis of the scientific domains in different countries are pruned.
Out of vocabulary words, mostly proper nouns and technical terms, are one main source of performance degradation in Cross Language Information Retrieval (CLIR) systems. Those are words not found in the dictionary. Bilingual dictionaries in general do not cover most proper nouns, which are usually primary keys in the query. As they are spelling variants of each other in most languages, using an approximate string matching technique against the target database index is the common approach taken to find the target language correspondents of the original query key. N-gram technique proved to be the most effective among other string matching techniques. The issue arises when the languages dealt with have different alphabets. Transliteration is then applied based on phonetic similarities between the languages involved. In this study, both transliteration and the n-gram technique are combined to generate possible transliterations in an English-Arabic CUR system. We refer to this technique as Transliteration N-Gram (TNG). We further enhance TNG by applying Part Of Speech disambiguation on the set of transliterations so that words with a similar spelling, but a different meaning, are excluded. Experimental results show that TNG gives promising results, and enhanced TNG further improves performance.
A multidimensional-scaling approach is used to analyze frequently used medical-topic terms in queries submitted to a Web-based consumer health information system. Based on a year-long transaction log file, five medical focus keywords (stomach, hip, stroke, depression, and cholesterol) and their co-occurring query terms are analyzed. An overlap-coefficient similarity measure and a conversion measure are used to calculate the proximity of terms to one another based on their co-occurrences in queries. The impact of the dimensionality of the visual configuration, the cutoff point of term co-occurrence for inclusion in the analysis, and the Minkowski metric power k on the stress value are discussed. A visual clustering of groups of terms based on the proximity within each focus-keyword group is also conducted. Term distributions within each visual configuration are characterized and are compared with formal medical vocabulary. This investigation reveals that there are significant differences between consumer health query-term usage and more formal medical terminology used by medical professionals when describing the same medical subject. Future directions are discussed.
With the program HistCite (TM) it is possible to generate and visualize the most relevant papers in a set of documents retrieved from the Science Citation Index. Historical reconstructions of scientific developments can be represented chronologically as developments in networks of citation relations extracted from scientific literature. This study aims to go beyond the historical reconstruction of scientific knowledge, enriching the output of HistCite (TM) with algorithms from social-network analysis and information theory. Using main-path analysis, it is possible to highlight the structural backbone in the development of a scientific field. The expected information value of the message can be used to indicate whether change in the distribution (of citations) has occurred to such an extent that a path-dependency is generated. This provides us with a measure of evolutionary change between subsequent documents. The "forgetting and rewriting" of historically prior events at the research front can thus be indicated. These three methods-HistCite, main path and path dependent transitions-are applied to a set of documents related to fullerenes and the fullerene-like structures known as nanotubes.
Four subjects-ecology, applied mathematics, sociology, and economics-were selected to assess whether there is a citation advantage between journal articles that have an open-access (OA) version on the internet compared to those articles that are exclusively toll access (TA). Citations were counted using the Web of Science, and the OA status of articles was determined by searching OAIster, OpenDOAR, Google, and Google Scholar. Of a sample of 4,633 articles examined, 2,280 (49%) were OA and had a mean citation count of 9.04 whereas the mean for TA articles was 5.76. There appears to be a clear citation advantage for those articles that are OA as opposed to those that are TA. This advantage, however, varies between disciplines, with sociology having the highest citation advantage, but the lowest number of OA articles, from the sample taken, and ecology having the highest individual citation count for OA articles, but the smallest citation advantage. Tests of correlation or association between OA status and a number of variables were generally found to weak or inconsistent. The cause of this citation advantage has not been determined.
Interdisciplinary collaboration is a major goal in research policy. This study uses citation analysis to examine diverse subjects in the Web of Science and Scopus to ascertain whether, in general, research published in journals classified in more than one subject is more highly cited than research published in journals classified in a single subject. For each subject, the study divides the journals into two disjoint sets called Multi and Mono. Multi consists of all journals in the subject and at least one other subject whereas Mono consists of all journals in the subject and in no other subject. The main findings are: (a) For social science subject categories in both the Web of Science and Scopus, the average citation levels of articles in Mono and Multi are very similar; and (b) for Scopus subject categories within life sciences, health sciences, and physical sciences, the average citation level of Mono articles is roughly twice that of Multi articles. Hence, one cannot assume that in general, multidisciplinary research will be more highly cited, and the converse is probably true for many areas of science. A policy implication is that, at least in the sciences, multidisciplinary researchers should not be evaluated by citations on the same basis as monodisciplinary researchers.
This article provides evidence for the network perspective on organizational learning in the cases of two companies of different size, industry, and culture. It builds on an earlier article that introduced the network perspective on organizational learning, and proposes some common traits of learning networks and tests them with the help of the tools of social-network analysis. We find support for the network perspective on organizational learning. There are some traits of the learning network that are common to very different companies, such as the fact that learning occurs mainly in clusters. Some other traits depend much on the organizational culture.
This article presents one part of a wider study, performed at the Department of library and information science and book studies (LIS & BS) at the University of Ljubljana (UL). The study investigated the perceptions of user friendliness of information-retrieval systems (IRS) and the role of individual characteristics of users in these perceptions. Based on an expert study, a user study with 61 postgraduate students of the UL was performed. Three interfaces of e-journals were studied: Science Direct, Proquest Direct, and Ebsco Host. Questionnaires and observations were used for data collection. The users' perceptions of user friendliness and of importance of auxiliary functions were investigated. Also, the connections between these perceptions and the users' individual characteristics were identified. Three sets of individual characteristics were included: approaches to studying, thinking styles, and hemisphere leanings. In connection with the dimensions of individual characteristics, very different perceptions of user friendliness were expressed. Some dimensions of individual characteristics were also found to be connected to the users' academic areas. It is shown that participants from different academic areas have different requirements and perceptions of user friendliness. The results of the study are relevant for the design of the user interfaces of disciplinary IR systems. They also have implications for other areas, for example, user education and training.
We explore unsupervised learning techniques for extracting semantic information about biomedical concepts and topics, and introduce a passage retrieval model for using these semantics in context to improve genomics literature search. Our contributions include a new passage retrieval model based on an undirected graphical model (Markov Random Fields), and new methods for modeling passage-concepts, document-topics, and passage-terms as potential functions within the model. Each potential function includes distributional evidence to disambiguate topics, concepts, and terms in context. The joint distribution across potential functions in the graph represents the probability of a passage being relevant to a biologist's information need. Relevance ranking within each potential function simplifies normalization across potential functions and eliminates the need for tuning of passage retrieval model parameters. Our dimensional indexing model facilitates efficient aggregation of topic, concept, and term distributions. The proposed passage-retrieval model improves search results in the presence of varying levels of semantic evidence, outperforming models of query terms, concepts, or document topics alone. Our results exceed the state-of-the-art for automatic document retrieval by 14.46% (0.3554 vs. 0.3105) and passage retrieval by 15.57% (0.1128 vs. 0.0976) as assessed by the TREC 2007 Genomics Track, and automatic document retrieval by 18.56% (0.3424 vs. 0.2888) as assessed by the TREC 2005 Genomics Track. Automatic document retrieval results for TREC 2007 and TREC 2005 are statistically significant at the 95% confidence level (p = .0359 and .0253, respectively). Passage retrieval is significant at the 90% confidence level (p = 0.0893).
While we now know the most important determinants of the digital divide, such as income, skills, and infrastructure, little has been written about how these variables relate to one another. Yet, it is on the basis of one's answer to this question that the difficulty of closing the divide ultimately depends. In this article, I have sought to challenge the (implicit) prevailing assumption in most of the digital-preparedness literature that variables can be perfectly substituted for one another and, hence, added together. In particular, and drawing on available evidence, I view the relationship between, say computers and computer skills, as being nearer the opposite extreme, of totally limited substitutability. On this basis, I suggest that the components of digital-preparedness indexes be multiplied rather than added. Using multiplication rather than addition in most current indexes of digital preparedness reveals a substantial understatement of the real difficulty in closing the digital divide and a different set of policies to deal with this larger problem. Such policies should include sharing arrangements and the use of intermediaries.
In this paper, we show a "Strategic Diagram" of the robot technology by applying the co-word analysis to the metadata of Korean related national R&D projects in 2001. The strategic diagram shows the evolutionary trends of the specific R&D domain and relational patterns between sub-domains. We may use this strategic diagram to support both the strategic planning and the R&D Program.
This paper presents a quantitative study of productivity, characteristics and various aspects of global publication in the field of library and information science (LIS). A total of 894 contributions published in 56 LIS journals indexed in SSCI during the years of 2000-2004 were analyzed. A total of 1361 authors had contributed publications during the five years. The overwhelming majority (89.93%) of them wrote one paper, The average number of authors per paper is 1.52. All the studied papers were published in English. The sum of research output of the authors form USA and UK reaches 70% of the total productivity. Most papers received few citations. Each article received on an average 1.6 citations and the LIS researchers cite mostly latest articles. About 48% of citing authors had tendency of self-citation. The productive authors, their contribution and authorship position are listed to indicate their productivity and degree of involvement in their research publications.
High citation is associated with research quality and consequently findings on highly cited articles are useful to increase understanding of the factors that produce high quality research. This study explores highly cited articles in six Subjects, focusing on late citation and peak citation years. Longitudinal citation patterns were found to be highly varied and, oil average, different from the remaining articles in each subject. For four of the six subjects, there is a correlation of over 0.42 between the percentage of early citations and total citation ranking but more highly ranked articles had a lower percentage of early citations. Surprisingly, for highly cited articles in all six subjects the prediction of citation ranking of from the sum of citations during their first six years was less accurate than prediction using the sum of the citations for only the fifth and sixth year.
In most scientific disciplines, a number of divergent and often highly specialized research areas are examined, which is reflected in substantial differences among journal scopes. Using the accounting literature as an example, we argue that this diversity in scopes should be considered when assessing journal influence. Concretely, we examine a citation-based structural influence measure for a sample of 41 accounting journals. Next, we identify sub-areas in the accounting literature and we explore journal influence in these sub-areas. Our results clearly demonstrate the importance of distinguishing between overall and sub-area influence. In addition, we show that sub-areas should be identified using a fuzzy clustering procedure.
In Open Access (OA) environment where article-based or author-based evaluation is important, a nexv evaluation system is needed to accommodate characteristics of Open Access Resources (OAR) and to overcome limitations of pre-existing evaluation systems such as journal-based evaluation. Primary and secondary evaluation factors were selected. Primary factors include hits and citations that constitutes composite index. Several secondary factors each for article and author evaluation were selected for normalization of the indexes. To validate superiority of newly developed normalized composite index systems compared to the monovariable index system, time-driven bias and power of discrimination were adopted. The results led to the conclusion that composite index proved to be a more stable index offsetting the negative effects from one element to another and normalization makes the composite index even more stable by controlling the bias from external elements.
This study was to explore a bibliometric approach to quantitatively assessing current research trends on atmospheric aerosol, using the related literature in the Science Citation Index (SCI) database from 1991 to 2006. Articles were concentrated on the analysis by scientific output, research performances by individuals, institutes and countries, and trends by the frequency of keywords used. Over the years, there had been a notably growth trend in research outputs, along with more participation and collaboration of institutes and countries. Research collaborative papers shifted from national inter-institutional to international collaboration. The decreasing share of world total and independent articles by the seven major industrialized countries (G7) was examined. Aerosol research in environmental and chemical related fields other than in medical fields was the mainstream of current years. Finally, author keywords, words in title and keywords plus were analyzed contrastively, with research trends and recent hotspots provided.
The present study aimed to determine the possible impact of medical research on the Spanish health system. To this end, an analysis was conducted of Spanish researchers' scientific production, measured in terms of the publications cited in MEDLINE, along with a series of economic, demographic and socio-sanitary data such as the R&D resources allocated to medical science, the actual population during the period Studied mortality, morbidity and drug spending. The results showed increases in all the variables studied, identified the areas most intensely researched and defined the relationship between this information and the chief causes of mortality. morbidity and drug spending.
Using a database for publications established at CEST and covering the period from 1981 to 2002 the differences in national scores obtained by different Counting methods have been measured. The results are supported by analysing data from the literature. Special attention has been paid to the comparison between the EU and the USA. There are big differences between scores obtained by different methods. In one instance the reduction in scores going from whole to complete-normalized (fractional) counting is 72 per cent. In the literature there is often not enough information given about methods used, and no sign of a clear and consistent terminology and of agreement on properties of and results from different methods. As a matter of fact, Whole counting is favourable to certain countries, especially countries with a high level of international cooperation. The problems are increasing with time because of the ever-increasing national and international cooperation in research and the increasing average number of authors per publication. The need for a common understanding and a joint effort to rectify the situation is stressed.
A bibliometric analysis was performed to assess the quantitative trend of published pentachlorophenol (PCP) remediation studies, including both degradation and sorption. The documents studies were retrieved from the Science Citation Index (SCI) for the period from 1994 to 2005. The trends were analyzed with the retrieved results in publication language, document type, page count. publication output, publication pattern, authorship, citation analysis and country of publication. The results indicated that degradation was the emphasis for PCP remediation. The average impact factor of the journals was higher for publishing degradation studies in comparison to that publishing sorption studies. And there was a positive correlation between CPP and IF for journals published more than two papers. The publishing Countries of both degradation and sorption denoted that most of these researches were done by USA and Canada. Two to four authors was the most popular level of co-authorship.
In this paper some new fields of application of Hirsch-related statistics are presented. Furthermore, so far unrevealed properties of the h-index are analysed in the context of rank-frequency and extreme-value statistics.
The place of genealogy in present scientific research has been investigated by scientometric methods. The term "genealogy" and related words were searched in the title, keywords, and abstracts of science journals for the period 1975-2006. It was concluded that 1991 onward the number of articles about "applied" genealogy has increased dramatically, whereas that of classical (or "pure") genealogy only modestly. In contemporary science, the fields medicine and genetics are those who profit most from human genealogy. More than forty percent of the medical articles containing the search terms were from the neurology and oncology in the period investigated.
The long-term influence and contribution of research can be evaluated relatively reliably by bibliometric citation analysis. Previously, productivity of nations has been estimated by using either the number of published articles or journal impact factors and/or citation data. These studies show certain trends, but detailed analysis is not possible due to the assumption that all articles in a journal were equally cited. Here we describe the first comprehensive, longterm, nationwide analysis of scientific performance. We studied the lifetime research output of 748 Finnish principal investigators in biomedicine during the years 1966-2000, analysed national trends, and made a comparison with international research production. Our results indicate that analyses of the scientific contribution of persons, disciplines, or nations should be based on actual publication and citation counts rather than on derived information like impact factors. 51% of the principal investigators have published altogether 75% of the articles; however, the whole scientific community has contributed to the growth of biomedical research in Finland since the Second World War.
Using an online survey, we have asked the researchers in the field of environmental and resource economics how they themselves would rank a representative list of journals in their field. The results of this ranking are then compared to the ordering based on the journals' impact factors as published by Thomson Scientific. The two sets of rankings seem to be positively correlated, but statistically the null hypothesis that the two rankings are uncorrelated cannot be rejected. This observation suggests that researchers interpret the current quality of journals based on other factors in addition to the impact factors.
The proceedings of the ISSI conferences in Stockholm, 2005, and Madrid, 2007, contain 85 contributions based on publication counting. The methods used in these contributions have been analyzed. The counting methods used are stated explicitly in 26 contributions and can be derived implicitly from the discussion of methods in 10 contributions. In only five contributions, there is a justification for the choice of method. Only one contribution gives information about different results obtained by using different methods. The non-additive results from whole counting give problems in the calculation of shares in seven contributions, but these problems are not mentioned. Only 11 contributions give a term (terms) for the counting method(s) used. To illustrate the problems, 11 of the contributions are discussed in detail. The conclusion is that 40 years of publication counting have not resulted in general agreement on definitions of methods and terminology nor in any kind of standardization.
This article focuses on how and why the publication outlets in which academic writers' work appears can impact on their citations, as part of a qualitative interview-based study of computer scientists' and sociologists' citing behaviour. Informants spoke of how they cited differently when writing in outlets aimed at a less knowledgeable audience, and for audiences from different disciplines and in different parts of the world. Citation behaviour can also be affected when writing for journals which favour different research paradigms, and the word limits journals impose led some informants to cite more selectively than they would have wished. The implications of the findings and the strengths and weaknesses of the interview-based method of investigation are also discussed.
The ability of g-index and h-index to discriminate between different types of scientists (low producers, big producers, selective scientists and top scientists) is analysed in the area of Natural Resources at the Spanish CSIC (WoS, 1994-2004). Our results show that these indicators clearly differentiate low producers and top scientists, but do not discriminate between selective scientists and big producers. However, g-index is more sensitive than h-index in the assessment of selective scientists, since this type of scientist shows in average a higher g-index/h-index ratio and a better position in g-index rankings than in the h-index ones. Current research suggests that these indexes do not substitute each other but that they are complementary.
This study explored the evolution of nanotechnology based on a mapping of patent applications. Citations among patent applications designated to the European Patent Office were intensively analysed. Approximately 4300 nanotechnology patent applications linked through citations were mapped. Fifteen domains of nanotechnology patent applications were found in the map in 2003. The domains cover a wide range of application fields; they are domains related to measurement and manufacturing; electronics; optoelectronics; biotechnology; and nano materials. Maps in several reference years registered the evolution of nanotechnology, where the breadth of application fields has been broadening over time. Direct and indirect knowledge flows among different domains of nanotechnology are seemingly small at the present. Each domain of nanotechnology is likely pushing the technological frontier within its own domain. The exception is sensing and actuating technologies on the nanometre scale. Direct and indirect knowledge flows to/from this domain describe their vital role in nanotechnology. Countries' specialisation was also analysed. Patent applications from the United States and the European Union cover a wide range of nanotechnology. Inventive activities in Japan are, however, strongly focusing on electronics. Intensive knowledge creation in specific technologies was found in Switzerland and Korea.
The aim of this study is to contribute to the debate on the relationship between scientific mobility and international collaboration. This case study deals with leading Chinese researchers in the field of plant molecular life sciences who returned to their home country. A correlation analysis of their mobility history, publication output, and international co-publication data, shows the relationship between scientific output, levels of international collaboration and various individual characteristics of returned researchers. The outcome of the analysis suggests that while host countries may loose human capital when Chinese scientists return home, the so-called "return brain drain", they may also gain in terms of scientific linkages within this rapidly emerging and globalizing research field.
SCI has been popular all over the world since it was published by Garfield in 1963. Researches on evaluating a researcher's output with SCI have always been continuous. In recent years, a great breakthrough has been made since the h-index was put forward in 2005. In this paper, we also advance a new method - Paper Quality Index (PQI) to evaluate the output of a researcher. The main purpose of our method is to solve two problems that consist in the method of h-index: one is that the h-index can't compare the outputs of researchers in different fields; the other is that it is unsuitable for evaluating the outputs of young researchers. A simple mathematical expression is constructed to eliminate the difference of citation among different fields and makes the evaluation of short-term outputs of researchers possible.
We have compared bibliometric data of Czech research papers generated from 1994 to 2005 with papers from six other EU countries: Austria, Hungary, Poland, Finland, Ireland and Greece. The Czech Republic ranked the fifth in number of papers per thousand inhabitants and the sixth in number of citations/paper. Relatively the most cited were Czech papers from fields Engineering and Mathematics ranking the third, and Computer Science, Environment/Ecology and Molecular Biology ranking the fourth among 7 EU countries. Our analysis indicates that Czech research is lagging behind the leading EU countries, but its output is proportional to the R&D expenses.
A fair assessment of merit is needed for better resource allocation in the scientific community. We analyzed the performance of the institutional h-index in the case of Brazilian Psychiatry Post-graduation Programs. Traditional bibliometric indicators and the institutional h-index similarly ranked the programs, except for the Average Impact Factor. The institutional h-index correlated strongly with the majority of the traditional bibliometric indicators, which did not occur with the Average Impact Factor. The institutional h-index balances "quantity" and "quality", and can be used as part of a panel of bibliometric indicators to aid the peer-review process.
A method for the calculation of a 'concatenated' h-index of jointly ranked combined bibliographies is presented in the case when only size and h-index of the original publication sets are known.
WorldCat, OCLC's bibliographic database, identifies books and the libraries that hold them. The holdings provide detailed information about the type and number of libraries that have acquired the material. Using this information, it is possible to infer the type of audience for which the material is intended. A quantitative measure, the audience level, is derived from the types of libraries that have selected the resource. The audience level can be used to refine discovery, analyze collections, advise readers, and enhance reference services.
Our daily social interaction is anchored in interpersonal discourse; accordingly, the phenomenon of linguistic politeness is prevalent in daily social interaction. Such linguistic behavior underscores the fact that linguistic politeness is a critical component of human communication. Speech participants utilize linguistic politeness to avoid and reduce social friction and enhance each other's face, or public self-image, during social interaction. It is face-work that underlies the interpersonal function of language use and encompasses all verbal and nonverbal realizations that bring forth one's positive social value, namely, face. Face-work is founded in and built into dynamic social relations; these social and cultural relations and context directly affect the enactment of face-work. Analysis and a subsequent understanding of sociointerpersonal communication are critical to the fostering of successful interaction and collaboration. Linguistic politeness theory is well positioned to provide a framework for an analysis of social interaction and interpersonal variables among discourse participants inasmuch as it is applicable not only to face-to-face social interactions but also to those interactions undertaken through online communication.
The impact of published academic research in the sciences and social sciences, when measured, is commonly estimated by counting citations from journal articles. The Web has now introduced new potential sources of quantitative data online that could be used to measure aspects of research impact. In this article we assess the extent to which citations from online syllabuses could be a valuable source of evidence about the educational utility of research. An analysis of online syllabus citations to 70,700 articles published in 2003 in the journals of 12 subjects indicates that online syllabus citations were sufficiently numerous to be a useful impact indictor in some social sciences, including political science and information and library science, but not in others, nor in any sciences. This result was consistent with current social science research having, in general, more educational value than current science research. Moreover, articles frequently cited in online syllabuses were not necessarily highly cited by other articles. Hence it seems that online syllabus citations provide a valuable additional source of evidence about the impact of journals, scholars, and research articles in some social sciences.
Author cocitation analysis (ACA) has frequently been applied over the last two decades for mapping the intellectual structure of a research field as represented by its authors. However, what is mapped in ACA is actually the structure of intellectual influences on a research field as perceived by its active authors. In this exploratory paper, by contrast, we introduce author bibliographic-coupling analysis (ABCA) as a method to map the research activities of active authors themselves for a more realistic picture of the current state of research in a field. We choose the information science (IS) field and study its intellectual structure both in terms of current research activities as seen from ABCA and in terms of intellectual influences on its research as shown from ACA. We examine how these two aspects of the intellectual structure of the IS field are related, and how they both developed during the "first decade of the Web," 1996-2005. We find that these two citation-based author-mapping methods complement each other, and that, in combination, they provide a more comprehensive view of the intellectual structure of the IS field than either of them can provide on its own.
This study explored undergraduate students' mental models of the Web as an information retrieval system. Mental models play an important role in people's interaction with information systems. Better understanding of people's mental models could inspire better interface design and user instruction. Multiple data-collection methods, including questionnaire, semistructured interview, drawing, and participant observation, were used to elicit students' mental models of the Web from different perspectives, though only data from interviews and drawing descriptions are reported in this article. Content analysis of the transcripts showed that students had utilitarian rather than structural mental models of the Web. The majority of participants saw the Web as a huge information resource where everything can be found rather than an infrastructure consisting of hardware and computer applications. Students had different mental models of how information is organized on the Web, and the models varied in correctness and complexity. Students' mental models of search on the Web were illustrated from three points of view: avenues of getting information, understanding of search engines' working mechanisms, and search tactics. The research results suggest that there are mainly three sources contributing to the construction of mental models: personal observation, communication with others, and class instruction. In addition to structural and functional aspects, mental models have an emotional dimension.
Effectively harnessing available data to support homeland-security-related applications is a major focus in the emerging science of intelligence and security informatics (ISI). Many studies have focused on criminal-network analysis as a major challenge within the ISI domain. Though various methodologies have been proposed, none have been tested for usefulness in creating link charts. This study compares manually created link charts to suggestions made by the proposed importance-flooding algorithm. Mirroring manual investigational processes, our iterative computation employs association-strength metrics, incorporates path-based node importance heuristics, allows for case-specific notions of importance, and adjusts based on the accuracy of previous suggestions. Interesting items are identified by leveraging both node attributes and network structure in a single computation. Our data set was systematically constructed from heterogeneous sources and omits many privacy-sensitive data elements such as case narratives and phone numbers. The flooding algorithm improved on both manual and link-weight-only computations, and our results suggest that the approach is robust across different interpretations of the user-provided heuristics. This study demonstrates an interesting methodology for including user-provided heuristics in network-based analysis, and can help guide the development of ISI-related analysis tools.
Lotka's law was formulated to describe the number of authors with a certain number of publications. Empirical results (Morris & Goldstein, 2007) indicate that Lotka's law is also valid if one counts the number of publications of coauthor pairs. This article gives a simple model proving this to be true, with the same Lotka exponent, if the number of coauthored papers is proportional to the number of papers of the individual coauthors. Under the assumption that this number of coauthored papers is more than proportional to the number of papers of the individual authors (to be explained in the article), we can prove that the size-frequency function of coauthor pairs is Lotkaian with an exponent that is higher than that of the Lotka function of individual authors, a fact that is confirmed in experimental results.
Pronominal anaphors are commonly observed in written texts. In this article, effective Chinese pronominal anaphora resolution is addressed by using lexical knowledge acquisition and salience measurement. The lexical knowledge acquisition is aimed to extract more semantic features, such as gender, number, and collocate compatibility by employing multiple resources. The presented salience measurement is based on entropy-based weighting on selecting antecedent candidates. The resolution is justified with a real corpus and compared with a rule-based model. Experimental results by five-fold cross-validation show that our approach yields 82.5% success rate on 1343 anaphoric instances. In comparison with a general rule-based approach, the performance is improved by 7%.
To study the structure of Iranian chemistry research, we identified 43 Iranian and international chemists who were highly cited in 7,682 Iranian chemistry publications (defined as an article with at least one Iranian author address) indexed in Science Citation Index (SciSearch) between 1990 and 2006, inclusive. We collected cocitation data for these authors from the entire SciSearch file (Dialog, File 34) over the time period. A principal components analysis identified seven interrelated factors accounting for 78% of the variance in the cocitation matrix. Iranian and international authors tended to load on separate factors. Three factors-synthesis of carbonyl compounds, solvent-free synthesis of organic compounds and oxidation of organic compounds-had an inter-correlation of vertical bar 0.3 vertical bar or higher. Physical organic chemistry and ionophores (a mixed factor of Iranian and international authors) connected at a lower value, while crown ethers and analytical chemistry were essentially uncorrelated. The PFNet structure maintained the topical factor groupings and Iranian and international authors tended to appear in separate subnetworks. Geographic and institutional influences, apparently relating in part to institutional affiliation and in part to restricted research topics, appear to underlie the primary structural features of Iranian chemistry in this time period.
This article analyzes the collaboration trends, authorship and keywords of all research articles published in the Journal of American Society for Information Science and Technology (JASIST). Comparing the articles between two 10-year periods, namely, 1988-1997 and 1998-2007, the three-fold objectives are to analyze the shifts in (a) authors' collaboration trends (b) top authors, their affiliations as well as the pattern of coauthorship among them, and (c) top keywords and the subdisciplines from which they emerge. The findings reveal a distinct tendency towards collaboration among authors, with external collaborations becoming more prevalent. Top authors have grown in diversity from those being affiliated predominantly with library/information-related departments to include those from information systems management, information technology, businesss, and the humanities. Amid heterogeneous clusters of collaboration among top authors, strongly connected cross-disciplinary coauthor pairs have become more prevalent. Correspondingly, the distribution of top keywords' occurrences that leans heavily on core information science has shifted towards other subdisciplines such as information technology and sociobehavioral science.
This article shows how a fuzzy ontology-based approach can improve semantic documents retrieval. After formally defining a fuzzy ontology and a fuzzy knowledge base, a special type of new fuzzy relationship called (semantic) correlation, which links the concepts or entities in a fuzzy ontology, is discussed. These correlations, first assigned by experts, are updated after querying or when a document has been inserted into a database. Moreover, in order to define a dynamic knowledge of a domain adapting itself to the context, it is shown how to handle a tradeoff between the correct definition of an object, taken in the ontology structure, and the actual meaning assigned by individuals. The notion of a fuzzy concept network is extended, incorporating database objects so that entities and documents can similarly be represented in the network. Information retrieval (IR) algorithm, using an object-fuzzy concept network (O-FCN), is introduced and described. This algorithm allows us to derive a unique path among the entities involved in the query to obtain maxima semantic associations in the knowledge domain. Finally, the study has been validated by querying a database using fuzzy recall, fuzzy precision, and coefficient variant measures in the crisp and fuzzy cases.
Eigenfactor.org, a journal evaluation tool that uses an iterative algorithm to weight citations (similar to the PageRank algorithm used for Google), has been proposed as a more valid method for calculating the impact of journals. The purpose of this brief communication is to investigate whether the principle of repeated improvement provides different rankings of journals than does a simple unweighted citation count (the method used by the Institute for Scientific Information@ [ISI]).
We introduce two new measures of the performance of a scientist. One measure, referred to as the h(alpha)-index, generalizes the well-known h-index or Hirsch index. The other measure, referred to as the g(alpha)-index, generalizes the closely related g-index. We analyze theoretically the relationship between the h(alpha)-and g(alpha)-indices on the one hand and some simple measures of scientific performance on the other hand. We also study the behavior of the h(alpha)-and g(alpha)-indices empirically. Some advantages of the h(alpha)- and g(alpha)-indices over the h- and g-indices are pointed out. (C) 2008 Elsevier Ltd. All rights reserved.
The general aim of this paper is to show the results of a study in which we combined bibliometric mapping and citation network analysis to investigate the process of creation and transfer of knowledge through scientific publications. The novelty of this approach is the combination of both methods. In this case we analyzed the citations to a very influential paper published in 1990 that contains, for the first time, the term Absorptive Capacity. A bibliometric map identified the terms and the theories associated with the term while two techniques from the citation network analysis recognized the main papers during 15 years. As a result we identified the articles that influenced the research for some time and linked them into a research tradition that can be considered the backbone of the "Absorptive Capacity Field". (C) 2008 Elsevier Ltd. All rights reserved.
The universalism norm of the ethos of science requires that contributions to science are not excluded because of the contributors' gender, nationality, social status, or other irrelevant criteria. Here, a generalized latent variable modeling approach is presented that grant program managers at a funding organization can use in order to obtain indications of potential sources of bias in their peer review process (such as the applicants' gender). To implement the method, the data required are the number of approved and number of rejected applicants for grants among different groups (for example, women and men or natural and social scientists). Using the generalized latent variable modeling approach indications of potential sources of bias can be examined not only for grant peer review but also for journal peer review. (C) 2008 Elsevier Ltd. All rights reserved.
Three variations on the power law model proposed by Egghe are fitted to four groups of h-index time series: publication-citation data for authors, journals and universities; and patent citation data for firms. It is shown that none of the power law models yields an adequate description of total career h-index sequences. (C) 2008 Elsevier Ltd. All rights reserved.
We provide a newaxiomatic characterization of the Hirsch-index. This new characterization is based on a simple and appealing symmetry axiom which essentially imposes that the number of citations and the number of publications should be treated in the same way and should be measured in the same scale. (C) 2008 Elsevier Ltd. All rights reserved.
This paper reviews a number of studies comparing Thomson Scientific's Web of Science (WoS) and Elsevier's Scopus. It collates their journal coverage in an important medical subfield: oncology. It is found that all WoS-covered oncological journals (n = 126) are indexed in Scopus, but that Scopus covers many more journals (an additional n = 106). However, the latter group tends to have much lower impact factors than WoS covered journals. Among the top 25% of sources with the highest impact factors in Scopus, 94% is indexed in the WoS, and for the bottom 25% only 6%. In short, in oncology the WoS is a genuine subset of Scopus, and tends to cover the best journals from it in terms of citation impact per paper. Although Scopus covers 90% more oncological journals compared to WoS, the average Scopus-based impact factor for journals indexed by both databases is only 2.6% higher than that based on WoS data. Results reflect fundamental differences in coverage policies: the WoS based on Eugene Garfield's concepts of covering a selective set of most frequently used (cited) journals; Scopus with broad coverage, more similar to large disciplinary literature databases. The paper also found that 'classical', WoS-based impact factors strongly correlate with a new, Scopus-based metric, SCImago JournalRank (SJR), one of a series of new indicators founded on earlier work by Pinski and Narin [Pinski, G., & Narin F. (1976). Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing and Management, 12, 297 - 312] that weight citations according to the prestige of the citing journal (Spearman's rho = 0.93). Four lines of future research are proposed. (C) 2008 Elsevier Ltd. All rights reserved.
International collaboration as measured by co-authorship relations on refereed papers grew linearly from 1990 to 2005 in terms of the number of papers, but exponentially in terms of the number of international addresses. This confirms Persson et al.' s [ Persson, O., Glanzel, W., & Danell, R. (2004). Inflationary bibliometrics values: The role of scientific collaboration and the need for relative indicators in evaluative studies. Scientometrics, 60(3), 421 - 432] hypothesis of an inflation in international collaboration. Patterns in international collaboration in science can be considered as network effects, since there is no political institution mediating relationships at that level except for the initiatives of the European Commission. Science at the international level shares features with other complex adaptive systems whose order arises from the interactions of hundreds of agents pursuing self-interested strategies. During the period 2000 - 2005, the network of global collaborations appears to have reinforced the formation of a core group of fourteen most cooperative countries. This core group can be expected to use knowledge from the global network with great efficiency, since these countries have strong national systems. Countries at the periphery may be disadvantaged by the increased strength of the core. (C) 2008 Elsevier Ltd. All rights reserved.
An expert ranking of forestry journals was compared with Journal Impact Factors and h-indices computed from the ISI Web of Science and internet-based data. Citations reported by Google Scholar offer an efficient way to rank all journals objectively, in a manner consistent with other indicators. This h-index exhibited a high correlation with the Journal Impact Factor (r = 0.92), but is not confined to journals selected by any particular commercial provider. A ranking of 180 forestry journals is presented, on the basis of this index. (C) 2008 Elsevier Ltd. All rights reserved.
Recently Woeginger [Woeginger, G. H. (2008-a). An axiomatic characterization for the Hirsch-index. Mathematical Social Sciences. An axiomatic analysis of Egghe's g-index. Journal of Informetrics] introduced a set of axioms for scientific impact measures. These lead to a characterization of the h-index. In this note we consider a slight generalization and check which of Woeginger's axioms are satisfied by the g-index, the h((2))-index and the R-2-index. (C) 2008 Elsevier Ltd. All rights reserved.
Using the example of microarrays, one of the constitutive technologies of post-genomic biomedicine, this paper introduces a method for analyzing publications, patents and research grants as proxies for "triple-helix interfaces" between university, industry and government activities. Our method creates bridges that allow one to move seamlessly between publication, patent and research project databases that use different fields and formats, and contain different information. These links do not require pre-defined categories in order to search for correspondences between sub-topics or research areas in the three databases. Finally, our results are not restricted to quantitative information but, rather, allow one to carry out qualitative investigations of the content of research activities. Our approach draws on a combination of text-mining and network analysis/mapping software packages. (C) 2008 Elsevier Ltd. All rights reserved.
The relative performance of science and technology (S&T) in the USA and PRC was compared in terms of quantity and quality, as reflected in their technical literatures. Three databases (Science Citation Index (SCI), INSPEC, Ei Compendex) were selected for the quantity comparison, and citation analysis in the SCI was used for the quality comparison. Thirty technology and research areas were compared for quantity production, and are presented in this paper. These 30 areas were selected based on our previous assessment of PRC S&T output, and represented areas of emphasis by the PRC in physical, environmental, engineering, and life sciences. In almost all technical areas, the USA had the quantity (number of papers) lead (for the period 2002 - 2007) based on the SCI results, although the PRC has made dramatic strides to overtake the USA. In most of the technical areas, by 2007 PRC had attained parity with, or exceeded, the S&T literature production of the USA in the INSPEC database. The major exceptions were the biomedical field and some aspects of environmental science, where the USA still had a large lead. For most technical areas, by 2007 the PRC had even higher relative S&T literature production, based on the Ei Compendex, compared to the INSPEC results. Moreover, the USA production appears to have peaked (in the Ei Compendex) in the 2005 time frame, despite increasing amounts of funding for S&T research. The PRC challenge in non-biomedical research and technology sectors becomes apparent in those databases that do not contain substantial biomedical research papers, and therefore remove a substantial intrinsic USA advantage. For quality computations, the publication and citation results were normalized to discrete slices of time, and are presented for nanotechnology only (for the period 1998 - 2003). While the USA held a commanding lead in quality over the PRC (and the other major nanotechnology producer nations as well) during the past decade, the PRC has increased the quality of its publications monotonically, and now appears to be competitive with France, Italy, Japan, and Australia, using the quality metric in this paper. (C) 2008 Elsevier Ltd. All rights reserved.
The g-index is a well-known index for measuring and comparing the output of scientific researchers, which has been introduced by Leo Egghe in 2006 as an improvement of the Hirsch-index. This article gives an axiomatic characterization of the g-index in terms of three natural axioms. (C) 2008 Elsevier Ltd. All rights reserved.
In a recent paper in the Journal of Informetrics, Habibzadeh and Yadollahie [Habibzadeh, F., & Yadollahie, M. (2008). Journal weighted impact factor: A proposal. Journal of Informetrics, 2(2), 164 - 172] propose a journal weighted impact factor (WIF). Unlike the ordinary impact factor, the WIF of a journal takes into account the prestige or the influence of citing journals. In this communication, we show that the way in which Habibzadeh and Yadollahie calculate the WIF of a journal has some serious problems. Due to these problems, a ranking of journals based on WIF can be misleading. We also indicate how the problems can be solved by changing the way in which the WIF of a journal is calculated. (C) 2008 Elsevier Ltd. All rights reserved.
Analysis of sociointerpersonal communication patterns among discourse participants is essential to understand the manifestation of and the interpersonal-communication features realized in online social interaction. The linguistic politeness theory provides an effective framework for such an analysis of sociointerpersonal communication features employed by online language users to maintain and enhance their public self-image, or face. The qualitative data analysis of this study, drawn from the real-time, online discussions of K-12 students, makes evident that interpersonal-communication features appear in the form of politeness tactics. The results of the study show that there is decreased use of deferential linguistic forms; on the contrary, a variety of verbal and nonverbal devices that denote positive politeness and bald-on-record (i.e., direct speech acts) frequently occur. The commonality of positive politeness and bald-on-record lies in the fact that both tactics are grounded in the nature of the close interpersonal relationships between participants. Such a communication pattern in the real-time, online discourse of K-12 students signifies that cognitive assessment of sociointerpersonal and contextual variables undertaken by speech participants underlies the realization of linguistic politeness. Employment of such politeness tactics indicates that effective and fully realized interpersonal communication plays a vital role in the development of online social interaction.
When studying how ordinary Web users interact with Web search engines, researchers tend to either treat the users as a homogeneous group or group them according to search experience. Neither approach is sufficient, we argue, to capture the variety in behavior that is known to exist among searchers. By applying automatic clustering technique based on self-organizing maps to search engine log files from a corporate intranet, we show that users can be usefully separated into distinguishable segments based on their actual search behavior. Based on these segments, future tools for information seeking and retrieval can be targeted to specific segments rather than just made to fit the "the average user." The exact number of clusters, and to some extent their characteristics, can be expected to vary between intranets, but our results indicate that some more generic groups may exist. In our study, a large group of users appeared to be "fact seekers" who would benefit from higher precision, a smaller group of users were more holistically oriented and would likely benefit from higher recall, and a third category of users seemed to constitute the knowledgeable users. These three groups may raise different design implications for search-tool developers.
Information behavior (IB) research involves examining how people look for and use information, often with the sole purpose of gaining insights into the behavior displayed. However, it is also possible to examine IB with the purpose of using the insights gained to design new tools or improve the design of existing tools to support information seeking and use. This approach is advocated by David Ellis who, over two decades ago, presented a model of information seeking behaviors and made suggestions for how electronic tools might be designed to support these behaviors. Ellis also recognized that IBs might be used as the basis for evaluating as well as designing electronic resources. In this article, we present the IB evaluation methods. These two novel methods, based on an extension of Ellis's model, use the empirically observed IBs of lawyers as a framework for structuring user-centered evaluations of the functionality and usability of electronic resources. In this article, we present the IB methods and illustrate their use through the discussion of two examples. We also discuss benefits and limitations, grounded in specific features of the methods.
The related fields of ethnomethodology (EM), founded by Harold Garfinkel, and conversation analysis (CA), as epitomized by the work of Harvey Sacks, offer unique insights into the operation of virtual reference services (VRS). The tradition of phenomenology within library and information science (LIS) provides a context for this research, although EM/CA differs in important respects, providing a program for grounded empirical investigations. Relevant EM/CA research concerns include the documentary method of interpretation, trust, indexicality, instructed action, and sequential organization. Review of the LIS literature on reference interactions in both face-to-face and virtual settings reveals a tendency to impose analytic categories and classificatory schemes that obscure the extremely situated and collaborative nature of reference work; however, an EM/CA examination of transcripts from the first 4 months of a newly implemented VRS at a large university library suggests the need for a more nuanced approach. Close-order examination of two chat reference transcripts reveals the interactional complexities and nuances that characterize even the most succinct encounters. Analyzing the reference query as a service request demonstrates how librarians deploy their interactional skills to address "face" concerns and ameliorate potentially problematic aspects of the reference encounter.
The principal objective of this research was to understand the incentive structure in a mixed economic and social market for information. Prior research suggests that tangible incentives will crowd out intangible incentives; however, information markets invite special examination of this finding. Data representing four years of activity by 523 researchers who gave about 52,000 answers on the Google Answers Web site were collected. Analysis revealed that the main predictor for researchers' participation was the anticipation of tip (gratuity). Analysis of two researcher subgroups showed that in the case of the frequent researchers, the tip was followed by social incentives: interaction (comments) and ratings. For occasional researchers, the tip was followed by the price paid for answers and then by comments. The results suggest that a pure economic incentive serves as enticement; however, social incentives induce persistent participation by researchers and eventually lead to higher average economic gains. The market is catalyzed by social activity, not cannibalized by it, as may have been predicted by theory. This finding provides empirical evidence for "social capital" since social incentives were connected to higher economic gains. The practical implication is that a mixed incentive design is likely to generate lively information-exchange environments.
I propose an approach to classifying scientific networks in terms of aggregated journal-journal citation relations of the ISI Journal Citation Reports using the affinity propagation method. This algorithm is applied to obtain the classification of SCI and SSCI journals by minimizing intracategory journal-journal (J-J) distances in the database, where distance between journals is calculated from the similarity of their annual citation patterns with a cutoff parameter, t, to restrain the maximal J-J distance. As demonstrated in the classification of SCI journals, classification of scientific networks with different resolution is possible by choosing proper values of t. Twenty journal categories in SCI are found to be stable despite a difference of an order of magnitude in L In our classifications, the level of specificity of a category can be found by looking at its value Of (D) over bar (RJ) (the average distance of members of a category to its representative journal), and relatedness of category members is implied by the value of (D) over bar (J-J) (the average J-J distance within a category). Our results are consistent with the ISI classification scheme, and the level of relatedness for most categories in our classification is higher than their counterpart in the ISI classification scheme.
Hirsch-type indices are studied with special attention to the AR(2)-index introduced by Jin. The article consists of two parts: a theoretical part and a practical illustration. In the theoretical part, we recall the definition of the AR(2)-index and show that an alternative definition, the so- called AR(1)(2), does not have the properties expected for this type of index. A practical example shows the existence of some of these mathematical properties and illustrates the difference between different h-type indices. Clearly the h-index itself is the most robust of all. It is shown that excluding so-called non-WoS source articles may have a significant influence on the R- and, especially, the g-index.
Although a number of investigations have been conducted on the information behavior of family historians, we know little about the degree to which they systematically collect information on the causes of death and major illnesses of ancestors. Such information, if reliable and accessible, could be useful to family physicians, the families themselves, and to epidemiologists. This article presents findings from a two-stage study of amateur genealogists in the USA. An initial state-wide telephone survey of 901 households was followed by in-depth interviews with a national sample of 23 family historians. Over half of the responding households in the general survey reported that someone in their family collects ancestral medical data; this practice appears to be more common among respondents who are women, older persons, and those with higher incomes. In-depth interviews revealed that this information is commonly collected by family historians, and typically comes from death certificates, secondarily from obituaries, and thirdly from word-of-mouth or family records; most of these respondents collected health information for reasons of surveillance of their own health risks. Social-networking approaches to encourage gathering of family data could aid in increased awareness and surveillance of health risks. Implications for health information seeking and applicable theories are discussed.
We are investigating the potential of 3D telepresence, or televideo, technology to support collaboration among geographically separated medical personnel in trauma emergency care situations. 3D telepresence technology has the potential to provide richer visual information than current 2D videoconferencing techniques. This may be of benefit in diagnosing and treating patients in emergency situations where specialized medical expertise is not locally available. The 3D telepresence technology does not yet exist, and there is a need to understand its potential before resources are spent on its development and deployment. This poses a complex challenge. How can we evaluate the potential impact of a technology within complex, dynamic work contexts when the technology does not yet exist? To address this challenge, we conducted an experiment with a postlest, between subjects design that takes the medical situation and context into account. In the experiment, we simulated an emergency medical situation involving practicing paramedics and physicians, collaborating remotely vi. a two conditions: with today's 2D videoconferencing and a 3D telepresence proxy. In this article, we examine information sharing between the attending paramedic and collaborating physician. Postquestionnaire data illustrate that the information provided by the physician was perceived to be more useful by the paramedic in the 3D proxy condition than in the 2D condition; however, data pertaining to the quality of interaction and trust between the collaborating physician and paramedic show mixed results. Postinterview data help explain these results.
We conducted an experiment with a posttest, between-subjects design to evaluate the potential of emerging 3D telepresence technology to support collaboration in emergency health care. 3D telepresence technology has the potential to provide richer visual information than do current 2D video conferencing techniques. This may be of benefit in diagnosing and treating patients in emergency situations where specialized medical expertise is not locally available. The experimental design and results concerning information behavior are presented in the article "Exploring the Potential of Video Technologies for Collaboration in Emergency Medical Care: Part I. Information Sharing" (Sonnenwald et al., this issue). In this article, we explore paramedics' task performance during the experiment as they diagnosed and treated a trauma victim while working alone or in collaboration with a physician via 2D videoconferencing or via a 3D proxy. Analysis of paramedics' task performance shows that paramedics working with a physician via a 3D proxy performed the fewest harmful interventions and showed the least variation in task performance time. Paramedics in the 3D proxy condition also reported the highest levels of self-efficacy. Interview data confirm these statistical results. Overall, the results indicate that 3D telepresence technology has the potential to improve paramedics' performance of complex medical tasks and improve emergency trauma health care if designed and implemented appropriately.
From a list of papers of an author, ranked in decreasing order of the number of citations to these papers one can calculate this author's Hirsch index (or h-index). If this is done for a group of authors (e. g. from the same institute) then we can again list these authors in decreasing order of their h-indices and from this, one can calculate the h-index of (part of) this institute. One can go even further by listing institutes in a country in decreasing order of their h-indices and calculate again the h-index as described above. Such h-indices are called by SCHUBERT [2007] "successive" h-indices. In this paper we present a model for such successive h-indices based on our existing theory on the distribution of the h-index in Lotkaian informetrics. We show that, each step, involves the multiplication of the exponent of the previous h-index by 1/alpha where alpha > 1 is a Lotka exponent. We explain why, in general, successive h-indices are decreasing. We also introduce a global h-index for which tables of individuals (authors, institutes,.) are merged. We calculate successive and global h-indices for the (still active) D. De Solla Price awardees.
To delineate the intellectual structure of Antarctic science, the research outputs on Antarctic science have been analyzed for a period of 25 years (1980-2004) through a set of scientometrics and network analysis techniques. The study is based on 10,942 records (research articles, letters, reviews, etc.), published in 961 journals/documents, and retrieved from the Science Citation Index (SCI) database. Over the years interest in Antarctic science has increased, as is evident from the growing number of ratified countries and research stations. During the period under study, the productivity has increased 3-times and there is a 13-fold increase in collaborative articles. Attempt has been made to identify important players like scientists, organizations and countries working in the field and to identify frontier areas of research that is being conducted in this continent. The highest 41% scientific output is contributed by the USA and the UK, followed by Australia and Germany. British Antarctic Survey (BAS), UK and Alfred Wegener Institute of Polar & Marine Research, Germany are the most productive institutes in Antarctic science. Maximum number of research articles on Antarctic science, have been published in the journal Polar Biology, indicating substantial work being done on the biology of this continent. The journals-Nature and Science are the highly-cited journals in Antarctic science. The paper written by J. C. Farman et al., published in Nature in 1985, reporting depletion of ozone layer, is the most-cited article. Semantic relationships between cited documents were measured through co-citation analysis. J. C. Farman and S. Solomon are co-cited most frequently.
The case of Dr. Hwang Woo Suk, the South Korean stem-cell researcher, is arguably the highest profile case in the history of research misconduct. The discovery of Dr. Hwang's fraud led to fierce criticism of the peer review process (at Science). To find answers to the question of why the journal peer review system did not detect scientific misconduct (falsification or fabrication of data) not only in the Hwang case but also in many other cases, an overview is needed of the criteria that editors and referees normally consider when reviewing a manuscript. Do they at all look for signs of scientific misconduct when reviewing a manuscript? We conducted a quantitative content analysis of 46 research studies that examined editors' and referees' criteria for the assessment of manuscripts and their grounds for accepting or rejecting manuscripts. The total of 572 criteria and reasons from the 46 studies could be assigned to nine main areas: (1) 'relevance of contribution,' (2) 'writing/presentation,' (3) 'design/conception,' (4) 'method/statistics,' (5) 'discussion of results,' (6) 'reference to the literature and documentation,' (7) 'theory,' (8) 'author's reputation/institutional affiliation,' and (9) 'ethics.' None of the criteria or reasons that were assigned to the nine main areas refers to or is related to possible falsification or fabrication of data. In a second step, the study examined what main areas take on high and low significance for editors and referees in manuscript assessment. The main areas that are clearly related to the quality of the research underlying a manuscript emerged in the analysis frequently as important: 'theory,' 'design/conception' and 'discussion of results.'
Building on a previous study that succeeded in mapping business competition positions at an industry level using Web co-link analysis, the current study attempted to improve Web co-link analysis by adding Web page content to obtain the mapping at a particular market segment level. This method combines the ideas of Web content mining with Web structure mining. The method was tested in the WiMAX sector of the telecommunication industry. Specifically, the keyword WiMAX was incorporated into queries that searched for co-links to pairs of company Websites. Two sets of data were collected: one with the proposed method and one with co-link search alone. The resulting two data matrices were analyzed using multidimensional scaling (MDS) to generate maps of business competition. The comparison between the two maps shows that the proposed method produced a map focusing on the WiMAX sector. The study also proposed the measure of reduction of co-link count that can be used to gauge the effectiveness of focusing the analysis on a particular sector. The reduction of co-link count could also be an easy and pragmatic measure for an analysis of a company's competitiveness in a particular market segment.
This note attempts to approximate the distribution function for the number of innovation activities (NIA) in the manufacturing sector using the dataset of 2002 Korean Innovation Survey. The mixture model applied here can easily capture the bimodality feature of the NIA distribution and provide some useful information such as the mean of NIA and the effect of a firm's characteristic on whether the firm will undertake innovation activity.
The psychology of tourism is a new, multidisciplinary research field. However, no systematic analyses of the scientific production in this field have been carried out to date. This study presents a bibliometric analysis of the area of psychology of tourism between 1990 and 2005. The evolution of scientific production during this period, Price's, Lotka's and Bradford's laws and citation patterns were studied. The results show a significant growth in the literature on the subject, as well as an increase in coauthorship and institutional collaboration. Bibliometric laws and empiric regularities observed in other disciplines are also present in this new research field.
In this study we analyse gender equality in the preparation, supervision and defence of PhD theses in Spain in the period 1990-2004. The results indicate a tendency towards greater equality in the number of men and women successfully completing doctoral studies. However, the gender imbalance among thesis supervisors and on thesis assessment boards is more apparent, with a predominance of male academics. Moreover, the gender of the PhD student is clearly related to the gender of the supervisor, and both are related to the gender of the members of the assessment boards of PhD theses in Spain.
The method of author cocitation analysis (ACA) was first presented by White and Griffith in 1981 as a "literature measure of intellectual structure" and its applicability for the mapping of areas of science has since then been tested in various bibliometric science mapping studies. In this study, an experimental method of calculating the first or single author cocitation frequency is presented and compared with the standard method. Applying Ward's method of clustering, the analysis revealed that the two approaches did not produce similar results and a tentative interpretation of deviations was that the experimental method provided with a more detailed depiction of the specialty structure. It was also concluded that a number of additional research questions need to be resolved before a comprehensive understanding of the suggested method's merits and demerits is reached.
The ISI journal impact factor (JIF) is based on a sample that may represent half the whole-of-life citations to some journals, but a small fraction (<10%) of the citations accruing to other journals. This disproportionate sampling means that the JIF provides a misleading indication of the true impact of journals, biased in favour of journals that have a rapid rather than a prolonged impact. Many journals exhibit a consistent pattern of citation accrual from year to year, so it may be possible to adjust the JIF to provide a more reliable indication of a journal's impact.
This article introduced two sampling methods, including Directly Random Sampling (DRS) and Redistributed Random Sampling (RRS) methods for categorization of a large number of research articles retrieved from metallurgy and polymer subfields from the Science Citation Index (SCI) database. The accuracy of the proposed sampling methods was considered in association by comparing with reference results previously obtained by Fully Retrieving Sampling (FRS) method, which involved analyzing the contents and categories of all articles from the database. The results suggested that RRS and DRS methods were appropriate, efficient and reasonably accurate for categorization of relatively large volume of research articles. RRS method was highly recommended, especially when the contents of sample articles was unevenly distributed. By DRS and RRS methods, only about 6.3% of total articles were required for obtaining similar results as those given by FRS method. The percentage Expected Worst Errors (EWE) from DRS and RRS methods were observed to range from 1.0 to 5.5%. The EWE value could be reduced by increasing the sample size.
Based on the Science Citation Index-Expanded web-version, the USA is still by far the strongest nation in terms of scientific performance. Its relative decline in percentage share of publications is largely due to the emergence of China and other Asian nations. In 2006, China has become the second largest nation in terms of the number of publications within this database. In terms of citations, the competitive advantage of the American "domestic market" is diminished, while the European Union (EU) is profiting more from the enlargement of the database over time than the USA. However, the USA is still outperforming all other countries in terms of highly cited papers and citation/publication ratios, and it is more successful than the EU in coordinating its research efforts in strategic priority areas like nanotechnology. In this field, the People's Republic of China (PRC) has become second largest nation in both numbers of papers published and citations behind the USA.
International co-authorship is generally thought and often found to have positive effects on the citation rate of scientific publications. We study the effect quantitatively in the example of four major and four medium Hungarian universities. The conclusions may be generalized to other countries of similar international status.
Highly cited articles are interesting because of the potential association between high citation counts and high quality research. This study investigates the 82 most highly cited Information Science and Library Science' (IS&LS) articles (the top 0.1%) in the Web of Science from the perspectives of disciplinarity, annual citation patterns, and first author citation profiles. First, the relative frequency of these 82 articles was much lower for articles solely in IS&LS than for those in IS&LS and at least one other subject, suggesting that that the promotion of interdisciplinary research in IS&LS may be conducive to improving research quality. Second, two thirds of the first authors had an h-index in IS&LS of less than eight, show that much significant research is produced by researchers without a high overall IS&LS research productivity. Third, there is a moderate correlation (0.46) between citation ranking and the number of years between peak year and year of publication. This indicates that high quality ideas and methods in IS&LS often are deployed many years after being published.
We elicit filing strategies for patent families in China and Japan in two prominent technology fields: telecommunications and audiovisual technology. For the two destination countries we find substantial heterogeneity in filing strategies among applications from different countries. This heterogeneity cannot be explained with activities in technological subfields.
C.V. Raman is being acknowledged by worldwide physics community for his classic works. The present study has made an effort to analyze how much impact in number of citation receiving for his publications. Of course, there was a lack of tools for such a study some years back. The study has limited to the database Science Citation Index for the period 1982-2005. The noteworthy results are: One third of his research papers have been cited at least once; The research papers published during 1918-1940 could make remarkable impact; Three of his papers have shown an upward growth in number of citations receiving; The total citations to papers of age 46 and 54 as on the year 1982 accounted for more than 50 per cent of the total citations received; Research works in the 'Acoustics' area have been cited more than any other area of his works; Eponymal citations are to be explored and analysed to understand the real impact of his works.
We have developed a way of describing the increase with time of the number of papers in a scientific field and apply it to a data base of about 2000 papers on symbolic logic published between 1666 and 1934. We find (a) a general exponential increase in the cumulative total number of papers, (b) oscillations around this due to the appearance of new ideas in the field and the time required for their full incorporation, and (c) exogenously caused fluctuations due to wars and other non-scientific events.
This paper investigates the utility of the Inclusion Index, the Jaccard Index and the Cosine Index for calculating similarities of documents, as used for mapping science and technology. It is shown that, provided that the same content is searched across various documents, the Inclusion Index generally delivers more exact results, in particular when computing the degree of similarity based on citation data. In addition, various methodologies such as co-word analysis, Subject-Action-Object (SAO) structures, bibliographic coupling, co-citation analysis, and self-citation links are compared. We find that the two former ones tend to describe rather semantic similarities that differ from knowledge flows as expressed by the citation-based methodologies.
Fluorine research has been identified as a priority area in South Africa and the South African Nuclear Energy Corporation (NECSA) is embarking in an effort to expand its hydrogen fluoride and aluminium trifluoride production capacity. On the eve of those efforts this article reports the findings of an effort to map and assess fluorine research in South Africa in comparison to four other countries i.e. Malaysia, Australia, Germany and Italy. The results of the assessment are aimed at guiding future directions for fluorine research in the country, at identifying centres of expertise nationally where new research chairs could be established, at identifying international centres of expertise to be utilised for collaboration and of course for inter-temporal benchmarking of fluorine research in South Africa. South Africa is identified to be producing a small number of fluorine research publications in comparison to other countries like Germany and Italy which produce orders of magnitude larger number of publications and in comparison to country's total research effort. Furthermore the relevant research effort appears to be dispersed geographically and in disciplinary terms. Relevant recommendations are provided with particular emphasis on the pluralistic science policy approach followed in the country.
Increasingly, funding of academic research is carried out through the support of collaboration, rather than through single awards to a sole grant holder. The practice is well supported by evidence that larger, network-based research achieves high quality while leading to a number of capacity building benefits for the research system, although with significant transaction costs. However, the question of what kind of funding schemes should be made available to researchers is not a simple dichotomy between single grant-holder projects and networks. A key question is how to achieve a balance in each subject field between different forms of funding instrument employed while ensuring different forms of funding retain a reputation for generating research of high scientific quality. This paper reports the results of a systematic comparison of the scientific quality of 1010 scientific papers from the ISI database produced under two contrasting forms of funding instrument for a single year in the Austrian science system. Comparison of the arcsinh transformed citation counts of papers from the two main forms of funding for basic science at the level of main scientific field shows there is no statistically significant difference in the quality achieved by the two forms of funding. This may suggest that funders and research performers have succeeded in ensuring that different research instruments nevertheless achieve very similar levels of scientific excellence.
A common problem in comparative bibliometric studies at the meso and micro level is the differentiation and specialisation of research profiles of the objects of analysis at lower levels of aggregation. Already the institutional level requires the application of more sophisticated techniques than customary in evaluation of national research performance. In this study institutional profile clusters are used to examine which level of the hierarchical subject-classification should preferably be used to build subject-normalised citation indicators. It is shown that a set of properly normalised indicators can serve as a basis of comparative assessment within and even among different clusters, provided that their profiles still overlap and such comparison is thus meaningful. On the basis of 24 selected European universities, a new version of relational charts is presented for the comparative assessment of citation impact.
In this article, we analyze the citations to articles published in 11 biological and medical journals from 2003 to 2007 that employ author-choice open-access models. Controlling for known explanatory predictors of citations, only 2 of the 11 journals show positive and significant open-access effects. Analyzing all journals together, we report a small but significant increase in article citations of 17%. In addition, there is strong evidence to suggest that the open-access advantage is declining by about 7% per year, from 32% in 2004 to 11% in 2007.
The bibliometric measure impact factor is a leading indicator of journal influence, and impact factors are routinely used in making decisions ranging from selecting journal subscriptions to allocating research funding to deciding tenure cases. Yet journal impact factors have increased gradually over time, and moreover impact factors vary widely across academic disciplines. Here we quantify inflation over time and differences across fields in impact factor scores and determine the sources of these differences. We find that the average number of citations in reference lists has increased gradually, and this is the predominant factor responsible for the inflation of impact factor scores over time. Field-specific variation in the fraction of citations to literature indexed by Thomson Scientific's Journal Citation Reports is the single greatest contributor to differences among the impact factors of journals in different fields. The growth rate of the scientific literature as a whole, and cross-field differences in net size and growth rate of individual fields, have had very little influence on impact factor inflation or on cross-field differences in impact factor.
As an acceptable proxy for innovative activity, patents have become increasingly important in recent years. Patents and patent citations have been used for construction of technology indicators. This article presents an alternative to other citation-based indicators, i.e., the patent h-index, which is borrowed from bibliometrics. We conduct the analysis on a sample of the world's top 20 firms ranked by total patents granted in the period 1996-2005 from the Derwent Innovations Index in the semiconductor area. We also investigate the relationships between the patent h-index and other three indicators, i.e., patent counts, citation counts, and the mean family size (MFS). The findings show that the patent h-index is indeed an effective indicator for evaluating the technological importance and quality, or impact, for an assignee. In addition, the MFS indicator correlates negatively and not significantly with the patent h-index, which indicates that the "social value" of a patent is in disagreement with its "private value." The two indicators, patent h-index and MFS, both provide an overview of the value of patents, but from two different angles.
We propose a new data source (Google Scholar) and metric (Hirsch's h-index) to assess journal impact in the field of economics and business. A systematic comparison between the Google Scholar h-index and the ISI Journal Impact Factor for a sample of 838 journals in economics and business shows that the former provides a more accurate and comprehensive measure of journal impact.
Text classifiers automatically classify documents into appropriate concepts for different applications. Most classification approaches use flat classifiers that treat each concept as independent, even when the concept space is hierarchically structured. In contrast, hierarchical text classification exploits the structural relationships between the concepts. In this article, we explore the effectiveness of hierarchical classification for a large concept hierarchy. Since the quality of the classification is dependent on the quality and quantity of the training data, we evaluate the use of documents selected from subconcepts to address the sparseness of training data for the top-level classifiers and the use of document relationships to identify the most representative training documents. By selecting training documents using structural and similarity relationships, we achieve a statistically significant improvement of 39.8% (from 54.5-76.2%) in the accuracy of the hierarchical classifier over that of the flat classifier for a large, three-level concept hierarchy.
As the Web is used increasingly to share and disseminate information, business analysts and managers are challenged to understand stakeholder relationships. Traditional stakeholder theories and frameworks employ a manual approach to analysis and do not scale up to accommodate the rapid growth of the Web. Unfortunately, existing business intelligence (BI) tools lack analysis capability, and research on BI systems is sparse. This research proposes a framework for designing BI systems to identify and to classify stakeholders on the Web, incorporating human knowledge and machine-learned information from Web pages. Based on the framework, we have developed a prototype called Business Stakeholder Analyzer (BSA) that helps managers and analysts to identify and to classify their stakeholders on the Web. Results from our experiment involving algorithm comparison, feature comparison, and a user study showed that the system achieved better within-class accuracies in widespread stakeholder types such as partner/sponsor/supplier and media/reviewer, and was more efficient than human classification. The student and practitioner subjects in our user study strongly agreed that such a system would save analysts' time and help to identify and classify stakeholders. This research contributes to a better understanding of how to integrate information technology with stakeholder theory, and enriches the knowledge base of BI system design.
Weblogs are gaining momentum as one of most versatile tools for online scholarly communication. Since academic weblogs tend to be used by scholars to position themselves in a disciplinary blogging community, links are essential to their construction. The aim of this article is to analyze the reasons for linking in academic weblogs and to determine how links are used for distribution of information, collaborative construction of knowledge, and construction of the blog's and the blogger's identity. For this purpose I analyzed types of links in 15 academic blogs, considering both sidebar links and in-post links. The results show that links are strategically used by academic bloggers for several purposes, among others to seek their place in a disciplinary community, to engage in hypertext conversations for collaborative construction of knowledge, to organize Information in the blog, to publicize their research, to enhance the blog's visibility, and to optimize blog entries and the blog itself.
Although information retrieval techniques used by Web search engines have improved substantially over the years, the results of Web searches have continued to be represented in simple list-based formats. Although the list-based representation makes it easy to evaluate a single document for relevance, it does not support the users in the broader tasks of manipulating or exploring the search results as they attempt to find a collection of relevant documents. HotMap is a meta-search system that provides a compact visual representation of Web search results at two levels of detail, and it supports interactive exploration via nested sorting of Web search results based on query term frequencies. An evaluation of the search results for a set of vague queries has shown that the re-sorted search results can provide a higher portion of relevant documents among the top search results. User studies show an increase in speed and effectiveness and a reduction in missed documents when comparing HotMap to the list-based representation used by Google. Subjective measures were positive, and users showed a preference for the HotMap interface. These results provide evidence for the utility of next-generation Web search results interfaces that promote interactive search results exploration.
Past research in information systems (IS) user satisfaction primarily adopted a conventional "key-driver analysis" approach assuming that independent variables symmetrically and linearly affect user satisfaction. However, recent studies suggest that relationships in IS satisfaction models are more complex. Relying solely on symmetric and linear models runs the risk of systemically misestimating the impact of independent variables on user satisfaction. Building upon previous work, we empirically tested the asymmetric and nonlinear IS user satisfaction model in the context of Internet-based portals. Results show that negative perceived performance on three of the four information-quality attributes have greater impacts on overall satisfaction than do positive perceived performance. In addition, user satisfaction appears to display diminishing sensitivity to information quality in the domain of negative perceived performance but not in positive perceived performance. We expect that this study will generate interest in this new but important area of research.
One of the most significant recent advances in health information systems has been the shift from paper to electronic documents. While research on automatic text and image processing has taken separate paths, there is a growing need for joint efforts, particularly for electronic health records and biomedical literature databases. This work aims at comparing text-based versus image-based access to multimodal medical documents using state-of-the-art methods of processing text and image components. A collection of 180 medical documents containing an image accompanied by a short text describing it was divided into training and test sets. Content-based image analysis and natural language processing techniques are applied individually and combined for multimodal document analysis. The evaluation consists of an indexing task and a retrieval task based on the "gold standard" codes manually assigned to corpus documents. The performance of text-based and image-based access, as well as combined document features, is compared. Image analysis proves more adequate for both the indexing and retrieval of the images. In the indexing task, multimodal analysis outperforms both independent image and text analysis. This experiment shows that text describing images can be usefully analyzed in the framework of a hybrid text/image retrieval system.
We describe the results of an experiment designed to study user preferences for different orderings of search results from three major search engines. In the experiment, 65 users were asked to choose the best ordering from two different orderings of the same set of search results: Each pair consisted of the search engine's original top-10 ordering and a synthetic ordering created from the same top-10 results retrieved by the search engine. This process was repeated for 12 queries and nine different synthetic orderings. The results show that there is a slight overall preference for the search engines' original orderings, but the preference is rarely significant. Users' choice of the "best" result from each of the different orderings indicates that placement on the page (i.e., whether the result appears near the top) is the most important factor used in determining the quality of the result, not the actual content displayed in the top-10 snippets. In addition to the placement bias, we detected a small bias due to the reputation of the sites appearing in the search results.
Access to previous results is of paramount importance in the scientific process. Recent progress in information management focuses on building e-infrastructures for the optimization of the research workflow, through both policy-driven and user-pulled dynamics. For decades, High Energy Physics (HEP) has pioneered innovative solutions in the field of information management and dissemination. In light of a transforming information environment, it is important to assess the current usage of information resources by researchers and HEP provides a unique test bed for this assessment. A survey of about 10% of practitioners in the field reveals usage trends and information needs. Community-based services, such as the pioneering arXiv and SPIRES systems, largely answer the need of the scientists, with a limited but increasing fraction of younger users relying on Google. Commercial services offered by publishers or database vendors are essentially unused in the field. The survey offers an insight into the most important features that users require to optimize their research workflow. These results inform the future evolution of information management in HEP and, as these researchers are traditionally "early adopters" of innovation in scholarly communication, can inspire developments of disciplinary repositories serving other communities.
The Journal Impact Factor (JIF) published by Thomson Reuters is often used to evaluate the significance and performance of scientific journals. Besides methodological problems with the JIF, the critical issue is whether a single measure is sufficient for characterizing the impact of journals, particularly the impact of multidisciplinary and wide-scope journals that publish articles in a broad range of research fields. Taking Angewandte Chemie International Edition and the Journal of the American Chemical Society as examples, we examined the two journals' publication and impact profiles across the sections of Chemical Abstracts and compared the results with the JIF. The analysis was based primarily on Communications published in Angewandte Chemie International Edition and the Journal of the American Chemical Society during 2001 to 2005. The findings show that the information available in the Science Citation Index is a rather unreliable indication of the document type and is therefore inappropriate for comparative analysis. The findings further suggest that the composition of the journal in terms of contribution types, the length of the citation window, and the thematic focus of the journal in terms of the sections of Chemical Abstracts has a significant influence on the overall journal citation impact. Therefore, a single measure of journal citation impact such as the JIF is insufficient for characterizing the significance and performance of wide-scope journals. For the comparison of journals, more sophisticated methods such as publication and impact profiles across subject headings of bibliographic databases (e.g., the sections of Chemical Abstracts) are valuable.
Prior research has identified specific factors that hinder growth of teledensity in developing countries and specific strategies used to overcome such limitations both in Latin America and in Sub-Saharan Africa. Prior research also has reported on the perceptions that telecommunications stakeholders have on how various strategies can inform and assist in the enhancement of teledensity in each of the two continental regions. This study fills a gap in the literature by investigating similarities and differences in the telecommunication stakeholders' perspectives of specific strategies used to address teledensity limitations in Latin America as well as in Sub-Saharan Africa. Independent samples of survey participants (Latin America's and Sub-Saharan Africa's telecommunications stakeholders) analyzed the strategies. Using appropriate statistical procedures, we examined these stakeholders' perceptions to find areas of commonality and difference in their respective perspectives on the effectiveness of selected strategies. Qualitative comments to support the stakeholders' responses are reported, together with future research implications.
Drawing on network theory, this study considers the content of U.S. presidential debates and how candidates' language differentiates them. Semantic network analyses of all U.S. presidential debates (1960-2004) were conducted. Results reveal that regardless of party affiliation, election winners were more central in their semantic networks than losers. Although the study does not argue causation between debating and electoral outcomes, results show a consistent pattern: Candidates who develop coherent, central, semantically structured messages in debates seem to be victorious on election day. An argument is made for employing semantic networks in studying debates and political discourse.
In this paper we present a number of metrics for usage of the SAO/NASA Astrophysics Data System(ADS). Since the ADS is used by the entire astronomical community, these are indicative of how the astronomical literature is used. We will show how the use of the ADS has changed both quantitatively and qualitatively. We will also show that different types of users access the system in different ways. Finally, we show how use of the ADS has evolved over the years in various regions of the world. Published by Elsevier Ltd.
This paper explores the use of Library Catalog Analysis (LCA), defined as the application of bibliometric or informetric techniques to a set of library online catalogs, to describe quantitatively a scientific-scholarly field on the basis of published book titles. It focuses on its value as a tool in studies of Social Sciences and Humanities, especially its cognitive structures, main book publishers and the research performance of its actors. The paper proposes an analogy model between traditional citation analysis of journal articles and Library Catalog Analysis of book titles. It presents the outcomes of an exploratory study of book titles in Economics included in 42 academic library catalogs from 7 countries. It describes the process of data collection and cleaning, and applies a series of indicators and thematic mapping techniques. It illustrates how LCA can be fruitfully used to assess book production and research performance at the level of an individual researcher, a research department, an entire country and a book publisher. It discusses a number of issues that should be addressed in follow-up studies and concludes that LCA of published book titles can be developed into a powerful and useful tool in studies of Social Sciences and Humanities. (C) 2008 Elsevier Ltd. All rights reserved.
Examining a comprehensive set of papers (n = 1837) that were accepted for publication by the journal Angewandte Chemie International Edition (one of the prime chemistry journals in the world) or rejected by the journal but then published elsewhere, this study tested the extent to which the use of the freely available database Google Scholar (GS) can be expected to yield valid citation counts in the field of chemistry. Analyses of citations for the set of papers returned by three fee-based databases - Science Citation Index, Scopus, and Chemical Abstracts - were compared to the analysis of citations found using GS data. Whereas the analyses using citations returned by the three fee-based databases show very similar results, the results of the analysis using GS citation data differed greatly from the findings using citations from the fee-based databases. Our study therefore supports, on the one hand, the convergent validity of citation analyses based on data from the fee-based databases and, on the other hand, the lack of convergent validity of the citation analysis based on the GS data. (C) 2008 Elsevier Ltd. All rights reserved.
The aim of the paper is to investigate the use of online data and time series analysis, in order to study the dynamics of new types of research collaboration in a systematic way. Two international research teams were studied for more than 3 years, and quantitative data about their internet use together with observation of their collaboration patterns were gathered. Time series analysis (ARIMA modelling) was performed on their use of internet, and specific types of models related to specific ways of conducting research at a distance. The paper proposes the use of online data and ARIMA models to identify the stabilisation of a complex system, such as a research team, and investigate everyday research practices. (C) 2008 Elsevier Ltd. All rights reserved.
This paper treats document-document similarity approaches in the context of science mapping. Five approaches, involving nine methods, are compared experimentally. We compare text-based approaches, the citation-based bibliographic coupling approach, and approaches that combine text-based approaches and bibliographic coupling. Forty-three articles, published in the journal Information Retrieval, are used as test documents. We investigate how well the approaches agree with a ground truth subject classification of the test documents, when the complete linkage method is used, and under two types of similarities, first-order and second-order. The results show that it is possible to achieve a very good approximation of the classification by means of automatic grouping of articles. One text-only method and one combination method, under second-order similarities in both cases, give rise to cluster solutions that to a large extent agree with the classification. (C) 2008 Elsevier Ltd. All rights reserved.
The definitions of the rational and real-valued variants of the h-index and g-index are reviewed. It is shown how they can be obtained both graphically and by calculation. Formulae are derived expressing the exact relations between the h-variants and between the g-variants. Subsequently these relations are examined. In a citation context the real h-index is often, but not always, smaller than the rational h-index. It is also shown that the relation between the real and the rational g-index depends on the number of citations of the article ranked g + 1. Maximum differences between h, h(r) and h(rat) on the one hand and between g, g(r) and g(rat) on the other are determined. (C) 2008 Elsevier Ltd. All rights reserved.
In this paper a machine learning approach for classifying Arabic text documents is presented. To handle the high dimensionality of text documents, embeddings are used to map each document ( instance) into R ( the set of real numbers) representing the tri-gram frequency statistics profiles for a document. Classification is achieved by computing a dissimilarity measure, called the Manhattan distance, between the pro. le of the instance to be classified and the profiles of all the instances in the training set. The class ( category) to which an instance ( document) belongs is the one with the least computed Manhattan measure. The Dice similarity measure is used to compare the performance of method. Results show that tri-gram text classification using the Dice measure outperforms classification using the Manhattan measure. (C) 2008 Elsevier Ltd. All rights reserved.
The problem of ranking is a crucial task in the web information retrieval systems. The dynamic nature of information resources as well as the continuous changes in the information demands of the users has made it very difficult to provide effective methods for data mining and document ranking. Regarding these challenges, in this paper an adaptive ranking algorithm is proposed named GPRank. This algorithm which is a function discovery framework, utilizes the relatively simple features of web documents to provide suitable rankings using a multi-layer/multi-population genetic programming architecture. Experiments done, illustrate that GPRank has better performance in comparison with well-known ranking techniques and also against its full mode edition. (C) 2008 Elsevier Ltd. All rights reserved.
One way to achieve international patent protection is to file patents via the Patent Cooperation Treaty (PCT). The application process therein can be divided into two phases, those represented by chapters I and II of the PCT. According to the literature, patent applications filed via chapter II of the Treaty tend to be more valuable. The results presented in this paper suggest that in general this assumption is not justified. The analyses further revealed that for practitioners seeking fast patent protection at the European Patent Office (EPO) via the PCT, the choice should be chapter II of the PCT, with the EPO as preliminary examination authority.
This paper presents the results of an evaluation of the national research system in Morocco. The exercise focuses on the period 1997-2006 and includes a comparison with South Africa, Egypt, Nigeria, Tunisia, Algeria, Portugal and Greece. Ratings of highly ranked researchers are developed on the basis of their number of publications, number of citations and also their 'h-index' (or Hirsch index). Finally, we examine the empirical model set by Glanzel that related the h-index to the number of publications and the mean citation rate per paper for these 'upper-class' researchers. The use of this model confirms that the h-index is likely to reflect the importance and the quality of the scientific output of a given researcher.
Citation analysis for evaluative purposes requires reference standards, as publication activity and citation habits differ considerably among fields. Reference standards based on journal classification schemes are fraught with problems in the case of multidisciplinary and general journals and are limited with respect to their resolution of fields. To overcome these shortcomings of journal classification schemes, we propose a new reference standard for chemistry and related fields that is based on the sections of the Chemical Abstracts database. We determined the values of the reference standard for research articles published in 2000 in the biochemistry sections of Chemical Abstracts as an example. The results show that citation habits vary extensively not only between fields but also within fields. Overall, the sections of Chemical Abstracts seem to be a promising basis for reference standards in chemistry and related fields for four reasons: (1) The wider coverage of the pertinent literature, (2) the quality of indexing, (3) the assignment of papers published in multidisciplinary and general journals to their respective fields, and (4) the resolution of fields on a lower level (e.g. mammalian biochemistry) than in journal classification schemes (e.g. biochemistry & molecular biology).
As the major concerns of the university are teaching and research, this paper describes the study of a nation-wide evaluation of research performance in management for 168 universities in Taiwan. In addition to the popular indicators of SCI/SSCI journal publications and citations, the number of projects funded by the National Science Council of Taiwan was used to account for the special characteristic of the field of management. The evaluation was based on individual professors rather than management programs, so that all types of universities, including those without management departments, could be compared. Performances of each university in those three indicators were aggregated by a set of a posteriori weights which were most favourable to all universities in calculating the aggregated score. The results show that public universities, in general, performed better than private ones. Universities with specific missions had comparable performance to general comprehensive ones. Analyses from a set of a priori weights solicited from experts showed that the results of this study are robust to the indicators selected and the weights used.
By employing the Pearson correlation, Fisher-and t-tests, the present study analyzes and compares scientometric data including number of source items, number of citations, impact factor, immediacy index, citing half-life and cited half-life, for essential journals in physics, chemistry and engineering, from SCI JCR on the Web 2002. The results of the study reveal that for all the scientometric indicators, except the cited half-life, there is no significant mean difference between physics and chemistry subjects indicating similar citation behavior among the scientists. There is no significant mean difference in the citing half-life among the three subjects. Significant mean difference is generally observed for most of the scientometric indicators between engineering and physics (or chemistry) demonstrating the difference in citation behavior among engineering researchers and scientists in physics or chemistry. Significant correlations among number of source items, number of citations, impact factor, and immediacy index and between cited half-life and citing half-life generally prevail for each of the three subjects. On the contrary, in general, there is no significant correlation between the cited half-life and other scientometric indicators. The three subjects present the same strength of the correlations between number of source items and number of citations, between number of citations and impact factor, and between cited half-life and citing half-life.
The paper reviews the present status of Indian physics research, in particular its nature of research system, nature of institutions involved, type of education offered and outturn at postgraduate and Ph.D level, the extent to which extra-mural funding support is available from various governmental R&D agencies, and the nature of professional organizations involved. The study is based on analysis of Indian physics output, as indexed in Expanded Science Citation Index (Web of Science) during 1993-2001. The study also discusses various features of Indian physics research such as its growth in terms of research papers, institutional publication productivity, nature of collaboration, and the quality and impact of its research output.
In restructuring environmental research organisations, smaller sites generally disappear and larger sites are created. These decisions are based on the economic principle, 'economies of scale', whereby the average cost of each unit produced falls as output increases. We show that this principle does not apply to the scientific performance of environmental research institutes, as productivity per scientist decreased with increasing size of a research site. The results are best explained by the principle 'diseconomies of scale', whereby powerful social factors limit the productivity of larger groupings. These findings should be considered when restructuring environmental science organisations to maximise their quality.
We analyse the co-authorship networks of researchers affiliated at universities in Turkey by using two databases: the international SSCI database and the Turkish ULAKBIM database. We find that co-authorship networks are composed largely of isolated groups and there is little intersection between the two databases, permitting little knowledge diffusion. There seems to be two disparate populations of researchers. While some scholars publish mostly in the international journals, others target the national audience, and there is very little intersection between the two populations. The same observation is valid for universities, among which there is very little collaboration. Our results point out that while Turkish social sciences and humanities publications have been growing impressively in the last decade, domestic networks to ensure the dissemination of knowledge and of research output are very weak and should be supported by domestic policies.
The relative occurrence of the words "surprising" and "unexpected" in the titles of scientific papers was 11 times more common in 2001-2005 than in 1900-1955. However, papers which had titles containing one of these words did not receive enhanced numbers of citations. Both words (and also adjectives "unusual" and "unfortunately") are used significantly more frequently in science than in social sciences and humanities. The distribution of the statements of surprise is not random in scientific literature (chemistry journals ranked highest in the number of papers claiming "surprising" or "unexpected" results) and may reflect the level of maturity of a discipline.
In recent studies the issue of the relatedness between journal impact factors and other measures of journal impact have been raised and discussed from both merely empirical and theoretical perspectives. Models of the underlying citation processes suggest distributions with two or more free parameters. Proceeding from the relation between the journals' mean citation rate and uncitedness and the assumption of an underlying Generalised Waring Distribution (GWD) model, it is found that the journal impact factor alone does not sufficiently describe a journal's citation impact, while a two-parameter solution appropriately reflects its main characteristics. For the analysis of highly cited publications an additional model derived from the same GWD is suggested. This approach results in robust, comprehensible and interpretable solutions that can readily be applied in evaluative bibliometrics.
Hirsch-type indices are devised for characterizing networks and network elements. Their actual use is demonstrated on scientometric examples, and the potential value of the concept on a practically unlimited range of networks is suggested.
Social network sites like MySpace are increasingly important environments for expressing and maintaining interpersonal connections, but does online communication exacerbate or ameliorate the known tendency for offline friendships to form between similar people (homophily)? This article reports an exploratory study of the similarity between the reported attributes of pairs of active MySpace Friends based upon a systematic sample of 2,567 members joining on June 18, 2007 and Friends who commented on their profile. The results showed no evidence of gender homophily but significant evidence of homophily for ethnicity, religion, age, country, marital status, attitude towards children, sexual orientation, and reason for joining MySpace. There were also some imbalances: women and the young were disproportionately commenters, and commenters tended to have more Friends than commentees. Overall, it seems that although traditional sources of homophily are thriving in MySpace networks of active public connections, gender homophily has completely disappeared. Finally, the method used has wide potential for investigating and partially tracking homophily in society, providing early warning of socially divisive trends.
The well-known similarity measures Jaccard, Salton's cosine, Dice, and several related overlap measures for vectors are compared. While general relations are not possible to prove, we study these measures on the "trajectories" of the form parallel to (X) over right arrow parallel to = a parallel to (Y) over right arrow parallel to, where a > 0 is a constant and parallel to center dot parallel to denotes the Euclidean norm of a vector. In this case, direct functional relations between these measures are proved. For Jaccard, we prove that it Is a convexly increasing function of Salton's cosine measure, but always smaller than or equal to the latter, hereby explaining a curve, experimentally found by Leydesdorff. All the other measures have a linear relation with Salton's cosine, reducing even to equality, in case a = 1. Hence, for equally normed vectors (e.g., for normalized vectors) we, essentially, only have Jaccard's measure and Salton's cosine measure since all the other measures are equal to the latter.
We use a technique recently developed by V. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre (2008) to detect scientific specialties from author cocitation networks. This algorithm has distinct advantages over most previous methods used to obtain cocitation "clusters" since it avoids the use of similarity measures, relies entirely on the topology of the weighted network, and can be applied to relatively large networks. Most importantly, it requires no subjective interpretation of the cocitation data or of the communities found. Using two examples, we show that the resulting specialties are the smallest coherent "groups" of researchers (within a hierarchy of cluster sizes) and can thus be identified unambiguously. Furthermore, we confirm that these communities are indeed representative of what we know about the structure of a given scientific discipline and that as specialties, they can be accurately characterized by a few keywords (from the publication titles). We argue that this robust and efficient algorithm is particularly well-suited to cocitation networks and that the results generated can be of great use to researchers studying various facets of the structure and evolution of science.
The impact of frame semantic enrichment of texts on the task of factoid question answering (QA) is studied in this paper. In particular, we consider different techniques for answer processing with frame semantics: the level of semantic class identification and role assignment to texts, and the fusion of frame semantic-based answer-processing approaches with other methods used in the Text REtrieval Conference (TREC). The impact of each of these aspects on the overall performance of a QA system is analyzed in this paper. The TREC 2004 and TREC 2006 factoid question sets were used for the experiments. These demonstrate that the exploitation of encapsulated frame semantics in FrameNet in a shallow semantic parsing process can enhance answer-processing performance in factoid QA systems. This improvement is dependent on the level of semantic annotation, the frame semantic alignment method, and the method of fusing frame semantic-based answer-processing models with other existing models. A more comprehensively annotated environment with all different part-of-speech target predicates provides a higher chance of correct factoid answer retrieval where semantic alignment is based on both semantic classes and a relaxed set of semantic roles for answer span identification. Our experiments on fusion techniques of frame semantic-based and entity-based answer-processing models show that merging answer lists with respect to their scores and redundancy by exploiting a fusion function leads to a more effective overall factoid QA system compared to the use of individual models.
A significant fraction of queries in PubMed (TM) are multiterm queries without parsing instructions. Generally, search engines interpret such queries as collections of terms, and handle them as a Boolean conjunction of these terms. However, analysis of queries in PubMed (TM) indicates that many such queries are meaningful phrases, rather than simple collections of terms. In this study, we examine whether or not it makes a difference, in terms of retrieval quality, if such queries are interpreted as a phrase or as a conjunction of query terms. And, if it does, what is the optimal way of searching with such queries. To address the question, we developed an automated retrieval evaluation method, based on machine learning techniques, that enables us to evaluate and compare various retrieval outcomes. We show that the class of records that contain all the search terms, but not the phrase, qualitatively differs from the class of records containing the phrase. We also show that the difference is systematic, depending on the proximity of query terms to each other within the record. Based on these results, one can establish the best retrieval order for the records. Our findings are consistent with studies in proximity searching.
To provide a basis for making predictions of the characteristics of search task (ST), based on work task (WT), and to explore the nature of WT and ST, this study examines the relationships between WT and ST (interrelationships) and the relationships between the different facets of both WT and ST (intra-relationships), respectively. A faceted classification of task was used to conceptualize work task and search task. Twenty-four pairs of work tasks and their associated search tasks were collected, by semistructured interviews, and classified based on the classification. The results indicate that work task shapes different facets or sub-facets of its associated search tasks to different degrees. Several sub-facets of search task, such as Time (Length), Objective task complexity, and Subjective task complexity, are most strongly affected by work task. The results demonstrate that it is necessary to consider difficulty and complexity as different constructs when investigating their influence on information search behavior. The exploration of intra-relationships illustrates the difference of work task and search task in their nature. The findings provide empirical evidence to support the view that work task and search task are multi-faceted variables and their different effects on users' information search behavior should be examined.
The purpose of this article is to analyze the acknowledgment (ACK) paratext of medical research articles written in English and Spanish in three geographical contexts: Venezuela, Spain, and the United States of America. We thus randomly selected 150 research articles from leading medical journals in each country. The frequency and length of ACKs, the number of named and unnamed acknowledgees, the reasons why they were acknowledged, the number of grants received, and the sources of funding were recorded. The motivations that underpinned each ACK were classified according to B. Cronin's (1995) and C.L. Giles & I.G. Councill's (2004) typology. Results were analyzed by means of chi-square tests. Our results show that ACKs from the English-language corpus are significantly more frequent and longer than those from both the Spanish and Venezuelan samples. The number of persons acknowledged and the number of grants received also were significantly greater in the U.S. sample than they were in the two Spanish-language corpora. Differences were found in the number and types of funding sources. Moreover, in the three corpora, technical/instrumental assistance was more frequently acknowledged than was peers'ideational input. A small-scale ethnographic research study was conducted with Spanish and Venezuelan researchers to get firsthand feedback on the motivations that could lie behind their ACK behavior. We conclude that "backstage solidarity" (E. Goffman, 1959, p. 177; also see B. Cronin & S. Franks, 2006, p. 1915) significantly differs from one context to another, and that the communicative and sociocultural conventions of academic contributorship are not only discipline-dependent but also language- and context-dependent.
In most disciplines of scholarly endeavor, there are many efforts at ranking research journals. There are two common methods for such efforts. One of these is based on tabulations of opinions offered by persons having some kind of relationship with the discipline. The other is based on analyses of the extent to which a journal's articles have been cited by papers appearing in some selected set of publications. In either case, construction of a journal ranking for a discipline makes no effort to distinguish between private and public universities. That is, data are aggregated from both private and public faculty researchers. It is thus assumed that the resultant ranking is applicable for both kinds of institutions. But, is this assumption reasonable? The answer is very important because these rankings are applied in the evaluation of promotion, tenure, and merit cases of faculty members working in a discipline. Here, we examine this widespread bibliometric assumption through the use of a ranking methodology that is based on the actual publishing behaviors of tenured researchers in a discipline. The method is used to study the behaviors of researchers at leading private universities versus those at leading public universities. Illustrating this approach within the information systems discipline, we find that there are indeed different publication patterns for the private versus public institutions. This finding suggests that journal-ranking exercises should not ignore private-public distinctions and that care should be taken to avoid evaluation standards that confound private and public rankings of journals.
The decomposition of scientific literature into disciplinary and subdisciplinary structures is one of the core goals of scientometrics. How can we achieve a good decomposition? The ISI subject categories classify journals included in the Science Citation Index (SCI). The aggregated journal-journal citation matrix contained in the Journal Citation Reports can be aggregated on the basis of these categories. This leads to an asymmetrical matrix (citing versus cited) that is much more densely populated than the underlying matrix at the journal level. Exploratory factor analysis of the matrix of subject categories suggests a 14-factor solution. This solution could be interpreted as the disciplinary structure of science. The nested maps of science (corresponding to 14 factors, 172 categories, and 6,164 journals) are online at http://www.leydesdortf.net/map06. Presumably, inaccuracies in the attribution of journals to the ISI subject categories average out so that the factor analysis reveals the main structures. The mapping of science could, therefore, be comprehensive and reliable on a large scale albeit imprecise in terms of the attribution of journals to the ISI subject categories.
In this paper, a new method for fuzzy clustering is proposed that combines generative topographic mapping (GTM) and Fuzzy c-means (FCM) clustering. GTM is used to generate latent variables and their posterior probabilities. These two provide the distribution of the input data in the latent space. FCM determines the seeds of clusters, as well as the resultant clusters and the corresponding membership functions of the input data, based on the latent variables obtained from GTM. Experiments are conducted to compare the results obtained using FCM and the Gustafson-Kessel (GK) algorithm with the proposed method in terms of four cluster-validity indexes. Using simulated and benchmark data sets, it is observed that the hybrid method (GTMFCM) performs better than FCM and GK algorithms in terms of these indexes. It is also found that the superiority of GTMFCM over FCM and GK algorithms becomes more pronounced with the increase in the dimensionality of the input data set.
The increasing popularity of e-learning has created a need for accurate student achievement prediction mechanisms, allowing instructors to improve the efficiency of their courses by addressing specific needs of their students at an early stage. In this paper, a student achievement prediction method applied to a 10-week introductory level e-learning course is presented. The proposed method uses multiple feed-forward neural networks to dynamically predict students' final achievement and to cluster them in two virtual groups, according to their performance. Multiple-choice test grades were used as the input data set of the networks. This form of test was preferred for its objectivity. Results showed that accurate prediction is possible at an early stage, more specifically at the third week of the 10-week course. In addition, when students were clustered, low misplacement rates demonstrated the adequacy of the approach. The results of the proposed method were compared against those of linear regression and the neural-network approach was found to be more effective in all prediction stages. The proposed methodology is expected to support instructors in providing better educational services as well as customized assistance according to students' predicted level of performance.
We contribute to the understanding of how technologies may be perceived to be part of technology clusters. The value added of the paper is both at a theoretical and an empirical level. We add to the theoretical understanding of technology clusters by distinguishing between clusters in perceptions and clusters in ownership, and by proposing a mechanism to explain the existence of clusters. Our empirical analysis combines qualitative and quantitative methods to investigate clusters of consumer electronics for a sample of Dutch consumers. We find that perceived clusters in consumer electronics are mostly determined by functional linkages, and that perceived technology clusters are good predictors of ownership clusters, but only for less widely diffused products.
E-mail messages are unquestionably one of the most popular communication media these days. Not only are they fast and reliable but also free in general. Unfortunately, a significant number of e-mail messages received by e-mail users on a daily basis are spam. This fact is annoying since spam messages translate into a waste of the user's time in reviewing and deleting them. In addition, spam messages consume resources such as storage, bandwidth, and computer-processing time. Many attempts have been made in the past to eradicate spam-1 however, none has proven highly effective. In this article, we propose a spam e-mail detection approach, called SpamED, which uses the similarity of phrases in messages to detect spam. Conducted experiments not only verify that SpamED using trigrams in e-mail messages is capable of minimizing false positives and false negatives in spam detection but it also outperforms a number of existing e-mail filtering approaches with a 96% accuracy rate.
Temporal growth of the h-index in a diachronous cumulative time series is predicted to be linear by Hirsch (2005), whereas other models predict a concave increase. Actual data generally yield a linear growth or S-shaped growth. We study the h-index's growth in computer simulations of the publication-citation process. In most simulations the h-index grows linearly in time. Only occasionally does an S-shape occur, while in our simulations a concave increase is very rare. The latter is often signalled by the occurrence of plateaus-periods of h-index stagnation. Several parameters and their influence on the h-index's growth are determined and discussed.
Caution is urged over the adoption of dynamic h-type indexes as advocated by Rousseau and Ye (2008). It is shown that the dynamics are critically dependent upon model assumptions and that practical interpretation might therefore be problematic. However, interesting questions regarding the interrelations between various h-type indexes are raised.
The Matthew effect has that recognition is bestowed on researchers of already high repute. If recognition is measured by citations, this means that often-cited papers or authors are cited more often. I use the statistical theory of the growth of firms to test whether the fame of papers and authors indeed exhibits increasing returns to scale, and confirm this hypothesis for the 100 most prolific economists.
We set out to analyse and quantify the papers published (for an international readership) by Spanish universities in the field of Legal and Forensic Medicine. For this, we used the MEDLINE data base, to analyse research articles in which a Spanish university teacher (whose sole employment was with a university, as registered by the Ministry of Education in July 2005, (n = 67), appeared as author or co-author in this field. The years covered are 1952 (First year that a Spanish author appears for an article on Legal and Forensic Medicine in this service) to July 2005. A total of 770 articles were counted; the productivity in this area was increasing substantially from the 1980's onwards. English language scientific journals were the preferred channel of communication. Slightly more than 85% of the works can be classified into four themes, of which Genetics is the most prolific. The number of papers published in English journals represented 84% of the total and only 13% was published in Spanish journals. There was a close relationship between growth in the authority index and inter-institutional co-operation, which boosted the production of articles. When at least one of the authorship of a published paper was a Spanish university teacher, the research was led by a university in 62.4% of cases, and of this 85.6% were Spanish universities.
This paper surveys 56 internationally renowned OR journals published in 1996-2005 with regard to authorship. Our findings show that the USA was the country that contributed the largest amount, approximately one-third, of research results to OR journals. Authors tend to publish papers in their home-country journals. Journal of the Operations Research Society of Japan has the highest author concentration, with more than 85% of the authors from Japan and European Journal of Operational Research, on the contrary, has the widest country spread of its authors. The entropy measure provides a whole picture of the share of all countries, based on which the editorial policy of a journal can be adjusted.
Most studies of scholarly influence within disciplines using citation data do not investigate the extent of an individual's influence; does it extend over a number of years with a sequence of publications or is it confined to a short period and a small number of publications? Using bibliographic data from a series of quadrennial reports into developments in UK geography, this paper finds that few authors are cited on more than one occasion.
This study used citation analysis method to identify the 40 classics published in the Journal of the American Society for Information Science and Technology from 1956 to 2007. Yhe year and subject distributions of these classic references reflect the history and the current status of information science.
In this paper we examine various aspects of the scientific collaboration between Europe and Israel, and show that the traditional collaboration patterns of Israel (preference towards collaboration with the US) is changing, and the collaboration with the EU countries is growing.
Median age difference (D) is obtained by subtracting median value of the age distribution of references of a scientific paper from citing half life of the journal that published it. Such an indicator can be related to the state of knowledge of research groups and can show some interesting properties: 1) it must be related with the incorporation of information pieces in an informal way, say the rate of self-citations; 2) it must follow the natural tendency of the groups towards a progressively updated state of knowledge, and 3) more productive groups will tend to use more recent information. These natural hypotheses have been investigated using a medium sized Spanish institution devoted to Food Research as a case study. Scientific output comprised 439 papers published in SCI journals between 1999 and 2004 by 16 research teams. Their 14,617 references were analyzed. Variables studied were number of published papers by every team, number of authors per paper, number of references per paper, type of documents cited, self citation rate and chronological range of reference lists. Number of authors per paper ranged between 1 and 15. The most frequent value (N = 128) was 3 authors. Average number of authors per paper is 4.03 (SD = 1.74). Mean number of references per paper (including review papers) is 33.3 (SD= 17.39) with slight differences between the groups. Mean self-citation rate was 13.72 % (SD = 11.7). The greatest chronological range was 119 years; half of all ranges was 30 years and the general mean for this variable was 33.34 years (SD = 16.34). D values were associated with self-citation rate and a negative relationship between D and chronological range of references was also found. Nevertheless, correlation figures were too small to reach sound conclusions about the effect of these variables. Number of references per paper, number of contributing authors and number of papers published by each team were not associated with D. D values can discriminate between groups managing updated information and delayed research teams. Publication delay affects D figures. Discontinuity of research lines, heterogeneity of research fields and the short time lapse studied could have some influence on the results of the study. It is suggested that a great coverage is needed to evaluate properly D figures as indicators of information update of research groups.
We have developed a method to obtain robust quantitative bibliometric indicators for several thousand scientists. This allows us to study the dependence of bibliometric indicators (such as number of publications, number of citations, Hirsch index...) on the age, position, etc. of CNRS scientists. Our data suggests that the normalized h-index (h divided by the career length) is not constant for scientists with the same productivity but different ages. We also compare the predictions of several bibliometric indicators on the promotions of about 600 CNRS researchers. Contrary to previous publications, our study encompasses most disciplines, and shows that no single indicator is the best predictor for all disciplines. Overall, however, the Hirsch index h provides the least bad correlations, followed by the number of papers published. It is important to realize however that even h is able to recover only half of the actual promotions. The number of citations or the mean number of citations per paper are definitely not good predictors of promotion.
To discover the current situation and characteristics of web reference accessibility, the present study examined the accessibility of 1,637 web references in two key Chinese academic journals published from 1999 to 2003. The author develops linear regression models to demonstrate the decay of web reference accessibility. The study examines the influence of high use of web references in a paper, the associations between web reference accessibility and generic domain, country domain, protocol, and resource type, respectively, and classifies inaccessible web references according to Internet Explorer feedbacks. It compares the retrieval efficacy among three kinds of retrieval methods and reports on the limitations of Internet Archive.
A socio-economic networking (SEN) of the public funded basic research (PFBR) in the Japan Atomic Energy Research Institute (JAERI) was studied by the bibliometric method combined with the international nuclear database INIS. As PFBR, Material Science (MS) research of JAERI is chosen. The appropriateness of the present bibliometric method is discussed. The authors believe that this method is applicable to studying the socio-economic effect on PFBR. The shortcoming of it is, however, the use of the inevitable usage of biased EBRF (ranked keywords), accompanied with the feeling of unfairness. The authors confirm that the S-matrix has a potential capability to show the quantitative magnitude of co-operation among research institutions avoiding significant bias.
This study examines the relationship between citation frequency and the human capital of teams of authors. Analysis of a random sample of articles published in top natural science journals shows that articles co-authored by teams including frequently cited scholars and teams whose members have diverse disciplinary backgrounds have greater citation frequency. The institutional prestige, the percentage of team members at U. S. institutions and the variety of disciplines represented by team member backgrounds do not influence citation frequency. The study introduces a method for evaluating the extent of multidisciplinarity that accounts for the relatedness of disciplines or authors.
Optics is an important research domain both for its scientific interest and industrial applications. In this paper, we constructed a citation network of papers and performed topological clustering method to investigate the structure of research and to detect emerging research domains in optics. We found that optics consists of main five subclusters, optical communication, quantum optics, optical data processing, optical analysis and lasers. Then, we further investigated the detailed subcluster structures in it. By doing so, we detected some emerging research domains such as nonlinearity in photonic crystal fiber, broad band parametric amplifier, and in-vivo imaging techniques. We also discuss the distinction between research front and intellectual base in optics.
It is shown that a Hirsch-type index can be used for assessing single highly cited publications by calculating the h-index of the set of papers citing the work in question. This index measures not only the direct impact of a publication but also its indirect influence through the citing papers.
Collaboration is a major research policy objective, but does it deliver higher quality research? This study uses citation analysis to examine the Web of Science (WoS) Information Science & Library Science subject category (IS&LS) to ascertain whether, in general, more highly cited articles are more highly collaborative than other articles. It consists of two investigations. The first investigation is a longitudinal comparison of the degree and proportion of collaboration in five strata of citation; it found that collaboration in the highest four citation strata (all in the most highly cited 22%) increased in unison over time, whereas collaboration in the lowest citation strata (un-cited articles) remained low and stable. Given that over 40% of the articles were un-cited, it seems important to take into account the differences found between un-cited articles and relatively highly cited articles when investigating collaboration in IS&LS. The second investigation compares collaboration for 35 influential information scientists; it found that their more highly cited articles on average were not more highly collaborative than their less highly cited articles. In summary, although collaborative research is conducive to high citation in general, collaboration has apparently not tended to be essential to the success of current and former elite information scientists.
This exploratory study examines how design engineers and technical professionals (hereafter referred to as engineers) in innovative high-tech firms in the United States and India use information in their daily work activities including research, development, and management. The researchers used naturalistic observation to conduct a series of daylong workplace observations with 103 engineers engaged in product design and testing in four U.S.- and two India-based firms. A key finding is that engineers spend about one fourth of their day engaged in some type of information event, which was somewhat lower than the percentage identified in previous research. The explanation may be rooted in the significant change in the information environment and corporate expectations in the last 15 years, which is the time of the original study. Searching technology has improved, making searching less time consuming, and engineers are choosing the Internet as a primary source even though information may not be as focused, as timely, or as authoritative. The study extends our understanding of the engineering workplace, and the information environment in the workplace, and provides information useful for improving methods for accessing and using information, which could ultimately lead to better job performance, facilitate innovation, and encourage economic growth.
A consensus map of science is generated from an analysis of 20 existing maps of science. These 20 maps occur in three basic forms: hierarchical, centric, and noncentric (or circular). The consensus map, generated from consensus edges that occur in at least half of the input maps, emerges in a circular form. The ordering of areas is as follows: mathematics is (arbitrarily) placed at the top of the circle, and is followed clockwise by physics, physical chemistry, engineering, chemistry, earth sciences, biology, biochemistry, infectious diseases, medicine, health services, brain research, psychology, humanities, social sciences, and computer science. The link between computer science and mathematics completes the circle. If the lowest weighted edges are pruned from this consensus circular map, a hierarchical map stretching from mathematics to social sciences results. The circular map of science is found to have a high level of correspondence with the 20 existing maps, and has a variety of advantages over hierarchical and centric forms. A one-dimensional Riemannian version of the consensus map is also proposed.
To date, there has been little empirical research investigating the specific types of help-seeking situations that arise when people interact with information in new searching environments such as digital libraries. This article reports the results of a project focusing on the identification of different types of help-seeking situations, along with types of factors that precipitate them among searchers of two different digital libraries. Participants (N=120) representing the general public in Milwaukee and New York City were selected for this study. Based on the analysis of multiple sources of data, the authors identify 15 types of help-seeking situations among this sample of novice digital library users. These situations are related to the searching activities involved in getting started, identifying relevant digital collections, browsing for information, constructing search statements, refining searches, monitoring searches, and evaluating results. Multiple factors that determine the occurrences of each type of help-seeking situation also are identified. The article concludes with a model that represents user, system, task, and interaction outcome as codeterminates in the formation of help-seeking situations, and presents the theoretical and practical implications of the study results.
Navigating through hyperlinks within a Web site to look for information from one of its Web pages without the support of a site map can be inefficient and ineffective. Although the content of a Web site is usually organized with an inherent structure like a topic hierarchy, which is a directed tree rooted at a Web site's homepage whose vertices and edges correspond to Web pages and hyperlinks, such a topic hierarchy is not always available to the user. In this work, we studied the problem of automatic generation of Web sites' topic hierarchies. We modeled a Web site's link structure as a weighted directed graph and proposed methods for estimating edge weights based on eight types of features and three learning algorithms, namely decision trees, naive Bayes classifiers, and logistic regression. Three graph algorithms, namely breadth-first search, shortest-path search, and directed minimum-spanning tree, were adapted to generate the topic hierarchy based on the graph model. We have tested the model and algorithms on real Web sites. It is found that the directed minimum-spanning tree algorithm with the decision tree as the weight learning algorithm achieves the highest performance with an average accuracy of 91.9%.
This article presents a bilingual video question answering (CIA) system, namely BVideoQA, which allows users to retrieve Chinese videos through English or Chinese natural language questions. Our method first extracts an optimal one-to-one string pattern matching according to the proposed dense and long N-gram match. On the basis of the matched string patterns, it gives a passage score based on our term-weighting scheme. The main contributions of this approach to multimedia information retrieval literatures include: (a) development of a truly bilingual video CIA system, (b) presentation of a robust bilingual passage retrieval algorithm to handle no-word-boundary languages such as Chinese and Japanese, (c) development of a large-scale bilingual video CIA corpus for system evaluation, and (d) comparisons of seven top-performing retrieval methods under the fair conditions. The experimental studies indicate that our method is superior to other existing approaches in terms of precision and main rank reciprocal rates. When ported to English, encouraging empirical results also are obtained. Our method is very important to Asian-like languages since the development of a word tokenizer is optional.
Knowledge Management Systems (KMS) have become increasingly popular as a knowledge-sharing tool in contemporary corporations. Enticing employees to seek knowledge from KMS remains an important concern for researchers and practitioners. Trust has been widely recognized in many studies as an important enabling factor for seeking knowledge; however, the role of trust in promoting knowledge-seeking behavior using KMS has not been adequately addressed. Drawing upon the extant literature on trust and information technology adoption, this article examines the relationships between the knowledge seekers' trust in the community of KMS users, their perceptions toward the system (perceived usefulness and perceived seeking efforts), and the intention to continually use the KMS. The results reveal that trust in the community of KMS users does not directly affect the employees' knowledge-seeking continuance intention; rather, it happens indirectly through a mediated effect of perceived usefulness of the KMS. Furthermore, we find that trust seems to be a stronger determinant of perceived usefulness than of perceived seeking efforts. Our study thus demonstrates the indirect, but still crucial, role of trust in knowledge-seeking behavior in the context of corporate KMS usage. Other findings and the implications of this study for both researchers and practitioners are correspondingly discussed.
In this research, we aim to identify factors that significantly affect the clickthrough of Web searchers. Our underlying goal is determine more efficient methods to optimize the clickthrough rate. We devise a clickthrough metric for measuring customer satisfaction of search engine results using the number of links visited, number of queries a user submits, and rank of clicked links. We use a neural network to detect the significant influence of searching characteristics on future user clickthrough. Our results show that high occurrences of query reformulation, lengthy searching duration, longer query length, and the higher ranking of prior clicked links correlate positively with future clickthrough. We provide recommendations for leveraging these findings for improving the performance of search engine retrieval and result ranking, along with implications for search engine marketing.
In this article, we performed a comparative study to investigate the performance of methods for detecting emerging research fronts. Three types of citation network, co-citation, bibliographic coupling, and direct citation, were tested in three research domains, gallium nitride (GaN), complex network (CNW), and carbon nanotube (CNT). Three types of citation network were constructed for each research domain, and the papers in those domains were divided into clusters to detect the research front. We evaluated the performance of each type of citation network in detecting a research front by using the following measures of papers in the cluster: visibility, measured by normalized cluster size, speed, measured by average publication year, and topological relevance, measured by density. Direct citation, which could detect large and young emerging clusters earlier, shows the best performance in detecting a research front, and co-citation shows the worst. Additionally, in direct citation networks, the clustering coefficient was the largest, which suggests that the content similarity of papers connected by direct citations is the greatest and that direct citation networks have the least risk of missing emerging research domains because core papers are included in the largest component.
Using 17 fully open-access journals published uninterruptedly during 2000 to 2004 in the field of library and information science, the present study investigates the impact of these open-access journals in terms of quantity of articles published, subject distribution of the articles, synchronous and diachronous impact factor, immediacy index, and journals' and authors' self-citation. The results indicate that during this 5-year publication period, there are as many as 1,636 articles published by these journals. At the same time, the articles have received a total of 8,591 Web citations during a 7-year citation period. Eight of 17 journals have received more than 100 citations. First Monday received the highest number of citations; however, the average number of citations per article was the highest in D-Lib Magazine. The value of the synchronous impact factor varies from 0.6989 to 1.0014 during 2002 to 2005, and the diachronous impact factor varies from 1.472 to 2.487 during 2000 to 2004. The range of the immediacy index varies between 0.0714 and 1.395. D-Lib Magazine has an immediacy index value above 0.5 in all the years whereas the immediacy index value varies from year to year for the other journals. When the citations of sample articles were analyzed according to source, it was found that 40.32% of the citations came from full-text articles, followed by 33.35% from journal articles. The percentage of journals' self-citation was only 6.04%.
While the Web has grown significantly in recent years, some portions of the Web remain largely underdeveloped, as shown in a lack of high-quality content and functionality. An example is the Arabic Web, in which a lack of well-structured Web directories limits users' ability to browse for Arabic resources. In this research, we proposed an approach to building Web directories for the underdeveloped Web and developed a proof-of-concept prototype called the Arabic Medical Web Directory (AMed-Dir) that supports browsing of over 5,000 Arabic medical Web sites and pages organized in a hierarchical structure. We conducted an experiment involving Arab participants and found that the AMedDir significantly outperformed two benchmark Arabic Web directories in terms of browsing effectiveness, efficiency, information quality, and user satisfaction. Participants expressed strong preference for the AMedDir and provided many positive comments. This research thus contributes to developing a useful Web directory for organizing the information in the Arabic medical domain and to a better understanding of how to support browsing on the underdeveloped Web.
This article presents and validates a clustering-based method for creating cultural ontologies for community-oriented information systems. The introduced semi-automated approach merges distributed annotation techniques, or subjective assessments of similarities between cultural categories, with established clustering methods to produce "cognate" ontologies. This approach is validated against a locally authentic ethnographic method, involving direct work with communities for the design of "fluid" ontologies. The evaluation is conducted with of a set of Native American communities located in San Diego County (CA, US). The principal aim of this research is to discover whether distributing the annotation process among isolated respondents would enable ontology hierarchies to be created that are similar to those that are crafted according to collaborative ethnographic processes, found to be effective in generating continuous usage across several studies. Our findings suggest that the proposed semiautomated solution best optimizes among issues of interoperability and scalability, deemphasized in the fluid ontology approach, and sustainable usage.
This article marries the study of serious leisure pursuits with library and information science's (LIS) interest in people's everyday use, need, seeking, and sharing of information. Using a qualitative approach, the role of information as a phenomenon was examined in relation to the leisure activity of hobbyist collecting. In the process, a model and a typology for these collectors were developed. We find that the information needs and information seeking of hobbyist collectors is best represented as an interrelationship between information and object needs, information sources, and interactions between collectors and their publics. Our model of the role of information in a particular domain of hobbyist collecting moves away from the idea of one individual seeking information from formal systems and shifts towards a model that takes seriously the social milieu of a community. This collecting community represents a layer of a social system with complex interactions and specialized information needs that vary across collector types. Only the serious collectors habitually engage in information seeking and, occasionally, in information dissemination, in the traditional sense yet information flows through the community and serve's as a critical resource for sustaining individual and communal collecting activities.
Governmental initiatives around scientific policy have progressively raised collaboration to priority status. In this context, a need has arisen to broaden the traditional approach to the analysis and study of research results by descending to the group or even the individual scale and supplementing the output-, productivity-, visibility- and impact-based focus with new measures that emphasize collaboration from the vantage of structural analysis. To this end, the present paper proposes new hybrid indicators for the analysis and evaluation of individual research results, popularity and prestige, that combine bibliometric and structural aspects. A case study was conducted of the nine most productive departments in Carlos III University of Madrid. The findings showed hybridization to be a tool sensitive to traditional indicators, but also to the new demands of modern science as a self-organized system of interaction among individuals, furnishing information on researchers' environments and the behaviour and attitudes adopted within those environments. (C) 2008 Elsevier Ltd. All rights reserved.
Digital libraries (DLs) are complex information systems which can present changes in their structure, content, and services. These complexities and dynamics make system maintenance a non-trivial task, since it requires periodical evaluation of the different DL components. Generally, these evaluations are customized per system and are performed only when problems occur and administrator intervention is required. This work aims to change the situation. We present 5SQual, a tool which provides ways to perform automatic and configurable evaluations of some of the most important DL components, among them, digital objects, metadata, and services. The tool implements diverse numeric indicators that are associated with eight quality dimensions described in the 5S quality model. Its generic architecture was developed to be applicable to various DLs and scenarios. In sum, the main contributions of this work include: (i) the design and implementation of 5SQual, a tool that validates a theoretical DL quality model; (ii) the demonstration of the applicability of the tool in several usage scenarios; and (iii) the evaluation (with usability specialists) of its graphical interface specially designed to guide the configuration of 5SQual evaluations. We also present the results of interviews conducted with administrators of real DLs regarding their expectations and opinions about 5SQual. (C) 2008 Elsevier Ltd. All rights reserved.
The use of scholarly publications that have not been formally published in e.g. journals is widespread in some fields. In the past they have been disseminated through various channels of informal communication. However, the Internet has enabled dissemination of these un-published and often unrefereed publications to a much wider audience. This is particularly interesting seen in relation to the highly disputed open access advantage as the potential advantage for low visibility publications has not been given much attention in the literature. The present study examines the role of working papers in economics during a 10-year period (1996-2005). It shows that working papers are increasingly becoming visible in the field specific databases. The impact of working papers is relatively low; however, high impact working paper series have citation rate levels similar to the low impact journals in the field. There is no tendency to an increase in impact during the 10 years which is the case for the high impact journals. Consequently, the result of this study does not provide evidence of an open access advantage for working papers in economics. (C) 2008 Elsevier Ltd. All rights reserved.
In this communication we perform an analysis of European science, investigating the way countries are joined in clusters according to their similarity. An extremely clear pattern arises, suggesting that geographical and cultural factors strongly influence the scientific fabric of these countries. Although it is seen that one of the major factors behind Science in Europe is, apparently, geographical proximity, bilateral cooperation between countries cannot fully account for the respective similarity. Long-term policies, planning and investment are also visible in the results. (C) 2009 Elsevier Ltd. All rights reserved.
Sentiment analysis is an important current research area. This paper combines rule-based classification, supervised learning and machine learning into a new combined method. This method is tested on movie reviews, product reviews and MySpace comments. The results show that a hybrid classification can improve the classification effectiveness in terms of micro- and macro-averaged F-1. F-1 is a measure that takes both the precision and recall of a classifier's effectiveness into account. In addition, we propose a semi-automatic, complementary approach in which each classifier can contribute to other classifiers to achieve a good level of effectiveness. (C) 2009 Elsevier Ltd. All rights reserved.
The Hirsch index is a number that synthesizes a researcher's output. It is the maximum number h such that the researcher has h papers with at least h citations each. Woeginger [ Woeginger, G.J. (2008a). An axiomatic characterization of the Hirsch-index. Mathematical Social Sciences, 56( 2), 224-232; Woeginger, G.J. (2008b). A symmetry axiom for scientific impact indices. Journal of Informetrics, 2( 3), 298-303] characterizes the Hirsch index when indices are assumed to be integer-valued. In this note, the Hirsch index is characterized, when indices are allowed to be real-valued, by adding to Woeginger's monotonicity two axioms in a way related to the concept of monotonicity. (C) 2009 Elsevier Ltd. All rights reserved.
We report data fusion experiments carried out on the four best-performing retrieval models from TREC 5. Three were conceptually/algorithmically very different from one another; one was algorithmically similar to one of the former. The objective of the test was to observe the performance of the 11 logical data fusion combinations compared to the performance of the four individual models and their intermediate fusions when following the principle of polyrepresentation. This principle is based on cognitive IR perspective (Ingwersen & Jarvelin, 2005) and implies that each retrieval model is regarded as a representation of a unique interpretation of information retrieval (IR). It predicts that only fusions of very different, but equally good, IR models may outperform each constituent as well as their intermediate fusions. Two kinds of experiments were carried out. One tested restricted fusions, which entails that only the inner disjoint overlap documents between fused models are ranked. The second set of experiments was based on traditional data fusion methods. The experiments involved the 30 TREC 5 topics that contain more than 44 relevant documents. In all tests, the Borda and CombSUM scoring methods were used. Performance was measured by precision and recall, with document cutoff values (DCVs) at 100 and 15 documents, respectively. Results show that restricted fusions made of two, three, or four cognitively/algorithmically very different retrieval models perform significantly better than do the individual models at DCV100. At DCV15, however, the results of polyrepresentative fusion were less predictable. The traditional fusion method based on polyrepresentation principles demonstrates a clear picture of performance at both DCV levels and verifies the polyrepresentation predictions for data fusion in IR. Data fusion improves retrieval performance over their constituent IR models only if the models all are quite conceptually/algorithmically dissimilar and equally and well performing, in that order of importance.
Social networks evolve over time with the addition and removal of nodes and links to survive and thrive in their environments. Previous studies have shown that the link-formation process in such networks is Influenced by a set of facilitators. However, there have been few empirical evaluations to determine the important facilitators. In a research partnership with law enforcement agencies, we used dynamic social-network analysis methods to examine several plausible facilitators of co-offending relationships in a large-scale narcotics network consisting of individuals and vehicles. Multivariate Cox regression and a two-proportion z-test on cyclic and focal closures of the network showed that mutual acquaintance and vehicle affillations were significant facilitators for the network under study. We also found that homophily with respect to age, race, and gender were not good predictors of future link formation in these networks. Moreover, we examined the social causes and policy implications for the significance and insignificance of various facilitators including common jails on future co-offending. These findings provide Important Insights into the link-formation processes and the resilience of social networks. In addition, they can be used to aid In the prediction of future links. The methods described can also help in understanding the driving forces behind the formation and evolution of social networks facilitated by mobile and Web technologies.
This article presents an exploratory study of "Blobgects," an experimental interface for an online museum catalog that enables social tagging and blogging activity around a set of cultural heritage objects held by a preeminent museum of anthropology and archaeology. This study attempts to understand not just whether social tagging and commenting about these objects is useful but rather whose tags and voices matter in presenting different "expert" perspectives around digital museum objects. Based on an empirical comparison between two different user groups (Canadian Inuit high-school students and museum studies students in the United States), we found that merely adding the ability to tag and comment to the museum's catalog does not sufficiently allow users to learn about or engage with the objects represented by catalog entries. Rather, the specialist language of the catalog provides too little contextualization for users to enter into the sort of dialog that proponents of Web 2.0 technologies promise. Overall, we propose a more nuanced application of Web 2.0 technologies within museums-one which provides a contextual basis that gives users a starting point for engagement and permits users to make sense of objects in relation to their own needs, uses, and understandings.
This study focuses on the task as a fundamental factor in the context of information seeking. The purpose of the study is to characterize kinds of tasks and to examine how different kinds of task give rise to different kinds of information-seeking behavior on the Web. For this, a model for information-seeking behavior was used employing dimensions of Information-seeking strategies (ISS), which are based on several behavioral dimensions. The analysis of strategies was based on data collected through an experiment designed to observe users' behaviors. Three tasks were assigned to 30 graduate students and data were collected using questionnaires, search logs, and interviews. The qualitative and quantitative analysis of the data Identified 14 distinct Information-seeking strategies. The analysis showed significant differences In the frequencies and patterns of ISS employed between three tasks. The results of the study are intended to facilitate the development of task-based information-seeking models and to further suggest Web information system designs that support the user's diverse tasks.
Expert finding is a key task in enterprise search and has recently attracted lots of attention from both research and industry communities. Given a search topic, a prominent existing approach is to apply some information retrieval (IR) system to retrieve top ranking documents, which will then be used to derive associations between experts and the search topic based on cooccurrences. However, we argue that expert finding is more sensitive to multiple levels of associations and document features that current expert finding systems insufficiently address, including (a) multiple levels of associations between experts and search topics, (b) document internal structure, and (c) document authority. We propose a novel approach that integrates the above-mentioned three aspects as well as a query expansion technique in a two-stage model for expert finding. A systematic evaluation is conducted on TREC collections to test the performance of our approach as well as the effects of multiple windows, document features, and query expansion. These experimental results show that query expansion can dramatically improve expert finding performance with statistical significance. For three well-known IR models with or without query expansion, document internal structures help improve a single window-based approach but without statistical significance, while our novel multiple window-based approach can significantly improve the performance of a single window-based approach both with and without document internal structures.
This study examines the criteria questioners use to select the best answers in a social Q&A site (Yahoo! Answers) within the theoretical framework of relevance research. A social Q&A site is a novel environment where people voluntarily ask and answer questions. In Yahoo! Answers, the questioner selects the answer that best satisfies his or her question and leaves comments on it. Under the assumption that the comments reflect the reasons why questioners select particular answers as the best, this study analyzed 2,140 comments collected from Yahoo! Answers during December 2007. The content analysis identified 23 individual relevance criteria in six classes: Content, Cognitive, Utility, Information Sources, Extrinsic, and Socioemotional. A major finding is that the selection criteria used in a social Q&A site have considerable overlap with many relevance criteria uncovered in previous relevance studies, but that the scope of socioemotional criteria has been expanded to include the social aspect of this environment. Another significant finding is that the relative importance of individual criteria varies according to topic categories. Socioemotional criteria are popular in discussion-oriented categories, content-oriented criteria in topic-oriented categories, and utility criteria in self-help categories. This study generalizes previous relevance studies to a new environment by going beyond an academic setting.
This article focuses on the process of scientific and scholarly communication. Data on open access publications on the Internet not only provides a supplement to the traditional citation indexes but also enables analysis of the microprocesses and daily practices that constitute scientific communication. This article focuses on a stage in the life cycle of scientific and scholarly information that precedes the publication of formal research articles in the scientific and scholarly literature. Binomial logistic regression models are used to analyse precise mechanisms at work in the transformation of a working paper (WP) into a journal article (JA) in the field of economics. The study unveils a fine-grained process of adapting WPs to their new context as JAs by deleting and adding literature references, which perhaps can be best captured by the term sculpting.
The representation of science as a citation density landscape and the study of scaling rules with the field-specific citation density as a main topological property was previously analyzed at the level of research groups. Here, the focus is on the individual researcher. In this new analysis, the size dependence of several main bibliometric indicators for a large set of individual researchers is explored. Similar results as those previously observed for research groups are described for individual researchers. The total number of citations received by scientists increases in a cumulatively advantageous way as a function of size (in terms of number of publications) for researchers in three areas: Natural Resources, Biology & Biomedicine, and Materials Science. This effect is stronger for researchers in low citation density fields. Differences found among thematic areas with different citation densities are discussed.
International co-authorship relations and university-industry-government (Triple Helix) relations have hitherto been studied separately. Using Japanese publication data for the 1981-2004 period, we were able to study both kinds of relations in a single design. In the Japanese file, 1,277,030 articles with at least one Japanese address were attributed to the three sectors, and we know additionally whether these papers were coauthored internationally. Using the mutual information in three and four dimensions, respectively, we show that the Japanese Triple-Helix system has been continuously eroded at the national level. However, since the mid-1990s, international coauthorship relations have contributed to a reduction of the uncertainty at the national level. In other words, the national publication system of Japan has developed a capacity to retain surplus value generated internationally. In a final section, we compare these results with an analysis based on similar data for Canada. A relative uncoupling of national university-industry-government relations because of international collaborations is indicated in both countries.
Knowledge resource is unique and valuable for a link to competitive advantage based on the knowledge-based perspective. Effective knowledge management is the major concern of contemporary business managers. The key determinant of effective knowledge management Is the firm's competitive strategy. The link between business strategy and knowledge management, while often discussed, has been widely Ignored in practice. Moreover, while knowledge management is complex in nature, it is difficult to directly translate a firm's competitive strategy into the specific knowledge management activities. This requires first defining knowledge strategy to guide further information technology (IT)-supported implementation approaches. Finally, the ultimate goal of knowledge management lies in the realization of firm performance. Previous studies have just discussed partial relationship among these relevant knowledge concepts rather than In an integrative manner. Thus, this research proposes a complete process-based model with four components: competitive strategy, knowledge strategy, Implementation approach, and firm performance. Empirical results have shown positive relationships between any two consecutive components and useful Insight for knowledge Implementation practice.
Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It alms to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified Into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.
Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.
Survivors of intimate partner violence face more than information gaps; many face powerful barriers in the form of information myths. Triangulating data from in-depth interviews and community bulletin board postings, this study incorporates insights from survivors, police, and shelter staff to begin mapping the information landscape through which survivors move. An unanticipated feature of that landscape is a set of 28 compelling information myths that prevent some survivors from making effective use of the social, legal, economic, and support resources available to them. This analysis of the sources, contexts, and consequences of these information myths is the first step in devising strategies to counter their ill effects.
It has been argued that the actual distribution of word frequencies could be reproduced or explained by generating a random sequence of letters and spaces according to the so-called intermittent silence process. The same kind of process could reproduce or explain the counts of other kinds of units from a wide range of disciplines. Taking the linguistic metaphor, we focus on the frequency spectrum, i.e., the number of words with a certain frequency, and the vocabulary size, i.e., the number of different words of text generated by an intermittent silence process. We derive and explain how to calculate accurately and efficiently the expected frequency spectrum and the expected vocabulary size as a function of the text size.
This study assesses the current state of responsibilities and skill sets required of cataloging professionals. It identifies emerging roles and competencies focusing on the digital environment and relates these to the established knowledge of traditional cataloging standards and practices. We conducted a content analysis of 349 job descriptions advertised in AutoCAT in 2005-2006. Multivariate techniques of cluster and multidimensional-scaling analyses were applied to the data. Analysis of job titles, required and preferred qualifications/skills, and responsibilities lends perspective to the roles that cataloging professionals play in the digital environment. Technological advances increasingly demand knowledge and skills related to electronic resource management, metadata creation, and computer and Web applications. Emerging knowledge and skill sets are increasingly being integrated into the core technical aspects of cataloging such as bibliographic and authority control and integrated library-system management. Management of cataloging functions is also in high demand. The results of the study provide insight on current and future curriculum design of library and information-science programs.
This article challenges recent research (Evans, 2008) reporting that the concentration of cited scientific literature increases with the online availability of articles and journals. Using Thomson Reuters' Web of Science, the present article analyses changes in the concentration of citations received (2- and 5-year citation windows) by papers published between 1900 and 2005. Three measures of concentration are used: the percentage of papers that received at least one citation (cited papers); the percentage of papers needed to account for 20%, 50%, and 80% of the citations; and the Herfindahl-Hirschman index (HHI). These measures are used for four broad disciplines: natural sciences and engineering, medical fields, social sciences, and the humanities. All these measures converge and show that, contrary to what was reported by Evans, the dispersion of citations is actually increasing.
The World Wide Web is growing at an enormous speed, and has become an indispensable source for information and research. New pages are constantly added, but there are additional processes as well: pages are moved or removed and/or their content changes. We report here the results of an eight year long project started in 1998, when multiple search engines were used to identify a set of pages containing the term informetrics. Data collection was repeated once a year for the last eight years (with the exception of 2000 and 2001) using both search engines and revisiting previously identified pages. The results show that the number of pages grew from 866 in 1998 to 28,914 in 2006 - a 33-fold growth. Besides the obvious growth of the topic on the Web, we observed both decay (pages disappearing from the Web) and modification. Even though most of the pages from 1998 either disappeared or ceased to contain the term informetrics, 165 pages (19.1%) still exist in 2006 and contain the search term. We followed the "fate" of these 165 pages: characterized the publishers, the contents and the changes that occurred the whole period. In recent years e-print servers and publishers' sites became sources of large number of pages related to informetrics. Longitudinal studies following the evolution of a topic on the Web are very important, since they provide insights about content and the underlying Web processes.
Research on the effects of collaboration in scientific research has been increasing in recent years. A variety of studies have been done at the institution and country level, many with an eye toward policy implications. However, the question of how to identify the most fruitful targets for future collaboration in high-performing areas of science has not been addressed. This paper presents a method for identifying targets for future collaboration between two institutions. The utility of the method is shown in two different applications: identifying specific potential collaborations at the author level between two institutions, and generating an index that can be used for strategic planning purposes. Identification of these potential collaborations is based on finding authors that belong to the same small paper-level community (or cluster of papers), using a map of science and technology containing nearly 1 million papers organized into 117,435 communities. The map used here is also unique in that it is the first map to combine the ISI Proceedings database with the Science and Social Science Indexes at the paper level.
How does our collective scholarly knowledge grow over time? What major areas of science exist and how are they interlinked? Which areas are major knowledge producers; which ones are consumers? Computational scientometrics - the application of bibliometric/scientometric methods to large-scale scholarly datasets - and the communication of results via maps of science might help us answer these questions. This paper represents the results of a prototype study that aims to map the structure and evolution of chemistry research over a 30 year time frame. Information from the combined Science (SCIE) and Social Science (SSCI) Citations Indexes from 2002 was used to generate a disciplinary map of 7,227 journals and 671 journal clusters. Clusters relevant to study the structure and evolution of chemistry were identified using JCR categories and were further clustered into 14 disciplines. The changing scientific composition of these 14 disciplines and their knowledge exchange via citation linkages was computed. Major changes on the dominance, influence, and role of Chemistry, Biology, Biochemistry, and Bioengineering over these 30 years are discussed. The paper concludes with suggestions for future work.
In this study some novel indicators and publication data resources are explored to study the dynamics of genomics research at three different levels: worldwide; national and at individual Research Centers. Our results indicate that the growth of genomics research worldwide seems to be stabilizing, whereas genomics research in the Netherlands aims at getting 'ready for the next step'. As we find differences in research dynamics at the level of individual Research Centers, governmental support in a 'next step' could take these differences into account. For this purpose, we introduce a general model of research dynamics and timing of research management, building on ideas of Price and Bonaccorsi. Based on this model a framework is presented to discuss steering options in relation to research dynamics. We apply this framework to Research Centers of the Netherlands Genomics Initiative (NGI) and discuss findings.
In recent issues of the ISSI Newsletter, Egghe [2006a] proposed the g-index and Kosmulski [2006] the h(2)-index, both claimed to be improvements on the original h-index proposed by Hirsch [2005]. The aim of this paper is to investigate the inter-relationships between these measures and also their time dependence using the stochastic publication/citation model proposed by Burrell [1992, 2007a]. We also make some tentative suggestions regarding the relative merits of these three proposed measures.
The purpose of this paper is to analyse the relationship between bureaucracy and research performance within Public Research Bodies. The research methodology is applied on a sample of 100 interviewed belonging to 11 institutes of National Research Council of Italy. The main finding is that within Italian Public Research Council there is academic bureaucratization that reduces performance and efficiency of institutes. In fact, institutes have two organizational behaviours: high bureaucracy - low performance and low bureaucracy - high performance. These bureaucratic tendencies are also present in other countries and particularly: the public research labs have an academic bureaucratization because of administrative burden necessary to the governance of the structures, whereas the universities have mainly an administrative bureaucratization generated by the increase of administrative staff in comparison with researchers and faculty.
A novel subject-delineation strategy has been developed for the retrieval of the core literature in bioinformatics. The strategy combines textual components with bibliometric, citation-based techniques. This bibliometrics-aided search strategy is applied to the 1980-2004 annual volumes of the Web of Science. Retrieved literature has undergone a structural as well as quantitative analysis. Patterns of national publication activity, citation impact and international collaboration are analysed for the 1990s and the new millennium.
The aim of this paper is to describe Spanish universities by means of structural, input and output indicators, to explore the relationship between those indicators and to analyse university behaviour in different dimensions. Seniority of the universities and environmental conditions are taken into account, together with input and output indicators, as well as others related to the networks and links established. Our results will contribute to the knowledge of the university research system in Spain, producing data that could be useful for research management at the institutional, regional and national level.
Scientometric predictors of research performance need to be validated by showing that they have a high correlation with the external criterion they are trying to predict. The UK Research Assessment Exercise (RAE) - together with the growing movement toward making the full-texts of research articles freely available on the web - offer a unique opportunity to test and validate a wealth of old and new scientometric predictors, through multiple regression analysis: Publications, journal impact factors, citations, co-citations, citation chronometrics (age, growth, latency to peak, decay rate), hub/authority scores, h-index, prior funding, student counts, co-authorship scores, endogamy/exogamy, textual proximity, download/co-downloads and their chronometrics, etc. can all be tested and validated jointly, discipline by discipline, against their RAE panel rankings in the forthcoming parallel panel-based and metric RAE in 2008. The weights of each predictor can be calibrated to maximize the joint correlation with the rankings. Open Access Scientometrics will provide powerful new means of navigating, evaluating, predicting and analyzing the growing Open Access database, as well as powerful incentives for making it grow faster.
It has been shown that information collected from and about links between web pages and web sites can reflect real world phenomena and relationships between the organizations they represent. Yet, government linking has not been extensively studied from a webometric point of view. The aim of this study was to increase the knowledge of governmental interlinking and to shed some light on the possible real world phenomena it may indicate. We show that interlinking between local government bodies in Finland follows a strong geographic, or rather a geopolitical pattern and that governmental interlinking is mostly motivated by official cooperation that geographic adjacency has made possible.
The German Research Foundation's (DFG) Emmy Noether Programme aims to fund excellent young researchers in the postdoctoral phase and, in particular, to open up an alternative to the traditional route to professorial qualification via the Habilitation (venia legendi). This paper seeks to evaluate this funding programme with a combination of methods made up of questionnaires, interviews, appraisals of the reviews, and bibliometric analyses. The key success criteria in this respect are the frequency of professorial appointments plus excellent research performance demonstrated in the form of publications. Up to now, such postdoc programme evaluations have been conducted only scarcely. In professional terms, approved applicants are actually clearly better placed. The personal career satisfaction level is also higher among funding recipients. Concerning publications and citations, some minor performance differences could be identified between approved and rejected applicants. Nevertheless, we can confirm that, on average, the reviewers indeed selected the slightly better performers from a relatively homogenous group of very high-performing applicants. However, a comparison between approved and rejected applicants did not show that participation in the programme had decisively influenced research performance in the examined fields of medicine and physics.
Opinions in the literature on the possible relationship between co-authorship and number of citations vary. This paper contributes to the debate with a further analysis of the subject, taking account of the number and quality of citations found for multi-(author, institution, country) and single-authored papers. The study is based on the scientific production of ten Carlos III University of Madrid departmental areas between 1997 and 2003 as reflected in the ISI Web of Science, and the number of times the respective papers were cited between 1997 and 2004. Univariate multifactorial analysis of variance (ANOVA) was used to verify the relationship between multi-authorship and visibility. The correlation between multi-institutional and multi-national authorship and the quartile of the citing journals was analyzed with correspondence analysis. The results show that while multi-institutional and multi-national authorship raise the number of citations, co-authorship and number of citations are unrelated. Correspondence analysis failed to show any correlation between the quartile of the citing journal and multi-institutional or multinational authorship, but did reveal a relationship between citing journal quartile and departmental area.
The possibilities of the Response Surface Methodology (RSM) has been explored within the ambit of Scientific Activity Analysis. The case of the system "Departments of the Area of Health Sciences of the University of Navarre (Spain)" has been studied in relation to the system "Scientific Community in the Health Sciences", from the perspective of input/output models (factors/response). It is concluded that the RSM reveals the causal relationships between factors and responses through the construction of polynomial mathematical models. Similarly, quasiexperimental designs are proposed, these permitting scientific activity to be analysed with minimum effort and cost and high accuracy.
Some documents provoke emotions in people viewing them. Will it be possible to describe emotions consistently and use this information in retrieval systems? We tested collective (statistically aggregated) emotion indexing using images as examples. Considering psychological results, basic emotions are anger, disgust, fear, happiness, and sadness. This study follows an approach developed by Lee and Neal (2007) for music emotion retrieval and applies scroll bars for tagging basic emotions and their intensities. A sample comprising 763 persons tagged emotions caused by images (retrieved from www.Flickr.com) applying scroll bars and (linguistic) tags. Using SPSS, we performed descriptive statistics and correlation analysis. For more than half of the images, the test persons have clear emotion favorites. There are prototypical images for given emotions. The document-specific consistency of tagging using a scroll bar is, for some images, very high. Most of the (most commonly used) linguistic tags are on the basic level (in the sense of Rosch's basic level theory). The distributions of the linguistic tags in our examples follow an inverse power-law. Hence, it seems possible to apply collective image emotion tagging to image information systems and to present a new search option for basic emotions. This article is one of the first steps in the research area of emotional information retrieval (EmIR).
Web searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list-based search interface paradigm does not scale well to mobile devices due to their inherent limitations. In this article, we investigate the application of search results clustering, used with some success for desktop computer searches, to the mobile scenario. Building on CREDO (Conceptual Reorganization of Documents), a Web clustering engine based on concept lattices, we present its mobile versions Credino and SmartCREDO, for PDAs and cell phones, respectively. Next, we evaluate the retrieval performance of the three prototype systems. We measure the effectiveness of their clustered results compared to a ranked list of results on a subtopic retrieval task, by means of the device-independent notion of subtopic reach time together with a reusable test collection built from Wikipedia ambiguous entries. Then, we make a cross-comparison of methods (i.e., clustering and ranked list) and devices (i.e., desktop, PDA, and cell phone), using an interactive information-finding task performed by external participants. The main finding is that clustering engines are a viable complementary approach to plain search engines both for desktop and mobile searches especially, but not only, for multitopic informational queries.
Session characteristics taken from large transaction logs of three Web search environments (academic Web site, public search engine, consumer health information portal) were modeled using cluster analysis to determine if coherent session groups emerged for each environment and whether the types of session groups are similar across the three environments. The analysis revealed three distinct clusters of session behaviors common to each environment: "hit and run" sessions on focused topics, relatively brief sessions on popular topics, and sustained sessions using obscure terms with greater query modification. The findings also revealed shifts in session characteristics over time for one of the datasets, away from "hit and run" sessions toward more popular search topics. A better understanding of session characteristics can help system designers to develop more responsive systems to support search features that cater to identifiable groups of searchers based on their search behaviors. For example, the system may identify struggling searchers based on session behaviors that match those identified in the current study to provide context sensitive help.
In this article, we present the results of a study that systematically explored "externalized knowledge" in the context of office work. The aim of the study was to investigate what knowledge information workers externalize into their physical environment (e.g., when shuffling or relocating paper documents on their desks) and how this knowledge is "encoded" in visual cues. Based on field data, categories of visual cues and semantic judgments were derived, a way to aggregate these categories into clusters was proposed, and a model of the relationships among cues and judgments was developed. We conclude that visual cues of physical documents represent, besides their content, information about the features of content and about different aspects of task context and overview. Three clusters outline the space of visual cues: (a) content that resides in (b) a holder that is (c) located somewhere in space. The three identified clusters of semantic judgments express three aggregation levels of one's work: (a) individual document, (b) the context of this document, and (c) an overview over someone's activities. The nature of the relationships among these clusters is reviewed through three aspects: content-dependence, flexibility, and effort. We conclude with implications of the results and future research directions.
A survey of the 113 academic libraries of the Association of Research Libraries (ARL) was administered to investigate whether Web usability Policies/Standards/Guidelines (PSGs) are in place, the levels of difficulty surrounding implementation, the impact of PSGs on actual usability practice, e.g., testing, resources, etc., and the relationship between ARL ranking and usability practice or PSGs. The response rate was over 74%. Results show that 25 (30%) libraries have PSGs dedicated to Web usability. Seventy-one (85%) libraries have conducted usability testing on their main Web sites, online public access catalogs (OPAC), or lower-level pages. Nevertheless, only seven libraries performed iterative testing of these platforms at pre-, during, and post-design stages. Statistical analysis indicates that having PSGs does not affect the amount of usability testing performed or the resources available for library Web usability initiatives. In addition, ARL ranking has no or little impact on PSGs, testing, or resources.
Academic librarians seeking to assess information literacy skills often focus on testing as a primary means of evaluation. Educators have long recognized the limitations of tests, and these limitations cause many educators to prefer rubric assessment to test-based approaches to evaluation. In contrast, many academic librarians are unfamiliar with the benefits of rubrics. Those librarians who have explored the use of information literacy rubrics have not taken a rigorous approach to methodology and interrater reliability. This article seeks to remedy these omissions by describing the benefits of a rubric-based approach to information literacy assessment, identifying a methodology for using rubrics to assess information literacy skills, and analyzing the interrater reliability of information literacy rubrics in the hands of university librarians, faculty, and students. Study results demonstrate that Cohen's kappa can be effectively employed to check interrater reliability. The study also indicates that rubric training sessions improve interrater reliability among librarians, faculty, and students.
In this article, we look at the personality characteristic "global innovativeness" as a predictor for the adoption of consumer electronics; the latter being termed "actualized innovativeness." Global innovativeness is tested as a predictor for three levels of actualized innovativeness: at the domain-specific, cluster-specific, and product-specific levels. Our theoretical propositions are tested using two different surveys, one consisting of adolescent bachelor students (n = 138) and the second consisting of a heterogeneous broad sample (n = 450). The results of these studies show that the higher the level of abstraction of actualized innovativeness, the stronger the effects of global innovativeness. The implications of these findings are discussed.
The Open Source Software (OSS) development model has emerged as an important competing paradigm to proprietary alternatives; however, insufficient research exists to understand the influence of some OSS project characteristics on the level of activity of the development process. A basic such characteristic is the selection of the project's software license. Drawing upon social movement theory, our study examined the relationship between OSS licenses and project activity. Some OSS licenses include a "copyleft" clause, which requires that if derivative products are to be released, it must be done under the license the original product had. We hypothesize that copylefted licenses, as opposed to non-copylefted licenses, are associated with higher developer membership and coding activity, faster development speed, and longer developer permanence in the project. To test the hypotheses, we used archival data sources of working OSS projects spanning several years of development time. We discuss practical and theoretical implications of the results as well as future research ideas.
Given the prevalence of community-driven knowledge sites (CKSs), such as Naver Knowledge In and Yahoo! Answers, it has become important to understand the key drivers of user decision-making processes. In the online environment, building and maintaining user trust belief is a significant challenge to the continuing growth and long-term viability of information systems (IS), due to the open nature of the Internet. This study develops a theoretical framework to examine the role of trust belief in the CKS post-adoption phenomenon. This study also presents an investigation of the key antecedents of trust belief to understand the mechanism of building user trust belief in a CKS. Based on the multidimensional trust formation model, this study posits user satisfaction, perceived reputation, disposition to trust, and information quality as the trust belief antecedents. Data collected from 258 users who have direct experiences with Naver Knowledge In was used to test the research model using structural equation modeling. This study preliminarily confirms the salience of user trust belief in the CKS post-adoption phenomenon. The findings also indicate that the cognitive-based, affective-based, and personality-oriented trust antecedents play a significant role in enhancing trust belief in a CKS. Theoretical and practical implications of the findings are described.
The relation between Pearson's correlation coefficient and Salton's cosine measure is revealed based on the different possible values of the division of the L(1)-norm and the L(2)-norm of a vector. These different values yield a sheaf of increasingly straight lines which together form a cloud of points, being the investigated relation. The theoretical results are tested against the author cocitation relations among 24 informetricians for whom two matrices can be constructed, based on co-citations: the asymmetric occurrence matrix and the symmetric cocitation matrix. Both examples completely confirm the theoretical results. The results enable us to specify an algorithm that provides a threshold value for the cosine above which none of the corresponding Pearson correlations would be negative. Using this threshold value can be expected to optimize the visualization of the vector space.
With the increasing number of digital documents, the ability to automatically classify those documents both efficiently and accurately is becoming more critical and difficult. One of the major problems in text classification is the high dimensionality of feature space. We present the ambiguity measure (AM) feature-selection algorithm, which selects the most unambiguous features from the feature set. Unambiguous features are those features whose presence in a document indicate a strong degree of confidence that a document belongs to only one specific category. We apply AM feature selection on a naive Bayes text classifier. We favorably show the effectiveness of our approach in outperforming eight existing feature-selection methods, using five benchmark datasets with a statistical significance of at least 95% confidence. The support vector machine (SVM) text classifier is shown to perform consistently better than the naiive Bayes text classifier. The drawback, however, is the time complexity in training a model. We further explore the effect of using the AM feature-selection method on an SVM text classifier. Our results indicate that the training time for the SVM algorithm can be reduced by more than 50%, while still improving the accuracy of the text classifier. We favorably show the effectiveness of our approach by demonstrating that it statistically significantly (99% confidence) outperforms eight existing feature-selection methods using four standard benchmark datasets.
Designing fair and unbiased metrics to measure the "level of excellence" of a scientist is a very significant task because they recently also have been taken into account when deciding faculty promotions, when allocating funds, and so on. Despite criticism that such scientometric evaluators are confronted with, they do have their merits, and efforts should be spent to arm them with robustness and resistance to manipulation. This article alms at initiating the study of the coterminal citations-their existence and implications-and presents them as a generalization of self-citations and of co-citation; it also shows how they can be used to capture any manipulation attempts against scientometric indicators, and finally presents a new index, the f index, that takes into account the coterminal citations. The utility of the new index is validated using the academic production of a number of esteemed computer scientists. The results confirm that the new index can discriminate those individuals whose work penetrates many scientific communities.
External information plays an important role in group decision-making processes, yet research about external information support for Group Support Systems (GSS) has been lacking. In this study, we propose an approach to build a concept space to provide external concept support for GSS users. Built on a Web mining algorithm, the approach can mine a concept space from the Web and retrieve related concepts from the concept space based on users' comments in a real-time manner. We conduct two experiments to evaluate the quality of the proposed approach and the effectiveness of the external concept support provided by this approach. The experiment results indicate that the concept space mined from the Web contained qualified concepts to stimulate divergent thinking. The results also demonstrate that external concept support in GSS greatly enhanced group productivity for idea generation tasks.
This introductory article explores how the use of information affects the effectiveness of early warning systems. By effectiveness, we refer to the capacity of the system to detect and decide on the existence of a threat. There are two aspects to effectiveness: (a) being able to see the evidence that is indicative of a threat and (b) making the decision, based on the weight of the evidence, to warn that the threat exists. In early warning, information use is encumbered by cues that are fallible and equivocal. Cues that are true indicators of a threat are obscured in a cloud of events generated by chance. Moreover, policy makers face the difficult decision of whether to issue a warning based on the information received. Because the information is rarely complete or conclusive, such decisions have to consider the consequences of failing to warn or giving a false warning. We draw on sociocognitive theories of perception and judgment to analyze these two aspects of early warning: detection accuracy (How well does perception correspond to reality?) and decision sensitivity (How much evidence is needed to activate warning?) Using cognitive continuum theory, we examine how detection accuracy depends on the fit between the information needs profile of the threat and the information use environment of the warning system. Applying signal detection theory, we investigate how decision sensitivity depends on the assessment and balancing of the risks of misses and false alarms inherent in all early warning decision making.
The Scholarly Database aims to serve researchers and practitioners interested in the analysis, modelling, and visualization of large-scale data sets. A specific focus of this database is to support macro-evolutionary studies of science and to communicate findings via knowledge-domain visualizations. Currently, the database provides access to about 18 million publications, patents, and grants. About 90% of the publications are available in full text. Except for some datasets with restricted access conditions, the data can be retrieved in raw or pre-processed formats using either a web-based or a relational database client. This paper motivates the need for the database from the perspective of bibliometric/scientometric research. It explains the database design, setup, etc., and reports the temporal, geographical, and topic coverage of data sets currently served via the database. Planned work and the potential for this database to become a global testbed for information science research are discussed at the end of the paper.
We present an application of the h-index in a context which does not include publications or citations. Rankings of library classification categories using the h-, g-and R-index are shown to be statistically equivalent. Moreover these indices seem to have the same discriminating power, as measured by the Gini concentration index. We further present best fitting Zipf-Mandelbrot functions for the h-distributions of classifications in different libraries.
Recently, the Russian government has ordered evaluation and reform of the basic research system. As a consequence, the number of research staff at the Russian Academy of Sciences will be reduced by 20% by 2007. The basis for research evaluation and institute budgeting will be bibliometric indicators. In view of these changes we look at the Russian publication output and argue that (1)publication output and citedness have to be considered in relation to the level of expenditure on R&D bibliometric indicators depend strongly on the database used (ISI's databases are biased) and their interpretation can be confusing; better coverage of Russian publications or a Russian Science Citation Index are needed. Also, research results are communicated in more ways than paper publications. policy makers have misused ISI statistics to demonstrate "a low level" of Russian R&D. Our paper is a part of a project designed to trace R&D development in a transition economy and knowledge transfer from basic research to innovation. Results of our project shed light on science policy and the social issues due to the indiscriminate introduction of quantitative indicators.
Bibliometric maps have the potential to become useful tools for science policy issues. The complexity of the structures, however, makes it often very difficult to interpret the results. In this study, we present a case study in which we use the bibliometric mapping results to address a high level science policy issue of research efficiency. By revealing the results in an alternative way, we increased the utility of bibliometric mapping within the science policy context. Moreover, by including additional information to the entities in the landscape, we provide useful input for the research potential.
This paper investigates, through an analysis of the published literature, the notion held by several people that HIV/AIDS in Africa is unique. Using co-word and multidimensional scaling (MDS) analyses of MEDLINE-extracted HIV/AIDS records, this study used five lists of terms to investigate the related-ness of various factors and diseases to HIV/AIDS. The lists consisted of risk factors, sexually transmitted diseases, tropical diseases, opportunistic diseases, and pre-disposing factors. Data (i.e. words.txt - consisting of keywords/phrases describing the aforementioned factors and diseases; and text.txt - containing HIV/AIDS papers' titles) were analyzed using TI computer-aided application software, developed by Leydesdorff. Results revealed that several factors and diseases that are pre-dominant in Sub-Saharan Africa exhibited strong and high pattern of co-occurrences with HIV/AIDS, implying close associated-ness with the epidemic in the region. Further areas of research, whose results will be used to make conclusive observations and arguments concerning the uniqueness of HIV/AIDS in Sub-Saharan Africa, are recommended.
This article reports for first time the state of science and technology in the African Continent on the basis of two scientometric indicators - number of research publications and number of patents awarded. Our analysis shows that Africa produced 68,945 publications over the 2000-2004 period or 1.8% of the World's publications. In comparison India produced 2.4% and Latin America 3.5% of the World's research. More detailed analysis reveals that research in Africa is concentrated in just two countries - South Africa and Egypt. These two counties produce just above 50% of the Continent's publications and the top eight countries produce above 80% of the Continent's research. Disciplinary analysis reveals that few African countries have the minimum number of scientists required for the functioning of a scientific discipline. Examination of the Continent's inventive profile, as manifested in patents, indicates that Africa produces less than one thousand of the world's inventions. Furthermore 88% of the Continent's inventive activity is concentrated in South Africa. The article recommends that the African Governments should pay particular attention in developing their national research systems.
Among classical bibliometric indicators, direct and relative impact measures for countries or other players in science are appealing and standard. Yet, as shown in this article, they may exhibit undesirable statistical properties, or at least ones that pose questions of interpretation in evaluation and benchmarking contexts. In this article, we address two such properties namely sensitivity to the Yule-Simpson effect, and a problem related to convexity. The Yule-Simpson effect can occur for direct impacts and, in a variant form, for relative impact, causing an apparent incoherence between field values and the aggregate (all-fields) value. For relative impacts, it may result in a severe form of 'out-range' of aggregate values, where a player's relative impact shifts from 'good' to 'bad', or conversely. Out-range and lack of convexity in general are typical of relative impact indicators. Using empirical data, we suggest that, for relative impact measures, 'out-range' due to lack of convexity is not exceptional. The Yule-Simpson effect is less frequent, and especially occurs for small players with particular specialisation profiles.
Within the field of the organisation of science, concerns about how academics generate patents tend to focus on a single set of either national or international patents. The main aim of this research is to study both national and international patenting in order to understand their differences. We have approached this issue from both a historical and an economic perspective, using data from the Spanish National Research Council (CSIC), the largest PRO in Spain. Three periods can be distinguished in the CSIC's history, according to the political context, namely the dictatorship (1939-1975), the transition to democracy (1976-1986) and democracy (1987-to date). The prevailing legal and institutional framework has marked the way in which patenting by CSIC has evolved in each of these periods. The current situation is one in which there is strong internationalisation of patenting activity, and in this most-recent period we explore trends in some of the economic influences on patenting activity. We conclude that the political and normative context may shape the culture of international patenting at PROs like the CSIC and that increasing technological cooperation has supported this internationalisation. However, very often foreign partners are included in the application in order to extend protection abroad for commercial reasons, so their number may not be a good indicator of inventive activity.
We analyze the relation between funding and output using bibliometric methods with field normalized data. Our approach is to connect individual researcher data on funding from Swedish university databases to data on incoming grants using the specific personal ID-number. Data on funding include the person responsible for the grant. All types of research income are considered in the analysis yielding a project database with a high level of precision. Results show that productivity can be explained by background variables, but that quality of research is more or less un-related to background variables.
National shares of worldwide publications in the Science Citation Index (SCI) have shifted recently. The long-term decline in U.S. share accelerated in the mid-1990s, and now the EU has joined this decline. Not coincidentally, the shares of some countries have increased sharply, particularly those of China, S. Korea, Taiwan, and Singapore. Since the SCI constantly adds new journals, one reason might be that newly added journals were more favorable to them. To test this, the database was partitioned into "old journals" (added before 1995) and "new journals," added afterward. The analysis was done for eight of the 20 fields of science defined by the National Science Indicator CD. In some fields, new journals were indeed much more favorable to the Asians. In some fields, however, new journals were actually more favorable to the U.S. In aggregate over the eight fields analyzed, the size of this effect was too small to account for much of the sharp changes in national shares. Furthermore tests between old and new journals find that differences in most fields are not statistically significant. The results provide evidence that the SCI can be used to accurately track national publication changes over time.
A case study of an emerging research area is presented dealing with the creation of organic thin film transistors, a subtopic within the general area called "plastic electronics." The purpose of this case study is to determine the structural properties of the citation network that may be characteristic of the emergence, development, and application or demise of a research area. Research on organic thin film transistors is highly interdisciplinary, involving journals and research groups from physics, chemistry, materials science, and engineering. There is a clear path to industrial applications if certain technical problems can be overcome. Despite the applied nature and potential for patentable inventions, scholarly publications from both academia and industry have continued at a rapid pace through 2007. The question is whether the bibliometric indicators point to a decline in this area due to imminent commercialization or to insurmountable technical problems with these materials.
The present study is part of an ongoing project on clustering European research institutions according to their publication profiles. Using hierarchical clustering eight clusters have been found the optimum solution for the classification. Aim of the present study is a structural analysis for the evaluation of research performance of specialised and multidisciplinary institutions. A breakdown by subject fields is used to characterise field-specific peculiarities of individual clusters by bibliometric indicators and to allow comparison within the same and among different clusters. Finally, benchmarks can then be used to study national research performance on basis of the institutional classification.
In this study we have focused on long term developments of various types of scientific publishing, and the field-normalized impact generated by these various types. The types of scientific output distinguished are output resulting from international cooperation, national cooperation, and single address publications, in which no apparent cooperation is found. A fourth type is distinguished by focusing on first authorship, within the international cooperation output. Changes in especially the share of a country's output from first-authored international cooperation and the share of single address publications can be regarded as indicators of strength and/or weakness of a science system.
The Garfield (Impact) Factor characterizes the measure of the up to date specific contribution of scientific journals to the total impact of the journals in a special field. A new indicator (Current Contribution Index, CCI) was introduced in order to characterize the relative contribution of journals to recent, relevant knowledge of a corresponding field. The CC Index relates the number of citations received by a journal in a given year to the total number of citations obtained by all journals of the corresponding field in that year. Mean Garfield Factors and mean Current Contribution Indexes were calculated for some fields and several journals. No significant correlation was found between the Garfield Factor (GF) and Current Contribution Index (CCI) of journals. The ratios of the GF to CCI referring to the corresponding top 10, 20 or 50 per cent of the journals ranked by decreasing GF and CCI, strongly differ by field.
The paper demonstrates visualization technique that show the collaboration structure of institutions in the specialty and the researchers that function as weak ties among them. Institution names were extracted from the collection of papers and disambiguated using the Derwent Analytics (v1.2) software product. Institutions were clustered into collaboration groups based on their co-occurrence in papers. A crossmap of clustered institutions against research fronts, which were derived using bibliographic coupling analysis, shows the research fronts that specific institutions participate in, their collaborator institutions and the research fronts in which those collaborations occurred. A crossmap of institutions to author teams, derived from co-authorship analysis, reveals research teams in the specialty and their general institutional affiliation, and further identifies the researchers that function as weak ties and the institutions that they link. The case study reveals that the techniques introduced in this paper can be used to extract a large amount of useful information about institutions participating in a research specialty.
Although many studies have analyzed the "synchronic" correlation of properties between authors and their co-authors, the "diachronic" correlation of properties, i.e., the correlation between their subsequent and precedent activity, has not yet been sufficiently studied using quantitative methods. This study pays attention not only to productivity but also the importance in the collaboration network as a measure of the researcher's activity, and clarifies whether there is any connection between (i) the researcher's activity subsequent to a collaboration and (ii) the collaborator's precedent activity, aiming at deriving knowledge about the diachronic effect of collaborators.
Bibliometric measures for evaluating research units in the book-oriented humanities and social sciences are underdeveloped relative to those available for journal-oriented science and technology. We therefore present a new measure designed for book-oriented fields: the "libcitation count." This is a count of the libraries holding a given book, as reported in a national or international union catalog. As librarians decide what to acquire for the audiences they serve, they jointly constitute an instrument for gauging the cultural impact of books. Their decisions are informed by knowledge not only of audiences but also of the book world (e.g., the reputations of authors and the prestige of publishers). From libcitation counts, measures can be derived for comparing research units. Here, we imagine a match-up between the departments of history, philosophy, and political science at the University of New South Wales and the University of Sydney in Australia. We chose the 12 books from each department that had the highest libcitation counts in the Libraries Australia union catalog during 2000 to 2006. We present each book's raw libcitation count, its rank within its Library of Congress (LC) class, and its LC-class normalized libcitation score. The latter is patterned on the item-oriented field normalized citation score used in evaluative bibliometrics. Summary statistics based on these measures allow the departments to be compared for cultural impact. Our work has implications for programs such as Excellence in Research for Australia and the Research Assessment Exercise in the United Kingdom. It also has implications for data mining in OCLC's WorldCat.
This paper explores the ISI Journal Citation Reports (JCR) bibliographic and subject structures through Library of Congress (LC) and American research libraries cataloging and classification methodology. The 2006 Science Citation Index JCR Behavioral Sciences subject category journals are used as an example. From the library perspective, the main fault of the JCR bibliographic structure is that the JCR mistakenly identifies journal title segments as journal bibliographic entities, seriously affecting journal rankings by total cites and the impact factor. In respect to JCR subject structure, the title segment, which constitutes the JCR bibliographic basis, is posited as the best bibliographic entity for the citation measurement of journal subject relationships. Through factor analysis and other methods, the JCR subject categorization of journals is tested against their LC subject headings and classification. The finding is that JCR and library journal subject analyses corroborate, clarify, and correct each other.
Participating in scholarly events (e.g., conferences, workshops, etc.) as an elite-group member such as an organizing committee chair or member, program committee chair or member, session chair, invited speaker, or award winner is beneficial to a researcher's career development. The objective of this study is to investigate whether elite-group membership for scholarly events is representative of scholars' prominence, and which elite group is the most prestigious. We collected data about 15 global (excluding regional) bioinformatics scholarly events held in 2007. We sampled (via stratified random sampling) participants from elite groups in each event. Then, bibliometric indicators (total citations and h index) of seven elite groups and a non-elite group, consisting of authors who submitted at least one paper to an event but were not included in any elite group, were observed using the Scopus Citation Tracker. The Kruskal-Wallis test was performed to examine the differences among the eight groups. Multiple comparison tests (Dwass, Steel, Critchlow-Fligner) were conducted as follow-up procedures. The experimental results reveal that scholars in an elite group have better performance in bibliometric indicators than do others. Among the elite groups, the invited speaker group has statistically significantly the best performance while the other elite-group types are not significantly distinguishable. From this analysis, we confirm that elite-group membership in scholarly events, at least in the field of bioinformatics, can be utilized as an alternative marker for a scholar's prominence, with invited speaker being the most important prominence indicator among the elite groups.
Scoring rules (or score-based rankings or summation-based rankings) form a family of bibliometric rankings of authors such that authors are ranked according to the sum over all their publications of some partial scores. Many of these rankings are widely used (e.g., number of publications, weighted or not by the impact factor, by the number of authors, or by the number of citations). We present an axiomatic analysis of the family of all scoring rules and of some particular cases within this family.
In this article, bibliometric indicators with publications and citations are used for a direct research-performance comparison among different or interdisciplinary categories, the work of individual scientists, and their research teams and institutions. For example, basic research performances of some projects at the Korea Institute of Science and Technology (KIST) were assessed using bibliographic factors with IPQ-Normalized impact factor to compare with an international level and other research groups in different or interdisciplinary fields. Some research teams at KIST showed higher quality publications in terms of the international measures.
In this article we present FLUX-CiM, a novel method for extracting components (e.g., author names, article titles, venues, page numbers) from bibliographic citations. Our method does not rely on patterns encoding specific delimiters used in a particular citation style. This feature yields a high degree of automation and flexibility, and allows FLUX-CiM to extract from citations in any given format. Differently from previous methods that are based on models learned from user-driven training, our method relies on a knowledge base automatically constructed from an existing set of sample metadata records from a given field (e.g., computer science, health sciences, social sciences, etc.). These records are usually available on the Web or other public data repositories. To demonstrate the effectiveness and applicability of our proposed method, we present a series of experiments in which we apply it to extract bibliographic data from citations in articles of different fields. Results of these experiments exhibit precision and recall levels above 94% for all fields, and perfect extraction for the large majority of citations tested. In addition, in a comparison against a state-of-the-art information-extraction method, ours produced superior results without the training phase required by that method. Finally, we present a strategy for using bibliographic data resulting from the extraction process with FLUX-CiM to automatically update and expand the knowledge base of a given domain. We show that this strategy can be used to achieve good extraction results even if only a very small initial sample of bibliographic records is available for building the knowledge base.
This article analyzes the structure of the hyperlink network formed by the Web pages of Japanese public libraries and its relationship with the network formed by Inter-Library Loans (ILLs), the traditional system of cooperation among public libraries. Our results indicate that (a) the hyperlink network is effectively connected in the sense that each library is reachable from other libraries by clicking a few links and (b) the network has many groups of libraries that cooperate with each other. Most of the cliques consist of prefectural libraries or libraries in the same prefecture. The hyperlink network shows two of the same tendencies as the ILL network: (a) The connection among libraries that are geographically close to each other or in the same administrative unit is strong, and (b) prefectural libraries occupy the central position. There are differences between them, however, in terms of the amount of ILLs and cliques. We conclude that Japanese public libraries have formed a network on the Web that is strongly affected by traditional cooperation, but also incorporates some new types of cooperation from the perspective of cliques.
Although the role the Internet plays in globalization has been widely discussed, relatively little is known about the extent to which users in different countries visit the same Web sites. Surprisingly, no prior research in the literature has empirically addressed this topic in a systematic way. Based on the theory of life in the round and related concepts of information behavior, this article reports an attempt to fill the gap by looking at how cultural, geodemographical, and economic factors underpin the extent which people from different countries visit the same Web sites. A commonality index to measure the commonality of Web site visiting for the macrolevel, cross-country study is proposed for a large-scale empirical study using online panel data that cover 101 countries. Results from the analyses indicate that as cyberspace is obviously fractured, Internet users in countries that share a common language, religion, and social norms, that have a similar level of economic development, and that are physically nearer to one another are more likely to visit the same Web sites. The relationship between individual-level information behavior and macrolevel Internet traffic metrics is established; the former helps explain the [after whereas the latter enriches the former.
Instrument validation is a critical step that researchers should employ in order to ensure the generation of scientifically valid knowledge. Without it, the basis of research findings and the generalization of such are threatened. This is especially true in the social sciences, a discipline in which the majority of published articles utilize subjective instruments in the collection of data. Consequently, instrument validation has increasingly become common practice in the social sciences, yet implementation of this practice differs greatly among the social-science disciplines. The assessment of instrument validation undertaken in this study attempts to provide guidance for reviewers, editors, authors, and readers, by offering various techniques of validity and analyzing the quality of a set of psychometric journal articles. In this research, six attributes of instrument validation areas are identified as validation protocol. The Journal of the American Society for Information Science and Technology (JASIST), which is widely recognized as a leading journal in the field of information science, was selected for examination to determine how well a set of research articles ranked in meeting instrument-validation protocol. Findings show that while researchers are becoming increasingly attentive to certain validation issues, standards on validation processes and reporting might prove helpful. This paper identifies areas for improvement in the reporting of validity measures and offers ways for researchers to implement them.
In this article, our interest is focused on the automatic learning of Boolean queries in information retrieval systems (IRSs) by means of multi-objective evolutionary algorithms considering the classic performance criteria precision and recall. We present a comparative study of four well-known, general-purpose, multi-objective evolutionary algorithms to learn Boolean queries in IRSs. These evolutionary algorithms are the Nondominated Sorting Genetic Algorithm (NSGA-II), the first version of the Strength Pareto Evolutionary Algorithm (SPEA), the second version of SPEA (SPEA2), and the Multi-Objective Genetic Algorithm (MOGA).
Project risks encompass both internal and external factors that are interrelated, influencing others in a causal way. It is very important to identify those factors and their causal relationships to reduce the project risk. In the past, most IT companies evaluate project risk by roughly measuring the related factors, but ignoring the important fact that there are complicated causal relationships among them. There is a strong need to develop more effective mechanisms to systematically judge all factors related to project risk and identify the causal relationships among those factors. To accomplish this research objective, our study adopts a cognitive map (CM)-based mechanism called the MACOM (Multi-Agents COgnitive Map), where CM is represented by a set of multi-agents, each embedded with basic intelligence to determine its causal relationships with other agents. CM has proven especially useful in solving unstructured problems with many variables and causal relationships; however, simply applying CM to project risk management is not enough because most causal relationships are hard to identify and measure exactly. To overcome this problem, we have borrowed a multi-agent metaphor in which CM is represented by a set of multi-agents, and project risk is explained through the interaction of the multi-agents. Such an approach presents a new computational capability for resolving complicated decision problems. Using the MACOM framework, we have proved that the task of resolving the IS project risk management could be systematically and intelligently solved, and in this way, IS project managers can be given robust decision support.
This study is designed to validate 10 Power Distance indicators identified from previous research on cultural dimensions to establish a measurement for determining a country's national political freedom represented on Web content and interface design. Two coders performed content analysis on 156 college/university Web sites selected from 39 countries. One-way analysis of variance was applied to analyze each of the proposed 10 indicators to detect statistical significant differences among means of the three freedom groups (free-country group, partly-free-country group, and not-free-country group). The results indicated that 6 of the 10 proposed indicators could be used to measure a country's national political freedom on Web interface design. The seventh indicator, symmetric layout, demonstrated a negative correlation between the freedom level and the Web representation of Power Distance. The last three proposed indicators failed to show any significant differences among the treatment means, and there are no clear trend patterns for the treatment means of the three freedom groups. By examining national political freedom represented on Web pages, this study not only provides an insight into cultural dimensions and Web interface design but also advances our knowledge in sociological and cultural studies of the Web.
This study analyzes to which degree the question types addressed to a digital reference service run by a consortium of public libraries have changed between the Years 1999 and 2006. The data consist of representative samples of reference questions to a Finnish "Ask a Librarian" digital reference service in the years studied. Questions were classified based on a taxonomy refining earlier major taxonomies. The proportion of ready reference questions had increased from 33 to 45% whereas the proportion of subject-based research questions had decreased from 57 to 47%. Among the former, fact-finding questions had especially increased, and among the latter, topical search questions had decreased. These changes in the popularity of question types are likely related to the way people are searching on the Internet. It is concluded that the Internet has somewhat reduced the traditional role of public libraries in mediated topical searching.
This article describes research designed to assess the interaction between culture and classification. Mounting evidence in cross-cultural psychology has indicated that culture may affect classification, which is an important dimension to global information systems. Data were obtained through three classification tasks, two of which were adapted from recent studies in cross-cultural psychology. Data were collected from 36 participants, 19 from China and 17 from the United States. The results of this research indicate that Chinese participants appear to be more field dependent, which may be related to a cultural preference for relationships instead of categories.
This paper introduces the generalized Egghe-indices as a new family of scientific impact measures for ranking the output of scientific researchers. The definition of this family is strongly inspired by Egghe's well-known g-index. The main contribution of the paper is a family of axiomatic characterizations that characterize every generalized Egghe-index in terms of four axioms.
J.E. Hirsch (2005) introduced the h-index to quantify an individual's scientific research output by the largest number h of a scientist's papers, that received at least h citations. This so-called Hirsch index can be easily modified to take multiple coauthorship into account by counting the papers fractionally according to (the inverse of) the number of authors. I have worked out 26 empirical cases of physicists to illustrate the effect of this modification. Although the correlation between the original and the modified Hirsch index is relatively strong, the arrangement of the datasets is significantly different depending on whether they are put into order according to the values of either the original or the modified index.
We analyze the publication output of 119 Chilean ecologists and find strong evidence that self-citations significantly affect the h-index increase. Furthermore, we show that the relationship between the increase in the h-index and the proportion of self-citations differs between high and low h-index researchers. In particular, our results show that it is in the low h-index group where self-citations cause the greater impact.
In this study, we investigate whether there is a need for the h index and its variants in addition to standard bibliometric measures (SBMs). Results from our recent study (L. Bornmann, R. Mutz, & H.-D. Daniel, 2008) have indicated that there are two types of indices: One type of indices (e.g., h index) describes the most productive core of a scientist's output and informs about the number of papers in the core. The other type of indices (e.g., a index) depicts the impact of the papers in the core. In evaluative bibliometric studies, the two dimensions quantity and quality of output are usually assessed using the SBMs "number of publications" (for the quantity dimension) and "total citation counts" (for the impact dimension). We additionally included the SBMs into the factor analysis. The results of the newly calculated analysis indicate that there is a high intercorrelation between "number of publications" and the indices that load substantially on the factor Quantity of the Productive Core as well as between "total citation counts" and the indices that load substantially on the factor Impact of the Productive Core. The high-loading indices and SBMs within one performance dimension could be called redundant in empirical application, as high intercorrelations between different indicators are a sign for measuring something similar (or the same). Based on our findings, we propose the use of any pair of indicators (one relating to the number of papers in a researcher's productive core and one relating to the impact of these core papers) as a meaningful approach for comparing scientists.
This brief communication describes the results of a questionnaire examining certain aspects of the Web-based information seeking practices of university students. The results are contrasted with past work showing that queries to Web search engines can be assigned to one of a series of categories: navigational, informational, and transactional. The survey results suggest that a large group of queries, which in the past would have been classified as informational, have become at least partially navigational. We contend that this change has occurred because of the rise of large Web sites holding particular types of information, such as Wikipedia and the Internet Movie Database.
Although retrieval systems based on probabilistic models will rank the objects (e.g., documents) being retrieved according to the probability of some matching criterion (e.g., relevance), they rarely yield an actual probability, and the scoring function is interpreted to be purely ordinal within a given retrieval task. In this brief communication, it is shown that some scoring functions possess the likelihood property, which means that the scoring function indicates the likelihood of matching when compared to other retrieval tasks, which is potentially more useful than pure ranking although it cannot be interpreted as an actual probability. This property can be detected by using two modified effectiveness measures: entire precision and entire recall.
An author's citation image is the set of authors with whom the author is cocited. When mapped using standard author cocitation analysis methods based on cited name co-occurrence counts across the entire citation database, the original context of cocitation (the focal author cocited with others) is lost. The citation image of Conrad Hal Waddington, a developmental biologist and evolutionary theorist, is mapped using both cocitation and tricitation approaches over three successive decades, 1975-1984, 1985-1994, and 1995-2004. All authors are tagged with a subject ID based on a principal components analysis of the cocitation data. The cocitation analyses place Waddington in a general subject context. The tricitation PFNets bring the major themes in articles citing Waddington into clearer focus. The changing scholarly landscape in which Waddington's work is used is demonstrated by changes in the citation image author set. These changes are associated with a shift from a primary focus on the mechanisms of evolutionary change (Waddington's work on canalization/genetic assimilation) to a resurgence of interest in Waddington's early experimental embryological work. The latter is linked to the emergence of evolutionary developmental biology, an interdisciplinary research area that examines the role of organismal development in evolutionary change.
For more than 40 years, the Institute for Scientific Information (ISI, now part of Thomson Reuters) produced the only available bibliographic databases from which bibliometricians could compile large-scale bibliometric indicators. ISI's citation indexes, now regrouped under the Web of Science (WoS), were the major sources of bibliometric data until 2004, when Scopus was launched by the publisher Reed Elsevier. For those who perform bibliometric analyses and comparisons of countries or institutions, the existence of these two major databases raises the important question of the comparability and stability of statistics obtained from different data sources. This paper uses macrolevel bibliometric indicators to compare results obtained from theWoS and Scopus. It shows that the correlations between the measures obtained with both databases for the number of papers and the number of citations received by countries, as well as for their ranks, are extremely high (R(2) approximate to .99). There is also a very high correlation when countries' papers are broken down by field. The paper thus provides evidence that indicators of scientific production and citations at the country level are stable and largely independent of the database.
The launching of Scopus and Google Scholar, and methodological developments in social-network analysis have made many more indicators for evaluating journals available than the traditional impact factor, cited half-life, and immediacy index of the ISI. In this study, these new indicators are compared with one another and with the older ones. Do the various indicators measure new dimensions of the citation networks, or are they highly correlated among themselves? Are they robust and relatively stable overtime? Two main dimensions are distinguished-size and impact-which together shape influence. The h-index combines the two dimensions and can also be considered as an indicator of reach (like Indegree). PageRank is mainly an indicator of size, but has important interactions with centrality measures. The Scimago Journal Ranking (SJR) indicator provides an alternative to the journal impact factor, but the computation is less easy.
According to the bibliographical data included in the Web of Science, SCOPUS, Chemical Abstracts, and other specialized information services covering the period 19001950, the first publications in mainstream journals by Mexican researchers appeared only in the first decades of the 20th century. Contrary to expectations, we find that the academic community was not the protagonist in the early stages of Mexican scientific practices, but that there was a strong contribution coming from researchers associated with the public-health sector and the chemical and mining industries. We were able to identify in this half century four different modes of scientific production: amateur, institutional, academic, and industrial, which in turn correspond to distinct stages in the evolution of the Mexican scientific production. We characterize these modes of production with a variety of indicators: publication and citation patterns, author output, journal and subject categories, institutional collaborations, and geographical distribution.
Though practitioners have seen discussions and debates surrounding the "Web 2.0" concept for the last few years, we know little of Web users' heterogeneity in the usage of Web 2.0 applications, let alone the factors associated with such heterogeneity. In this article, we propose a Web user's degree of Web 2.0-ness to be measured by the weighted average of the degrees of Web 2.0-ness of the Web sites that he or she has visited. A Web site's degree of Web 2.0-ness in turn is evaluated through a series of binary criteria as to whether the site accommodates popular Web 2.0 applications. Utilizing clickstream data from an online panel coupled with expert scoring for the empirical analysis, we find that a Web user's degree of Web 2.0-ness is positively associated with his or her behavioral volume (measured by the number of page views), behavioral speed (measured by the duration of each page view), and behavioral concentration (measured by the Gini coefficient of page views the user made across Web sites). Furthermore,Web users who are younger and male are found to have a higher degree of Web 2.0-ness.
Query reformulation is a key user behavior during Web search. Our research goal is to develop predictive models of query reformulation during Web searching. This article reports results from a study in which we automatically classified the query-reformulation patterns for 964,780 Web searching sessions, composed of 1,523,072 queries, to predict the next query reformulation. We employed an n-gram modeling approach to describe the probability of users transitioning from one query-reformulation state to another to predict their next state. We developed first-, second-, third-, and fourth-order models and evaluated each model for accuracy of prediction, coverage of the dataset, and complexity of the possible pattern set. The results show that Reformulation and Assistance account for approximately 45% of all query reformulations; furthermore, the results demonstrate that the first- and second-order models provide the best predictability, between 28 and 40% overall and higher than 70% for some patterns. Implications are that the n-gram approach can be used for improving searching systems and searching assistance.
The goal of this study is to understand whether providing a search intermediary familiar with a problem domain and its topical structure would support a user's Web searching tasks, especially complicated tasks with multifaceted topics, and whether the order of searching tasks or system usage influences their successful completion. This study investigates the effect of two factors, the interaction mode and the display layout, on the three main measures of the user's Web searching behaviors: effectiveness, efficiency, and usability. Two interaction modes are compared, mediation via a domain-specific document collection versus nonmediated search, and two display layouts, a combination of browsing-supporting hierarchic display and ranked list of results versus the simple linear list of search results. The results are analyzed in the Flow theory point of view; they were analyzed by order of the tasks and system usage order. The findings of this study contribute to a better understanding of how the mediation system and/or the combined display support a Web information user.
Libraries, private and public, offer valuable resources to library patrons. As of today, the only way to locate information archived exclusively in libraries is through their catalogs. Library patrons, however, often find it difficult to formulate a proper query, which requires using specific keywords assigned to different fields of desired library catalog records,to obtain relevant results. These improperly formulated queries often yield irrelevant results or no results at all. This negative experience in dealing with existing library systems turns library patrons away from directly querying library catalogs; instead, they rely on Web search engines to perform their searches first, and upon obtaining the initial information (e.g., titles, subject headings, or authors) on the desired library materials, they query library catalogs. This searching strategy is an evidence of failure of today's library systems. In solving this problem, we propose an enhanced library system, which allows partial, similarity matching of (a) tags defined by ordinary users at a folksonomy site that describe the content of books and (b) unrestricted keywords specified by an ordinary library patron in a query to search for relevant library catalog records. The proposed library system allows patrons posting a query Q using commonly used words and ranks the retrieved results according to their degrees of resemblance with 0 while maintaining the query processing time comparable with that achieved by current library search engines.
When users have poorly defined or complex goals, search interfaces that offer only keyword-searching facilities provide inadequate support to help them reach their information-seeking objectives. The emergence of interfaces with more advanced capabilities, such as faceted browsing and result clustering, can go some way toward addressing such problems. The evaluation of these interfaces, however, is challenging because they generally offer diverse and versatile search environments that introduce overwhelming amounts of independent variables to user studies; choosing the interface object as the only independent variable in a study would reveal very little about why one design outperforms another. Nonetheless, if we could effectively compare these interfaces, then we would have a way to determine which was best for a given scenario and begin to learn why. In this article, we present a formative inspection framework for the evaluation of advanced search interfaces through the quantification of the strengths and weaknesses of the interfaces in supporting user tactics and varying user conditions. This framework combines established models of users and their needs and behaviors to achieve this. The framework is applied to evaluate three search interfaces and demonstrates the potential value of this approach to interactive information retrieval evaluation.
Many researchers in medical and life sciences commonly use the PubMed online search engine (http://www.pubmed.gov) to access the MEDLINE bibliographic database. The researchers' strategies were investigated as a function of their knowledge of the content area. Sixteen life science researchers with no experience in neuroscience and 16 neuroscience researchers of matched professional experience performed five bibliographic search tasks about neuroscience topics. Objective measures and concomitant verbal protocols were used to assess behavior and performance. Whatever their knowledge of PubMed, neuroscientists could find adequate references within the allotted time period. Despite their lack of knowledge in neuroscience, life scientists could select adequate references with the same efficiency. However, differences were observed in the way neuroscientists and life scientists proceeded. For instance, life scientists took more time to read the task instructions and opened more abstracts while selecting their answers. These data suggest that regular use of online databases combined with graduate-level expertise in a broad scientific field like biology can compensate for the absence of knowledge in the specific domain in which references are sought. The large inter-individual variability in performance within both groups implies that beyond domain knowledge, individual cognitive abilities are the main determinants of bibliographic search performance.
In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called "araSearch". araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.
Biomedical research is critical to biodefense, which is drawing increasing attention from governments globally as well as from various research communities. The U.S. government has been closely monitoring and regulating biomedical research activities, particularly those studying or involving bioterrorism agents or diseases. Effective surveillance requires comprehensive understanding of extant biomedical research and timely detection of new developments or emerging trends. The rapid knowledge expansion, technical breakthroughs, and spiraling collaboration networks demand greater support for literature search and sharing, which cannot be effectively supported by conventional literature search mechanisms or systems. In this study, we propose an integrated approach that integrates advanced techniques for content analysis, network analysis, and information visualization. We design and implement Arizona Literature Mapper, a Web-based portal that allows users to gain timely, comprehensive understanding of bioterrorism research, including leading scientists, research groups, institutions as well as insights about current mainstream interests or emerging trends. We conduct two user studies to evaluate Arizona Literature Mapper and include a well-known system for benchmarking purposes. According to our results, Arizona Literature Mapper is significantly more effective for supporting users' search of bioterrorism publications than PubMed. Users consider Arizona Literature Mapper more useful and easier to use than PubMed. Users are also more satisfied with Arizona Literature Mapper and show stronger intentions to use it in the future. Assessments of Arizona Literature Mapper's analysis functions are also positive, as our subjects consider them useful, easy to use, and satisfactory. Our results have important implications that are also discussed in the article.
We present CopeOpi, an opinion-analysis system, which extracts from the Web opinions about specific targets, summarizes the polarity and strength of these opinions, and tracks opinion variations over time. Objects that yield similar opinion tendencies over a certain time period may be correlated due to the latent causal events. CopeOpi discovers relationships among objects based on their opinion-tracking plots and collocations. Event bursts are detected from the tracking plots, and the strength of opinion relationships is determined by the coverage of these plots. To evaluate opinion mining, we use the NTCIR corpus annotated with opinion information at sentence and document levels. CopeOpi achieves sentence and document-level f-measures of 62% and 74%. For relationship discovery, we collected 1.3M economics-related documents from 93 Web sources over 22 months, and analyzed collocation-based, opinion-based, and hybrid models. We consider as correlated company pairs that demonstrate similar stock-price variations, and selected these as the gold standard for evaluation. Results show that opinion-based and collocation-based models complement each other, and that integrated models perform the best. The top 25, 50, and 100 pairs discovered achieve precision rates of 1, 0.92, and 0.79, respectively.
A network approach was used to determine the overall supportive communication patterns constructed within the PTT psychosis support group in Taiwan, the largest bulletin board system in the Chinese-speaking world. The full sequences of supportive interactions were observed over a 21-year period from February 2004 to July 2006. The resits indicated that the most exchanged support types were information and network links. All types of supportive communication networks were relatively sparse, yet small groups of cliques with different provision of support types formed within the psychosis group. Most of the online supportive interactions exchanged at dyadic and triadic levels. The overall supportive network was highly centralized. The overall findings with implications for future studies were discussed.
This study investigates Ted Nelson's works and the influence of his hypertext concept through citation analysis, including citation counting, characteristics of citing articles on language, document type, citing year, discipline, and citation content. The selection of the Nelson's works was based on searching Library Literature & Information Science, Library and Information Science Abstracts, Google and Yahoo search engines. The citation data were compiled from the database of Web of Science. The results of the study reveal that hypertext has directly great impact on information retrieval and world wide web; therefore, the concept has had profound influence on information, library and computer science disciplines. Moreover, the influence of Nelson's works spreads to other disciplines variously, especially on education, literature, business and economics, engineering, sociology, psychology, etc. The citation context analysis of citing articles on information and library science reveals that (1) definition, orientation and general introduction of hypertext; (2) relation of Vannevar Bush and Ted Nelson in terms of hypertext; (3) Nelson's Xanadu system and its component of hypertext; (4) the application of hypertext in information science and library science are four most citing purpose.
This paper explores a methodology for delimitating scientific subfields by combining the use of (specialist) journal categories from Thomson Scientific's Web of Science (WoS) and reference analysis. In a first step it selects all articles in journals included in a particular WoS journal category covering a subfield. These journals are labelled as a subfield's specialist journals. In a second step, this set of papers is expanded with papers published in other, additional journals and citing a subfield's specialist journals with a frequency exceeding a certain citation threshold. Data are presented for two medical subfields: Oncology and Cardiac & Cardiovascular System. A validation based on findings from earlier studies, from an analysis of MESH descriptors from MEDLINE, and on expert opinion provides evidence that the proposed methodology has a high precision, and that expansion substantially enhanced the recall, not merely in terms of the number of retrieved papers, but also in terms of the number of research topics covered. The paper also examines how a bibliometric ranking of countries and universities based on the citation impact of their papers published in a subfield's specialist journals compares to that of a ranking based on the impact of their articles in additional journals. Rather weak correlations especially obtained at the level of universities underline the conclusion from earlier studies that an assessment of research groups or universities in a scientific subfield that takes into account solely papers published in a subfield's specialist journals is unsatisfactory.
In Latin America, interactive science centres and museums are key institutions for science communication. In order to map their relationship over the Internet, a Web co-link analysis was applied to 18 websites of science centres and museums affiliated to the Network for the Popularization of Science and Technology in Latin America and the Caribbean - RedPOP. Clustering analysis, multidimensional scaling (MDS) and an analysis of all pages with links to at least two websites were performed. Results showed that language barriers played a prominent role in clustering, with external recognition by the target public representing a secondary issue.
This study applies Prathap's approach to successive h-indices in order to measure the influence of researcher staff on institutional impact. The twelve most productive Cuban institutions related to the study of the human brain are studied. The Hirsch index was used to measure the impact of the institutional scientific output, using the g-index and R-index as complementary indicators. Prathap's approach to successive h-indices, based on the author-institution hierarchy, is used to determine the institutional impact through the performance of the researcher staff. The combination of different Hirsch-type indices for institutional evaluation is illustrated.
The literature dedicated to the analysis of the difference in research productivity between the sexes tends to agree in indicating better performance for men. Through bibliometric examination of the entire population of research personnel working in the scientific-technological disciplines of Italian university system, this study confirms the presence of significant differences in productivity between men and women. The differences are, however, smaller than reported in a large part of the literature, confirming an ongoing tendency towards decline, and are also seen as more noticeable for quantitative performance indicators than other indicators. The gap between the sexes shows significant sectorial differences. In spite of the generally better performance of men, there are scientific sectors in which the performance of women does not prove to be inferior.
Impact factors for 20 journals ranked first by Journal Citation Reports (JCR) were compared with the same indicator calculated on the basis of citation data obtained from Scopus database. A significant discrepancy was observed as Scopus, though results differed from title to title, found in general more citations than listed in JCR. This also affected ranking of the journals. More thorough examination of two selected titles proved that the divergence resulted mainly from difference in coverage of two products, although other important factors also play their part.
This paper briefly reviews the knowledge-generation process and explores to what degree technical and scientific knowledge from prior art anticipates novelty or the inventive step of an invention. Inventions are novel if they have not been described (in the public) before, and they are inventive if the technical solution was non-obvious to a skilled person in the field. We employ a novel approach of patent citation analysis to investigate this phenomenon. Since in this context common approaches of such citation analysis are biased (usually, citations are neither exhaustive nor relevant in their entirety), we focus on examination reports of European patent applications and the references given therein. Our findings reveal that particularly technical knowledge comprised in patents serves as a source of novelty, while scientific knowledge frequently stems from multiple scientific papers and accounts for the inventive step. In addition, it is found that in many cases scientific knowledge is of commercial relevance and therefore constitutes more than general background information that aids the technical knowledge generation process.
This paper sets out to explore the patterns of technological change and knowledge spillover in the field of flat panel display (FPD) technology, along with the catching-up behavior of latecomers, through the analysis of US patents and patent citations between 1976 and 2005. Our results show that: (i) the catching-up by FPD technology latecomers began at the transition stage (1987-1996) when the dominant design became established in areas with high 'revealed technology advantage' (RTA); (ii) there is no apparent localization of knowledge spillover amongst FPD technology latecomers; instead, higher citation frequencies of forerunners' patents were found in latecomers' FPD patents during the transition (1987-1996) and post-dominant design (1997-2005) stages and; and (iii) a few extraordinary peaks were found in the citation frequency of forerunners' patents at long citation lags in latecomers' FPD patents, particularly during the transition stage (1987-1996), indicative of the knowledge threshold which latecomers need to cross in order to catch up with forerunners.
At present China is challenging the leading sciento-economic powers and evolving to one of the world's largest potentials in science and technology. Jointly with other emerging economies, China has already changed the balance of power among the formerly leading nations as measured by scientific production. In the present paper, the evolution of China's publication activity and citation impact in the social sciences is studied for the period 1997-2006. Besides the comparative analysis of trends in publication and citation patterns and of national publication profiles, an attempt is made to interpret the results in both the regional and global context.
The paper investigates three aspects of patent value - technological value, direct economic value, and indirect economic value. The paper suggests that we measure the technological value of a patent by looking at its number of citations, direct economic value by looking at its licensing and income from royalties, and indirect economic value by looking at its life (i.e., duration). For the research, the author's two previous studies are deeply explored. It is found that these three aspects of patent value are positively correlated with one another. In addition, their domains overlap and interrelate. Research collaboration is the one variable found to have a significant effect on all three aspects. The field effect of electronics positively affects technological and indirect economic value, whereas research team size negatively affects technological and indirect economic value.
This paper examines the genesis of journal impact measures and how their evolution culminated in the journal impact factor (JIF) produced by the Institute for Scientific Information. The paper shows how the various building blocks of the dominant JIF (published in the Journal Citation Report - JCR) came into being. The paper argues that these building blocks were all constructed fairly arbitrarily or for different purposes than those that govern the contemporary use of the JIF. The results are a faulty method, widely open to manipulation by journal editors and misuse by uncritical parties. The discussion examines some solution offered to the bibliometrics and scientific communities considering the wide use of this indicator at present.
This paper analysis the distribution of some characteristics of computer scientists in Brazil according to regions and gender. Computer scientist is defined as the faculty of a graduate level computer science department. Under this definition, there were 886 computer scientists in Brazil in November 2006.
The titles of scientific articles have a special significance. We examined nearly 20 million scientific articles and recorded the development of articles with a question mark at the end of their titles over the last 40 years. Our study was confined to the disciplines of physics, life sciences and medicine, where we found a significant increase from 50% to more than 200% in the number of articles with question-mark titles. We looked at the principle functions and structure of the titles of scientific papers, and we assume that marketing aspects are one of the decisive factors behind the growing usage of question-mark titles in scientific articles.
Clustering is applied to web co-outlink analysis to represent the heterogeneous nature of the World Wide Web in terms of the "triple helix" model (university-industry-government). An initial categorization is based on families of websites, which is then matched with Spanish institutions from diverse sectors represented on the Web, to uncover cognitive structures and related subgroups with common interests and confirm the junction of sectors of the "triple helix" model. We may conclude that the clustering method applied to web co-outlink analysis works when fully institutionalized organizations are studied, to make their interconnections manifest.
While ISSI was founded in 1993, Scientometrics and Bibliometrics are now at least half a century old. Indeed, the field can be traced to early quantitative studies in the early 20th century. In the 1930s, it evolved to the "science of science." The publication of J.D. Bernal's Social Function of Science in 1939 was a key transition point but the field lay dormant until after World War II, when D.J.D. Price's books Science Since Babylon and Little Science, Big Science were published in 1961 and 1963. His role as the "Father of Scientometrics" is clearly evident by using the HistCite software to visualize his impact as well as the subsequent impact of the journal Scientometrics on the growth of the field. Scientometrics owes its name to V.V. Nalimov, the author of Naukometriya, and to Tibor Braun who adapted the neologism for the journal. The primordial paper on citation indexing by Garfield which appeared in Science 1955 became a bridge between Bernal and Price. The timeline for the evolution of Scientometrics is demonstrated by a HistCite tabulation of the ranked citation index of the 100,000 references cited in the 3000 papers citing Price. (C) 2009 Elsevier Ltd. All rights reserved.
We propose an explanatory and computational theory of transformative discoveries in science. The theory is derived from a recurring theme found in a diverse range of scientific change, scientific discovery, and knowledge diffusion theories in philosophy of science, sociology of science, social network analysis, and information science. The theory extends the concept of structural holes from social networks to a broader range of associative networks found in science studies, especially including networks that reflect underlying intellectual structures such as co-citation networks and collaboration networks. The central premise is that connecting otherwise disparate patches of knowledge is a valuable mechanism of creative thinking in general and transformative scientific discovery in particular. In addition, the premise consistently explains the value of connecting people from different disciplinary specialties. The theory not only explains the nature of transformative discoveries in terms of the brokerage mechanism but also characterizes the subsequent diffusion process as optimal information foraging in a problem space. Complementary to epidemiological models of diffusion, foraging-based conceptualizations offer a unified framework for arriving at insightful discoveries and optimizing subsequent pathways of search in a problem space. Structural and temporal properties of potentially high-impact scientific discoveries are derived from the theory to characterize the emergence and evolution of intellectual networks of a field. Two Nobel Prize winning discoveries, the discovery of Helicobacter pylori and gene targeting techniques, and a discovery in string theory demonstrated such properties. Connections to and differences from existing approaches are discussed. The primary value of the theory is that it provides not only a computational model of intellectual growth, but also concrete and constructive explanations of where one may find insightful inspirations for transformative scientific discoveries. (C) 2009 Elsevier Ltd. All rights reserved.
We analyze the advent and development of eight scientific fields from their inception to maturity and map the evolution of their networks of collaboration over time, measured in terms of co-authorship of scientific papers. We show that as a field develops it undergoes a topological transition in its collaboration structure between a small disconnected graph to a much larger network where a giant connected component of collaboration appears. As a result, the number of edges and nodes in the largest component undergoes a transition between a small fraction of the total to a majority of all occurrences. These results relate to many qualitative observations of the evolution of technology and discussions of the "structure of scientific revolutions". We analyze this qualitative change in network topology in terms of several quantitative graph theoretical measures, such as density, diameter, and relative size of the network's largest component. To analyze examples of scientific discovery we built databases of scientific publications based on keyword and citation searches, for eight fields, spanning experimental and theoretical science, across areas as diverse as physics, biomedical sciences, and materials science. Each of the databases was vetted by field experts and is the result of a bibliometric search constructed to maximize coverage, while minimizing the occurrence of spurious records. In this way we built databases of publications and authors for superstring theory, cosmic strings and other topological defects, cosmological inflation, carbon nanotubes, quantum computing and computation, prions and scrapie, and H5N1 influenza. We also built a database for a classical example of "pathological" science, namely cold fusion. All these fields also vary in size and in their temporal patterns of development, with some showing explosive growth from an original identifiable discovery (e.g. carbon nanotubes) while others are characterized by a slow process of development (e.g. quantum computers and computation). We show that regardless of the detailed nature of their developmental paths, the process of scientific discovery and the rearrangement of the collaboration structure of emergent fields is characterized by a number of universal features, suggesting that the process of discovery and initial formation of a scientific field, characterized by the moments of discovery, invention and subsequent transition into "normal science" may be understood in general terms, as a process of cognitive and social unification out of many initially separate efforts. Pathological fields, seemingly, never undergo this transition, despite hundreds of publications and the involvement of many authors. (C) 2009 Elsevier Ltd. All rights reserved.
We propose a research program to analyse spatial aspects of the science system. First, we provide a review of scientometric studies that already explicitly take the spatial dimension into account. The review includes studies on (i) the spatial distribution of research and citations, (ii) the existence of spatial biases in collaboration, citations and mobility, and (iii) the citation impact of national versus international collaborations. Then, we address a number of methodological issues in dealing with space in scientometrics. Finally, to integrate spatial and non-spatial approaches, we propose an analytical framework based on the concept of proximity. A proximity approach allows for combining hypotheses from different theoretical perspectives into a single framework. (C) 2009 Elsevier Ltd. All rights reserved.
Visual depiction of the structure and evolution of science has been proposed as a key strategy for dealing with the large, complex, and increasingly interdisciplinary records of scientific communication. While every such visualization assumes the existence of spatial structures within the system of science, new methods and tools are rarely linked to thorough reflection on the underlying spatial concepts. Meanwhile, geographic information science has adopted a view of geographic space as conceptualized through the duality of discrete objects and continuous fields. This paper argues that conceptualization of science has been dominated by a view of its constituent elements (e.g., authors, articles, journals, disciplines) as discrete objects. It is proposed that, like in geographic information science, alternative concepts could be used for the same phenomenon. For example, one could view an author as either a discrete object at a specific location or as a continuous field occupying all of a discipline. It is further proposed that this duality of spatial concepts can extend to the methods by which low-dimensional geometric models of high-dimensional scientific spaces are created and used. This can result in new methods revealing different kinds of insights. This is demonstrated by a juxtaposition of two visualizations of an author's intellectual evolution on the basis of either a discrete or continuous conceptualization. (C) 2009 Elsevier Ltd. All rights reserved.
e-Research is a rapidly growing research area, both in terms of publications and in terms of funding. In this article we argue that it is necessary to reconceptualize the ways in which we seek to measure and understand e-Research by developing a sociology of knowledge based on our understanding of how science has been transformed historically and shifted into online forms. Next, we report data which allows the examination of e-Research through a variety of traces in order to begin to understand how knowledge in the realm of e-Research has been and is being constructed. These data indicate that e-Research has had a variable impact in different fields of research. We argue that only an overall account of the scale and scope of e-Research within and between different fields makes it possible to identify the organizational coherence and diffuseness of e-Research in terms of its socio-technical networks, and thus to identify the contributions of e-Research to various research fronts in the online production of knowledge. (C) 2009 Elsevier Ltd. All rights reserved.
Discursive knowledge emerges as codification in flows of communication. The flows of communication are constrained and enabled by networks of communications as their historical manifestations at each moment of time. New publications modify the existing networks by changing the distributions of attributes and relations in document sets, while the networks are self-referentially updated along trajectories. Codification operates reflexively: the network structures are reconstructed from the perspective of hindsight. Codification along different axes differentiates discursive knowledge into specialties. These intellectual control structures are constructed bottom-up, but feed top-down back upon the production of new knowledge. However, the forward dynamics of diffusion in the development of the communication networks along trajectories differs from the feedback mechanisms of control. Analysis of the development of scientific communication in terms of evolving scientific literatures provides us with a model which makes these evolutionary processes amenable to measurement. (C) 2009 Elsevier Ltd. All rights reserved.
In this study the amount of "informal" citations (i.e. those mentioning only author names or their initials instead of the complete references) in comparison to the "formal" (full reference based) citations is analyzed using some pioneers of chemistry and physics as examples. The data reveal that the formal citations often measure only a small fraction of the overall impact of seminal publications. Furthermore, informal citations are mainly given instead of (and not in addition to) formal citations. As a major consequence, the overall impact of pioneering articles and researchers cannot be entirely determined by merely counting the full reference based citations.
Nanotechnology has been intensively investigated by bibliometric methods due to its technological importance and expected impacts on economic activity. However, there is less focus on nanobiotechnology, which is an emerging research domain in nanotechnology. In this paper, we study the current status of the former, with our primary focus being to reveal the structure and research domains in nanobiotechnology. We also examine country and institutional performance in nanobiotechnology. It emerged that nanostructures, drug delivery and biomedical applications, bio-imaging, and carbon nanotubes and biosensors are the major research domains, while the USA is the leading country, and China has also made substantial contribution. Most institutions having a major impact in the area of nanobiotechnology are located in the USA.
In this study, we aim to evaluate the global scientific production of stem cell research for the past 16 years and provide insights into the characteristics of the stem cell research activities and identify patterns, tendencies, or regularities that may exist in the papers. Data are based on the online version of SCI, Web of Science from 1991 to 2006. Articles referring to stem cell were assessed by many aspects including exponential fitting the trend of publication outputs during 1991-2006, distribution of source title, author keyword, and keyword plus analysis. Based on the exponential fitting the yearly publicans of the last decade, it can also be calculated that, in 2,011, the number of scientific papers on the topic of stem-cell will be twice of the number of publications in 2006. Synthetically analyzing three kinds of keywords, it can be concluded that application of stem cell transplantation technology to human disease therapy, especially research related on "embryonic stem cell" and "mesenchymal stem cell" is the orientation of all the stem cell research in the 21(st) century. This new bibliometric method can help relevant researchers realize the panorama of global stem cell research, and establish the further research direction.
Two types of series of h-indices for journals published in the field of Horticulture during the period 1998-2007 are calculated. Type I h-indices are based on yearly data, while type II h-indices use cumulative data. These h-indices are also considered in a form normalised with respect to the number of published articles. It is observed that type I h-indices, normalised or not, decrease linearly over a period of ten years. The type II series, however, is not linear in nature: it exhibits partly a concave shape. This proves that the journals (in Horticulture) do not exhibit a linear increase in h-index as argued by Hirsch in the case of life-time achievements of scientists. In the second part of the paper, an attempt is made to study the relative visibility of a journal and its change over time, based on h-indices of journals. It is shown that: the h-index over the complete period 1998-2007 of the journal Theoretical & Applied Genetics (h = 62) is much higher than that of all other journals in the field the relation between the number of publications and the type II h-index for the whole period is not an exact power law (as it would have to be if the Egghe-Rousseau model were applicable) in order to study the dynamic aspects of journal visibility, a field-relative normalised h-ratio is defined to monitor systematic changes in the field of Horticulture. Except for two journals, the Pearson correlation coefficient for yearly values of this field-relative normalised h-ratio indicates that there is no systematic change of the performance of the journals with respect to the field as a whole.
This study aims to provide archiving research trends from the perspective of the field of library and information science using profiling analysis method. The LISA database has been selected as the representative database in the library and information science field, and articles have been searched via the keyword 'archiv*'. The analysis methods used in this study were the journal profiling method and the descriptor profiling method. The descriptor profiling method presents descriptors as a bag of words. That is, it represents descriptors according to the word sets which are included in the documents in which those descriptors are assigned. As a result of journal analysis, six representative journals which are closely related to archiv* have been identified. The six journals were Archivaria, Advanced Technology Libraries, Journal of the Society of Archivists, American Archivist, Archifacts, and Records Management Bulletin. The results of descriptor analysis show that the most comprehensive and core subject was digital libraries, and the most comprehensive and core object was the electronic media. Another result of detailed analysis shows that the outstanding objects were publications, special collections/sound, cultural heritage, television, image/photographs, internet/bibliographic data, and DB/newspapers. On the other hand, outstanding subjects were Archives, National Libraries, Universities, Libraries and companies.
This paper aimed to examine the reliability of co-citation clustering analysis in representing the research history of subject by comparing the results from co-citation clustering analysis with a review written by authorities. Firstly, the treatment of traumatic spinal cord injury was chosen as an investigated subject to be retrieved the resource articles and their references were downloaded from Science Citation Index CD-ROM between 1992 and 2002. Then, the highly cited papers were arranged chronologically and clustered with the method of co-citation clustering. After mapping the time line visualization, the history and structure of treatment of spinal cord injury were presented clearly. At last, the results and the review were compared according the time period, and then the recall and the precision were calculated. The recall was 37.5%, and the precision was 54.5%. The research history of traumatic spinal cord injury treatment analyzed by co-citation clustering was nearly consistent with authoritative review, although some clusters had shorter period than which was summarized by professionals. This paper concluded that co-citation clustering analysis was a useful method in representing the research history of subject, especially for the information researchers, who do not have enough professional knowledge. Its demerit of low recall could be offset by combination this method with other analytic techniques.
The present article contributes to the current methodological debate concerning author co-citation analyses. (ACA) The study compares two different units of analyses, i.e. first- versus inclusive all-author co-citation counting, as well as two different matrix generation approaches, i.e. a conventional multivariate and the so-called Drexel approach, in order to investigate their influence upon mapping results. The aim of the present study is therefore to provide more methodological awareness and empirical evidence concerning author co-citation studies. The study is based on structured XML documents extracted from the IEEE collection. These data allow the construction of ad-hoc citation indexes, which enables us to carry out the hitherto largest all-author co-citation study. Four ACA are made, combining the different units of analyses with the different matrix generation approaches. The results are evaluated quantitatively by means of multidimensional scaling, factor analysis, Procrustes and Mantel statistics. The results show that the inclusion of all cited authors can provide a better fit of data in two-dimensional mappings based on MDS, and that inclusive all-author co-citation counting may lead to stronger groupings in the maps. Further, the two matrix generation approaches produce maps that have some resemblances, but also many differences at the more detailed levels. The Drexel approach produces results that have noticeably lower stress values and are more concentrated into groupings. Finally, the study also demonstrates the importance of sparse matrices and their potential problems in connection with factor analysis. We can confirm that inclusive all-ACA produce more coherent groupings of authors, whereas the present study cannot clearly confirm previous findings that first-ACA identifies more specialties, though some vague indication is given. Most crucially, strong evidence is given to the determining effect that matrix generation approaches have on the mapping of author co-citation data and thus the interpretation of such maps. Evidence is provided for the seemingly advantages of the Drexel approach.
In this paper we carry out an empirical analysis to address some questions concerning the production and quality of technology in environmental sectors. The methodology involves patents as a measure of the generation of new knowledge, and patent citations as a proxy for the quality of a technological invention. The sample contains more than 12,000 environmental European patents from firms and government institutions from 1998 to 2004. From our econometric analysis, we found that environmental patents applied by individual inventors present on average less quality that those applied by institutional inventors. The size of family patent is relevant to explain forward patent citation. Furthermore, patents coming from abroad (out of Europe), in particular with US and Japan priority, are more cited on average than local patents (with European priority). Lastly, the specialization in environmental fields of a patent plays a negative role in determining the frequency of forward citation.
Using a dataset of refereed conference papers, this work explores the determinants of academic production in the field of management. The estimation of a count data model shows that the countries' level of economic development and their economy size have a positive and highly significant effect on scholarly management knowledge production. The linguistic variable (English as official language), which has been cited by the literature as an important factor facilitating the participation in the international scientific arena, has also a positive and statistically significant effect.
Using 17 fully open access electronic journals published uninterruptedly during 2000-2004 in the field of Library and Information Science the present study investigated the trend of LIS Open Access e-journals' literature by analysing articles, authors, institutions, countries, subjects, & references. Quantitative content analysis was carried out on the data, data were analysed in order to project literature growth, authorship pattern, gender pattern, cited references pattern and related bibliometric phenomena. The analysis indicates that there were as many as 1636 articles published during 2000-2004 with an average increment of 23.75 articles per year. The authorship pattern indicates that team research has not been very common in LIS OA publishing and male authors were keener than female authors. Authors from academic institutions were paid more interest in OA publishing and most of them were from developed nations. The subject coverage of these OA e-journals was very vast and almost all facets of information and library science were covered in these articles. There were 90.10% of articles of these e-journals contained references and on an average an article contained 24 references. Of these, 38.53% of references were hyperlinked and 87.35% of hyperlinked references were live during investigation. The analysis of data clearly indicates that OA e-journals in LIS are rapidly establishing themselves as a most viable media for scholarly communication.
A study on the network characteristics of two collaboration networks constructed from the ACM and DBLP digital libraries is presented. Different types of generic network models and several examples are reviewed and experimented on re-generating the collaboration networks. The results reveal that while these models can generate the power-law degree distribution sufficiently well, they are not able to capture the other two important dynamic metrics: average distance and clustering coefficient. While all current models result in small average distances, none shows the same tendency as the real networks do. Furthermore all models seem blind to generating large clustering coefficients. To remedy these shortcomings, we propose a new model with promising results. We get closer values for the dynamic measures while having the degree distribution still power-law by having link addition probabilities change over time, and link attachment happen within local neighborhood only or globally, as seen in the two collaboration networks.
Traditional input indicators of research performance, such as research funding, number of active scientists, and international collaborations, have been widely used to assess countries' publication output. However, while publication in today's English-only research world requires sound research in readable English, English proficiency may be a problem for the productivity of non-native English-speaking (NNES) countries. Data provided by the Brazilian National Research Council (CNPq) containing the academic profile of 51,223 Brazilian researchers show a correlation between English proficiency and publication output. According to our results, traditional input indicators may fall short of providing an accurate representation of the research performance of NNES developing countries.
About a decade ago the German Science Council requested a strengthening of academic research at the German economic research institutes to improve the academic foundation of policy advice - the traditional task of the institutes. Based on publications in SSCI journals, research output has since then improved remarkably in scope and quality and has involved an ever rising number of scholars within the institutes. It can be considered to be a substantial success, which should be internationally recognized. The present study demonstrates that for a wide range of different methods the rankings of publication performance is fairly robust. The results are distorted, however, if they are based on a highly selective list of journals as was the case in previous literature.
There are many researches have been conducted on webometrics, especially the impacts of websites on each other and the web impact factor. However, there are few studies focusing on the websites of Iranian universities. This study analyzed the websites of Iranian universities of medical sciences according to the webometric indicators. In a cross-sectional study, the number of web pages, inlinks, external inlinks and also the overall and absolute web impact factors for Iranian universities of medical sciences with active exclusive websites were calculated and compared using AltaVista search engine. Finally, the websites were ranked based on these webometric indicators. The results showed that the website of Tehran university of medical sciences with 49,300 web pages and 9860 inlinks was ranked first for the size and number of inlinks, while its impact factor was ranked 38th. Rafsanjan UMS with 15 web pages and 211 links had the highest rank for the web impact factor among Iranian universities of medical sciences. The study revealed that Iranian universities of medical sciences did not have much impact on the web and were not well known internationally. The major reason relies on linguistic barriers. Some of them also suffer from technical problems in their web design.
The study discusses the necessity to analyze the influence of theoretical and empirical types of journal articles on the citation impact of Spanish psychology journals. Three of the most representative Spanish psychology journals were selected for the purposes of this study: Papeles del Psiclogo, Analisis y Modificacin de Conducta and Psicothema. Twenty-three psychology journals in Spanish were used as source journals. Altogether, there were sixty-seven issues reviewed for the references and ninety-three issues for the articles. The bibliometricanalysis was conducted by six highly trained psychologists. The results demonstrated differences regarding the percentages of empirical and theoretical articles published in the three examined journals and the number of citations received by them based on the article type. When normalizing the results according to the number of theoretical and empirical articles that were published, it becomes evident that the theoretical articles receive on average twice as many references as the empirical ones. We discuss the importance of this effect on the comparison of journals based on their citation impact and show the evidence that it is only valid to compare journals which publish a similar percentage of theoretical and empirical articles.
We apply social network analysis to display the characteristics of the networks resulting from bibliographic coupling of journals by the Chinese patent data of United States Patent and Trademark Office (USPTO) between 1995 and 2002. The networks of journals in all fields, the three strongly science-based fields (i.e. Biotechnology, Pharmaceuticals, and Organic Fine Chemistry), and the three weakly science-based fields (i.e. Optics, Telecommunications, and Consumer Electronics), have been analyzed from the global and the ego views, respectively. We study a variety of statistical properties of our networks, including number of actors, number of edges, size of the giant component, density, mean degree, clustering coefficient and the centralization measures of the network. We also highlight some apparent differences in the network structure between the subjects studied. Besides, we use the three centrality measures, i.e. degree, closeness, and betweenness, to identify the important journals in the network of all fields and those strongly science-based networks.
This study is a bibliometric analysis on ocean circulation-related research for the period 1991-2005. Selected documents included "ocean circulation, sea circulation, seas circulation, marine circulation, and circulation ocean" as a part of the title, abstract or keywords. Analyzed parameters included the document type, the article output, the article distribution in journals, the publication activity of countries, and institutes and the authorship. An indicator, citation per publication (CPP) was applied to evaluate the scientific impact of a publication. The relationship between cumulative articles and the year was modeled. Three dominant categories were picked out, and their output increase was modeled. The USA was found to be leading the research with 47% share of total articles, with a CPP up to 5.9. Woods Hole Oceanography Institute in the USA was the most productive institute with a CPP of 6.8. In the citation analysis, a 5th year citation mode was found. A paper life model was applied to compare the cumulative citations increasing rates of different years.
The h-index is a recent but already quite popular way of measuring research quality and quantity. However, it discounts highly-cited papers. The g-index corrects for this, but it is sensitivity to the number of never-cited papers. Besides, h- or g-index-based rankings have a large number of ties. Therefore, this paper introduces two new indices, and tests their performance for the 100 most prolific economists. A researcher has a t-number (f-number) of t (f) if t (f) is the largest number for which it holds that she has t (f) publications for which the geometric (harmonic) average number of citations is at least t (f). The new indices overcome the shortcomings of the old indices.
In the last few years, many new bibliometric rankings or indices have been proposed for comparing the output of scientific researchers. We propose a formal framework in which rankings can be axiomatically characterized. We then present a characterization of some popular rankings. We argue that such analyses can help the user of a ranking to choose one that is adequate in the context where she/he is working.
For each of the years 2003, 2004, and 2005 the number of citations for individual papers published in Physics in Medicine and Biology was compared to the mean quality-score assigned to the manuscript by two independent experts as part of the normal peer review process. A low but statistically significant correlation was found between citations and quality score (1 best to 5 worst) for every year: 2003: -0.227 (p < 0.001); 2004: -0.238 (p < 0.001); 2005: -0.154 (p < 0.01). Papers in the highest quality category (approximately 10 per cent of those published) were cited about twice as often as the average for all papers. Data were also examined retrospectively by dividing the papers published in each year into five citation quintiles. A paper of the highest quality is about ten times more likely to be found in the most cited quintile than in the least cited quintile. By making the assumption that the mean number of citations per paper is a reasonable surrogate for the impact factor, it was also shown that the impact factor for Physics in Medicine and Biology could be increased substantially by rejecting more papers based on the reviewers' scores. To accomplish this, however, would require a reduction in the acceptance rate of manuscripts from about 50 per cent to near 10 per cent.
I offer insight into the principles by which the salaries of Italian Renaissance professors were determined. There is a longstanding fascination with the fact that some professors during the Renaissance had extremely high salaries. It has been suggested that at the top of the salary scale were the superstars, professors who could attract many students and raise the prestige of the university. Through an analysis of data on the salaries of professors at Padua in 1422-1423, I argue that much of the differences in salaries can be explained in terms of the stage of career of professors. Those professors who have taught the longest tend to be paid the most. Hence, there is little evidence for the superstar thesis.
During the period 1985-1995 Daniel Koshland was Editor-in-Chief of the journal Science. As such he exerted a huge influence on all aspects related to content and lay-out of the journal. This study compares Science's bibliometric characteristics between three periods: a pre-Koshland (1975-1984) period, the Koshland period (1985-1995) and the post-Koshland period (1996-2006). The distributions of document types, the country/territory and institutional distribution of authors, co-authorship data and disciplinary impact measured by subject categories of citations are studied. These bibliometric characteristics unveil some of the changes the journal went through under the leadership of Daniel Koshland.
This paper studies cooperation patterns in Spain between science history researchers by analysing co-authorship in the scientific publications of the Social Science Citation Index (SSCI) and the Science Citation Index (SCI) databases.
This paper will develop and demonstrate a novel method for analyzing scientific indexes called Latent Semantic Differentiation. Using two distinct datasets comprised of scientific abstracts, it will demonstrate the procedure's ability to identify the dominant themes, cluster the articles accordingly, visualize the results, and provide a qualitative description of each cluster. Combined, the analyses will highlight the utility of the procedure for scientific document indexing, structuring university departments, facilitating grant administration, and augmenting ongoing research on scientific citation. Because the procedure is extensible to any textual domain, there are numerous avenues for continued research both within the sciences and beyond.
This study analyses the bibliometric characteristics of the presentations at the 5th, 8th and 10th Conferences of the International Society for Scientometrics and Informetrics, which were subsequently published in peer-reviewed journals covered by the Science Citation Index, Social Science Citation Index and LISA databases. 31.7% of all the papers presented at the three conferences were published. Scientometrics was the journal that published the highest proportions. A low rate of publication deprives researchers of potentially interesting results and points up the role of the ISSI Conference proceedings as a primary source of information.
Using bibliographic records from the Science Citation Index, the paper examines the publication of South African scientists. The analysis shows that collaboration research in South Africa has been growing steadily and the scientists are highly oriented towards collaborative rather than individualistic research. International collaboration is preferred to domestic collaboration while publication seems to be a decisive factor in collaboration. The paper also looks at the collaboration dimensions of partnering countries, sectors and disciplines, and examines how collaboration can be predicted by certain publication variables. Characteristic features are evident in both the degree and nature of collaboration which can be predicted by the number of countries involved, number of partners and the fractional count of papers.
Most studies of patents citations focus on national or international contexts, especially contexts of high absorptive capacity, and employ examiner citations. We argue that results can vary if we take the region as the context of analysis, especially if it is a region with low absorptive capacity, and if we study applicant citations and examiner-inserted citations separately. Using a sample from the Valencian Community (Spain), we conclude that (i) the use of examiner-inserted citations as a proxy for applicant citations, (ii) the interpretation of non-patent references as indicators of science-industry links, and (iii) the traditional results for geographical localization are not generalizable to all regions with low absorptive capacity.
Social impacts and degrees of organization inherent to opinion formation for interacting agents on networks present interesting questions of general interest from physics to sociology. We present a quantitative analysis of a case implying an evolving small size network, i.e. that inherent to the ongoing debate between modern creationists (most are Intelligent Design (ID) proponents (IDP) and Darwin's theory of Evolution Defenders (DED)). This study is carried out by analyzing the structural properties of the citation network unfolded in the recent decades by publishing works belonging to members of the two communities. With the aim of capturing the dynamical aspects of the interaction between the IDP and DED groups, we focus on two key quantities, namely, the degree of activity of each group and the corresponding degree of impact on the intellectual community at large. A representative measure of the former is provided by the rate of production of publications (RPP), whilst the latter can be assimilated to the rate of increase in citations (RIC). These quantities are determined, respectively, by the slope of the time series obtained for the number of publications accumulated per year and by the slope of a similar time series obtained for the corresponding citations. The results indicate that in this case, the dynamics can be seen as geared by triggered or damped competition. The network is a specific example of marked heterogeneity in exchange of information activity in and between the communities, particularly demonstrated through the nodes having a high connectivity degree, i.e. opinion leaders.
This paper seeks to provide current indicators on Indian science and technology for measuring the country's progress in research. The study uses for the purpose 11 years publications data on India and top 20 productive countries as drawn from the Scopus database for the period 1996 to 2006. The study examines country performance on several measures including country publication share in the world research output, country publication share in various subjects in the national context and in the global context, patterns of research communication in core Indian domestic and international journals, geographical distribution of publications, share of international collaborative papers at the national level as well as across subjects and characteristics of high productivity institutions, scientists and cited papers. The paper also compares the similarity of Indian research profile with top 20 productive countries. The findings of the study should be of special significance to the planners & policy-makers as they have implications for the long term S&T planning of the country.
The bibliometric indices of the scientific field of geostatistics were analyzed using statistical and spatial data analysis. The publications and their citation statistics were obtained from the Web of Science (4000 most relevant), Scopus (2000 most relevant) and Google Scholar (5389). The focus was on the analysis of the citation rate (CR), i.e. number of citations an author or a library item receives on average per year. This was the main criterion used to analyze global trends in geostatistics and to extract the Top 25 most-cited lists of the research articles and books in geostatistics. It was discovered that the average citation rate for geostatisticians has stabilized since 1999, while the authors' n-index seems to have declined ever since. One reason for this may be because there are more and more young authors with a lower n-index. We also found that the number of publications an author publishes explains only 60% of the variation in the citation statistics and that this number progressively declines for an author with a lower number of publications. Once the geographic location is attached to a selection of articles, an isotropic Gaussian kernel smoother weighted by the CR can be used to map scientific excellence around the world. This revealed clusters of scientific excellence around locations such as Wageningen, London, Utrecht, Hampshire, UK, Norwich, Paris, Louvain, Barcelona, and Zurich (Europe); Stanford, Ann Arbor, Tucson, Corvallis, Seattle, Boulder, Montreal, Baltimore, Durham, Santa Barbara and Los Angeles (North America); and Canberra, Melbourne, Sydney, Santiago (Chile), Taipei, and Beijing (other continents). Further correlation with socio-economic variables showed that the spatial distribution of CRs in geostatistics is independent of the night light image (which represents economic activity) and population density. This study demonstrates that the commercial scientific indexing companies could enhance their service by assigning the geographical location to library items to allow spatial exploration and analysis of bibliometric indices.
The aim of this study was to ascertain the possible effect of journal self-citations on the increase in the impact factors of journals in which this scientometric indicator rose by a factor of at least four in only a few years. Forty-three journals were selected from the Thomson-Reuters (formerly ISI) Journal Citation Reports as meeting the above criterion. Eight journals in which the absolute number of citations was lower than 20 in at least two years were excluded, so the final sample consisted of 35 journals. We found no proof of widespread manipulation of the impact factor through the massive use of journal self-citations.
Brazilian scientific production has increased significantly over the last decade, and mental health has been a leading research field in the country, with a growing number of articles published in high quality international journals. This article analyses the scientific output of mental health research between 2004 and 2006 and estimates individual research performance based on four different strategies. A total of 106 mental health scientists were included in the analysis; together they published 1,209 articles indexed in Medline or ISI, with over 65% of the production in journals with impact factor a parts per thousand yen1. Median impact factor of publications was 2. Spearman correlation coefficient showed a large positive correlation between all four different measures used to estimate individual research output. Ten investigators were together responsible for almost 30% of the articles published in the period, whereas 65% of the sample contributed with less than 10 articles.
In this article we analyse how research on complementary and alternative medicine (CAM) break through into one established scientific arena, namely academic journals. With help from bibliometric methods we analyse publication of CAM articles, in the Medline database, during the period 1966-2007. We also analyse the general content of the articles and in what journals they get published. We conclude that the publication activity of CAM articles increases rapidly, especially in the late 1990s, and that the changing growth rate is not due to the general expansion of Medline. The character of CAM articles has changed towards more clinical oriented research, especially in subfields such as acupuncture and musculoskeletal manipulations. CAM articles are found both in core clinical journals and in specialized CAM journals. Even though a substantial part of the articles are published in CAM journals, we conclude that the increasing publication activity is not restricted to the expansion of these specialized journals.
A new framework of international comparisons is advised: each country is gauged against its bordering countries. This approach has several undeniable drawbacks, but by revealing some otherwise hidden patterns, advantageously supplements the customary comparison methods.
Concept theory is an extremely broad, interdisciplinary and complex field of research related to many deep fields with very long historical traditions without much consensus. However, information science and knowledge organization cannot avoid relating to theories of concepts. Knowledge organizing systems (e.g., classification systems, thesauri, and ontologies) should be understood as systems basically organizing concepts and their semantic relations. The same is the case with information retrieval systems. Different theories of concepts have different implications for how to construe, evaluate, and use such systems. Based on "a post-Kuhnian view" of paradigms, this article put forward arguments that the best understanding and classification of theories of concepts is to view and classify them in accordance with epistemological theories (empiricism, rationalism, historicism, and pragmatism). It is also argued that the historicist and pragmatist understandings of concepts are the most fruitful views and that this understanding may be part of a broader paradigm shift that is also beginning to take place in information science. The importance of historicist and pragmatic theories of concepts for information science is outlined.
In both the social sciences and the humanities, books and monographs play significant roles in research communication. The absence of citations from most books and monographs from the Thomson Reuters/Institute for Scientific Information databases (ISI) has been criticized, but attempts to include citations from or to books in the research evaluation of the social sciences and humanities have not led to widespread adoption. This article assesses whether Google Book Search (GBS) can partially fill this gap by comparing citations from books with citations from journal articles to journal articles in 10 science, social science, and humanities disciplines. Book citations were 31% to 212% of ISI citations and, hence, numerous enough to supplement ISI citations in the social sciences and humanities covered, but not in the sciences (3%-5%), except for computing (46%), due to numerous published conference proceedings. A case study was also made of all 1,923 articles in the 51 information science and library science ISI-indexed journals published in 2003. Within this set, highly book-cited articles tended to receive many ISI citations, indicating a significant relationship between the two types of citation data, but with important exceptions that point to the additional information provided by book citations. In summary, GBS is clearly a valuable new source of citation data for the social sciences and humanities. One practical implication is that book-oriented scholars should consult it for additional citations to their work when applying for promotion and tenure.
The authors investigated 11 sports-related query keywords extracted from a public search engine query log to better understand sports-related information seeking on the Internet. After the query log contents were cleaned and query data were parsed, popular sports-related keywords were identified, along with frequently co-occurring query terms associated with the identified keywords. Relationships among each sports-related focus keyword and its related keywords were characterized and grouped using multidimensional scaling (MDS) in combination with traditional hierarchical clustering methods. The two approaches were synthesized in a visual context by highlighting the results of the hierarchical clustering analysis in the visual MDS configuration. Important events, people, subjects, merchandise, and so on related to a sport were illustrated, and relationships among the sports were analyzed. A small-scale comparative study of sports searches with and without term assistance was conducted. Searches that used search term assistance by relying on previous query term relationships outperformed the searches without the search term assistance. The findings of this study provide insights into sports information seeking behavior on the Internet. The developed method also may be applied to other query log subject areas.
In this research we investigate the effect of search engine brand on the evaluation of searching performance. Our research is motivated by the large amount of search traffic directed to a handful of Web search engines, even though many have similar interfaces and performance. We conducted a laboratory experiment with 32 participants using a 4 2 factorial design confounded in four blocks to measure the effect of four search engine brands (Google, MSN, Yahoo!, and a locally developed search engine) while controlling for the quality and presentation of search engine results. We found brand indeed played a role in the searching process. Brand effect varied in different domains. Users seemed to place a high degree of trust in major search engine brands; however, they were more engaged in the searching process when using lesser-known search engines. It appears that branding affects overall Web search at four stages: (a) search engine selection, (b) search engine results page evaluation, (c) individual link evaluation, and (d) evaluation of the landing page. We discuss the implications for search engine marketing and the design of empirical studies measuring search engine performance.
In this study we extracted speaker-specific functional expressions from political speeches using random forests to investigate speakers' political styles. Along with methodological development, stylistics has expanded its scope into new areas of application such as authorship profiling and sentiment analysis in addition to conventional areas such as authorship attribution and genre-based text classification. Among these, computational sociolinguistics, which aims at providing a systematic and solid basis for sociolinguistic analysis using machine learning and linguistically-motivated features, is a potentially important area. In this study we showed the effectiveness of the random forests classifier for such tasks by applying it to Japanese prime ministers' Diet speeches. The results demonstrated that our method successfully extracted the speaker-specific expressions of two Japanese prime ministers,and enabled us to investigate their political styles in a systematic manner. The method can be applied to sociolinguistic analysis of various other types of texts, and in this way, this study will contribute to developing the area of computational sociolinguistics.
This work explores computational models of multi-party discourse, using transcripts from U.S. Supreme Court oral arguments. The turn-taking behavior of participants is treated as a supervised sequence-labeling problem and modeled using first- and second-order conditional random fields (CRFs). We specifically explore the hypothesis that discourse markers and personal references provide important features in such models. Results from a sequence prediction experiment demonstrate that incorporating these two types of features yields significant improvements in accuracy. Our experiments are couched in the broader context of developing tools to support legal scholarship, although we see other natural language processing applications as well.
It is important in information retrieval (IR), information extraction, or classification tasks that morphologically related forms are conflated under the same stem (using stemmer) or lemma (using morphological analyzer). To achieve this for the English language, algorithmic stemming or various morphological analysis approaches have been suggested. Based on Cross-Language Evaluation Forum test collections containing 284 queries and various IR models, this article evaluates these word-normalization proposals. Stemming improves the mean average precision significantly by around 7% while performance differences are not significant when comparing various algorithmic stemmers or algorithmic stemmers and morphological analysis. Accounting for thesaurus class numbers during indexing does not modify overall retrieval performances. Finally, we demonstrate that including a stop word list, even one containing only around 10 terms, might significantly improve retrieval performance, depending on the IR model.
This article describes an information retrieval approach according to the two different search modes that exist: browsing an ontology (via categories) or defining a query in free language (via keywords). Various proposals offer approaches adapted to one of these two modes. We present a proposal leading to a system allowing the integration of both modes using the same search engine. This engine is adapted according to each possible search mode.
In scientometric research, the use of cooccurrence data is very common. In many cases, a similarity measure is employed to normalize the data. However, there is no consensus among researchers on which similarity measure is most appropriate for normalization purposes. In this article, we theoretically analyze the properties of similarity measures for cooccurrence data, focusing in particular on four well-known measures: the association strength, the cosine, the inclusion index, and the Jaccard index. We also study the behavior of these measures empirically. Our analysis reveals that there exist two fundamentally different types of similarity measures, namely, set-theoretic measures and probabilistic measures. The association strength is a probabilistic measure, while the cosine, the inclusion index, and the Jaccard index are set-theoretic measures. Both our theoretical and our empirical results indicate that cooccurrence data can best be normalized using a probabilistic measure. This provides strong support for the use of the association strength in scientometric research.
Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products revolving around natural language processing tasks. Many researchers have attacked the name identification problem in a variety of languages, but only a few limited research efforts have focused on named entity recognition for Arabic script. This is due to the lack of resources for Arabic named entities and the limited amount of progress made in Arabic natural language processing in general. In this article, we present the results of our attempt at the recognition and extraction of the 10 most important categories of named entities in Arabic script: the person name, location, company, date, time, price, measurement, phone number, ISBN, and file name. We developed the system Named Entity Recognition for Arabic (NERA) using a rule-based approach. The resources created are: a Whitelist representing a dictionary of names, and a grammar, in the form of regular expressions, which are responsible for recognizing the named entities. A filtration mechanism is used that serves two different purposes: (a) revision of the results from a named entity extractor by using meta-data, in terms of a Blacklist or rejecter, about ill-formed named entities and (b) disambiguation of identical or overlapping textual matches returned by different name entity extractors to get the correct choice. In NERA, we addressed major challenges posed by NER in the Arabic language arising due to the complexity of the language, peculiarities in the Arabic orthographic system, non-standardization of the written text, ambiguity, and lack of resources. NERA has been effectively evaluated using our own tagged corpus; it achieved satisfactory results in terms of precision, recall, and F-measure.
In a recently published PNAS paper, Radicchi, Fortunato, and Castellano (2008) propose the relative indicator c(f) as an unbiased indicator for citation performance across disciplines (fields, subject areas). To calculate c(f), the citation rate for a single paper is divided by the average number of citations for all papers in the discipline in which the single paper has been categorized. c(f) values are said to lead to a universality of discipline-specific citation distributions. Using a comprehensive dataset of an evaluation study on Angewandte Chemie International Edition (AC-IE), we tested the advantage of using this indicator in practical application at the micro level, as compared with (1) simple citation rates, and (2) z-scores, which have been used in psychological testing for many years for normalization of test scores. To calculate z-scores, the mean number of citations of the papers within a discipline is subtracted from the citation rate of a single paper, and the difference is then divided by the citations' standard deviation for a discipline. Our results indicate that z-scores are better suited than c(f) values to produce universality of discipline-specific citation distributions.
Using an inquiry project-based learning (PBL) approach that involved the collaboration of three kinds of teachers (general studies, language, and information technology [IT]) and the librarian, primary 4 students from a Hong Kong school were guided through group projects. The effect of this approach was examined by comparing the project grades of students who took part in the inquiry PBL project with those of a control group. Surveys and interviews were conducted with students (N(1) = 141), parents (N(2) = 27), and teachers (N(3) = 11). The comparison of group project grades revealed significantly higher scores for the inquiry PBL groups (p < 0.05). Surveys showed that students were perceived to have improved in eight dimensions of learning, with no significant differences (p > 0.05) between students, parents, and teachers. Students enjoyed the projects and perceived them to be relatively easy. Gender differences and academic abilities had no significant moderating effects on the learning dimensions. Examination of the approach showed the collaboration between the four teaching staff to be effective through the support of the school administration, an inquiry learning expert, and parents. On the other hand, the main limitation was the extra workload for the teachers. Nevertheless, the study participants and stakeholders all advocate the continued implementation of the approach.
Worldwide losses due to the copyright infringement of intellectual property such as PC software, music recordings, and motion pictures continue at epidemic proportions in emerging countries. This article develops a research model for analyzing individual ethical decision making that is influenced simultaneously by two chief forces: regulative sanction and self sanction. In particular, we report on the differences between 241 U.S. and 277 China college students' self-reported copyright infringement behaviors and attitudes. The analysis shows that the China subjects exhibit less concern about being prosecuted and penalized, but are equally responsive to social sanctions as U.S. subjects, strongly suggesting that stricter enforcement of copyright law in China will reduce copyright violations. However, the results show that self-regulatory efficacy is the primary determinant of copyright adherence for the U.S. subjects. For the China subjects, while self-regulatory efficacy is shown to significantly predict copyright infringement behaviors, it exists at lower levels and plays a lesser role in ethical decision making when compared to the U.S. subjects. Overall, the results indicate that normative and cultural-cognitive changes in China that go beyond regulative enforcement may be required if significant reductions in copyright infringement are to be expected.
In this paper we extend earlier work on the role of the personality trait of resistance to change (RTC) in the adoption of digital libraries. We present an integrative study, drawing on a number of research streams, including IT adoption, social psychology, and digital-library acceptance. Using structural equation modeling, we confirm RTC as a direct antecedent of effort expectancy. In addition, we also find that by affecting computer anxiety and result demonstrability, RTC acts as an indirect antecedent to both effort expectancy and performance expectancy, which in turn determine user intention to adopt digital library technology. Implications for research and practice are discussed.
A journal set in an interdisciplinary or newly developing area can be determined by including the journals classified under the most relevant ISI Subject Categories into a journal-journal citation matrix. Despite the fuzzy character of borders, factor analysis of the citation patterns enables us to delineate the specific set by discarding the noise. This methodology is illustrated using communication studies as a hybrid development between political science and social psychology. The development can be visualized using animations which support the claim that a specific journal set in communication studies is increasingly developing, notably in the "being cited" patterns. The resulting set of 28 journals in communication studies is smaller and more focused than the 45 journals classified by the ISI Subject Categories as "Communication." The proposed method is tested for its robustness by extending the relevant environments to sets including many more journals.
Present study analyses the research output and impact in Synthetic Organic Chemistry (SOC) during 1998-2004 applying standardized scientometric indicators. Volume of research publications and their citations presented as percentage world share is illustrative of trending pattern against time. Adopting relative indicators - Absolute Citation Impact (ACI) and Relative Citation Impact (RCI), a cross national comparison is attempted at three levels of aggregations - global, Asian and Indian. Based on this analysis, it is concluded that G7 nations, being leaders for the volume of literature published and citations attracted are showing a decreasing trend over the years probably due to shifting and diversification of their research efforts to other emerging research fronts. In contrast smaller nations though publishing low volume but high quality research are represented by Netherlands. This country credited with only 1.12% world share of publications has recorded highest Absolute Citation Impact and recorded higher than world average Relative Citation Impact. In Asian region, between the two developing economies India and China, China out-performed India qualitatively by accounting higher citation share, higher Absolute Citation Impact (ACI) and higher Relative Citation Impact (RCI).
Collaboration is one of the remarkable characteristics of contemporary basic research. Using bibliometric method, we quantitatively analyze international collaboration publication output between China and the G7 countries based on Science Citation Index. The results indicate that international collaboration publication output between China and the G7 countries has shown exponential growth aroused by the growth of science in China. USA is the most important collaboration country and the international collaboration between China and the G7 countries display differences at each research field.
Scientific journals play an important role in international academic information exchange. Their international performance can be evaluated through the comparison of the geographical distribution patterns of authors, citations and subscriptions. In this study we analyzed 3 journals, i.e., Chinese Chemical Letters (China), Chemical Communications (England) and Chemistry Letters (Japan), for their regional distribution patterns of the editorial board members, the authors database, and the citation regions, using the bibliometric method, on the basis of the Web of Science. The results show that, compared with international journals, the Chinese Chemical Letters lags behind in all aspects.
This study performs a webometric analysis to explore the communication characteristics of scientific knowledge in a national scholarly Web space comprising top ranking universities and government supported research institutions in South Korea. We found significant differences in scholarly communication activity as well as linking behavior among different subspaces in addition to institutional differences. We also found the usefulness of the ADM approach in analyzing the metric data containing extreme outliers and discovered the directory model as the most appropriate. Page counts were found significantly correlated with inlinks as well as with outlinks at the directory level in the whole scholarly Web space.
There is a growing literature measuring research excellence in economics. The h-index is noteworthy in combining quantity and research quality in a single measure of researcher excellence, and its ability to be extended to measure the quantity and quality of the researchers in a department. We extend the use of the first successive h-index further to measure the quality of graduate education, specifically excellence in research supervision, based on publication and citation data for individual researchers ascribed to their graduate supervisors.
In order to prevent the formation of a gap between the quality and quantity in Iranian scientific publications, this study makes an effort to analyze Iranian scientific publications indexed on the ISI Web of Science database using quantitative and qualitative scientometrics criteria over a ten year period. As a first step, all Iranian institutes were divided into three categories; universities, research institutes and other organizations. Then they were compared according to quantitative and qualitative criteria. Second, the correlation between the quality and quantity of the publications was measured. The research findings indicated that, according to qualitative criteria (citation, citation impact and percentage of cited documents) there are no meaningful differences among the three groups, while regarding quantitative criterion(number of papers), universities rank higher than the other two groups. The results also indicated that there is a positive and meaningful correlation among qualitative and quantitative criteria in the scholarly scientific publications conducted by Iranian organizations. In other words, in Iranian organizations the quality of publications increases as their quantity increases. The comparison of magnitude of correlation between these two criteria in the three categories reveals the fact that the correlation between number of papers and citations criterion in research institutes is stronger than the other two groups.
This study applies the artificial neural network technique to explore the influence of quantitative and qualitative patent indicators upon market value of the pharmaceutical companies in US. The results show that Herfindahl-Hirschman Index of patents influences negatively market value of the pharmaceutical companies in US, and their technological independence positively affects their market value. In addition, this study also finds out that patent citations of the American pharmaceutical companies have an inverse U-shaped effect upon their market value.
We utilize the bibliometric tool of co-word analysis to identify trends in the methods and subjects of ecology during the period 1970-2005. Few previous co-word analyses have attempted to analyze fields as large as ecology. We utilize a method of isolating concepts and methods in large datasets that undergo the most significant upward and downward trends. Our analysis identifies policy-relevant trends in the field of ecology, a discipline that helps to identify and frame many contemporary policy problems. The results provide a new foundation for exploring the relations among public policies, technological change, and the evolution of science priorities.
The gender gap in science and technology has received considerable attention by both researchers and policy makers. In an effort to better understand the quantity, quality, and underlying characteristics of female research efforts, I integrate three existing databases to uncover how female patenting activities differ from men's in the US biotechnology industry. Data on how much science the patents build upon, the author institutions of that science, and who funded the papers in which the science appears are all examined. In addition, using the NBER Patent Citation Data Files, I propose a possible gender-based life cycle model for patenting activity. The policy implications of my findings are also discussed.
This study analyzed 2443 papers published in 2006 by European Union authors on pain-related research. Five EU countries (the UK, Germany, Italy, the Netherlands and France) each published > 200 papers while three countries (Cyprus, Malta and Estonia) published none; socio-economic indicators were related to each country's productivity. The 2443 papers were published in 592 journals and Cephalalgia, Pain and European Journal of Pain were the most prolific. Publications were also analyzed for intra- versus inter-EU/non-EU collaborations and subdisciplines profiles in Clinical Medicine and the Life Sciences for the World, USA, EU and the top-four EU countries were compared.
A bibliometric analysis was performed on a set of 1718 documents relating to Web 2.0 to explore the dimensions and characteristics of this emerging field. It has been found that Web 2.0 has its root deep in social networks with medicine and sociology as the major contributing disciplines to the scholarly publications beyond its technology backbone - information and computer science. Terms germane to Web 2.0, extracted from the data collected in this study, were also visualized to reflect the very nature of this rising star on the Internet. Web 2.0, according to the current research, is of the user, by the user, and more importantly, for the user.
An intellectual property (IP)-centric, communication-based Innovation Agenda is proposed and investigated. The agenda, which is aligned with IP legal prescription, is defined as follows: the firm's R&D expenditure is captured within products. The firm applies for a patent and files a trademark to protect its interests in the 'patentable' product, and issues a media communication, which may alter the perception of future cash flows, and thereby market price. Upon patent issuance and trademark registration, the firm will then seek another media communication. Spearman (partial) correlation analysis shows strong correlation among the various proxy metrics suggesting that the model basis may exist. The model proposes a novel link among national socioeconomic metrics, corporate strategy, and the technology based innovative firm. Finally, the model supports the inclusion of trademark and media communications data to be considered in socioeconomic modeling.
The practice of publishing clinical trials in scientific journals is common, although not without its critics. This study aims to measure the effect of clinical trials citations on several bibliometric indicators: citations per document (CD); journal impact factor (JIF); relative h-index (RhI) and strike rate index (SRI). We select all the citable documents published in the NEJM, Lancet, JAMA, AIM and BMJ, for the period 2000-2004, and record the citations received by those papers from 2000 to 2005. Our results show that clinical trials have a CD significantly higher than those for conventional papers; JIF is lower when clinical trials are excluded, especially for NEJM, Lancet and JAMA. Finally, both RhI and SRI seem to be unaffected by clinical trials citations.
In this work, we compare the difference in the number of citations compiled with Scopus as opposed to the Web of Science (WoS) with the aim of analysing the agreement among the citation rankings generated by these databases. For this, we analysed the area of Health Sciences of the University of Navarra (Spain), composed of a total of 50 departments and 864 researchers. The total number of published works reflected in the WoS during the period 1999-2005 was 2299. For each work, the number of citations in both databases was recorded. The results indicate that the works received 14.7% more citations in Scopus than in WoS. In the departments, the difference was greater in the clinical ones than in the basic ones. In the case of the rankings of citations, it was found that both databases generate similar results. The Spearman and Kendall-Tau coefficients were higher than 0.9. It was concluded that the difference in the number of citations found did not correspond to the difference of coverage of WoS and Scopus.
This paper analyses the changing geographic balance in China's international co-publications in general and in three molecular life science subfields in particular. No support is found for the expectation that intensive, designated institutional support for research collaboration in the form of joint laboratories has a positive impact on the number of co-publications at the systemic level. The size of partner research systems, and since the turn of the century the relative size of overseas Chinese scientific communities in various partner countries do help to explain the observed geographic variations in the share of China's international co-publications. The paper concludes by discussing some of the potential factors underlying the perceived change in the dynamics of international co-publication behavior of mainland Chinese scientists since the turn of the century.
Development of research methods requires a systematic review of their status. This study focuses on the use of Hierarchical Linear Modeling methods in psychiatric research. Evaluation includes 207 documents published until 2007, included and indexed in the ISI Web of Knowledge databases; analyses focuses on the 194 articles in the sample. Bibliometric methods are used to describe the publications patterns. Results indicate a growing interest in applying the models and an establishment of methods after 2000. Both Lotka's and Bradford's distributions are adjusted to the data.
It appears popular, particularly among science administrators, to use citations and various citation measures for ranking scientists, as if such exercises would reflect the scientific potential of the persons considered. In recent time the Hirsch index h in particular has obtained visibility in this respect in view of its simplicity. We consider a possible extension of the concept of selective citations, which in fact is innate to the h index, and propose a simple generalization, indices H and Q, which to a degree supplement the information accompanying the evaluation of h. The H index keeps record of the "history" of citations and the quotient Q = H/h is a measure for the quality of a scientist based on the history of his/her citations.
This article introduces the generalized Kosmulski-indices as a new family of scientific impact measures for ranking the output of scientific researchers. As special cases, this family contains the well-known Hirsch-index h and the Kosmulski-index h ((2)). The main contribution is an axiomatic characterization that characterizes every generalized Kosmulski-index in terms of three axioms.
Future political priorities for science and technology (S&T) policy formulation usually rest on a rather simplistic interpretation of past events. This can lead to serious errors and distortions and can negatively affect the innovation system. In this article we try to highlight the riskiness involved in policy making based on traditional R&D indicators and trends. We would emphasise that this approach does not take account of structural aspects crucial for the analysis of the innovation system. We examine the implications for science, technical and human resources policies of the political challenge of R&D convergence in a peripheral EU region. Three scenarios are developed based on application of the same criteria to the trends observed in traditional R&D input indicators.
Increasingly, collaboration between firms as well as science-industry interactions are being considered as important for technology development. Yet, few attempts have been made to analyze the contribution of collaboration, taking into account different stages of the technology life cycle. Our analysis, based on a panel of 197 regions in the EU-15 and Switzerland (time period 1978-2001), provides evidence that, in the field of biotechnology, science-industry collaboration contributes to better technological performance of regions both during the emerging phases (1978-1990) and the growth stages (1991-1999) of the life cycle. Collaboration between industrial partners also contributes to the technological performance of regions during the first phase but is less pronounced during later phases of the technology life cycle. Moreover, the analysis reveals that, as technologies develop over time, the impact of local collaboration is mitigated in favor of collaboration that has an international dimension. This holds true for both science-industry interactions and for collaboration between firms. In consequence, our findings underscore the relevance of incorporating life cycle dynamics (of technologies) when studying the nature and impact of collaboration on the technological performance of regions.
Experimental data [Mansilla, R., Koppen, E., Cocho, G., & Miramontes, P. (2007). On the behavior of journal impact factor rank-order distribution. Journal of Informetrics, 1(2), 155-160] reveal that, if one ranks a set of journals (e. g. in a field) in decreasing order of their impact factors, the rank distribution of the logarithm of these impact factors has a typical S-shape: first a convex decrease, followed by a concave decrease. In this paper we give a mathematical formula for this distribution and explain the S-shape. Also the experimentally found smaller convex part and larger concave part is explained. If one studies the rank distribution of the impact factors themselves, we now prove that we have the same S-shape but with inflection point in mu, the average of the impact factors. These distributions are valid for any type of impact factor (any publication period and any citation period). They are even valid for any sample average rank distribution. (C) 2009 Elsevier Ltd. All rights reserved.
The prevalence of uncited papers or of highly cited papers, with respect to the bulk of publications, provides important clues as to the dynamics of scientific research. Using 25 million papers and 600 million references from the Web of Science over the 1900-2006 period, this paper proposes a simple model based on a random selection process to explain the "uncitedness" phenomenon and its decline over the years. We show that the proportion of cited papers is a function of (1) the number of articles available (the competing papers), (2) the number of citing papers and (3) the number of references they contain. Using uncitedness as a departure point, we demonstrate the utility of the stretched-exponential function and a form of the Tsallis q-exponential function to fit complete citation distributions over the 20th century. As opposed to simple power-law fits, for instance, both these approaches are shown to be empirically well-grounded and robust enough to better understand citation dynamics at the aggregate level. On the basis of these models, we provide quantitative evidence and provisional explanations for an important shift in citation practices around 1960. We also propose a revision of the "citation classic" category as a set of articles which is clearly distinguishable from the rest of the field. (C) 2009 Elsevier Ltd. All rights reserved.
This paper classifies common journal evaluation indicators into three categories, namely three first-level indicators. They are respectively the indicators on journal impact, on timeliness, and on journal characteristics. The data used here is drawn from the medical journals in CSTPCD, a citation database built by the Institute of Scientific and Technical Information of China. The three categories of indicators are correlated with one another, so a structural equation may be established. Then we calculate the value of three first-level indicators and give subjective weights to the indicators. The comprehensive evaluation upon the medical journals yields satisfactory results. By simulating the complex relationship among journal indicators, the structural equation can be used for the estimation of some implicit indicators and the screening of indicators. This approach provides a new perspective for scientific and technological evaluation in general sense. It should be noted that the availability of basic data and the rationality of modeling bear much upon the evaluation results. (C) 2009 Elsevier Ltd. All rights reserved.
This paper proposes a new method for indicator selection in panel data analysis and tests the method with relevant data on agricultural journals provided by the Institute of Scientific & Technical Information of China. An evaluation exercise by the TOPSIS method is conducted as a comparison. The result shows that panel data analysis is an effective method for indicator selection in scholarly journal evaluation; journals of different disciplines should not be evaluated with the same criteria; it is beneficial to publish all the evaluation indicators; unavailability of a few indicators has a limited influence on evaluation results; simplifying indicators can reduce costs and increase efficiency as well as accuracy of journal evaluation. (C) 2009 Elsevier Ltd. All rights reserved.
In this paper we use scale-independent indicators to explore the performance of the Chinese innovation system from an economic and from a science and technology point of view, and compare it with 21 other nations. Some important developments in the Chinese innovation system, hidden by rankings by conventional performance indicators, were revealed. We find that gross domestic expenditure on R&D (GERD) & gross domestic product (GDP) and GDP & POP (population) all exhibit strong 'Matthew effects', measured by their scaling factors. This means that the Chinese R&D intensity (GERD/GDP) and national wealth (GDP per capita) are growing significantly with the increase of the GDP. Also pairs such as citations & papers, papers & GDP, citations & GDP, and paper & GERD exhibit these 'Matthew effects'. This observation points to the fact that in China scientific outputs and impacts are growing faster than economic growth and research investment. However, according to another scale-independent indicator, namely the adjusted relative citation impact (ARCI), China ranks on the bottom of the list, but the growth rate of the ARCI is the highest among these countries (comparing the periods 1995-1999 and 2001-2005). To sum up, we interpret these findings to mean that the scientific outputs and impacts of China show a real tendency of catching up with its economic growth. It is expected that with an increase of its GDP and R&D intensity China will show a sustained increase in indicators related to science and technology. Similarly, there are very strong 'Matthew effects' between the outputs of technology (patents) and economic growth and research investment. This means that the outputs of technology are expected to increase considerably with an increase of GDP and R&D expenditure. Furthermore, in the Chinese innovation system the government intramural expenditure on R&D (GOVERD) has a stronger non-linear impact on patent productivity than business enterprise expenditure on R&D (BERD). This shows that in China research institutions financed by the government play a more important role than enterprises. (C) 2009 Elsevier Ltd. All rights reserved.
The behavior of co-citation clusters is studied over a wide range of similarity values, and we demonstrate the existence of critical or percolation transitions marked by a sudden expansion of cluster size with a small decrease in similarity which, in most cases, reflects the emergence of a giant component on the overall graph for the dataset. The study was motivated by the question of how to set appropriate thresholds for delineating individual research areas that identify, as far as possible, natural boundaries, in view of the fact that a threshold or criterion appropriate for one area may not be appropriate for another. We explore the rate of change in cluster size as a possible boundary indicator. The relationship of this critical behavior to maps of science is discussed. (C) 2009 Elsevier Ltd. All rights reserved.
The following seniority-independent Hirsch-type index has been defined. A scientist has index hpd if hpd of his/her papers have at least hpd citations per decade each, and his/her other papers have less than hpd + 1 citations per decade each. In contrast with the original h-index, which steadily increases in time, hpd of a mature scientist is nearly constant over many years, and hpd of an inactive scientist slowly declines. Therefore hpd is suitable to compare the scientific output of scientists in different ages. (C) 2009 Elsevier Ltd. All rights reserved.
Selection processes are never faultless. We investigate the predictive validity of the manuscript selection process at Angewandte Chemie International Edition (AC-IE), one of the prime chemistry journals worldwide, and conducted a citation analysis for manuscripts that were accepted by the journal or rejected but published elsewhere (n = 1817). With the bibliometric data, we were able to calculate the extent of type I and type II errors of the selection decisions. We found that the decisions regarding 15% of the manuscripts show a type I error (accepted manuscripts that did not perform as well as or worse than the average rejected manuscript). Moreover, the decisions regarding 15% of the manuscripts are affected by a type II error (rejected manuscripts that performed equal to or above the average accepted manuscript). (C) 2009 Elsevier Ltd. All rights reserved.
In response to the call for a science of science policy, we discuss the contribution of indicators at the macro-level of nations from a scientometric perspective. In addition to global trends such as the rise of China, one can relate percentages of world share of publications to government expenditure in academic research. The marginal costs of improving one's share are increasing over time. Countries differ considerably in terms of the efficiency of turning ( financial) input into bibliometrically measurable output. Both funding schemes and disciplinary portfolios differ among countries. A price per paper can nevertheless be estimated. The percentages of GDP spent on academic research in different nations are significantly correlated to historical contingencies such as the percentage of researchers in the population. The institutional dynamics make strategic objectives such as the Lisbon objective of the EU - that is, spending 3% of GDP for R&D in 2010 - unrealistic. (C) 2009 Elsevier Ltd. All rights reserved.
In a recent paper, Egghe [Egghe, L. (in press). Mathematical derivation of the impact factor distribution. Journal of Informetrics] presents a mathematical analysis of the rank-order distribution of journal impact factors. The analysis is based on the central limit theorem. We criticize the empirical relevance of Egghe's analysis. More specifically, we argue that Egghe's analysis relies on an unrealistic assumption and we show that the analysis is not in agreement with empirical data. (C) 2009 Elsevier Ltd. All rights reserved.
The universe of information has been enriched by the creation of the World Wide Web, which has become an indispensible source for research. Since this source Is growing at an enormous speed, an in-depth look of its performance to create a method for its evaluation has become necessary; however, growth is not the only process that influences the evolution of the Web. During their lifetime, Web pages may change their content and links to/from other Web pages, be duplicated or moved to a different URL, be removed from the Web either temporarily or permanently, and be temporarily inaccessible due to server and/or communication failures. To obtain a better understanding of these processes, we developed a method for tracking topics on the Web for long periods of time, without the need to employ a crawler and relying only on publicly available resources. The multiple data-collection methods used allow us to discover new pages related to the topic, to identify changes to existing pages, and to detect previously existing pages that have been removed or whose content is not relevant anymore to the specified topic. The method is demonstrated through monitoring Web pages that contain the term "informetrics" for a period of 8 years. The data-collection method also allowed us to analyze the dynamic changes in search engine coverage, illustrated here on Google-the search engine used for the longest period of time for data collection in this project.
Understanding the growth models of news stories on disasters is a key issue for efficient disaster management. This article proposes a method to identify three growth models: the Damped Exponential Model, the Normal Model, and the Fluctuating Model. This method is proven to be valid using the 112 disasters occurring between 2003 and 2008. The factors that influence the likelihood of the growth models include disaster types, newsworthy material, disaster severity, and economic development of the affected area. This article suggests that disaster decision-makers can identify the respective likelihood of the three growth models of news stories when a disaster happens, and thereby implement effective measures in response to the disaster situation.
Searching for multimedia is an important activity for users of Web search engines. Studying user's interactions with Web search engine multimedia buttons, including image, audio, and video, is important for the development of multimedia Web search systems. This article provides results from a Weblog analysis study of multimedia Web searching by Dogpile users in 2006. The study analyzes the (a) duration, size, and structure of Web search queries and sessions; (b) user demographics; (c) most popular multimedia Web searching terms; and (d) use of advanced Web search techniques including Boolean and natural language. The current study findings are compared with results from previous multimedia Web searching studies. The key findings are: (a) Since 1997, image search consistently is the dominant media type searched followed by audio and video; (b) multimedia search duration is still short (>50% of searching episodes are <1 min), using few search terms; (c) many multimedia searches are for information about people, especially in audio search; and (d) multimedia search has begun to shift from entertainment to other categories such as medical, sports, and technology (based on the most repeated terms). Implications for design of Web multimedia search engines are discussed.
This article aims at identifying the factors influencing the implementation of Web accessibility (WA) by European banks and its effects on the visibility through the Internet. We studied a database made up of 51 European banks whose shares are included in the Dow Jones EURO STOXX (R) TMI Banks [8300] Index. Regarding the factors for the implementation of WA, we considered two feasible reasons. First, WA adoption can be motivated by operational factors, as WA can contribute to increase the efficiency of the operations. Second, WA can also be understood as a part of the corporate social responsibility (CSR) strategy, so banks that are more committed with CSR should be more prone to implement WA. However, our results indicate that neither operational nor social factors seem to have exerted a significant influence on WA adoption. An implication of these findings is the advisability of orientating governmental policies to firm awareness of the fact that WA should be a part of the CSR activities of banks. Regarding the effects of the implementation of WA, our results indicate that the effort pays dividends in terms of Internet visibility. This could eventually contribute to increase future revenues and, therefore, the performance of WA-committed banks.
This article describes a model for online consumer health information consisting of five quality criteria constructs. These constructs are grounded in empirical data from the perspectives of the three main sources in the communication process: health information providers, consumers, and intermediaries, such as Web directory creators and librarians, who assist consumers in finding healthcare information. The article also defines five constructs of Web page structural markers that could be used in information quality evaluation and maps these markers to the quality criteria. Findings from correlation analysis and multinomial logistic tests indicate that use of the structural markers depended significantly on the type of Web page and type of information provider. The findings suggest the need to define genre-specific templates for quality evaluation and the need to develop models for an automatic genre-based classification of health information Web pages. In addition, the study showed that consumers may lack the motivation or literacy skills to evaluate the information quality of health Web pages, which suggests the need to develop accessible automatic information quality evaluation tools and ontologies.
Consumer empowerment and the role of the expert patient in their own healthcare, enabled through timely access to quality information, have emerged as significant factors in better health and lifestyle outcomes. Governments, medical researchers, healthcare providers in the public and private sector, drug companies, health consumer groups, and individuals are increasingly looking to the Internet to both access and distribute health information, communicate with each other, and form supportive or collaborative online communities. Evaluating the accuracy, provenance, authority, and reliability of Web-based health information is a major priority. The Breast Cancer Knowledge Online Portal project (BCK-Online) explored the individual and changing information and decision support needs of women with breast cancer and the issues they face when searching for relevant and reliable health information on the Internet. Its user-sensitive research design integrated multidisciplinary methods including user information-needs analysis, knowledge-domain mapping, metadata modeling, and systems-development research techniques. The main outcomes were a personalized information portal driven by a metadata repository of user-sensitive resource descriptions, the BCKOnline Metadata Schema, richer understandings of the concepts of quality, relevance, and reliability, and a user-sensitive design methodology. This article focuses on the innovative, metadata-based quality reporting feature of the BCKOnline Portal, and concludes that it is timely to consider the inclusion of quality elements in resource discovery metadata schema, especially in the health domain.
This article explores the aesthetic design criteria that should be incorporated into the information visualization of a taxonomy intended for use by children. Seven elementary-school students were each asked to represent their ideas in drawings for visualizing a taxonomy. Their drawings were analyzed according to six criteria-balance, equilibrium, symmetry, unity, rhythm, and economy-identified as aesthetic measures in previous research. The drawings revealed the presence of all six measures, and three-unity, equilibrium, and rhythm-were found to play an especially important role. It is therefore concluded that an aesthetic design for an information visualization for young users should incorporate all six measures.
The aggregated journal-journal citation matrix-based on the Journal Citation Reports (JCR) of the Science Citation Index-can be decomposed by indexers or algorithmically. In this study, we test the results of two recently available algorithms for the decomposition of large matrices against two content-based classifications of journals: the ISI Subject Categories and the field/subfield classification of Glanzel and Schubert (2003). The content-based schemes allow for the attribution of more than a single category to a journal, whereas the algorithms maximize the ratio of within-category citations over between-category citations in the aggregated category-category citation matrix. By adding categories, indexers generate between-category citations, which may enrich the database, for example, in the case of inter-disciplinary developments. Algorithmic decompositions, on the other hand, are more heavily skewed towards a relatively small number of categories, while this is deliberately counteracted upon in the case of content-based classifications. Because of the indexer effects, science policy studies and the sociology of science should be careful when using content-based classifications, which are made for bibliographic disclosure, and not for the purpose of analyzing latent structures in scientific communications. Despite the large differences among them, the four classification schemes enable us to generate surprisingly similar maps of science at the global level. Erroneous classifications are cancelled as noise at the aggregate level, but may disturb the evaluation locally.
Many algorithms have been implemented for the problem of text classification. Most of the work in this area was carried out for English text. Very little research has been carried out on Arabic text. The nature of Arabic text is different than that of English text, and preprocessing of Arabic text is more challenging. This paper presents an implementation of three automatic text-classification techniques for Arabic text. A corpus of 1445 Arabic text documents belonging to nine categories has been automatically classified using the kNN, Rocchio, and naive Bayes algorithms. The research results reveal that Naive Bayes was the best performer, followed by kNN and Rocchio.
Nobel Prizes are an important indicator of research excellence for a country. Spain has not had a science Nobel Prize winner since 1906, although its gross domestic product (GDP) is high, research and development (R&D) investments, in monetary terms, are high, and conventional bibliometric parameters are fairly good. Spanish research produces many sound papers that are reasonably cited but does not produce top-cited publications. This absence of top-cited publications suggests that important achievements are scarce and, consequently, explains the absence of Nobel Prize awards. I argue that this negative research trend in Spain is caused by the extensive use of formal research evaluations based on the number of publications, impact factors, and journal rankings. These formal evaluations were introduced to establish a national salary bonus that mitigated the lack of research incentives in universities. When the process was started, the results were excellent but, currently, it has been kept too long and should be replaced by methods to determine the actual interest of the research. However, this replacement requires greater involvement of universities in stimulating research.
The article analyzes how researchers evaluate the quality of information in Internet e-mailing lists and whether the embeddedness of academic e-mailing lists in off line networks of a scientific community is related to it. The idea is that embeddedness gives researchers more opportunities to build up a reputation through contributing high-quality information to the group. Survey and archive data are used to test hypotheses about the effects of off line networks in 47 academic e-mailing lists used by a random sample of university researchers. Results indicate that the information in embedded online groups is regarded as more valuable than in purely virtual groups.
The importance of information visualization as a means of transforming data into visual, understandable form is now embraced across university campuses and research institutes world-wide. Yet, the role of designers in this field of activity is often overlooked by the dominant scientific and technological interests in data visualization, and a corporate culture reliant on off-the-shelf visualization tools. This article is an attempt to describe the value of design thinking in information visualization with reference to Horst Rittel's (1988) definition of "disorderly reasoning," and to frame design as a critical act of translating between scientific, technical, and aesthetic interests.
This article will explore information design as a form of contemporary artistic practice and how artistic and philosophical concepts such as the "performative utterance" operate at the edges of metadata and large-scale technology projects such as the Semantic Web.
This article is focused on the changes needed in design to create positive solutions for all involved in design processes. It draws upon the rich discussion and discourse from a conference focused on positive design involving managers, designers, and IT specialists, all focused on overcoming the problem-based focus and decision paradigms to enhance all phases of the design processes to develop sustainable solutions for real issues in a changing world. Therefore, all fields using design, consciously or not, including management, Information Communication Technology (ICT), and designers as well, need to redesign their processes and first rethink their design paradigms on a meta level.
In most discussions about information and knowledge management, natural language is described as too fuzzy, ambiguous, and changing to serve as a basis for the development of large-scale tools and systems. Instead, artificial formal languages are developed and used to represent, hopefully in an unambiguous and precise way, the information or knowledge to be managed. Intertextual semantics (IS) adopts an almost exactly opposite point of view: Natural language is the foundation on which information management tools and systems should be developed, and the usefulness of artificial formalisms used in the process lies exclusively in our ability to derive natural language from them. In this article, we introduce IS, its origins, and underlying hypotheses and principles, and argue that even if its basic principles seem remote from current trends in design, IS is actually compatible with-and complementary to-those trends, especially semiotic engineering (C.S. de Souza, 2005a). We also hint at further possible application areas, such as interface and interaction design, and the design of concrete objects.
In 1996, with funding from the Henry Luce Foundation, Jack Lenor Larsen and an advisory committee composed of distinguished museum and design professionals developed "Objects Classified by Mediums" in response to the concern that existing systems do not provide the tools for comparing information on objects. A common understanding and definitions of terms are crucial to the success of a classification project meant to cross institutional and national boundaries. "Objects Classified by Mediums" seeks to organize areas of study in fiber, clay, metal, wood, and so on, to allow curators and scholars to compare information on similar methods used, build a conceptual framework for the greater understanding of whole categories of objects rather than as isolated works, and provide a finding tool for cross-cultural and cross-disciplinary investigation.
Communicative clothing is emerging as the next generation of intelligent clothing, with communication being achieved between clothing and the wearer or between clothing and the external environment or people. In both cases, "communicative" clothing refers to any clothing or textile accessory that receives or emits information out of the structure that composes it. The market for this clothing is foreseen in specific area applications, including professionals, healthcare, everyday life, sports or leisure, and so on; however, large-volume production and application in everyday use is still a dream for manufacturers. One of the main roadblocks to successful adoption of these technologies among fashion designers and retailers is that access to ready materials with which to experiment and develop commercially successful products is limited. Communication clothing is the result of integration of a number of different technical elements such as control interfaces, sensors, data-processing devices, output devices, energy sources, and connectors. It is reasonable to expect that these technologies/elements are known and available as standard tools and accessories, usable by fashion designers to add value to any garment ensemble.
Incredible innovations are being made in the world of textiles due to collaborations across disciplines that allow incorporation of technology and textiles. The author uses research she did to curate the Cooper-Hewitt National Design Museum's 2005 exhibit Extreme Textiles: Designing for High Performance to further explore developments in textile engineering in aerospace, the military, athletics, and architecture that benefit from technology transfer, or moving a technology developed for one organization or environment into another. Through these collaborations, ways of using "smart" or "electronic" textiles, which can sense and react to their environments, have made significant advancements-developments that have proven useful not only in the field for which they are intended but across industries. Craft, particularly embroidery, is an important piece of this work, often providing the answer to the questions of, for example, how to keep circuitry closed, as with the Antennae Vest, or how to maintain aesthetics of conductive fabrics, as with the Fuzzy Light Switch. The author demonstrates how textiles, as a craft, fit easily with contemporary technology.
Successful professionals in technical disciplines require abilities beyond technical competence-to interpret complex and ambiguous situations, interact with experts from other specialties and disciplines, and constructively evaluate their own work and the work of others. In this article, we argue that experiences and interactions with the arts should play an important role in the education of a specific group of technical workers-information professionals-and that such interactions provide a useful and necessary complement to the more familiar rational, scientific model that currently informs technical professional education. We discuss the principles inherent in an arts-based approach to learning and show how the work done by information professionals is similar to the work done by creative and performing artists as well as those in the design professions. Finally, we describe three examples of complementary learning opportunities built on arts-based practices.
Over the last few decades, digital technologies have driven deep and profound changes in our relationships with our institutions, communications, and cultures. This process is not only ongoing but also accelerating. For the children who will inevitably grow up in this environment of change, we have done little to update the institution of education. The field of design has a great deal to offer children at this time. The thinking processes and multimodal approaches can, in part, provide the foundation for the skills that children will need for the necessary innovations of the future. The following article makes further recommendations for creativity as the next essential literacy for our children.
Creativity and inspiration are essential elements of the fashion design process. Many historic costume collections were founded specifically to educate and inspire designers and students. While traditional research took a hands-on approach to using these collections, students and designers increasingly rely on the Internet and other digital resources for inspiration. Consequently, to remain relevant, costume collections need to adapt to this new way of conducting research. Several projects, such as the Digital Dress Project, the Drexel Digital Museum Project, and the recently launched searchable catalog of The Costume Institute of the Metropolitan Museum of Art, have advanced this process.
We investigate different approaches based on correlation analysis to reduce the complexity of a space of quantitative indicators for the assessment of research performance. The proposed methods group bibliometric indicators into clusters of highly intercorrelated indicators. Each cluster is then associated with a representative indicator. The set of all representatives corresponds to a base of orthogonal metrics capturing independent aspects of research performance and can be exploited to design a composite performance indicator. We apply the devised methodology to isolate orthogonal performance metrics for scholars and journals in the field of computer science and to design a global performance indicator. The methodology is general and can be exploited to design composite indicators that are based on a set of possibly overlapping criteria.
It is often necessary to compare data-rich charts, tables, diagrams, or drawings rather than the articles that contextualize that data. The objective of this research has been to create a database of non-textual components (here, maps) that are searchable independently of the articles from which they are taken, with the option to view the source articles. The method mines words from the articles that are near or associated with each component map, and these mined words become the basis of region, time, and subject indexing. The evaluation showed that automatic indexing of the component maps by these three facets works well, and indicates that a large-scale component database following this model is viable.
Concerns about health issues cover a wide spectrum. Consumer health information, which has become more available on the Internet, plays an extremely important role in addressing these concerns. A subject directory as an information organization and browsing mechanism is widely used in consumer health-related Websites. In this study we employed the information visualization technique Self-Organizing Map (SOM) in combination with a new U-matrix algorithm to analyze health subject clusters through a Web transaction log. An experimental study was conducted to test the proposed methods. The findings show that the clusters identified from the same cells based on path-length-1 outperformed both the clusters from the adjacent cells based on path-length-1 and the clusters from the same cells based on path-length-2 in the visual SOM display. The U-matrix method successfully distinguished the irrelevant subjects situated in the adjacent cells with different colors in the SOM display. The findings of this study lead to a better understanding of the health-related subject relationship from the users' traversal perspective.
A longstanding area of study in indexing is the identification of factors affecting vocabulary usage and consistency. This topic has seen a recent resurgence with a focus on social tagging. Tagging data for scholarly articles made available by the social bookmarking Website CiteULike (www.citeulike.org) were used to test the use of inter-indexer/tagger consistency density values, based on a method developed by the authors by comparing calculations for highly tagged documents representing three subject areas (Science, Social Science, Social Software). The analysis revealed that the developed method is viable for a large dataset. The findings also indicated that there were no significant differences in tagging consistency among the three topic areas, demonstrating that vocabulary usage in a relatively new subject area like social software is no more inconsistent than the more established subject areas investigated. The implications of the method used and the findings are discussed.
To establish whether the compromised need/the label effect is a frequently occurring phenomenon or not, available studies of the phenomenon are examined and claims are compared with evidence. Studies that reportedly have verified the phenomenon are shown to suffer from technical problems that put the claim of verification in doubt. Studies that have reported low percentages of questions changing from the initial query during large-scale studies of user-librarian negotiations could indicate that users are quite often asking for precisely what they want. However, these studies are found not to be definite falsifications, as the librarians did not conduct in-depth interviews and therefore may have failed to discover the users' real information needs. Whether the compromised need/the label effect is a frequently occurring phenomenon or not cannot be conclusively confirmed or disconfirmed. However, the compromised need/the label effect is not the obvious truism or empirical fact that it has otherwise been claimed to be.
Detailed checklists and questionnaires have been used in the past to assess the quality of structured abstracts in the medical sciences. The aim of this article is to report the findings when a simpler checklist was used to evaluate the quality of 100 traditional abstracts published in 53 different social science journals. Most of these abstracts contained information about the aims, methods, and results of the studies. However, many did not report details about the sample sizes, ages, or sexes of the participants, or where the research was carried out. The correlation between the lengths of the abstracts and the amount of information present was 0.37 (p < .001), suggesting that word limits for abstracts may restrict the presence of key information to some extent. We conclude that authors can improve the quality of information in traditional abstracts in the social sciences by using the simple checklist provided in this article.
Information Science (IS) is commonly said to study collection, classification, storage, retrieval, and use of information. However, there is no consensus on what information is. This article examines some of the formal models of information and informational processes, namely, Situation Theory and Shannon's Information Theory, in terms of their suitability for providing a useful framework for studying information in IS. It is argued that formal models of information are concerned with mainly ontological aspects of information, whereas IS, because of its evaluative role with respect to semantic content, needs an epistemological conception of information. It is argued from this perspective that concepts of epistemological/aesthetic/ethical information are plausible, and that information science needs to rise to the challenge of studying many different conceptions of information embedded indifferent contexts. This goal requires exploration of a wide variety of tools from philosophy and logic.
Information and knowledge are true assets in modern organizations. In order to cope with the need to manage these assets, corporations have invested in a set of practices that are conventionally called knowledge management. This article presents a case study on the development and the evaluation of ontologies that was conducted within the scope of a knowledge management project undertaken by the second largest Brazilian energy utility. Ontologies have different applications and can be used in knowledge management, in information retrieval, and in information systems, to mention but a few. Within the information systems realm, ontologies are generally used as system models, but their usage has not been restricted to software development. We advocate that, once assessed as to its content, an ontology may provide benefits to corporate communication and, therefore, provide support to knowledge management initiatives. We expect to further contribute by describing possibilities for the application of ontologies within organizational environments.
In the information systems (IS) field, research interest in attitude has fluctuated over the past decades given the inconsistent and inconclusive findings on attitude's effects on behavioral intention (BI) to use information and communication technology (ICT). This study addresses the conceptual, operational, and temporal dynamics of attitude that may have caused the inconsistent and inconclusive results. A longitudinal study was conducted to validate our hypotheses. The results show that: (a) The attitude that significantly influences BI needs to be at a particular specificity with BI on two aspects, the same evaluation target and the same evaluation time, where the time specificity can supersede the target specificity; (b) the relationships among attitudes and intention remain the same if they are measured at the same time, regardless of use stages; (c) the two types of attitudes show different long-lasting effects over time; (d) omitting important mediating factors in a research model may generate misleading messages; and (e) attitudes alone can explain a large amount of variances in BI. The results can help explain the reasons behind inconsistent findings in the literature, inspire additional research efforts, and suggest bringing attitudes back to information systems research due to their theoretical and practical importance.
A new method for mapping the semantic structure of science is described. We assume that different researchers, working on the same set of research problems, will use the same words for concepts central to their research problems. Therefore, different research fields and disciplines should be identifiable by different words and the pattern of co-occurring words. In natural language, however, there is quite some diversity because many words have multiple meaning. In addition, the same meaning can be expressed by using different words. We argue that traditional factor analytic and cluster analytic techniques are inadequate for mapping the semantic structure if such polysemous and synonymous words are present. Instead, an alternative model, the mixtures of factor analyzers (MFA) model, is utilized. This model extends the traditional factor analytic model by allowing multiple centroids of the dataset. We argue that this model is structurally better suited to map the semantic structure of science. The model is illustrated by a case study of the uncertainty literature sampled from data from the ISI Web of Science. The MFA model is applied with the goal of discovering multiple, potentially incommensurate, conceptualizations of uncertainty in the literature. In this way, the MFA model can help in creating understanding of the use of language in science, which can benefit multidisciplinary research and interdisciplinary understanding, and assist in the development of multidisciplinary taxonomies of science.
This study applied Communication Privacy Management (CPM) theory to the context of blogging and developed a validated, theory-based measure of blogging privacy management. Across three studies, 823 college student bloggers completed an online survey. In study one (n=176), exploratory and confirmatory factor analysis techniques tested four potential models. Study two (n=291) cross-validated the final factor structure obtained in the fourth model with a separate sample. Study three (n=356) tested the discriminant and predictive validity of the measure by comparing it to the self-consciousness scale. The Blogging Privacy Management Measure (BPMM) is a multidimensional, valid, and reliable construct. Future research could explore the influence of family values about privacy on blogging privacy rule management.
We are witnessing a rapid trend toward the adoption of exercises for evaluation of national research systems, generally based on peer review. They respond to two main needs: stimulating higher efficiency in research activities by public laboratories, and realizing better allocative efficiency in government funding of such institutions. However, the peer review approach is typified by several limitations that raise doubts for the achievement of the ultimate objectives. In particular, subjectivity of judgment, which occurs during the step of selecting research outputs to be submitted for the evaluations, risks heavily distorting both the final ratings of the organizations evaluated and the ultimate funding they receive. These distortions become ever more relevant if the evaluation is limited to small samples of the scientific production of the research institutions. The objective of the current study is to propose a quantitative methodology based on bibliometric data that would provide a reliable support for the process of selecting the best products of a laboratory, and thus limit distortions. Benefits are twofold: single research institutions can maximize the probability of receiving a fair evaluation coherent with the real quality of their research. At the same time, broader adoptions of this approach could also provide strong advantages at the macroeconomic level, since it guarantees financial allocations based on the real value of the institutions under evaluation. In this study the proposed methodology was applied to the hard science sectors of the Italian university research system for the period 2004-2006.
Many studies on coauthorship networks focus on network topology and network statistical mechanics. This article takes a different approach by studying micro-level network properties with the aim of applying centrality measures to impact analysis. Using coauthorship data from 16 journals in the field of library and information science (LIS) with a time span of 20 years (19882007), we construct an evolving coauthorship network and calculate four centrality measures (closeness centrality, betweenness centrality, degree centrality, and PageRank) for authors in this network. We find that the four centrality measures are significantly correlated with citation counts. We also discuss the usability of centrality measures in author ranking and suggest that centrality measures can be useful indicators for impact analysis.
Sentence ranking is the issue of most concern in document summarization today. While traditional feature-based approaches evaluate sentence significance and rank the sentences relying on the features that are particularly designed to characterize the different aspects of the individual sentences, the newly emerging graph-based ranking algorithms (such as the PageRank-like algorithms) recursively compute sentence significance using the global information in a text graph that links sentences together. In general, the existing PageRank-like algorithms can model well the phenomena that a sentence is important if it is linked by many other important sentences. Or they are capable of modeling the mutual reinforcement among the sentences in the text graph. However, when dealing with multidocument summarization these algorithms often assemble a set of documents into one large file. The document dimension is totally ignored. In this article we present a framework to model the two-level mutual reinforcement among sentences as well as documents. Under this framework we design and develop a novel ranking algorithm such that the document reinforcement is taken into account in the process of sentence ranking. The convergence issue is examined. We also explore an interesting and important property of the proposed algorithm. When evaluated on the DUC 2005 and 2006 query-oriented multidocument summarization datasets, significant results are achieved.
Peer-to-peer (P2P) file sharing is a leading Internet application. Millions of users use P2P file-sharing systems daily to search for and download files, accounting for a large portion of Internet traffic. Due to their scale, it is important to fully understand how these systems work. We analyze user queries and shared files collected on the Gnutella system, draw some conclusions on the nature of the application, and propose some research problems.
We present a rationale for the Hirsch-index rank-order distribution and prove that it is a power law (hence a straight line in the log-log scale). This is confirmed by experimental data of Pyykko and by data produced in this article on 206 mathematics journals. This distribution is of a completely different nature than the impact factor (IF) rank-order distribution which (as proved in a previous article) is S-shaped. This is also confirmed by our example. Only in the log-log scale of the h-Index distribution do we notice a concave deviation of the straight line for higher ranks. This phenomenon is discussed.
L. Egghe (2008) studied the h-index (Hirsch index) and the g-index, counting the authorship of cited articles in a fractional way. But his definition of the 9(F)-index for the case that the article count is fractionalized yielded values that were close to or even larger than the original g-index. Here I propose an alternative definition by which the g-index is modified in such a way that the resulting g(m)-index is always smaller than the original g-index. Based on the interpretation of the g-index as the highest number of articles of a scientist that received on average g or more citations, in the specification of the new g(m)-index the articles are counted fractionally not only for the rank but also for the average.
The consistent finding that internationally coauthored papers are more heavily cited has led to a tacit agreement among politicians and scientists that international collaboration in scientific research should be particularly promoted. However, existing studies of research collaboration suffer from a major weakness in that the Thomson Reuters Web of Science until recently did not link author names with affiliation addresses. The general approach has been to hierarchically code papers into international paper, national paper, or local paper based on the address information. This hierarchical coding scheme severely understates the level and contribution of local or national collaboration on an internationally coauthored paper. In this research, I code collaboration variables by hand checking each paper in the sample, use two measures of a paper's impact, and try several regression models. I find that both international collaboration and local collaboration are positively and significantly associated with a paper's impact, but international collaboration does not have more epistemic authority than local collaboration. This result suggests that previous findings based on hierarchical coding might be misleading.
By their widespread availability and dissemination through open access media, scholarly outputs witness an improved visibility supposed to cause a better citation performance. However, due to the existence of the Matthew effect in science system, which affects users' perceptions of quality, ultimate effects of the enhanced visibility on different entities are obscure. Moreover, different attitudes towards open access give rise to a more strong quality dynamics in the open access world. Aiming to explore the consequence of the interaction between visibility and quality dynamics, this study investigates countries positioning in open access journals. The results show that the world's countries welcome open access pattern whether by submitting to or publishing open access journals. A large proportion of the enduring, prestigious open access journals are published by scientifically proficient and developing nations, emphasizing their successful commitment to maintain the undertaken role. The results of the citation analysis highlight national inequalities regarding citation distributions among countries contributing to the journals within the system and within individual disciplines in the system. Well-performing countries mainly consist of advanced ones; however, some lessdeveloped nations are found to perform well in the journal system.
Author co-citation analysis (ACA) is an important method for discovering the intellectual structure of a given scientific field. Since traditional ACA was confined to ISI Web of Knowledge (WoK), the co-citation counts of pairs of authors mainly depended on the data indexed in WoK. Fortunately, Google Scholar has integrated different academic databases from different publishers, providing an opportunity of conducting ACA in wider a range. In this paper, we conduct ACA of information science in China with the Chinese Google Scholar. Firstly, a brief introduction of Chinese Google Scholar is made, including retrieval principles and data formats. Secondly, the methods used in our paper are given. Thirdly, 31 most important authors of information science in China are selected as research objects. In the part of empirical study, factor analysis is used to find the main research directions of information science in China. Pajek, a powerful tool in social network analysis, is employed to visualize the author co-citation matrix as well. Finally, the resemblances and the differences between China and other countries in information science are pointed out.
This paper reports on the practises of bioinformatics research in South Africa using bibliometric techniques. The search strategy was designed to cover the common concepts in biological data organisation, retrieval and analysis; the development and application of tools and methodologies in biological computation; and related subjects in genomics and structural bioinformatics. The South African literature in bioinformatics has grown by 66.5% between 2001 and 2006. However, its share of world production is not on par with comparator countries, Brazil, India and Australia.
The object is to identify the flux of information and get to know the socio-spatial and socioinstitutional dimensions of knowledge in the process of innovation, and to be able to visualize the impact and cognitive relationships of the sources of information used in the production of patents, as well as interactions and social cooperation that exists between the local innovative agents of The State University of Campinas. The research is of an exploratory nature with a case study design, in order to find out, by means of patentometric indicators, the flow and social relations characterized by cognitive and institutional aspects of local and regional knowledge based on the production of the Institution's patents.
In scientometrics for trend analysis, parameter choices for observing trends are often made ad hoc in past studies. For examples, different year spans might be used to create the time sequence and different indices were chosen for trend observation. However, the effectiveness of these choices was hardly known, quantitatively and comparatively. This work provides clues to better interpret the results when a certain choice was made. Specifically, by sorting research topics in decreasing order of interest predicted by a trend index and then by evaluating this ordering based on information retrieval measures, we compare a number of trend indices (percentage of increase vs. regression slope), trend formulations (simple trend vs. eigen-trend), and options (various year spans and durations for prediction) in different domains (safety agriculture and information retrieval) with different collection scales (72500 papers vs. 853 papers) to know which one leads to better trend observation. Our results show that the slope of linear regression on the time series performs constantly better than the others. More interestingly, this index is robust under different conditions and is hardly affected even when the collection was split into arbitrary (e.g., only two) periods. Implications of these results are discussed. Our work does not only provide a method to evaluate trend prediction performance for scientometrics, but also provides insights and reflections for past and future trend observation studies.
A frequently used indicator for assessing technological strengths of nations are patents registered in the triad region, i.e. in North America, Europe, and Asia. Currently these so-called triadic patents are defined as filed at the United States Patent and Trademark Office (USPTO), the European Patent Office (EPO), and the Japanese Patent Office (JPO). Recent developments suggested that this definition might lack adequacy regarding the offices in Europe and Asia. Our findings propose that in particular Germany and China should be added to this triad definition since in some technology fields patents registered in these countries show the same citation impact as patents registered at the EPO or JPO. Our results also underline that the number of triadic patent families per country is a function of technological specialization and (national) patenting strategies.
In this paper we investigate the position of a review network within a research specialty; the network of scholars who write reviews of their colleagues' work. This is one of the voluntary activities that researchers perform as a prerequisite for the functioning of the invisible college. We compare this network to other networks within the specialty, and this allows us to distinguish various roles: stars, influentials, members, reviewers and juniors. As scholars are characterized by different role-configurations, the invisible college becomes stratified. We discuss the implications for the development of a referee factor and review factor, norms for refereeing and reviewing, and the development of systems-based research evaluations.
This paper surveys 32 renowned Industrial Engineering (IE) journals with regard to authorship for the period of 1996-2005. The findings show that the USA was the top contributing country, accounting for approximately one-third of the total number of articles. The 80/20 rule and the entropy measure consistently identify Issues in Science and Technology (IST), Industrial Engineer (IE), and R&D Magazine (RDM) as journals of high country concentration, or journals of low internationality. Conversely, Journal of Materials Processing Technology (JMPT), Production Planning & Control (PPC), and Technovation (TNV) have the highest degree of country diversity, or internationality. The quality of a journal, as expressed by impact factors, its internationality, and its number of articles published, are found to be independent of each other.
The state of the art on the issue of sex differences in research efficiency agrees in recognizing higher performances for males, however there are divergences in explaining the possible causes. One of the causes advanced is that there are sex differences in the availability of aptitude at the "high end". By comparing sex differences in concentration and performance of Italian academic star scientists to the case in the population complement, this work aims to verify if star, or "high-end", scientists play a preponderant role in determining higher performance among males. The study reveals the existence of a greater relative concentration of males among star scientists, as well as a performance gap between male and female star scientists that is greater than for the rest of the population. In the latter subpopulation the performance gap between the two sexes is seen as truly marginal.
This research analyzes a "who cites whom" matrix in terms of aggregated journal-journal citations to determine the location of communication studies on the academic spectrum. Using the Journal of Communication as the seed journal, the 2006 data in the Journal Citation Reports are used to map communication studies. The results show that social and experimental psychology journals are the most frequently used sources of information in this field. In addition, several journals devoted to the use and effects of media and advertising are weakly integrated into the larger communication research community, whereas communication studies are dominated by American journals.
Bibliographic records are extensively used in the study of citations. Based on ISI data, this paper examines citation patterns of the publications of South African scientists in recent years. In particular, the focus of this paper is on citations as to the collaborative dimensions of South African scientists in their publications. The study reveals that the number of citations received by a publication varies not only according to the collaboration but also to the types of collaboration of the authors who are involved in its production. Furthermore, it emerges that the impact of citations on publications differs from discipline to discipline, and affiliating sector to sector, regardless of collaboration.
The state authorities in Germany used to fund public sector research without controlling the performance of the research units. This has changed during past decade, where the dominant mechanism by which formerly unconditional state funds are allocated nowadays is indicator-based performance measurement. The indicator sets used to measure the research-related performance in the German public science sector are usually very narrow, often consisting exclusively of finished doctoral theses and third-party funds. Using a unique dataset of 473 German research units from astrophysics, nanotechnology, economics and biotechnology, this paper outlines principles for the construction of sensible indicator sets for the performance measurement of scientific research groups. It is argued that scientific production is multidimensional. Thus one-sided indicator sets that fail to cover the relevant output dimensions give rise to incentives that will ultimately lower the performance of the science sector in total. Indicator sets should strive for sustainable incentives, which can be guaranteed if the sets are broad enough. As a starting point it is shown that the very common performance indicator 'acquired third-party funds' may affect research efficiency negatively, especially if the level of third-party funds is already very high. Therefore, we conclude that third-party funds should be used with great care, if at all.
English is becoming the international language in numerous fields of human civilization. We sought to evaluate the extent of use of English in the field of biomedical publications. We searched in PubMed for the number of articles written in the 57 indexed languages, during each one of the four past 10-year periods. The extent of use of English as the publication language of articles included in PubMed has gradually risen from 62.3% of the total number of indexed articles between 1967-1976, to 74.0% between 1977-1986, 83.4% between 1987-1996, and reached 89.3% in the period between 1997-2006. The percentage of articles written in each one of the other languages was less than 1.6% for the period of 1997-2006. Apart from English, only the percentage of articles written in Chinese has risen between 1967-1976 and 1997-2006 (from 0.05% to 1.49%). In conclusion, the dominance of English in biomedical publications archived by the most commonly used database is impressive and increasing. This fact may have several consequences, favourable or not, in various aspects of scientific production.
Though there are many and diverse opinions as to the order in which the authors appear in research papers, the most accepted is the one which gives more responsibility to the first and last author. In this work, a study is carried out of the order in which the authors appear in research papers, in which at least one author affiliated to the University of Extremadura (Spain) has collaborated in the 1990-2005 period. The objective is to determine the difference in the position of men and women, and the resulting responsibility and visibility of female authors as opposed to male authors. In the University of Extremadura these positions are principally occupied by men, since throughout the period studied, no more than 20% of the papers have women either in the first or last position, while the percentage obtained by men is around 50%, the remaining percentage being occupied by authors not belonging at present to the Uex. Nevertheless, the women of the University of Extremadura have both a higher percentage than expected and a positive evolution in the more relevant positions in recent years.
The conventional view depicts scientific communities in the developing world as globally isolated and dependent. Recent studies suggest that individual scientists tend to favor either local or international ties. Yet there are good reasons to believe that both kinds of ties are beneficial for knowledge production. Since they allow for the more efficient management of social networks, Internet technologies are expected to resolve this inverse relationship. They are also expected to decentralize access to resources within developing regions that have traditionally reflected an urban male bias. Elaborating upon science, development and social network perspectives, we examine the impact of the Internet in the Chilean scientific community, addressing the questions 'to what extent is Internet use and experience associated with the size of foreign and domestic professional networks?' and 'are professional network resources equitably distributed across regional and demographical dimensions?' We offer results from a communication network survey of 337 Chilean researchers working in both academic departments and research institutes. We introduce a new measure, 'collaboration range', to indicate the extent to which scientists engage in work with geographically dispersed contacts. Results suggest that larger foreign networks are associated with higher email use and diversity, but local networks are smaller with longer use of the Internet. Diversity of email use is also associated with diverse geographical networks. Moreover, Internet use may be reducing the significance of international meetings for scientific collaboration and networking. Finally, results also show that in the Internet age professional network resources are distributed symmetrically throughout the Chilean scientific community.
We analyze and evaluate the information provided by Spanish public universities on the web about their assessment and quality processes with the aim of detecting aspects for improvement and identifying best practices in universities that could act as a benchmark for the rest of the sector. A tested model/template incorporating a set of criteria and indicators is used to determine the quality of this information. The strengths and weaknesses of institutional websites are analyzed at both individual level and as a whole; the possible relation between website quality and the characteristics of the universities is also examined.
We show that the invariant method [Pinski & Narin, 1976], recently axiomatised by Palacios-Huerta & Volij [2004], and used to quality-rank academic journals is subject to manipulation: a journal can boost its performance by making additional citations to other journals.
The aim of this paper is to model and study the age of the Web using a sample of about four million of web pages from the 16 European Research Area countries obtained during 2004 and 2005. Web page time-stamp (date when the web pages were created or last changed for last time), format and size in bytes data have been analysed. Several indicators are introduced to measure longitudinal aspects of the Web. Half-age is proposed as a measure of the age distribution because this is found to be exponential. "Web Update Index" and "Lifespan Index" are introduced to measure the changing rate of a small sample over time. Results show that the British Web space has the youngest Web pages while the Greek and Belgian ones have the oldest. The study also compared Web pages topics and found that Biology pages are more stable than Physics pages.
This paper studies four different h-index sequences (different in publication periods and/or citation periods). Lotkaian models for these h-index sequences are presented by mutual comparison of one sequence with another one. We also give graphs of these h-sequences for this author on which a discussion is presented. The same is done for the g-index and the R-index.
Using Institute for Scientific Information (ISI) data, this paper calculated institutional self citations rates (ISCRs) for 96 of the top research universities in the United States from 2005-2007. Exhibiting similar temporal patterns of author and journal self-citations, the ISCR was 29% in the first year post-publication, and decreased significantly in the second year post-publication (19%). Modeling the data via power laws revealed total publications and citations did not correlate with the ISCR, but did correlate highly with ISCs. California Institute of Technology exhibited the highest ISCR at 31%. Academic and cultural factors are discussed in relation to ISCRs.
A number of proxy measures have been used as indicators of journal quality. The most recent and commonly employed are journal impact factors. These measures are somewhat controversial, although they are frequently referred to in establishing the impact of published journal articles. Within psychology, little is known about the relationship between the 'objective' impact factors of journals and the 'subjective' ratings of prestige and perceived publishing difficulty amongst academics. In order to address this, a cross-sectional web-based survey was conducted in the UK to investigate research activity and academics' views of journals within three fields of psychology; cognitive, health and social. Impact factors for each journal were correlated with individual academic's perceptions of prestige and publishing difficulty for each journal. A number of variables pertaining to the individual academic and their place of work were assessed as predictors of these correlation values, including age, gender, institution type, and a measure of departmental research activity. The implications of these findings are discussed in relation to perceptions of journal prestige and publishing difficulty, higher education in general and the assessment of research activity within academic institutions.
A citation analysis was carried out on the most important research journals in the field of Catalan literature between 1974 and 2003. The indicators and qualitative parameters obtained show the value of performing citation analysis in cultural and linguistic areas that are poorly covered by the A&HCI. Catalan literature shows a similar pattern to that of humanities in general, but it could still be in a stage of consolidation because too little work has as yet been published.
Editorial delay, the time between submission and acceptance of scientific manuscripts, was investigated for a set of 4,540 papers published in 13 leading food research journals. Groups of accelerated papers were defined as those that fell in the lower quartile of the distribution of the editorial delay for the journals investigated. Delayed papers are those in the upper quartile of the distribution. Editorial stage is related to the peer review process and two variables were investigated in search of any bias in editorial review that could influence publication delay: countries of origin of the manuscript and authors' previous publishing experience in the same journal. A ranking of countries was established based on contributions to the leading food research journals in the period 1999-2004 and four categories comprising heavy, medium, light and occasional country producers was established. Chi square tests show significant differences in country provenance of manuscripts only for one journal. The results for influence on editorial delay of cross-national research and international collaboration, conducted by means of the Fisher statistic test, were similar. A two-tailed Student's t test shows significant differences (p < 0.05) in the distribution of experienced and novel authors across the delayed and accelerated groups of papers. Although these results are time and discipline limited, it can be concluded that authors' publishing experience causes a faster review and acceptance of their papers and that neither country of provenance nor cross-national research influence the time involved in editorial acceptance of the papers.
A form of normalisation is presented for the evaluation of citation data on multidisciplinary research. This method is based on the existing classification according to the publishing journals and not on the classification of output according to ISI subject categories. A publication profile is created for each institution to be investigated. This profile accounts for the weight of publications in a journal, represented by the number of publications as a proportion of the total output of the institution. In accordance with this weight, the citation rate of each journal is compared to a qualified relative indicator. The final result is a relative citation rate J, which is the relative perception of the performance of an institution accounting for its publication and citation habits and makes a transdisciplinary comparison possible.
This article investigated contributions of natural rubber (NR) research through research articles and patents in Science Citation Index Expanded (SCI-Expanded) and SCOPUS databases and related the results with productivity-export volumes during 2002-2006. 1,771 research papers and 5,686 patents on "natural rubber" were retrieved from the databases. The results revealed that the top five countries produced the NR raw material by the order of productivity volumes were Thailand, Indonesia, Malaysia, Vietnam and China whereas those produced the synthetic rubber were the United States, China, Japan, Russia and Germany. Among the top three countries for NR production, Malaysia became a NR producer for its own use, whereas Thailand and Indonesia still had higher export volumes. Research articles and patents on natural rubber had contribution shares of about 20.9% and 47.5% of all rubber publications, respectively. The patents on natural rubber were found to increase with time while the research articles remained unchanged. Journal of Applied Polymer Science was the most preferable for publishing the research papers on rubbers. Eight countries ranked in the top countries for contributing the research articles on natural rubber were the United States, India, Malaysia, France, Germany, Thailand, Japan and China, similar country distributions being also found for research articles on synthetic styrene-butadiene rubber except for Thailand and Malaysia. No linear relationship between the productivity-export volume and research publication number was observed, but the results implied that the growth rate for commercializing the rubber was greater than that for research and development of natural rubber. Most NR research works focused on neat NR, which was contributed the most by USA while NR blend and NR composite papers were mainly published by Indian researchers.
In the grant peer review process we can distinguish various evaluation stages in which assessors judge applications on a rating scale. Bornmann & al. [2008] show that latent Markov models offer a fundamentally good opportunity to model statistically peer review processes. The main objective of this short communication is to test the influence of the applicants' gender on the modeling of a peer review process by using latent Markov models. We found differences in transition probabilities from one stage to the other for applications for a doctoral fellowship submitted by male and female applicants.
The study examines aspects of both neo-colonial ties and neo-colonial science in research papers produced by Central African countries. The primary focus is on the extent and pattern of neo-colonial ties and other foreign participation in the co-authorship of Central African research papers. The analysis revealed that 80% of Central Africa's research papers are produced in collaboration with a partner from outside the region. Moreover, 46% of papers are produced in collaboration with European countries as the only partner, and 35% in collaboration with past colonial rulers. The top collaborating countries are France (32%), the USA (14%), and the UK and Germany (both 12%). Foreign powers also facilitate the production of regionally and continentally co-authored papers in Central Africa, where European countries participate in 77% of regionally co-authored papers. The practice of neo-colonial science, on the other hand, features in a survey of reprint authors of Cameroonian papers. The survey investigated specific contributions made by Cameroon coauthors to the research processes underlying a paper. Cameroonian researchers contribute intellectually and conceptually to the production of research papers, irrespective of whether the collaboration involves partners from past colonial or non-colonial countries. Their most frequent role in collaborative research with foreign researchers remains the conduct of fieldwork.
Measurement of research activity still remains a controversial question. The use of the impact factor from the Institute for Scientific Information (ISI) is quite widespread nowadays to carry out evaluations of all kinds; however, the calculation formula employed by ISI in order to construct its impact factors biases the results in favour of knowledge fields which are better represented in the sample, cite more in average and whose citations are concentrated in the early years of the articles. In the present work, we put forward a theoretical proposal regarding how aggregated normalization should be carried out with these biases, which allows comparing scientific production between fields, institutions and/or authors in a neutral manner. The technical complexity of such work, together with data limitations, lead us to propose some adjustments on the impact factor proposed by ISI which - although they do not completely solve the problem - reduce it and allow glimpsing the path towards more neutral evaluations. The proposal is empirically applied to three analysis levels: single journals, knowledge fields and the set of journals from the Journal Citation Report.
The quality and credibility of Internet resources has been a concern in scholarly communication. This paper reports a quantitative analysis of the use of Internet resources in journal articles and addresses the concerns for the use of Internet resources scholarly journals articles. We collected the references listed in 35,698 articles from 14 journals published during 1996 to 2005, which resulted in 1,000,724 citations. The citation data was divided into two groups: traditional citations and Web citations, and examined based on frequencies of occurrences by domain and type of Web citation sources. The findings included: (1) The number of Web citations in the journals investigated had been increasing steadily, though the quantity was too small to draw an inclusive conclusion on the data about their impact on scientific research; (2) A great disparity existed among different disciplines in terms of using information on the Web. Applied disciplines and interdisciplinary sciences tended to cite more information on the Web, while classical and experimental disciplines cited little of Web information; (3) The frequency of citations was related to the reputation of the author or the institution issuing the information, and not to the domain or webpage types; and (4) The researchers seemed to lack confidence in Internet resources, and Web information was not as frequently cited as reported in some publications before. The paper also discusses the need for developing a guideline system to evaluate Web resources regarding their authority and quality that lies in the core of credibility of Web information.
In this study, a series of relative indicators are used to compare the difference in research performance in biomedical fields between ten selected Western and Asian countries. Based on Thomson's Essential Science Indicators (ESI) 1996-2006, the output of papers and their citations in ten biomedical fields are compared at multiple levels using relative indicators. Chart diagrams and hierarchical clustering are applied to represent the data. The results confirm that there are many differences in intra- and interdisciplinary scientific activities between the West and the East. In most biomedical fields Asian countries perform below world average.
Based on two large data samples from ISI databases, the author evaluated the Hirsch model, the Egghe-Rousseau model, and the Glanzel-Schubert model of the h-index. The results support the Glanzel-Schubert model as a better estimation of the h-index at both journal and institution levels. If h (c) , h (p) and h (pc) stand for the Hirsch estimation, Egghe-Rousseau estimation, and Glanzel-Schubert estimation, respectively, then an inequality h (p) < h similar to h (pc) < h (c) holds in most cases.
Although there are many measures of centrality of individuals in social networks, and those centrality measures can be applied to the analysis of authors' importance in co-authorship networks, the distribution of an author's collaborative relationships among different communities has not been considered. This distribution or extensity is an important aspect of authors' activity. In the present study, we will propose a new measure termed extensity centrality, taking into account the distribution of an author's collaborative relationships. In computing the strength of collaborative ties, which is closely related to the extensity centrality, we choose Salton's measure. We choose the ACM SIGKDD data as our testing data set, and analyze the result of authors' importance from different points of view.
The paper pursues the rigorous mathematical study of the Hirsch index and shows that it has power law upper tail distribution and determines the exponent provided that the underlying publication and citation distributions have fat tails as well. The result is demonstrated on the distribution of the Hirsch index of journals. The paper is concluded with some further remarks on the Hirsch index.
Internet has made it possible to move towards researcher and article impact instead of solely focusing on journal impact. To support citation measurement, several indexes have been proposed, including the h-index. The h-index provides a point estimate. To address this, a new index is proposed that takes the citation curve of a researcher into account. This article introduces the index, illustrates its use and compares it to rankings based on the h-index as well as rankings based on publications. It is concluded that the new index provides an added value, since it balances citations and publications through the citation curve.
In this paper we present a study about scientific production in Computer Science in Brazil and several other countries, as measured by the number of articles in journals and conference proceedings indexed by ISI and by Scopus. We compare the Brazilian production from 2001 to 2005 with some Latin American, Latin European, BRIC (Brazil, Russia, India, China), and other relevant countries (South Korea, Australia and USA). We also classify and compare these countries according to the ratio of publications in journals and conferences (the ones indexed by the two services). The results show that Brazil has by far the largest production among Latin American countries, has a production about one third of Spain's, one fourth of Italy's, and about the same as India and Russia. The growth in Brazilian publications during the period places the country in the mid-range group and the distribution of Brazilian production according to impact factor is similar to most countries.
I review and discuss instances in which 19 future Nobel Laureates encountered resistance on the part of the scientific community towards their discoveries, and instances in which 24 future Nobel Laureates encountered resistance on the part of scientific journal editors or referees to manuscripts that dealt with discoveries that later would earn them the Nobel Prize.
In general information production processes (IPPs), we define productivity as the total number of sources but we present a choice of seven possible definitions of performance: the mean or median number of items per source, the fraction of sources with a certain minimum number of items, the h-, g-, R- and h(w)-index. We give an overview of the literature on different types of IPPs and each time we interpret "performance" in these concrete cases. Examples are found in informetrics (including webometrics and scientometrics), linguistics, econometrics and demography. In Lotkaian IPPs we study these interpretations of "performance" in function of the productivity in these IPPs. We show that the mean and median number of items per source as well as the fraction of sources with a certain minimum number of items are increasing functions of the productivity if and only if the Lotkaian exponent is decreasing in function of the productivity. We show that this property implies that the g-, R- and h(w)-indices are increasing functions of the productivity and, finally, we show that this property implies that the h-index is an increasing function of productivity. We conclude that the h-index is the indicator which shows best the increasing relation between productivity and performance.
For many years, the ISI Web of Knowledge from Thomson Reuters was the sole publication and citation database covering all areas of science thus becoming an invaluable tool in bibliometric analysis. In 2004, Elsevier introduced Scopus and this is rapidly becoming a good alternative. Several attempts have been made at comparing these two instruments from the point of view of journal coverage for research or for bibliometric assessment of research output. This paper attempts to answer the question that all researchers ask, i.e., what is to be gained by searching both databases? Or, if you are forced to opt for one of them, which should you prefer? To answer this question, a detailed paper by paper study is presented of the coverage achieved by ISI Web of Science and by Scopus of the output of a typical university. After considering the set of Portuguese universities, the detailed analysis is made for two of them for 2006, the two being chosen for their comprehensiveness typical of most European universities. The general conclusion is that about 2/3 of the documents referenced in any of the two databases may be found in both databases while a fringe of 1/3 are only referenced in one or the other. The citation impact of the documents in the core present in both databases is higher, but the impact of the fringe that are present only in one of the databases should not be disregarded as some high impact documents may be found among them.
In this paper we report research results investigating microblogging as a form of electronic word-of-mouth for sharing consumer opinions concerning brands. We analyzed more than 150,000 microblog postings containing branding comments, sentiments, and opinions. We investigated the overall structure of these microblog postings, the types of expressions, and the movement in positive or negative sentiment. We compared automated methods of classifying sentiment in these microblogs with manual coding. Using a case study approach, we analyzed the range, frequency, timing, and content of tweets in a corporate account. Our research findings show that 19% of microblogs contain mention of a brand. Of the branding microblogs, nearly 20% contained some expression of brand sentiments. Of these, more than 50% were positive and 33% were critical of the company or product. Our comparison of automated and manual coding showed no significant differences between the two approaches. In analyzing microblogs for structure and composition, the linguistic structure of tweets approximate the linguistic patterns of natural language expressions. We find that microblogging is an online tool for customer word of mouth communications and discuss the implications for corporations using microblogging as part of their overall marketing strategy.
The purposes of this study were to explore college students' perceptions, uses of, and motivations for using Wikipedia, and to understand their information behavior concerning Wikipedia based on social cognitive theory (SCT). A Web survey was used to collect data in the spring of 2008. The study sample consisted of students from an introductory undergraduate course at a large public university in the midwestern United States. A total of 134 students participated in the study, resulting in a 32.8% response rate. The major findings of the study include the following: Approximately one-third of the students reported using Wikipedia for academic purposes. The students tended to use Wikipedia for quickly checking facts and finding background information. They had positive past experiences with Wikipedia; however, interestingly, their perceptions of its information quality were not correspondingly high. The level of their confidence in evaluating Wikipedia's information quality was, at most, moderate. Respondents' past experience with Wikipedia, their positive emotional state, their disposition to believe information in Wikipedia, and information utility were positively related to their outcome expectations of Wikipedia. However, among the factors affecting outcome expectations, only information utility and respondents' positive emotions toward Wikipedia were related to their use of it. Further, when all of the independent variables, including the mediator, outcome expectations, were considered, only the variable information utility was related to Wikipedia use, which may imply a limited applicability of SCT to understanding Wikipedia use. However, more empirical evidence is needed to determine the applicability of this theory to Wikipedia use. Finally, this study supports the knowledge value of Wikipedia (Fallis, 2008), despite students' cautious attitudes toward Wikipedia. The study suggests that educators and librarians need to provide better guidelines for using Wikipedia, rather than prohibiting Wikipedia use altogether.
arXiv.org mediates contact with the literature for entire scholarly communities, providing both archival access and daily email and web announcements of new materials. We confirm and extend a surprising correlation between article position in these initial announcements and later citation impact, due primarily to intentional "self-promotion" by authors. There is, however, also a pure "visibility" effect: the subset of articles accidentally in early positions fared measurably better in the long-term citation record. Articles in astrophysics (astro-ph) and two large subcommunities of theoretical high energy physics (hep-th and hep-ph) announced in position 1, for example, respectively received median numbers of citations 83%, 50%, and 100% higher than those lower down, while the subsets there accidentally had 44%, 38%, and 71% visibility boosts. We also consider the positional effects on early readership. The median numbers of early full text downloads for astro-ph, hep-th, and hep-ph articles announced in position 1 were 82%, 61%, and 58% higher than for lower positions, respectively, and those there accidentally had medians visibility-boosted by 53%, 44%, and 46%. Finally, we correlate a variety of readership features with long-term citations, using machine learning methods, and conclude with some observations on impact metrics and the dangers of recommender mechanisms.
Gaining a rapid overview of an emerging scientific topic, sometimes called research fronts, is an increasingly common task due to the growing amount of interdisciplinary collaboration. Visual overviews that show temporal patterns of paper publication and citation links among papers can help researchers and analysts to see the rate of growth of topics, identify key papers, and understand influences across subdisciplines. This article applies a novel network-visualization tool based on meaningful layouts of nodes to present research fronts and show citation links that indicate influences across research fronts. To demonstrate the value of two-dimensional layouts with multiple regions and user control of link visibility, we conducted a design-oriented, preliminary case study with 6 domain experts over a 4-month period. The main benefits were being able (a) to easily identify key papers and see the increasing number of papers within a research front, and (b) to quickly see the strength and direction of influence across related research fronts.
This paper studies how varied damping factors in the PageRank algorithm influence the ranking of authors and proposes weighted PageRank algorithms. We selected the 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculated the ranks of these 108 authors based on PageRank with the damping factor ranging from 0.05 to 0.95. In order to test the relationship between different measures, we compared PageRank and weighted PageRank results with the citation ranking, h-index, and centrality measures. We found that in our author co-citation network, citation rank is highly correlated with PageRank with different damping factors and also with different weighted PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index rank does not significantly correlate with centrality measures but does significantly correlate with other measures. The key factors that have impact on the PageRank of authors in the author co-citation network are being co-cited with important authors.
This article elaborates the picture of information use from the perspective of interpreting informational cues about the attributes of entities. It is assumed that such activity draws on cognitive mechanisms that are employed as the constituents of diverse interpretation approaches to informational cues. The empirical data of the study were gathered by means of think aloud method from 16 prospective homebuyers; in 2008. The participants interpreted informational cues available in announcements published in a printed housing listing issue and a Web-based information system serving the needs of prospective homebuyers. The data were examined by means of qualitative content analysis. By drawing on the findings of Zhang and her associates, the study revealed 7 cognitive mechanisms: identification of key attributes, specification, evaluation, comparison by similarity, comparison by differentiation, explanation, and conclusion. Three major approaches employed in the interpetation of informational cues were identified. The descriptive-evaluative approach draws on the identification and evaluation of individual attributes of an entity. The comparative approach is more sophisticated because it is based on the evaluation of the attributes by their perceived similarity or differentiation. Finally, the explanatory approach draws on the identification of attributes with causal potential.
Reading is a common everyday activity for most of us. In this article, we examine the potential for using Wikipedia to fill in the gaps in one's own knowledge that may be encountered while reading. If gaps are encountered frequently while reading, then this may detract from the reader's final understanding of the given document. Our goal is to increase access to explanatory text for readers by retrieving a single Wikipedia article that is related to a text passage that has been highlighted. This approach differs from traditional search methods where the users formulate search queries and review lists of possibly relevant results. This explicit search activity can be disruptive to reading. Our approach is to minimize the user interaction involved in finding related information by removing explicit query formulation and providing a single relevant result. To evaluate the feasibility of this approach, we first examined the effectiveness of three contextual algorithms for retrieval. To evaluate the effectiveness for readers, we then developed a functional prototype that uses the text of the abstract being read as context and retrieves a single relevant Wikipedia article in response to a passage the user has highlighted. We conducted a small user study where participants were allowed to use the prototype while reading abstracts. The results from this initial study indicate that users found the prototype easy to use and that using the prototype significantly improved their stated understanding and confidence in that understanding of the academic abstracts they read.
In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.
As a greater volume of information becomes increasingly available across all disciplines, many approaches, such as document clustering and information visualization, have been proposed to help users manage information easily. However, most of these methods do not directly extract key concepts and their semantic relationships from document corpora, which could help better illuminate the conceptual structures within given information. To address this issue, we propose an approach called "Clonto" to process a document corpus, identify the key concepts, and automatically generate ontologies based on these concepts for the purpose of conceptualization. For a given document corpus, Clonto applies latent semantic analysis to identify key concepts, allocates documents based on these concepts, and utilizes WordNet to automatically generate a corpus-related ontology. The documents are linked to the ontology through the key concepts. Based on two test collections, the experimental results show that Clonto is able to identify key concepts, and outperforms four other clustering algorithms. Moreover, the ontologies generated by Clonto show significant informative conceptual structures.
In this paper we examine how information-particularly, its organization and presentation-and "space" (i.e., a physical location) can be combined to create a particular "place" (i.e., a location adapted to a particular purpose) for engaging and stabilizing homeless young people, aged 13-25. Over 10 months, we used a participatory-design research approach to investigate how an alliance of nine service agencies used information resources to support homeless young people. We collected 250 information resources and analyzed how these materials were organized and presented at four service agencies. In general, the agencies used ad hoc organizational schemes and presentations that were not in keeping with the key values of the alliance, which include human welfare, respect, trust, autonomy, and sustainability. To improve information delivery and the projection of common values, we followed a two-step design process. First, based on a card-sorting activity, we developed a new organizational scheme. Second, we developed four interrelated prototypes for presenting information resources: Rolling Case, InfoBike, Slat Wall, and Infold. To convey the use of these prototypes, three short video scenarios were created to demonstrate how the prototypes would be used by stakeholders, including homeless young people, staff, and volunteers. Feedback from stakeholders suggested that these prototypes, when sufficiently refined, could be useful and operationally viable. By investigating the concept of "place," reconstituted through organizational schemes and novel presentations of information resources, this work creates possibilities that may allow grassroots service agencies to give more efficient access to information while expressing their values.
This paper describes ePaper, a research prototype system of a personalized newspaper on a mobile reading device. The ePaper aggregates content (i.e., news items) from various news providers, classifies the news items according to concepts from a news domain ontology, and delivers an electronic newspaper to each subscribed user (reader). The system personalizes the content of the newspaper according to the user's profiles and preferences by applying ontological content-based and collaborative filtering algorithms. The user's profile is updated implicitly and dynamically, based on the tracking of their reading. Beyond personalization, the ePaper can also provide the user with a "standard edition" of a selected newspaper, as well as browsing capabilities in a repository of news items. The layout of the newspaper is adapted to the specifications of the reading device and to the user's preferences. In this overview paper, we highlight the main research challenges involved in the development of ePaper and describe how we addressed them.
This paper presents and compares three feature reduction techniques that were applied to Arabic text. The techniques include stemming, light stemming, and word clusters. The effects of the aforementioned techniques were studied and analyzed on the K-nearest-neighbor classifier. Stemming reduces words to their stems. Light stemming, by comparison, removes common affixes from words without reducing them to their stems. Word clusters group synonymous words into clusters and each cluster is represented by a single word. The purpose of employing the previous methods is to reduce the size of document vectors without affecting the accuracy of the classifiers. The comparison metric includes size of document vectors, classification time, and accuracy (in terms of precision and recall). Several experiments were carried out using four different representations of the same corpus: the first version uses stem-vectors, the second uses light stem-vectors, the third uses word clusters, and the fourth uses the original words (without any transformation) as representatives of documents. The corpus consists of 15,000 documents that fall into three categories: sports, economics, and politics. In terms of vector sizes and classification time, the stemmed vectors consumed the smallest size and the least time necessary to classify a testing dataset that consists of 6,000 documents. The light stemmed vectors superseded the other three representations in terms of classification accuracy.
When applying for patents, companies should consider performing patent portfolios as a means of integrating their patent strategy to shape their overall business strategy. This is an important issue for any company in pursuit of enhanced operational performance because the whole raison d'etre behind the application of patents is the anticipation of achieving maximum competitive advantage. A prerequisite for such a company is a decision analysis model of patent portfolios because this has the added advantage of being readily applicable to the evaluation of the quality of its competitors' portfolios; thus, by understanding both itself and its competitors, a company can attain a superior position. To demonstrate this, we examine patent priority networks (PPNs) formed through patent family members and claimed priority patents, performing a model of patent portfolio analysis and then going on to determine the algorithms. We suggest that information retrieved from this network can provide a useful reference tool for decision-making by company CEOs, CTOs, R&D managers, and intellectual property managers.
Using a power-law model, the two best-known topics in citation analysis, namely the impact factor and the Hirsch index, are unified into one relation (not a function). The validity of our model is, at least in a qualitative way, confirmed by real data.
This study proposes a new visualization method and index for collection evaluation. Specifically, it develops a network-based mapping technique and a user-focused Hirsch index (user-side h-index) given the lack of previous studies on collection evaluation methods that have used the h-index. A user-side h-index is developed and compared with previous indices (use factor, difference of percentages, collection-side h-index) that represent the strengths of the subject classes of a library collection. The mapping procedure includes the subject-usage profiling of 63 subject classes and collection-usage map generations through the pathfinder network algorithm. Cluster analyses are then conducted upon the pathfinder network to generate 5 large and 14 small clusters. The nodes represent the strengths of the subject-class usages reflected by the user-side h-index. The user-side h-index was found to have advantages (e.g., better demonstrating the real utility of each subject class) over the other indices. It also can more clearly distinguish the strengths between the subject classes than can collection-side h-index. These results may help to identify actual usage and strengths of subject classes in library collections through visualized maps. This may be a useful rationale for the establishment of the collection-development plan.
We show that between 1999 and 2008 the percentage of articles with more than one corresponding author or with several authors that contributed equally, leading to so-called "equal first authors;" has steadily been on the rise. Increasing numbers of corresponding authors and equally contributing authors may lead to increased stress on teamwork if not properly acknowledged in research evaluation exercises.
The objective of this study is to conduct a bibliometric analysis of all biological invasions-related publications in the Science Citation Index (SCI) from 1991 to 2007. The indicator citation per publication (CPP) was used to evaluate the impact of articles, journals, and institutions. In the 3323 articles published in 521 journals, 7261 authors from 1905 institutions of 100 countries participated. As the most productive country of biological invasions research, the US will benefit from more collaboration between institutions, countries, and continents. In addition, analysis of keywords was applied to reveal research trends.
The mean citations per paper is used increasingly as a simple metric for indicating the impact of a journal or comparing journal rankings. While convenient, we suggest that it has limitations given the highly skewed distributions of citations per paper in a wide range of journals.
Two broad classes of scientific impact indices are proposed and their properties - both theoretical and practical - are discussed. These new classes were obtained as a geometric generalization of the well-known tools applied in scientometric, like Hirsch's h-index, Woeginger's w-index and the Kosmulski's Maxprod. It is shown how to apply the suggested indices for estimation of the shape of the citation function or the total number of citations of an individual. Additionally, a new efficient and simple O(log n) algorithm for computing the h-index is given.
We provide a comprehensive and critical review of the h-index and its most important modifications proposed in the literature, as well as of other similar indicators measuring research output and impact. Extensions of some of these indices are presented and illustrated.
Bibliographic data on ophthalmology, optometry and visual science (OOVS) literature of China drawn from the SCI-Expanded database covering the period 2000-2007 (961 publications) were analyzed to create a comprehensive overview of research output. Of 961 articles, 480 were published in 2006 and 2007. The majority of researchers worked in university hospitals (53%). 21% of the publications included one or more international co-authors. For each article, the average author number was 4.96 +/- 2.73, which increased from 3.96 in 2000 to 5.36 in 2007. The most cited references came from Investigative Ophthalmology & Visual Science and Ophthalmology. The greatest number of studies was focused on the retina.
This study develops and tests an integrated conceptual model of basic research evaluation from a varying perspective. The main objective is to obtain a more complete understanding of the external factors affecting the publicly fund basic research in a country. Structural Equation Modeling (SEM) with Partial Least Squares (PLS) is used to test the conceptual model with empirical data collected from WCY (World Competitiveness Yearbook) and ESI (Essential Science Indicators) database. Interrelationships among the research output and outcome, together with three external factors (resource, impetus, accumulative advantage) have been successfully explored and the conceptual model of journal evaluation has been examined.
This paper aims to demonstrate briefly that major scientific achievements spread through the Internet according to an exponential expression until a saturation point.
This paper presents a methodology for measuring the improvements in efficiency and adjustments in the scale of R&D (Research & Development) activities. For this purpose, this study decomposes academic productivity growth into components attributable to (1) world academic frontier change, (2) R&D efficiency change, (3) human capital accumulation, and (4) capital accumulation. The world academic frontier at each point in time is constructed using data envelopment analysis (DEA). This study calculates each of the above four components of academic productivity for 27 countries over 1990-2003, and finds that the components which contribute to academic productivity growth vary with the different countries' characteristics and development stages. Human capital has more weight in terms of the quantity of academic research, and capital accumulation plays a more important role in the citation impact of academic research.
In the last two decades there have been studies claiming that science is becoming ever more interdisciplinary. However, the evidence has been anecdotal or partial. Here we investigate how the degree of interdisciplinarity has changed between 1975 and 2005 over six research domains. To do so, we compute well-established bibliometric indicators alongside a new index of interdisciplinarity (Integration score, aka Rao-Stirling diversity) and a science mapping visualization method. The results attest to notable changes in research practices over this 30 year period, namely major increases in number of cited disciplines and references per article (both show about 50% growth), and co-authors per article (about 75% growth). However, the new index of interdisciplinarity only shows a modest increase (mostly around 5% growth). Science maps hint that this is because the distribution of citations of an article remains mainly within neighboring disciplinary areas. These findings suggest that science is indeed becoming more interdisciplinary, but in small steps - drawing mainly from neighboring fields and only modestly increasing the connections to distant cognitive areas. The combination of metrics and overlay science maps provides general benchmarks for future studies of interdisciplinary research characteristics.
Scientific journals claim that correspondence sections are for post-publication peer review. We compared the conditions for submission and the bibliometrics of letters-to-editors published in leading medical journals in 2002 and 2007 using journal-derived information and data from PubMed and Journal Citation Reports. The median time limit for letter submissions decreased from 6 to 3.5 weeks, the median word limit from 400 to 350. The median number of letters per published article was near one in both years. Only about half of the letters were followed by an author reply in either year. Electronic response systems were available for four journals in 2007.
I examine whether the professionalization of science, a process that unfolded between 1600 and 1899, afforded better opportunities for young scientists to make significant discoveries. My analysis suggests that the professionalization of the sciences did make it a little easier for scientists to make significant contributions at a younger age. But, I also argue that it is easy to exaggerate the effects of professionalization. Older and middle age scientists continued to play an important role in making significant discoveries throughout the history of modern science.
We carry out a bibliometric study of the activity of astronomers in the field of Herbig-Haro (HH) objects. Through an appropriate choice of keywords, we recover the papers on HH objects from the ADS (Astrophysics Data Service) and ISI ("Web of Knowledge") databases. From the two databases we recover number of papers and citations which differ by similar to 10%. We analyze an 11-year period, restricting ourselves to authors with at least 10 papers within the period. We analyze the number of papers and citations, as well as the H index of this set of authors. Within this sample, we identify the authors belonging to Mexican institutions. We find that the Mexican researchers perform very well, having higher publication and citation rates than the ones of the full sample of authors active in the field of HH objects. The Mexicans have a degree of specialization (measured as ratios between the production in the chosen field and the total production of the individual authors) similar to the one of the full sample. They collaborate in somewhat larger groups than the authors of the full sample. Finally, we have carried out a study of the impact in the chosen field of different astronomical journals. We find that the Revista Mexicana de Astronomia y Astrofisica is well placed in the "second tier" of astronomical publications.
The h-index is becoming a reference tool for career assessment and it is starting to be considered by some agencies and institutions in promotion, allocation, and funding decisions. In areas where h indices tend to be low, individuals with different research accomplishments may end up with the same h. This paper proposes a multidimensional extension of the h index in which the conventional h is only the first component. Additional components of the multidimensional index are obtained by computing the h-index for the subset of papers not considered in the immediately preceding component. Computation of the multidimensional index for 204 faculty members in Departments of Methodology of the Behavioral Sciences in Spain shows that individuals with the same h can indeed be distinguished by their values in the remaining components, and that the strength of the correlation of the second and third components of the multidimensional index with alternative bibliometric indicators is similar to that of the first component (i.e., the original h).
This paper examines how Korean technological development is linked with scientific activities and spreads to industrial fields through knowledge flows. It empirically assesses the linkages between scientific and technological knowledge flows and technological innovation by determining whether the quantity and quality of scientific papers cited by, and the knowledge being absorbed in, Korean patents filed in USPTO varied over time, and between technology fields. We conducted MANOVA and then canonical discriminate analysis. Our findings are: the patterns of both the absorption of scientific knowledge and the diffusion of technological knowledge differ by period and by field, and the speed of knowledge diffusion differs by technology field. This implies that the time required for Korean investment in basic and applied research to impact her industrial innovation differs by technology field.
This study aims at detecting the role of individual journals and uncovering structural patterns of information flow among scientific journals in a cross-citation network, using different bibliometric indicators and statistical methods of data analysis. Beyond measuring the individual journals' position within the communication network, we shed light on their cognitive background as well. Language barrier and lacking internationality proved one of the main hindrances for integration into the communication network. Moreover, some document types hinder journals from establishing self-links. Against our expectations, we have found a clear divergence between strongly interlinked and high-entropy journals. Furthermore, the analysis of strong links among different fields allows the detection of high-interdisciplinary journals.
Based on data from the Science Citation Index Expanded (SCIE) and using scientometric methods, we conducted a systematic analysis of Chinese regional contributions and international collaboration in terms of scientific publications, publication activity, and citation impact. We found that regional contributions are highly skewed. The top positions measured by number of publications or citations, share of publications or citations are taken by almost the same set of regions. But this is not the case when indicators for relative citation impact are used. Comparison between regional scientific output and R&D expenditure shows that Spearman's rank correlation coefficient between the two indicators is rather low among the leading publication regions.
We studied the influence of the number of citations, the number of citable items and the number of journal self-citations on increases in the impact factor (IF) in 123 journals from the Journal Citation Reports database in which this scientometric indicator had decreased during the previous four years. In general, we did not find evidence that abuse of journal self-citations contributed to the increase in the impact factor after several years of decreases.
Social tagging is one of the major phenomena transforming the World Wide Web from a static platform into an actively shared information space. This paper addresses various aspects of social tagging, including different views on the nature of social tagging, how to make use of social tags, and how to bridge social tagging with other Web functionalities; it discusses the use of facets to facilitate browsing and searching of tagging data; and it presents an analogy between bibliometrics and tagometrics, arguing that established bibliometric methodologies can be applied to analyze tagging behavior on the Web. Based on the Upper Tag Ontology (UTO), a Web crawler was built to harvest tag data from Delicious, Flickr, and YouTube in September 2007. In total, 1.8 million objects, including bookmarks, photos, and videos, 3.1 million taggers, and 12.1 million tags were collected and analyzed. Some tagging patterns and variations are identified and discussed.
This article reports on a longitudinal study of information seeking by undergraduate information management students. It describes how they found and used information, and explores their motivation and decision making. We employed a use-in-context approach where students were observed conducting, and were interviewed about, information-seeking tasks carried out during their academic work. We found that participants were reluctant to engage with a complex range of information sources, preferring to use the Internet. The main driver for progress in information seeking was the immediate demands of their work (e.g., assignments). Students used their growing expertise to justify a conservative information strategy, retaining established strategies as far as possible and completing tasks with minimum information-seeking effort. The time cost of using library material limited the uptake of such resources. New methods for discovering and selecting information were adopted only when immediately relevant to the task at hand, and tasks were generally chosen or interpreted in ways that minimized the need to develop new strategies. Students were driven by the demands of the task to use different types of information resources, but remained reluctant to move beyond keyword searches, even when they proved ineffective. They also lacked confidence in evaluating the relative usefulness of resources. Whereas existing literature on satisficing has focused on stopping conditions, this work has highlighted a richer repertoire of satisficing behaviors.
With increasing sophistication in technology has emerged a growing interest in accessing images for personal and work purposes. In this research we investigated the use of images as data-for the information contained within the image, and as an object to illustrate. Thirty journalists and historians from academic and professional work settings were interviewed using a series of semistructured questions regarding how they use images (for information or for illustration) and the types of image attributes used to describe an appropriate image for their work. This was done within the context of a work task model used by this group to understand how images are used throughout the process of completing a typical written work task. Findings suggest that the stage of the work task process has a significant impact on how the image is used (information or illustration). Participants use as many descriptive as conceptual image attributes to locate an image, but, interestingly, there are no significant differences according to use for information or illustration purposes. This study increases our understanding of the function of images in the written work task process, and provides new knowledge about the conceptual and descriptive attributes that are most valued.
For projects in knowledge-intensive domains, it is crucially important that knowledge management systems are able to track and infer workers' up-to-date information needs so that task-relevant information can be delivered in a timely manner. To put a worker's dynamic information needs into perspective, we propose a topic variation inspection model to facilitate the application of an implicit relevance feedback (IRF) algorithm and collaborative filtering in user modeling. The model analyzes variations in a worker's task-needs for a topic (i.e., personal topic needs) over time, monitors changes in the topics of collaborative actors, and then adjusts the worker's profile accordingly. We conducted a number of experiments to evaluate the efficacy of the model in terms of precision, recall, and F-measure. The results suggest that the proposed collaborative topic variation inspection approach can substantially improve the performance of a basic profiling method adapted from the classical RF algorithm. It can also improve the accuracy of other methods when a worker's information needs are vague or evolving, i.e., when there is a high degree of variation in the worker's topic-needs. Our findings have implications for the design of an effective collaborative information filtering and retrieval model, which is crucial for reusing an organization's knowledge assets effectively.
We investigated the role of subjective factors in the information search process. Forty-eight participants each conducted six Web searches in a controlled setting. We examined relationships between subjective factors (happiness levels, satisfaction with and confidence in the search results, feeling lost during search, familiarity with and interest in the search topic, estimation of task difficulty) and objective factors (search behavior, search outcomes, and search-task characteristics). Data analysis was conducted using a multivariate statistical test (canonical correlations analysis). The findings confirmed existence of several relationships suggested by prior research, including relationships between objective search task difficulty and the perception of task difficulty, and between subjective states and search behaviors and outcomes. One of the original findings suggests that higher happiness levels before and during the search correlate with better feelings after the search, but also correlate with worse search outcomes and lower satisfaction, suggesting that, perhaps, it pays off to feel some "pain" during the search to "gain" quality outcomes.
Prior research has shown that social interaction is important for continuation of question-and-answer (Q&A) activity online and that it also leads to monetary rewards. The present research focuses on the link between social interaction and the value of information. Expressions of self-presentation in the interaction between askers and answerers online are studied as antecedents for answer feedback which represents the value of the answer to the asker. This relationship is examined in a Q&A site, specifically, in Google Answers (GA). The results of content analysis performed on sets of questions and answers show that both explicit and implicit social cues are used by the site's participants; however, only implicit expressions of self-presentation are related to the provision of social and monetary feedback, ratings, and tips. This finding highlights the importance of implicit cues in textual communication and lends support to the notion of social capital where both monetary and social forms of feedback are the result of interaction online.
User-generated content on the Web has become an extremely valuable source for mining and analyzing user opinions on any topic. Recent years have seen an increasing body of work investigating methods to recognize favorable and unfavorable sentiments toward specific subjects from online text. However, most of these efforts focus on English and there have been very few studies on sentiment analysis of Chinese content. This paper aims to address the unique challenges posed by Chinese sentiment analysis. We propose a rule-based approach including two phases: (1) determining each sentence's sentiment based on word dependency, and (2) aggregating sentences to predict the document sentiment. We report the results of an experimental study comparing our approach with three machine learning-based approaches using two sets of Chinese articles. These results illustrate the effectiveness of our proposed method and its advantages against learning-based approaches.
When using scientific literature to model scholarly discourse, a research specialty can be operationalized as an evolving set of related documents. Each publication can be expected to contribute to the further development of the specialty at the research front. The specific combinations of title words and cited references in a paper can then be considered as a signature of the knowledge claim in the paper: New words and combinations of words can be expected to represent variation, while each paper is at the same time selectively positioned into the intellectual organization of a field using context-relevant references. Can the mutual information among these three dimensions-title words, cited references, and sequence numbers-be used as an indicator of the extent to which intellectual organization structures the uncertainty prevailing at a research front? The effect of the discovery of nanotubes (1991) on the previously existing field of fullerenes is used as a test case. Thereafter, this method is applied to science studies with a focus on scientometrics using various sample delineations. An emerging research front about citation analysis can be indicated.
This study uses citations, from 1996 to 2007, to the work of 80 randomly selected full-time, information studies (IS) faculty members from North America to examine differences between Scopus and Web of Science in assessing the scholarly impact of the field focusing on the most frequently citing journals, conference proceedings, research domains and institutions, as well as all citing countries. Results show that when assessment is limited to smaller citing entities (e.g., journals, conference proceedings, institutions), the two databases produce considerably different results, whereas when assessment is limited to larger citing entities (e.g., research domains, countries), the two databases produce very similar pictures of scholarly impact. In the former case, the use of Scopus (for journals and institutions) and both Scopus and Web of Science (for conference proceedings) is necessary to more accurately assess or visualize the scholarly impact of IS, whereas in the latter case, assessing or visualizing the scholarly impact of IS is independent of the database used.
In the traditional model of information retrieval, searchers and indexers choose query and index terms, respectively, and these term choices are ultimately compared in a matching process. One of the main challenges in information science and information retrieval is that searchers and indexers often do not choose the same term even though the item is relevant to the need whereas at other times they do choose the same term even though it is not relevant. But if both searchers and indexers have the opportunity to review feedback data showing the success or failure of their previous term choices, then there exists an evolutionary force that, all else being equal, will lead to helpful convergence in searchers' and indexers' term usage when the information is relevant, and helpful divergence of term usage when it is not. Based on learning theory, and new theory presented here, it is possible to predict which terms will emerge as the terminological conventions that are used by groups of searchers and the indexers of relevant and nonrelevant information items.
Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.
This paper describes and evaluates various stemming and indexing strategies for the Russian language. We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stemmers to the Snowball stemmer, to no stemming, and also to a language-independent approach (n-gram). To evaluate the suggested stemming strategies we apply various probabilistic information retrieval (IR) models, including the Okapi, the Divergence from Randomness (DFR), a statistical language model (LM), as well as two vector-space approaches, namely, the classical tf idf scheme and the dtu-dtn model. We find that the vector-space dtu-dtn and the DFR models tend to result in better retrieval effectiveness than the Okapi, LM, or tf idf models, while only the latter two IR approaches result in statistically significant performance differences. Ignoring stemming generally reduces the MAP by more than 50%, and these differences are always significant. When applying an n-gram approach, performance differences are usually lower than an approach involving stemming. Finally, our light stemmer tends to perform best, although performance differences between the light, aggressive, and Snowball stemmers are not statistically significant.
This article uses an evidence-based approach to assess the difficulties faced by developing country scientists in accessing scientific literature. I compare the backward citation patterns of Swiss and Indian scientists in a database of 43,150 scientific papers published by scientists from either country in 2007. Controlling for fields and quality with citing journal fixed effects, I find that Indian scientists have shorter reference lists (-6%) and are more likely to cite articles from open access journals (+50%). Moreover, the difference in the length of the reference list is more pronounced in biology and medicine,where circulation of (free) preprints and conference proceedings is non-existent. Informal file-sharing practices among scientists mitigate the effects of access restrictions.
Digital artifacts have novel properties that largely derive from the processes that mediate their creation, and that can be best understood by a close examination of such processes. This paper introduces the concept of "quasi-object" to characterize these objects and elucidate the activities that comprise their mediations. A case study of "bugs" is analyzed to illustrate exemplary activities of justification, qualification, and binding in the process of bug fixing in Free/Open Source Software development. The findings of the case study lead to broader reflections on the character of digital artifacts in general. The relationship of "quasi-object" to other similar concepts are explored.
Memes are small units of culture, analogous to genes, which flow from person to person by copying or imitation. More than any previous medium, the Internet has the technical capabilities for global meme diffusion. Yet, to spread globally, memes need to negotiate their way through cultural and linguistic borders. This article introduces a new broad method, Web memetics, comprising extensive Web searches and combined quantitative and qualitative analyses, to identify and assess: (a) the different versions of a meme, (b) its evolution online, and (c) its Web presence and translation into common Internet languages. This method is demonstrated through one extensively circulated joke about men, women, and computers. The results show that the joke has mutated into several different versions and is widely translated, and that translations incorporate small, local adaptations while retaining the English versions' fundamental components. In conclusion, Web memetics has demonstrated its ability to identify and track the evolution and spread of memes online, with interesting results, albeit for only one case study.
Simple bibliometric indicators, such as average number of citations per publication per researcher, or the recently proposed Hirsch index (h-index), are nowadays tracked by online repositories, including Web of Science (WOS), and often affect critical decision making. This work proposes appropriate scaling of the h-index based on its probability distribution that is calculated for any underlying citation distribution. The proposed approach outperforms existing index estimation models that have focused on the expected value only (i.e., first moment). Furthermore, it is shown that average number of citations per publication per scientific field, total number of publications per researcher, as well as researcher's h-index measured value, expected value, and standard deviation constitute the minimum information required for meaningful h-index ranking campaigns; otherwise contradicting ranking results emerge. This work may potentially shed light to (current or future) large-scale, h-index-based bibliometric evaluations.
The citation counts are increasingly used to assess the impact on the scientific community of publications produced by a researcher, an institution or a country. There are many institutions that use bibliometric indicators to steer research policy and for hiring or promotion decisions. Given the importance that counting citations has today, the aim of the work presented here is to show how citations are distributed within a scientific area and determine the dependence of the citation count on the article features. All articles referenced in the Web of Science in 2004 for Biology & Biochemistry, Chemistry, Mathematics and Physics were considered. We show that the distribution of citations is well represented by a double exponential-Poisson law. There is a dependence of the mean citation rate on the number of co-authors, the number of addresses and the number of references, although this dependence is a little far from the linear behaviour. For the relation between the mean impact and the number of pages the dependence obtained was very low. For Biology & Biochemistry and Chemistry we found a linear behaviour between the mean citation per article and impact factor and for Mathematics and Physics the results obtained are near to the linear behaviour. (C) 2009 Elsevier Ltd. All rights reserved.
Based on the rank-order citation distribution of e. g. a researcher, one can define certain points on this distribution, hereby summarizing the citation performance of this researcher. Previous work of Glanzel and Schubert defined these so-called "characteristic scores and scales" (CSS), based on average citation data of samples of this ranked publication-citation list. In this paper we will define another version of CSS, based on diverse h-type indices such as the h-index, the g-index, the Kosmulski's h((2))-index and the g-variant of it, the g((2))-index. Mathematical properties of these new CSS are proved in a Lotkaian framework. These CSS also provide an improvement of the single h-type indices in the sense that they give h-type index values for different parts of the ranked publication-citation list. (C) 2009 Elsevier Ltd. All rights reserved.
Bibliometric studies at the micro level are increasingly requested by science managers and policy makers to support research decisions. Different measures and indices have been developed at this level of analysis. One type of indices, such as the h-index and g-index, describe the most productive core of the output of a researcher and inform about the number of papers in the core. Other indices, such as the a-index and m-index, depict the impact of the papers in the core. In this paper, we present a new index which relates two different dimensions in a researcher's productive core: a quantitative one (number of papers) and a qualitative one (impact of papers). In such a way, we could obtain a more balanced and global view of the scientific production of researchers. This new index, called q(2)-index, is based on the geometric mean of h-index and the median number of citations received by papers in the h-core, i.e., the m-index, which allows us to combine the advantages of both kind of indices. (C) 2009 Elsevier Ltd. All rights reserved.
Many, if not most network analysis algorithms have been designed specifically for single-relational networks; that is, networks in which all edges are of the same type. For example, edges may either represent "friendship," "kinship," or " collaboration," but not all of them together. In contrast, a multi-relational network is a network with a heterogeneous set of edge labels which can represent relationships of various types in a single data structure. While multi-relational networks are more expressive in terms of the variety of relationships they can capture, there is a need for a general framework for transferring the many single-relational network analysis algorithms to the multi-relational domain. It is not sufficient to execute a single-relational network analysis algorithm on a multi-relational network by simply ignoring edge labels. This article presents an algebra for mapping multi-relational networks to single-relational networks, thereby exposing them to single-relational network analysis algorithms. Published by Elsevier Ltd.
A recently suggested modification of the g-index is analysed in order to take multiple coauthorship appropriately into account. By fractionalised counting of the papers one can obtain an appropriate measure which I call g(m)-index. Two fictitious examples for model cases and two empirical cases are analysed. The results are compared with two other variants of the g-index which have also recently been proposed. Only the g(m)-index shows the correct behaviour when datasets are aggregated. The interpolated and continuous versions of the g-index and its variants are also discussed. For an intuitive comparison of the determination of the investigated variants of the h-index and the g-index, a visualization of the citation records is utilized. (C) 2009 Elsevier Ltd. All rights reserved.
The status of a journal is commonly determined by two factors: popularity and prestige. While the former counts citations, the latter recursively weights them with the prestige of the citing journals. We make a thorough comparison of the bibliometric concepts of popularity and prestige for journals in the sciences and in the social sciences. We find that the two notions diverge more for the hard sciences, including physics, engineering, material sciences, and computer sciences, than they do for the geosciences, for biology-medical disciplines, and for the social sciences. Moreover, we identify the science and social science journals with the highest diverging ranks in popularity and prestige compilations. (C) 2009 Elsevier Ltd. All rights reserved.
This paper introduces the Hirsch spectrum (h-spectrum) for analyzing the academic reputation of a scientific journal. h-Spectrum is a novel tool based on the Hirsch (h) index. It is easy to construct: considering a specific journal in a specific interval of time, h-spectrum is defined as the distribution representing the h-indexes associated to the authors of the journal articles. This tool allows defining a reference profile of the typical author of a journal, compare different journals within the same scientific field, and provide a rough indication of prestige/reputation of a journal in the scientific community. h-Spectrum can be associated to every journal. Ten specific journals in the Quality Engineering/Quality Management field are analyzed so as to preliminarily investigate the h-spectrum characteristics. (C) 2009 Elsevier Ltd. All rights reserved.
This paper introduces a new approach to describe the spread of research topics across disciplines using epidemic models. The approach is based on applying individual-based models from mathematical epidemiology to the diffusion of a research topic over a contact network that represents knowledge flows over the map of science-as obtained from citations between ISI Subject Categories. Using research publications on the protein class kinesin as a case study, we report a better fit between model and empirical data when using the citation-based contact network. Incubation periods on the order of 4-15.5 years support the view that, whilst research topics may grow very quickly, they face difficulties to overcome disciplinary boundaries. (C) 2009 Elsevier Ltd. All rights reserved.
The scientific quality of a publication can be determined not only based on the number of times it is cited but also based on the speed with which its content is disseminated in the scientific community. In this study we tested whether manuscripts that were accepted by Angewandte Chemie International Edition (one of the prime chemistry journals worldwide) received the first citation after publication faster than manuscripts that were rejected by the journal but published elsewhere. The results of a Cox regression model show that accepted manuscripts have a 49% higher hazard rate of citation than rejected manuscripts. (C) 2009 Elsevier Ltd. All rights reserved.
To sustain economic growth, countries have to manage systems in order to create technological innovation. To meet this goal, they are developing policies that organically connect companies, national laboratories, and universities into innovation networks. However, the whole structures of these connections have been little investigated because of the difficulty of obtaining such data. We use Japanese patent data and create a network of jointly applying organizations. This network can be considered as one representation of an innovation network because patents are seeds of innovation and joint applications are strong evidence of connections between organizations. We investigated the structure of the network, especially whether or not the degree distribution follows a power law. After that, we also propose a model that generates the actual network, not only degree distribution, but also link distance distribution. (C) 2009 Elsevier Ltd. All rights reserved.
This paper investigates the impact of small world properties and the size of largest component on innovation performance at national level. Our study adds new evidence to the limited literature on this topic with an empirical investigation for the patent collaboration networks of 16 main innovative countries during 1975-2006. We combine small world network theory with statistical models to systematically explore the relationship between network structure and patent productivity. Results fail to support that the size of largest component enhances innovative productivity significantly, which is not consistent with recent concerns regarding positive effects of largest component on patent output. We do find that small-world structure benefits innovation but it is limited to a special range after which the effects inversed and shorter path length always correlates with increased innovation output. Our findings extend the current literature and they can be implicated for policy makers and relevant managers when making decisions for technology, industry and firm location. (C) 2009 Elsevier Ltd. All rights reserved.
This paper provides a ranking of 69 marketing journals using a new Hirsch-type index, the hg-index which is the geometric mean of hg. The applicability of this index is tested on data retrieved from Google Scholar on marketing journal articles published between 2003 and 2007. The authors investigate the relationship between the hg-ranking, ranking implied by Thomson Reuters' Journal Impact Factor for 2008, and rankings in previous citation-based studies of marketing journals. They also test two models of consumption of marketing journals that take into account measures of citing (based on the hg-index), prestige, and reading preference. (C) 2009 Elsevier Ltd. All rights reserved.
In this paper a generalisation of the h-index and g-index is given on the basis of nonnegative real-valued functionals defined on subspaces of the vector space generated by the ordered samples. Several Hirsch-type measures are defined and their basic properties are analysed. Empirical properties are illustrated using examples from the micro- and meso-level. Among these measures, the h-index proved the most, the arithmetic and geometric g-indices, the least robust measures. The mu-index and the harmonic g-index provide more balanced results and are still robust enough. (C) 2009 Elsevier Ltd. All rights reserved.
Previous research has shown that citation data from different types of Web sources can potentially be used for research evaluation. Here we introduce a new combined Integrated Online Impact (IOI) indicator. For a case study, we selected research articles published in the Journal of the American Society for Information Science & Technology (JASIST) and Scientometrics in 2003. We compared the citation counts from Web of Science (WoS) and Scopus with five online sources of citation data including Google Scholar, Google Books, Google Blogs, PowerPoint presentations and course reading lists. The mean and median IOI was nearly twice as high as both WoS and Scopus, confirming that online citations are sufficiently numerous to be useful for the impact assessment of research. Wealso found significant correlations between conventional and online impact indicators, confirming that both assess something similar in scholarly communication. Further analysis showed that the overall percentage for unique Google Scholar citations outside the WoS were 73% and 60% for the articles published in JASIST and Scientometrics, respectively. An important conclusion is that in subject areas where wider types of intellectual impact indicators outside the WoS and Scopus databases are needed for research evaluation, IOI can be used to help monitor research performance. (C) 2009 Elsevier Ltd. All rights reserved.
Consumer health information has proliferated on the Web. However, because virtually anyone can publish this type of information on the Web, consumers cannot always rely on traditional credibility cues such as reputation of a journal. Instead, they must rely on a variety of cues, including visual presentation, to determine the veracity of information. This study is an examination of the relationship of people's visual design preferences to judgments of credibility of information on consumer health information sites. Subjects were asked to rate their preferences for visual designs of 31 health information sites after a very brief viewing. The sites were then reordered and subjects rated them according to the extent to which they thought the information on the sites was credible. Visual design judgments bore a statistically significant similarity to credibility ratings. Sites with known brands were also highly rated for both credibility and visual design. Theoretical implications are discussed.
This research investigated several important issues in using implicit feedback techniques to assist searchers with difficulties in formulating effective search strategies. It focused on examining the relationship between types of behavioral evidence that can be captured from Web searches and searchers' interests. A carefully crafted observation study was conducted to capture, examine, and elucidate the analytical processes and work practices of human analysts when they simulated the role of an implicit feedback system by trying to infer searchers' interests from behavioral traces. Findings provided rare insight into the complexities and nuances in using behavioral evidence for implicit feedback and led to the proposal of an implicit feedback model for Web search that bridged previous studies on behavioral evidence and implicit feedback measures. A new level of analysis termed an analytical lens emerged from the data and provides a road map for future research on this topic.
Facilitating engaging user experiences is essential in the design of interactive systems. To accomplish this, it is necessary to understand the composition of this construct and how to evaluate it. Building on previous work that posited a theory of engagement and identified a core set of attributes that operationalized this construct, we constructed and evaluated a multidimensional scale to measure user engagement. In this paper we describe the development of the scale, as well as two large-scale studies (N = 440 and N = 802) that were undertaken to assess its reliability and validity in online shopping environments. In the first we used Reliability Analysis and Exploratory Factor Analysis to identify six attributes of engagement: Perceived Usability, Aesthetics, Focused Attention, Felt Involvement, Novelty, and Endurability. In the second we tested the validity of and relationships among those attributes using Structural Equation Modeling. The result of this research is a multidimensional scale that may be used to test the engagement of software applications. In addition, findings indicate that attributes of engagement are highly intertwined, a complex interplay of user-system interaction variables. Notably, Perceived Usability played a mediating role in the relationship between Endurability and Novelty, Aesthetics, Felt Involvement, and Focused Attention.
With improving technology in the world, particularly Information and Communication Technology (ICT), people need to make an effort to keep up with the new technology. The importance of ICT in higher education continues to grow year by year. The purpose of this study was to analyze factors influencing application of ICT by agricultural graduate students. The statistical population included agricultural graduate students in colleges of agriculture at the University of Tehran. A sample of 110 students was selected using a random sampling method. A questionnaire was used for data collection. Reliability and validity of instrument were determined through opinions of faculty members and application of Cronbach's alpha. The data were analyzed using descriptive and inferential statistics. The findings revealed that skill, support, and facilities were the three factors that influenced the application of ICT by agricultural students. In order to predict application of ICT by agricultural students, the Technology Acceptance Model (TAM) was used. The results showed that skill had direct and indirect effects on the application of ICT, while support and facilities affected the application of ICT indirectly. Given the direct effect on application of ICT, we infer that when students' skills improve, they are more likely to use ICT.
This article reports the author's recent research in developing a holistic model for various levels of digital library (DL) evaluation in which perceived important criteria from heterogeneous stakeholder groups are organized and presented. To develop such a model, the author applied a three-stage research approach: exploration, confirmation, and verification. During the exploration stage, a literature review was conducted followed by an interview, along with a card sorting technique, to collect important criteria perceived by DL experts. Then the criteria identified were used for developing an online survey during the confirmation stage. Survey respondents (431 in total) from 22 countries rated the importance of the criteria. A holistic DL evaluation model was constructed using statistical techniques. Eventually, the verification stage was devised to test the reliability of the model in the context of searching and evaluating an operational DL. The proposed model fills two lacunae in the DL domain: (a) the lack of a comprehensive and flexible framework to guide and benchmark evaluations, and (b) the uncertainty about what divergence exists among heterogeneous DL stakeholders, including general users.
This article addresses a two-step approach for term extraction. In the first step on term candidate extraction, a new delimiter-based approach is proposed to identify features of the delimiters of term candidates rather than those of the term candidates themselves. This delimiter-based method is much more stable and domain independent than the previous approaches. In the second step on term verification, an algorithm using link analysis is applied to calculate the relevance between term candidates and the sentences from which the terms are extracted. All information is obtained from the working domain corpus without the need for prior domain knowledge. The approach is not targeted at any specific domain and there is no need for extensive training when applying it to new domains. In other words, the method is not domain dependent and it is especially useful for resource-limited domains. Evaluations of Chinese text in two different domains show quite significant improvements over existing techniques and also verity its efficiency and its relatively domain-independent nature. The proposed method is also very effective for extracting new terms so that it can serve as an efficient tool for updating domain knowledge, especially for expanding lexicons.
This article analyzes the effect of interdisciplinarity on the scientific impact of individual articles. Using all the articles published in Web of Science in 2000, we define the degree of interdisciplinarity of a given article as the percentage of its cited references made to journals of other disciplines. We show that although for all disciplines combined there is no clear correlation between the level of interdisciplinarity of articles and their citation rates, there are nonetheless some disciplines in which a higher level of interdisciplinarity is related to a higher citation rates. For other disciplines, citations decline as interdisciplinarity grows. One characteristic is visible in all disciplines: Highly disciplinary and highly interdisciplinary articles have a low scientific impact. This suggests that there might be an optimum of interdisciplinarity beyond which the research is too dispersed to find its niche and under which it is too mainstream to have high impact. Finally, the relationship between interdisciplinarity and scientific impact is highly determined by the citation characteristics of the disciplines involved: Articles citing citation-intensive disciplines are more likely to be cited by those disciplines and, hence, obtain higher citation scores than would articles citing non-citation-intensive disciplines.
In recent years there has been an increasingly pressing need for the evaluation of results from public-sector research activity, particularly to permit the efficient allocation of ever scarcer resources. Many of the studies and evaluation exercises that have been conducted at the national and international levels emphasize the quality dimension of research output, while neglecting that of productivity. This work is intended to test for the possible existence of correlation between quantity and quality of scientific production and determine whether the most productive researchers are also those that achieve results that are qualitatively better than those of their colleagues. The analysis proposed refers to the entire Italian university system and is based on the observation of production in the hard sciences by more than 26,000 researchers in the period 2001-2005. The results show that the output of more-productive researchers is superior in quality than that of less-productive researchers. The relation between productivity and quality results is largely insensitive to the types of indicators or the test methods applied and also seems to differ little among the various disciplines examined.
Many tasks in library and information science (e.g., indexing, abstracting, classification, and text analysis techniques such as discourse and content analysis) require text meaning interpretation, and, therefore, any individual differences in interpretation are relevant and should be considered, especially for applications in which these tasks are done automatically. This article investigates individual differences in the interpretation of one aspect of text meaning that is commonly used in such automatic applications: lexical cohesion and lexical semantic relations. Experiments with 26 participants indicate an approximately 40% difference in interpretation. In total, 79, 83, and 89 lexical chains (groups of semantically related words) were analyzed in 3 texts, respectively. A major implication of this result is the possibility of modeling individual differences for individual users. Further research is suggested for different types of texts and readers than those used here, as well as similar research for different aspects of text meaning.
Domain ontologies play an important role in supporting knowledge-based applications in the Semantic Web. To facilitate the building of ontologies, text mining techniques have been used to perform ontology learning from texts. However, traditional systems employ shallow natural language processing techniques and focus only on concept and taxonomic relation extraction. In this paper we present a system, known as Concept-Relation-Concept Tuple-based Ontology Learning (CRCTOL), for mining ontologies automatically from domain-specific documents. Specifically, CRCTOL adopts a full text parsing technique and employs a combination of statistical and lexico-syntactic methods, including a statistical algorithm that extracts key concepts from a document collection, a word sense disambiguation algorithm that disambiguates words in the key concepts, a rule-based algorithm that extracts relations between the key concepts, and a modified generalized association rule mining algorithm that prunes unimportant relations for ontology learning. As a result, the ontologies learned by CRCTOL are more concise and contain a richer semantics in terms of the range and number of semantic relations compared with alternative systems. We present two case studies where CRCTOL is used to build a terrorism domain ontology and a sport event domain ontology. At the component level, quantitative evaluation by comparing with Text-To-Onto and its successor Text2Onto has shown that CRCTOL is able to extract concepts and semantic relations with a significantly higher level of accuracy. At the ontology level, the quality of the learned ontologies is evaluated by either employing a set of quantitative and qualitative methods including analyzing the graph structural property, comparison to WordNet, and expert rating, or directly comparing with a human-edited benchmark ontology, demonstrating the high quality of the ontologies learned.
The g-index is discussed in terms of the average number of citations of the publications in the g-core, showing that it combines features of the h-index and the A-index in one number. For a visualization, data of 8 famous physicists are presented and analyzed. In comparison with the h-index, the g-index increases between 67% and 144%, on average by a factor of 2.
The Basic Vector Space Model (BVSM) is well known in information retrieval. Unfortunately, its retrieval effectiveness is limited because it is based on literal term matching. The Generalizod Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) are two prominent semantic retrieval methods, both of which assume there is some underlying latent semantic structure in a dataset that can be used to improve retrieval performance. However, while this structure may be derived from both the term space and the document space, GVSM exploits only the former and LSI the latter. In this article, the latent semantic structure of a dataset is examined from a dual perspective; namely, we consider the term space and the document space simultaneously. This new viewpoint has a natural connection to the notion of kernels. Specifically, a unified kernel function can be derived for a class of vector space models. The dual perspective provides a deeper understanding of the semantic space and makes transparent the geometrical meaning of the unified kernel function. New semantic analysis methods based on the unified kernel function are developed, which combine the advantages of I-SI and GVSM. We also prove that the new methods are stable because although the selected rank of the truncated Singular Value Decomposition (SVD) is far from the optimum, the retrieval performance will not be degraded significantly. Experiments performed on standard test collections show that our methods are promising.
Despite the rapid growth in social network sites and in data mining for emotion (sentiment analysis), little research has tied the two together, and none has had social science goals. This article examines the extent to which emotion is present in MySpace comments, using a combination of data mining and content analysis, and exploring age and gender. A random sample of 819 public comments to or from U.S. users was manually classified for strength of positive and negative emotion. Two thirds of the comments expressed positive emotion, but a minority (20%) contained negative emotion, confirming that MySpace is an extraordinarily emotion-rich environment. Females are likely to give and receive more positive comments than are males, but there is no difference for negative comments. It is thus possible that females are more successful social network site users partly because of their greater ability to textually harness positive affect.
In this brief communication, we evaluate the use of two stopword lists for the English language (one comprising 571 words and another with 9) and compare them with a search approach accounting for all word forms. We show that through implementing the original Okapi form or certain ones derived from the Divergence from Randomness (DFR) paradigm, significantly lower performance levels may result when using short or no stopword lists. For other DFR models and a revised Okapi implementation, performance differences between approaches using short or long stopword lists or no list at all are usually not statistically significant. Similar conclusions can be drawn when using other natural languages such as French, Hindi, or Persian.
Carbon nanotube field emission display (CNT-FED) represents both emerging application of nanotechnology and revolutionary invention of display. Therefore, it is an important subject to monitor the states and trends of CNT-FED technology before the next stage of development. The present paper uses patent bibliometric analysis and patent network analysis to monitor the technological trends in the field of CNT-FED. These results firstly reveal the different aspects of patenting activities in the field of CNT-FED. Then, patent network analysis indicates the developing tendency of worldwide FED production based on the synthesis of CNT materials. Furthermore, key technologies of three clusters can be identified as the depositing CNT on substrate, coating phosphor on screen and assembling process for whole device. Finally, emitter material is taken for the key factor in R&D work to improve the efficacy in CNT-FED technology.
Patents contain much significant technical information which can serve as an indicator of technological and economical development. This study attempts to forecast the development of the biped robot walking technique in Japan by use of the patent data obtained from the Japan Patent Office. The study applies linear regression to the patent data using three S-curve models developed by Loglet Lab, Pearl, and Gompertz individually. Various parameters inherent to each model including the least sum of modulus error and the least mean of square error of the model are analyzed. The most appropriate model for measuring the inflection point, the growth and the saturation time of the technique is described. Based on the Gompertz model analysis, this study finds that the biped robot walking technique will continue to develop for several decades in Japan and the saturation period is estimated to be around the year 2079-2082. This finding can help related researchers and managers in the robot field to foresee the development trend of the biped robot walking technique in this century.
Although the topic of technological diversification has been a major source of research, only a few studies have explored the determinant variables of technological scope decisions. The present study enhances our understanding of the determinants of a firm's technological scope strategy. After reviewing the related literatures, we proposed and empirically tested a conceptual model from the perspective of the firm's environment, strategic orientation, and resources. The results suggest that the coherence between technological scope decisions and proposed model is significantly related to performance.
The relationship between R&D and market value has attracted the interest of many scholars within different fields, but scant attention has been paid to the countries with weak protection of intellectual property rights (IPR). This is unfortunate, since this problem is potentially highly relevant for IPR policy in developing countries. In particular, several questions arise when the problem of R&D market value is analyzed in a country where IPR protection is weak. First, there are concerns regarding incentives (i.e., private returns) for firms to invest in R&D when IPR is only weakly protected. Second, significant differences could emerge in the market valuation of R&D investments of domestic and foreign firms, above all in those industries where spillovers are more likely. To examine these issues, this paper investigates the market valuation of R&D investments of a panel of 219 R&D-reporting domestic and foreign firms publicly traded in India with an empirical analysis. First, the market valuation of the R&D capital for the whole sample is positive and higher than those obtained in U.S. or European countries from similar analyses. Second, in the sub-samples of the domestic and foreign firms, the market value of R&D investments of foreign firms is not significantly different from zero, while the valuation coefficient of domestic firms is four times higher than that obtained on the whole sample. Third, in science-based industries the difference between domestic and foreign firms is smaller than in the other industries. The policy implications of these findings are discussed.
This study utilized artificial neural network (ANN) to explore the nonlinear influences of firm size, profitability, and employee productivity upon patent citations of the US pharmaceutical companies. The results showed that firm size, profitability, and employee productivity of the US pharmaceutical companies had the nonlinearly and monotonically positive influences upon their patent citations. Therefore, if US pharmaceutical companies want to enhance their innovation performance, they should pay attention on their firm size, profitability, and employee productivity.
Because R&D conducted in electronics and chemistry has made significant contributions to South Korean economic development, past strategies in technology developments in these fields are addressed. The possibility of capturing national technology strategy and policy characteristics from patent analyses is explored. For the analysis, data were analyzed from 557 US patents in electronics and 108 US patents in chemistry, registered by Korean inventors, between 1989 and 1992. Descriptive statistics of aggregated patent information were equivalently mapped to each strategy in the two fields. Industry-specific features and past technology strategies in electronics and chemistry are identified. Electronics was driven by the private sector, while chemistry was driven by the public sector. Inventors in both fields are seeking clustered innovation on which subsequent innovation can be accumulated and/or applied to numerous heterogeneous fields. Contrary to the stated assumption, many Korean electronic innovations were based on scientific outputs such as papers. Of the knowledge strategy variables, size of invention and number of heterogeneous classifications are considered to be an important factor that affects patent citation counts in both fields.
This paper adds to the growing empirical evidence on the relationship between patenting and publishing among university employees. Data from all Norwegian universities and a broad set of disciplines is used, consisting of confirmed patent inventors and group of peers without patents matched to the inventors by controlling for gender, age, affiliation and position. In general, the findings support earlier investigations concluding that there is a positive relationship between patenting and publishing. There are, however, important differences among fields, universities and possibly types of academic entrepreneurs, underscoring the need to look at nuanced and contextual factors when investigating the effects of patenting.
In this study, the author tried to demonstrate the linkage between science research and technology development through non-patent citation analysis to reveal that the important knowledge resources from science research had significant impact on technology development. Genetic engineering technology was the field examined in this study. From the references listed in the patents, it was observed that the technology development in genetic engineering was influenced heavily by the research done by public sector. Over 90% of the citations were non-patent literatures, and the majority of non-patent citations were journal articles. Citing preferences, such as country preference and institute preference were observed from the data included in this study.
Despite strategic research has been done in recent years to study how network topology shapes the evolution of competition in various industries, previous researches do not investigate the importance of high betweenness point on the connectivity of patent citation networks. The goal of this report is to examine and characterize the small world phenomenon in the patent citations network by analyzing the data of RFID patents. The results suggest that the patent citation network can indeed be characterized as "small world". Additionally, the patent citation network resembles the power-law connectivity distribution and exhibits preferential connectivity behavior. In other words, a few key patents have a great many more connections than the majority of patents with few connections. Furthermore, the patents of high betweenness centrality were identified. It is found that 81% of the patent citation activities have relations with the patents of high betweenness centrality. The result of this analysis will provide a specific way for managers to identify key patents, to map their own patent deployment and to derive insight into the best ways to navigate within such networks.
In recent years, firms have increased their use of internal and external knowledge through intermediaries. Knowledge brokers match buyers and sellers in the technology marketplace as well as connect and combine existing knowledge. We discuss how financial incentives in the technology marketplace can address challenges to open innovation, and how the marketplace could make individual inventors essential contributors. And then, we identify the key determinants of intellectual-property auction bids and different characteristics of auctioned and non-auctioned patents. Relevance, the scope of patents, and other factors suggested in the literature impact patent auctions, as mediated by knowledge brokers.
The objective of this research is to develop a new patent bibliometric performance measure by using modified citation rate analyses with dynamic backward citation windows. Cited half-life employed in bibliometrics was adopted in order to establish a model of annual patent backward citation windows. Based on the dynamic behavior of backward citation windows, the annual backward patent citation rates for each technology domain can be calculated to measure its bibliometric performance. It was found that the dynamic backward citation window represents more accurately the citation cycle time which is a key factor on technology assessment. Because different technology domain may have disparate attributes, a normalized backward citation rate was developed to measure the corresponding rank for each domain respect to the entire industry. Three technology domains were then chosen for demonstrative case studies which represent semiconductor, LCD, and drug industries.
This paper proposes a novel methodological framework for effectively measuring the production frontier performance (PFP) of macro-scale (regional or national) R&D activities themselves associated with two improved models: a non-radial data envelopment analysis (DEA) model and a nonradial Malmquist index. In particular, the framework can provide multidimensional information to benchmark various R&D efficiency indexes (i.e., technical efficiency, pure technical efficiency and scale efficiency) as well as the total factor R&D productivity change (determined by three components: "catch-up" of R&D efficiency, "frontier shift" of R&D technology as well as "exploitation" of R&D scale economics effect) at a comparable production frontier. It can be used to not only investigate the potential and sustainable capacity of innovation but also screen and finance R&D projects at the regional or national level. We have applied the framework to a province-level panel dataset on R&D activities of 30 selected Chinese provinces.
The investment in research and development (R&D) for semiconductor industry is never small as the technology cycle time (TCT) is relatively short comparing to other industries, thus a semiconductor company requires lots of technological innovations and capital offerings to maintain. The semiconductor industry contributes primarily part of the micro-electronic industries. Advancing technology and patent application are the centre of attention within the semiconductor sector. This research examines the relationship between patent quality and the profits a patent creates for a company in this selected field. This study distinguishes itself from prior research by including cross-sectional data, time series data to simultaneously collect and analyze. The study result shows that some indicators of patent quality are statistically significant to return on assets.
Patent statistics are a frequently used innovation indicator for the description and analysis of technological strengths and weaknesses, both on the macro and the micro level. Patent data has a broad coverage, high reliability, allows a very differentiated perspective and the data has become easier in availability and accessibility. Especially when cross country comparisons and comparative assessments are intended, a deep knowledge and understanding of patent systems is required. In the 1990s Triadic patents, which were able to balance the home advantage of domestic applicants/inventors. An increasing internationalisation and globalisation makes it also necessary to adapt the patent analyses to this new world order. In this paper the so called Transnational patents are suggested, which allows to grasp the new relations and relative positions between the industrialised, industrialising and emerging countries. The existing concepts are presented and discussed and contrasted against the concept of Transnational Patents.
This study utilizes neural network to explore the nonlinear relationships between corporate performance and the patent traits measured from Herfindahl-Hirschman Index of patents (HHI of patents), patent citations, and relative patent position in the most important technological field (RPP(MIT)) in the US pharmaceutical industry. The results show that HHI of patents and RPP(MIT) have nonlinearly and monotonically positive influences upon corporate performance, while the influence of patent citations is nonlinearly U-shaped. Therefore, pharmaceutical companies should raise the degrees of the leading position in their most important technological fields and the centralization of their technological capabilities to enhance corporate performance.
In 1975 Tefko Saracevic declared "the subject knowledge view" to be the most fundamental perspective of relevance. This paper examines the assumptions in different views of relevance, including "the system's view" and "the user's view" and offers a reinterpretation of these views. The paper finds that what was regarded as the most fundamental view by Saracevic in 1975 has not since been considered (with very few exceptions). Other views, which are based on less fruitful assumptions, have dominated the discourse on relevance in information retrieval and information science. Many authors have reexamined the concept of relevance in information science, but have neglected the subject knowledge view, hence basic theoretical assumptions seem not to have been properly addressed. It is as urgent now as it was in 1975 seriously to consider "the subject knowledge view" of relevance (which may also be termed "the epistemological view"). The concept of relevance, like other basic concepts, is influenced by overall approaches to information science, such as the cognitive view and the domain-analytic view. There is today a trend toward a social paradigm for information science. This paper offers an understanding of relevance from such a social point of view.
This study investigates interactive video retrieval. The basis for this study is that user- and search task-centric research in video information retrieval can assist efforts for developing effective user interfaces and help complement the existing corpus of video retrieval research by providing evidence for the benefits of evaluating systems using such an approach. Accordingly, the results were collected and analyzed from the perspective of certain users and search tasks (i.e., information needs). The methodology of this study employed specially designed interactive search experiments to examine a number of different factors in a video retrieval context, including those that correspond to search tasks of a particular domain, interface features and functions, system effectiveness, and user interactions. The results indicated that the use and effectiveness of certain interface features and functions were dependent on the type of search task, while others were more consistent across the full experiment. Also included is a review of prior research pertaining to visual search tasks, systems development, and user interaction. ViewFinder, the prototype system used to carry out the interactive search experiments of this study, is fully described.
The landscape metaphor was one of the first methods used by the information visualization community to reorganize and depict document archives that are not inherently spatial. The motivation for the use of the landscape metaphor is that everyone intuitively understands landscapes. We critically examine the information visualization designer's ontologies for implementing spatialized landscapes with ontologies of the geographic domain held by lay people. In the second half of the article, we report on a qualitative study where we empirically assessed whether the landscape metaphor has explanatory power for users trying to make sense of spatialized views, and if so, in what ways. Specifically, we are interested in uncovering how lay people interpret hills and valleys in an information landscape, and whether their interpretation is congruent with the current scientific understanding of geomorphologic processes. Our empirical results suggest that neither developers' or lay users' understanding of terrain visualizations is based on universal understanding of the true process that has shaped a natural landscape into hills and valleys, mountains, and canyons. Our findings also suggest that the information landscape metaphor for sense making of a document collection is not self-evident to lay users, as claimed by information landscape designers. While a deep understanding of geomorphology will probably not be required to successfully use an information landscape, we do suggest that a coherent theory on how people use space will be necessary to produce cognitively useful information visualizations.
People taking part in argumentative debates through collective annotations face a highly cognitive task when trying to estimate the group's global opinion. In order to reduce this effort, we propose in this paper to model such debates prior to evaluating their "social validation." Computing the degree of global confirmation (or refutation) enables the identification of consensual (or controversial) debates. Readers as well as prominent information systems may thus benefit from this information. The accuracy of the social validation measure was tested through an online study conducted with 121 participants. We compared their human perception of consensus in argumentative debates with the results of the three proposed social validation algorithms. Their efficiency in synthesizing opinions was demonstrated by the fact that they achieved an accuracy of up to 84%.
It is important for education in computer science and information systems to keep up to date with the latest development in technology. With the rapid development of the Internet and the Web, many schools have included Internet-related technologies, such as Web search engines and e-commerce, as part of their curricula. Previous research has shown that it is effective to use search engine development tools to facilitate students' learning. However, the effectiveness of these tools in the classroom has not been evaluated. In this article, we review the design of three search engine development tools, SpidersRUs, Greenstone, and Alkaline, followed by an evaluation study that compared the three tools in the classroom. In the study, 33 students were divided into 13 groups and each group used the three tools to develop three independent search engines in a class project. Our evaluation results showed that SpidersRUs performed better than the two other tools in overall satisfaction and the level of knowledge gained in their learning experience when using the tools for a class project on Internet applications development.
Automatic text classification (TC) is essential for the management of information. To properly classify a document d, it is essential to identify the semantics of each term t in d, while the semantics heavily depend on context (neighboring terms) of t in d. Therefore, we present a technique CTFA (Context-based Term Frequency Assessment) that improves text classifiers by considering term contexts in test documents. The results of the term context recognition are used to assess term frequencies of terms, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies, without needing to modify the classifiers. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Empirical results show that CTFA successfully enhances performance of several kinds of text classifiers on different experimental data.
This article studies citation practices in the arts and humanities from a theoretical and conceptual viewpoint, drawing on studies from fields like linguistics, history, library & information science, and the sociology of science. The use of references in the humanities is discussed in connection with the growing interest in the possibilities of applying citation analysis to humanistic disciplines. The study shows how the use of references within the humanities is connected to concepts of originality, to intellectual organization, and to searching and writing. Finally, it is acknowledged that the use of references is connected to stylistic, epistemological, and organizational differences, and these differences must be taken into account when applying citation analysis to humanistic disciplines.
The specific impact index, or s-index, is introduced as a measure of a scientist's projected impact per paper. The index is complementary to other indices that measure overall impact as it can distinguish between authors maximizing the quantity of their output and authors maximizing the quality of their output. It also can be used to monitor career progress. The main advantage of the new index is that it reduces age bias from older papers that have more time to accumulate citations than do more recent papers. The index was tested on 24 scientists in different fields and of different statures. The overall projected impact estimated from the index correlates well with Hirsch's h-index squared (r(2) = 0.99). The impact of different aging models was evaluated as well.
The study of the citation histories and ageing of documents are topics that have been addressed from several perspectives, especially in the analysis of documents with "delayed recognition" or "sleeping beauties." However, there is no general methodology that can be extensively applied for different time periods or research fields. In this article, a new methodology for the general analysis of the ageing and "durability" of scientific papers is presented. This methodology classifies documents into three general types: delayed documents, which receive the main part of their citations later than normal documents; flashes in the pan, which receive citations immediately after their publication but are not cited in the long term; and normal documents, documents with a typical distribution of citations over time. These three types of durability have been analyzed considering the whole population of documents in the Web of Science with at least 5 external citations (i.e., not considering self-citations). Several patterns related to the three types of durability have been found and the potential for further research of the developed methodology is discussed.
Two forms of diffusion are studied: diffusion by publications, originating from the fact that a group publishes in different fields; and diffusion by citations, originating from the fact that the group's publications are cited in different fields. The first form of diffusion originates from an internal mechanism by which the group itself expands its own borders. The second form is partly driven by an external mechanism, in the sense that other fields use or become interested in the original group's expertise, and partly by the group's internal dynamism, in the sense that their articles, being published in more and more fields, have the potential to be applied in these other fields. In this contribution, we focus on basic counting measures as measures of diffusion. We introduce the notions of field diffusion breadth, defined as the number of for Essential Science Indicators (ESI) fields in which a set of articles is cited, and field diffusion intensity, defined as the number of citing articles in one particular ESI field. Combined effects of publications and citations can be measured by the Gin! evenness measure. Our approach is illustrated by a study of mathematics at Tongji University (Shanghai, China).
Using the Scopus dataset (1996-2007) a grand matrix of aggregated journal-journal citations was constructed. This matrix can be compared in terms of the network structures with the matrix contained in the Journal Citation Reports (JCR) of the Institute of Scientific Information (ISI). Because the Scopus database contains a larger number of journals and covers the humanities, one would expect richer maps. However, the matrix is in this case sparser than in the case of the ISI data. This is because of (a) the larger number of journals covered by Scopus and (b) the historical record of citations older than 10 years contained in the ISI database. When the data is highly structured, as in the case of large journals, the maps are comparable, although one may have to vary a threshold (because of the differences in densities). In the case of interdisciplinary journals and journals in the social sciences and humanities, the new database does not add a lot to what is possible with the ISI databases.
Adding or deleting items such as self-citations has an influence on the h-index of an author. This influence will be proved mathematically in this article. We hereby prove the experimental finding in E. Gianoli and M.A. Molina-Montenegro (2009) that the influence of adding or deleting self-citations on the h-index is greater for low values of the h-index. Why this is logical also is shown by a simple theoretical example. Adding or deleting sources such as adding or deleting minor contributions of an author also has an influence on the h-index of this author; this influence is modeled in this article. This model explains some practical examples found in X. Flu, R. Rousseau, and J. Chen (in press).
Finding worthwhile podcasts can be difficult for listeners since podcasts are published in large numbers and vary widely with respect to quality and repute. Independently of their informational content, certain podcasts provide satisfying listening material while other podcasts have little or no appeal. In this paper we present PodCred, a framework for analyzing listener appeal, and we demonstrate its application to the task of automatically predicting the listening preferences of users. First, we describe the PodCred framework, which consists of an inventory of factors contributing to user perceptions of the credibility and quality of podcasts. The framework is designed to support automatic prediction of whether or not a particular podcast will enjoy listener preference. It consists of four categories of indicators related to the Podcast Content, the Podcaster, the Podcast Context, and the Technical Execution of the podcast. Three studies contributed to the development of the PodCred framework: a review of the literature on credibility for other media, a survey of prescriptive guidelines for podcasting, and a detailed data analysis. Next, we report on a validation exercise in which the PodCred framework is applied to a real-world podcast preference prediction task. Our validation focuses on select framework indicators that show promise of being both discriminative and readily accessible. We translate these indicators into a set of easily extractable "surface" features and use them to implement a basic classification system. The experiments carried out to evaluate system use popularity levels in iTunes as ground truth and demonstrate that simple surface features derived from the PodCred framework are indeed useful for classifying podcasts.
Social network sites (SNSs) such as MySpace and Facebook are important venues for interpersonal communication, especially among youth. One way in which members can communicate is to write public messages on each other's profile, but how is this unusual means of communication used in practice? An analysis of 2,293 public comment exchanges extracted from large samples of U.S. and U.K. MySpace members found them to be relatively rapid, but rarely used for prolonged exchanges. They seem to fulfill two purposes: making initial contact and keeping in touch occasionally such as at birthdays and other important dates. Although about half of the dialogs seem to exchange some gossip, the dialogs seem typically too short to play the role of gossip-based "social grooming" for typical pairs of Friends, but close Friends may still communicate extensively in SNSs with other methods.
Terrorist groups use the Web as their infrastructure for various purposes. One example is the forming of new local cells that may later become active and perform acts of terror. The Advanced Terrorist Detection System (ATDS), is aimed at tracking down online access to abnormal content, which may include terrorist-generated sites, by analyzing the content of information accessed by the Web users. ATDS operates in two modes: the training mode and the detection mode. In the training mode, ATDS determines the typical interests of a prespecified group of users by processing the Web pages accessed by these users over time. In the detection mode, ATDS performs real-time monitoring of the Web traffic generated by the monitored group, analyzes the content of the accessed Web pages, and issues an alarm if the accessed information is not within the typical interests of that group and similar to the terrorist interests. An experimental version of ATDS was implemented and evaluated in a local network environment. The results suggest that when optimally tuned the system can reach high detection rates of up to 100% in case of continuous access to a series of terrorist Web pages.
I studied the distribution of articles and citations in journals between 1998 and 2007 according to an empirical function with two exponents. These variables showed good fit to a beta function with two exponents.
Since the publication of Robert K. Merton's theory of cumulative advantage in science (Matthew Effect), several empirical studies have tried to measure its presence at the level of papers, individual researchers, institutions, or countries. However, these studies seldom control for the intrinsic "quality" of papers or of researchers-"better" (however defined) papers or researchers could receive higher citation rates because they are indeed of better quality. Using an original method for controlling the intrinsic value of papers-identical duplicate papers published in different journals with different impact factors-this paper shows that the journal in which papers are published have a strong influence on their citation rates, as duplicate papers published in high-impact journals obtain, on average, twice as many citations as their identical counterparts published in journals with lower impact factors. The intrinsic value of a paper is thus not the only reason a given paper gets cited or not, there is a specific Matthew Effect attached to journals and this gives to papers published there an added value over and above their intrinsic quality.
Ranking of universities has lately received considerable attention. However, ranking of departments would give a higher resolution picture of the distribution of quality within each university. In this work the Hirsch (h) index of each faculty in Greek Chemistry, Chemical Engineering, Materials Science, and Physics departments was calculated using the Web of Science and the mean value was used to rank them. This ranking refers to the research performance of each department and thus is most relevant to its doctoral program. The results seem highly meaningful. If performed on a pan-European basis, such rankings could spur healthy competition and could provide a strong motive for meritocratic hiring practices. Technical difficulties and possible extension of this approach to social science and humanities departments are discussed.
Patents represent the technological or inventive activity and output across different fields, regions, and time. The analysis of information from patents could be used to help focus efforts in research and the economy; however, the roles of the factors that can be extracted from patent records are still not entirely understood. To better understand the impact of these factors on patent value, machine learning techniques such as feature selection and classification are used to analyze patents in a sample industry, nanotechnology. Each nanotechnology patent was represented by a comprehensive set of numerical features that describe inventors, assignees, patent classification, and outgoing references. After careful design that included selection of the most relevant features, selection and optimization of the accuracy of classification models that aimed at finding most valuable (top-performing) patents, we used the generated models to analyze which factors allow to differentiate between the top-performing and the remaining nanotechnology patents. A few interesting findings surface as important such as the past performance of inventors and assignees, and the count of referenced patents.
This paper proves two regularities that where found in the paper (LariviSre et al. (2007). Long-term patterns in the aging of the scientific literature, 1900-2004. In Proceedings of ISSI 2007. CSIC, Madrid, Spain, pp. 449-456.). The first is that the mean as well as the median reference age increases in time. The second is that the Price Index decreases in time. Using an exponential literature growth model we prove both regularities. Hence we show that the two results do not have a special informetric reason but that they are just a mathematical consequence of a widely accepted simple literature growth model.
In this article, we firstly analyze the referencing process and the citation process of a scientific journal in theory, and find that the observed referencing or citation process includes the diffusing process and the aging process of cited literature and the publishing process of citing literature, thereby it is illuminated why the identified average publication delay ((T) over bar = T(s) + tau) was longer than the observed value. Secondly, we compare the transfer function model of the observed citing process with other classical citation distribution models and find that the model is superior to others. Finally, using the model, we identify parameters of actual referencing and citation processes from data of age distributions of references and citations of 38 journals of neurology and applied mathematics in JCR, respectively; and then compare differences of identified parameters and obtain some interesting conclusions.
The multidimensional character and inherent conflict with categorisation of interdisciplinarity makes its mapping and evaluation a challenging task. We propose a conceptual framework that aims to capture interdisciplinarity in the wider sense of knowledge integration, by exploring the concepts of diversity and coherence. Disciplinary diversity indicators are developed to describe the heterogeneity of a bibliometric set viewed from predefined categories, i.e. using a top-down approach that locates the set on the global map of science. Network coherence indicators are constructed to measure the intensity of similarity relations within a bibliometric set, i.e. using a bottom-up approach, which reveals the structural consistency of the publications network. We carry out case studies on individual articles in bionanoscience to illustrate how these two perspectives identify different aspects of interdisciplinarity: disciplinary diversity indicates the large-scale breadth of the knowledge base of a publication; network coherence reflects the novelty of its knowledge integration. We suggest that the combination of these two approaches may be useful for comparative studies of emergent scientific and technological fields, where new and controversial categorisations are accompanied by equally contested claims of novelty and interdisciplinarity.
In this study, we examine and validate the use of existing text mining techniques (based on the vector space model and latent semantic indexing) to detect similarities between patent documents and scientific publications. Clearly, experts involved in domain studies would benefit from techniques that allow similarity to be detected-and hence facilitate mapping, categorization and classification efforts. In addition, given current debates on the relevance and appropriateness of academic patenting, the ability to assess content-relatedness between sets of documents-in this case, patents and publications-might become relevant and useful. We list several options available to arrive at content based similarity measures. Different options of a vector space model and latent semantic indexing approach have been selected and applied to the publications and patents of a sample of academic inventors (n = 6). We also validated the outcomes by using independently obtained validation scores of human raters. While we conclude that text mining techniques can be valuable for detecting similarities between patents and publications, our findings also indicate that the various options available to arrive at similarity measures vary considerably in terms of accuracy: some generally accepted text mining options, like dimensionality reduction and LSA, do not yield the best results when working with smaller document sets. Implications and directions for further research are discussed.
The present paper proposes a method for detecting, identifying and visualizing research groups. The data used refer to nine Carlos III University of Madrid departments, while the findings for the Communication Technologies Department illustrate the method. Structural analysis was used to generate co-authorship networks. Research groups were identified on the basis of factorial analysis of the raw data matrix and similarities in the choice of co-authors. The resulting networks distinguished the researchers participating in the intra-departmental network from those not involved and identified the existing research groups. Fields of research were characterized by the Journal of Citation Report subject category assigned to the bibliographic references cited in the papers written by the author-factors. The results, i.e., the graphic displays of the structures of the socio-centric and co-authorship networks and the strategies underlying collaboration among researchers, were later discussed with the members of the departments analyzed. The paper constitutes a starting point for understanding and characterizing networking within research institutions.
In order to measure the degree to which Google Scholar can compete with bibliographical databases, search results from this database is compared with Thomson's ISI WoS (Institute for Scientific Information, Web of Science). For earth science literature 85% of documents indexed by ISI WoS were recalled by Google Scholar. The rank of records displayed in Google Scholar and ISI WoS, is compared by means of Spearman's footrule. For impact measures the h-index is investigated. Similarities in measures were significant for the two sources.
A large part of Social Sciences and the Humanities do not adapt to international proceedings used in English for scientific output on databases such as the Web of Science and Scopus. The aim of this paper is to show the different results obtained in scientific work by comparing Social Sciences researchers with those of other sciences in four Spanish universities. The first finding is that some Social Sciences researchers are somewhat internationalised. However, the majority of individuals who are prestigious in their local academic-scientific community do not even appear on the information sources mentioned above.
One of the more important measures of a scholar's research impact is the number of times that the scholar's work is cited by other researchers as a source of knowledge. This paper conducts a first of its kind examination on Israel's academic economists and economics departments, ranking them according to the number of citations on their work. It also provides a vista into one of the primary reasons given by junior Israeli economists for an unparalleled brain drain from the country-discrepancies between research impact and promotion. The type of examination carried out in this paper can now be easily replicated in other fields and in other countries utilizing freely available citations data and compilation software that have been made readily accessible in recent years.
The exploratory analysis developed in this paper relies on the hypothesis that each editor possesses some power in the definition of the editorial policy of her journal. Consequently if the same scholar sits on the board of editors of two journals, those journals could have some common elements in their editorial policies. The proximity of the editorial policies of two scientific journals can be assessed by the number of common editors sitting on their boards. A database of all editors of ECONLIT journals is used. The structure of the network generated by interlocking editorship is explored by applying the instruments of network analysis. Evidence has been found of a compact network containing different components. This is interpreted as the result of a plurality of perspectives about the appropriate methods for the investigation of problems and the construction of theories within the domain of economics.
To be able to measure the scientific output of researchers is an increasingly important task to support research assessment decisions. To do so, we can find several different measures and indices in the literature. Recently, the h-index, introduced by Hirsch in 2005, has got a lot of attention from the scientific community for its good properties to measure the scientific production of researchers. Additionally, several different indicators, for example, the g-index, have been developed to try to improve the possible drawbacks of the h-index. In this paper we present a new index, called hg-index, to characterize the scientific output of researchers which is based on both h-index and g-index to try to keep the advantages of both measures as well as to minimize their disadvantages.
We propose a comprehensive bibliometric study of the profile of Nobel Prize winners in chemistry and physics from 1901 to 2007, based on citation data available over the same period. The data allows us to observe the evolution of the profiles of winners in the years leading up to-and following-nominations and awarding of the Nobel Prize. The degree centrality and citation rankings in these fields confirm that the Prize is awarded at the peak of the winners' citation history, despite a brief Halo Effect observable in the years following the attribution of the Prize. Changes in the size and organization of the two fields result in a rapid decline of predictive power of bibliometric data over the century. This can be explained not only by the growing size and fragmentation of the two disciplines, but also, at least in the case of physics, by an implicit hierarchy in the most legitimate topics within the discipline, as well as among the scientists selected for the Nobel Prize. Furthermore, the lack of readily-identifiable dominant contemporary physicists suggests that there are few new paradigm shifts within the field, as perceived by the scientific community as a whole.
The Hirsch index is a number that synthesizes a researcher's output. It is defined as the maximum number h such that the researcher has h papers with at least h citations each. Woeginger (Math Soc Sci 56: 224-232, 2008a; J Informetr 2: 298-303, 2008b) suggests two axiomatic characterizations of the Hirsch index using monotonicity as one of the axioms. This note suggests three characterizations without adopting the monotonicity axiom.
This paper examines the effects of a scholar's position and gender on publishing productivity in several types of scientific publications: monographs, articles in journals, articles in edited books, and articles in conference proceedings. The data consist of 1,367 scholars who worked at the University of Helsinki, Finland, during the period 2002-2004. The analysis shows that professors are the most productive, PhDs publish more than non-PhDs, and men perform better than women, also when other scholarly characteristics are controlled for. These differences are greater for monographs and articles in edited books than for articles in journals. In terms of conference proceedings, no remarkable productivity differences were found.
This paper brings together recent statistical evidence on international (co-)publications and (foreign) PhD-students and scholars to document shifts in geographic sources of scientific production and the impact this has on flows of scientific talent and partnering for scientific collaboration. The evidence demonstrates that despite the continued dominance of the US and the increasing importance of the EU, the TRIAD is in relative decline. Other geographic sources of science outside the TRIAD are rising, both in quantity, but also, although still to a lesser extent, in quality. Especially China drives this non-TRIAD growth. This catching-up of non-TRIAD countries drives a slow but real process of global convergence. It nevertheless leaves a less equal non-TRIAD science community, as the growth of China, is not matched by other non-TRIAD countries. Despite the rise of China's own scientific production, and the increasing return flows of overseas students and scholars, the outward flows of Asian talents have not diminished over time. The data suggest a high correlation between the patterns of international mobility of scientists and the patterns of international collaborations. The large and stable flow of Chinese human capital into the US forms the basis on which stable international US-Chinese scientific networks are built. With the EU lacking this Chinese human capital circulation, it is more difficult to build up similar strong and stable networks.
Literature examining information judgments and Internet search behaviors notes a number of major research gaps, including how users actually make these judgments outside of experiments or researcher-defined tasks, and how search behavior is impacted by a user's judgment of online information. Using the medical setting, where doctors face real consequences in applying the information found, we examine how information judgments employed by doctors to mitigate risk impact their cognitive search. Diaries encompassing 444 real clinical information search incidents, combined with semistructured interviews across 35 doctors, were analyzed via thematic analysis. Results show that doctors, though aware of the need for information quality and cognitive authority, rarely make evaluative judgments. This is explained by navigational bias in information searches and via predictive judgments that favor known sites where doctors perceive levels of information quality and cognitive authority. Doctors' mental models of the Internet sites and Web experience relevant to the task type enable these predictive judgments. These results suggest a model connecting online cognitive search and information judgment literatures. Moreover, this implies a need to understand cognitive search through longitudinal or learning-based views for repeated search tasks, and adaptations to medical practitioner training and tools for online search.
Granularity is a novel concept for presenting information in search result interfaces of hierarchical query-driven information retrieval systems in a manner that can support understanding and exploration of the context of the retrieved information (e.g., by highlighting its position in the granular hierarchy and exposing its relationship with relatives in the hierarchy). Little research, however, has been conducted on the effects of granularity of search results on the relevance judgment behavior of engineers. Engineers are highly motivated information users who are particularly interested in understanding the context of the retrieved information. Therefore, it is hypothesized that the design of systems with careful regard for granularity would improve engineers' relevance judgment behavior. To test this hypothesis, a prototype system was developed and evaluated in terms of the time needed for users to find relevant information, the accuracy of their relevance judgment, and their subjective satisfaction. To evaluate the prototype, a user study was conducted where participants were asked to complete tasks, complete a satisfaction questionnaire, and be interviewed. The findings showed that participants performed better and were more satisfied when the prototype system presented only relevant information in context. Although this study presents some novel findings about the effects of granularity and context on user relevance judgment behavior, the results should be interpreted with caution. For example, participants in this research were recruited by convenience and performed a set of simulated tasks as opposed to real ones. However, suggestions for further research are presented.
In this article a set of requirements for the design of a personal document management system is presented, based on the results of three research studies (Bondarenko, 2006; Bondarenko & Janssen, 2005; Bondarenko & Janssen, 2009). We propose a framework, based on layers of task decomposition, that helps to understand the needs of information workers with regard to personal document and task management. Relevant user processes are described and requirements for a document-management system are derived for each layer. The derived requirements are compared to related studies, and implications for system design are discussed.
Knowledge transfer among employees is a critical enabler of organizational learning. In this article, the direct and moderating effects of the multilevel (i.e., dyadic and individual levels) antecedents of knowledge transfer are examined based on social network and knowledge management research. By analyzing the survey responses from eight R&D groups of five firms using hierarchical linear modeling, we find that structural equivalence significantly influences interpersonal knowledge transfer at the dyadic level, even when strength of ties is controlled. At the individual level, the knowledge recipient's motivational factors such as group identification and the perceived expertise of colleagues show significant effects on knowledge transfer. Finally, the effect of strength of ties at the dyadic level is more influential when the recipient's group identification is low.
This paper studies the role and effects of user-producer interaction (UPI) in the production of Web sites. The results of two empirical studies, one from a producer perspective and one from a user perspective, are presented. It is concluded that using UPI in the development of informative Web sites for a large public is a risky process, because it is hard to identify and define the most targeted users and their particular needs. The implications of these results are discussed.
Data integration and mediation have become central concerns of information technology over the past few decades. With the advent of the Web and the rapid increases in the amount of data and the number of Web documents and users, researchers have focused on enhancing the interoperability of data through the development of metadata schemes. Other researchers have looked to the wealth of metadata generated by book-marking sites on the Social Web. While several existing ontologies have capitalized on the semantics of metadata created by tagging activities, the UpperTag Ontology (UTO) emphasizes the structure of tagging activities to facilitate modeling of tagging data and the integration of data from different bookmarking sites as well as the alignment of tagging ontologies. UTO is described and its utility in modeling, harvesting, integrating, searching, and analyzing data is demonstrated with metadata harvested from three major social tagging systems (Delicious, Flickr, and YouTube).
This article presents an unsupervised algorithm for semantic annotation of morphological descriptions of whole organisms. The algorithm is able to annotate plain text descriptions with high accuracy at the clause level by exploiting the corpus itself. In other words, the algorithm does not need lexicons, syntactic parsers, training examples, or annotation templates. The evaluation on two real-life description collections in botany and paleontology shows that the algorithm has the following desirable features: (a) reduces/eliminates manual labor required to compile dictionaries and prepare source documents, (b) improves annotation coverage: the algorithm annotates what appears in documents and is not limited by predefined and often incomplete templates; (c) learns clean and reusable concepts: the algorithm learns organ names and character states that can be used to construct reusable domain lexicons, as opposed to collection-dependent patterns whose applicability is often limited to a particular collection; (d) insensitive to collection size; and (e) runs in linear time with respect to the number of clauses to be annotated.
New Web 2.0 technologies such as wikis permit any organizational member of a virtual community of practice (CoP) to dynamically edit, integrate, and rewrite content (what we call knowledge shaping) as well as contribute personal knowledge. Previous research on factors that motivate contribution in virtual Cops has focused exclusively on factors explaining why people contribute their personal knowledge, with no research focused on why people make the knowledge-shaping contributions (rewriting, integrating, and restructuring pages) which are possible with wikis. We hypothesize that factors that explain frequency of contribution will be different for those who shape from those who contribute only their personal knowledge. The results support our hypotheses. In addition, we find that shapers are not more likely to be managers or members of a community's core group who might typically serve in an administrator role, contrary to prior expectations. The implications of using Web 2.0 tools to encourage this shaping behavior are discussed.
In recent years we have witnessed a significant growth of social-computing communities-online services in which users share information in various forms. As content contributions from participants are critical to the viability of these communities, it is important to understand what drives users to participate and share information with others in such settings. We extend previous literature on user contribution by studying the factors that are associated with various forms of participation in a large online photo-sharing community. Using survey and system data, we examine four different forms of participation and consider the differences between these forms. We build on theories of motivation to examine the relationship between users' participation and their motivations with respect to their tenure in the community. Amongst our findings, we identify individual motivations (both extrinsic and intrinsic) that underpin user participation, and their effects on different forms of information sharing; we show that tenure in the community does affect participation, but that this effect depends on the type of participation activity. Finally, we demonstrate that tenure in the community has a weak moderating effect on a number of motivations with regard to their effect on participation. Directions for future research, as well as implications for theory and practice, are discussed.
In the process of scientific research, many information objects are generated, all of which may remain valuable indefinitely. However, artifacts such as instrument data and associated calibration information may have little value in isolation; their meaning is derived from their relationships to each other. Individual artifacts are best represented as components of a life cycle that is specific to a scientific research domain or project. Current cataloging practices do not describe objects at a sufficient level of granularity nor do they offer the globally persistent identifiers necessary to discover and manage scholarly products with World Wide Web standards. The Open Archives Initiative's Object Reuse and Exchange data model (OAI-ORE) meets these requirements. We demonstrate a conceptual implementation of OAI-ORE to represent the scientific life cycles of embedded networked sensor applications in seismology and environmental sciences. By establishing relationships between publications, data, and contextual research information, we illustrate how to obtain a richer and more realistic view of scientific practices. That view can facilitate new forms of scientific research and learning. Our analysis is framed by studies of scientific practices in a large, multidisciplinary, multi-university science and engineering research center, the Center for Embedded Networked Sensing.
Root extraction is one of the most important topics in information retrieval (IR), natural language processing (NLP), text summarization, and many other important fields. In the last two decades, several algorithms have been proposed to extract Arabic roots. Most of these algorithms dealt with triliteral roots only, and some with fixed length words only. In this study, a novel approach to the extraction of roots from Arabic words using bigrams is proposed. Two similarity measures are used, the dissimilarity measure called the "Manhattan distance," and Dice's measure of similarity. The proposed algorithm is tested on the Holy Qu'ran and on a corpus of 242 abstracts from the Proceedings of the Saudi Arabian National Computer Conferences. The two files used contain a wide range of data: the Holy Qu'ran contains most of the ancient Arabic words while the other file contains some modern Arabic words and some words borrowed from foreign languages in addition to the original Arabic words. The results of this study showed that combining N-grams with the Dice measure gives better results than using the Manhattan distance measure.
The method of latent semantic indexing (LSI) is well-known for tackling the synonymy and polysemy problems in information retrieval; however, its performance can be very different for various datasets, and the questions of what characteristics of a dataset and why these characteristics contribute to this difference have not been fully understood. In this article, we propose that the mathematical structure of simplexes can be attached to a term-document matrix in the vector space model (VSM) for information retrieval. The Q-analysis devised by R.H. Atkin (1974) may then be applied to effect an analysis of the topological structure of the simplexes and their corresponding dataset. Experimental results of this analysis reveal that there is a correlation between the effectiveness of LSI and the topological structure of the dataset. By using the information obtained from the topological analysis, we develop a new method to explore the semantic information in a dataset. Experimental results show that our method can enhance the performance of VSM for datasets over which LSI is not effective.
Based on the principles of the h-index, I propose a new measure, the w-index, as a particularly simple and more useful way to assess the substantial impact of a researcher's work, especially regarding excellent papers. The w-index can be defined as follows: If w of a researcher's papers have at least 10w citations each and the other papers have fewer than 10(w + 1) citations, that researcher's w-index is w. The results demonstrate that there are noticeable differences between the w-index and the h-index, because the w-index plays close attention to the more widely cited papers. These discrepancies can be measured by comparing the ranks of 20 astrophysicists, a few famous physical scientists, and 16 Price medalists. Furthermore, I put forward the w(q)-index to improve the discriminatory power of the w-index and to rank scientists with the same w. The factor q is the least number of citations a researcher with w needed to reach w + 1. In terms of both simplicity and accuracy, the w-index or w(q)-index can be widely used for evaluation of scientists, journals, conferences, scientific topics, research institutions, and so on.
This research applies statistical time series analysis to examine the changing pattern of scholars' attitudes toward open-access (OA) journal publishing from the early 1990s. By synthesizing survey results in existing studies, this research focuses on representative aspects of the attitudes and behaviors recorded through the years. It finds that although an increase in the publishing and awareness rates of scholars with regard to OA journals has been observed, scholars have been consistently concerned with the low prestige of such journals and their lack of peer review, which is not the case in practice. It is hoped that the findings will provide useful information for the improvement of OA advocacy.
Of h-type indices available now, the g-index is an important one in that it not only keeps some advantages of the h-index but also counts citations from highly cited articles. However, the g-index has a drawback that one has to add fictitious articles with zero citation to calculate this index in some important cases. Based on an alternative definition without introducing fictitious articles, an analytical method has been proposed to calculate the g-index based approximately on the h-index and the e-index. If citations for a scientist are ranked by a power law, it is shown that the g-index can be calculated accurately by the h-index, the e-index, and the power parameter. The relationship of the h-, g-, and e-indices presented here shows that the g-index contains the citation information from the h-index, the e-index, and some papers beyond the h-core.
For determining the eminence of scientific journals, a new indicator stressing the importance of papers in the "elite set" (i.e., highly cited papers) is suggested. The number of papers in the elite set (P (pi v)) is calculated with the equation: (10 log P) - 10, where P is the total number of papers in the set. The one-hundredth of citations (C) obtained by P (pi v) papers is regarded as the pi(v)-index which is field and time dependent. The pi(v)-index is closely correlated with the citedness (C/P) of P (pi v) papers, and it is also correlated with the Hirsch-index. Three types of Hirsch-sets are distinguished, depending on the relation of the number of citations received by the Hirsch-paper (ranked as h) and the paper next in rank (h + 1) by citation. The h-index of an Anomalous Hirsch-set (AH) may be increased by a single citation to a paper outside the Hirsch-core. (A set of papers may be regarded as AH, where the number of citations to the Hirsch-paper is higher than the h-index and the next paper in rank shows as many citations as the value of the h-index.).
The Ranking Web of World Repositories ( http://repositories.webometrics.info) is introduced. The objective is to promote Open access initiatives (OAI) supporting the use of repositories for scientific evaluation purposes. A set of metrics based on web presence, impact and usage is discussed. The Ranking is built on indicators obtained from web search engines following a model close to the Impact Factor one. The activity accounts for a 50% of the index, including number of pages, pdf files and items in Google Scholar database, while the visibility takes into account the external inlinks received by the repository (the other 50%). The Ranking provides the Top 300 repositories from a total of 592 worldwide, with a strong presence of US, German and British institutional repositories and the leadership of the large subject repositories. Results suggest the need to take into consideration other file formats and the usage information, an option is not feasible today.
This study was survey on citation research of Open Access (OA) journals in English papers of Iranian universities journals during year 2007. The main purposes of this paper were: to examine the state of English papers in Iranian journals in Thomson Scientific Master Journal List (TSMJL); and to analyze their visibility through citations to OA journals in DOAJ database. In fact, the researcher has used of citation analysis technique of bibliometric and large-scale sociometric analyses on about 16,219 citations. The method followed in the first part of this study is obtaining data from e-journal articles which indexed in TSMJL, conducting descriptive analyses, and reporting the findings in tables and figures. In the second part of the study, DOAJ database is used to behaviour cited reference searches and other citation analyses. It found that there are 960 Iranian print-based journals and only 37 Iranian Journals was indexed in TSMJL. Sixteen English Journals in TSMJL of eight Iranian universities. Throughout sixteen journals only one journal didn't publish during 2007 and there were 704 articles all over the fifteen journals. Using large-scale sociometric analyses on about 16,219 citations all over 15 journals, it is notable that number of journals without citation to DOAJ was 3,101 (99.7%) and the number of journals with citation to DOAJ was 9 (0.3%). It shows that there was huge difference between the journals which had citing to DOAJ and without citing to DOAJ.
Google Scholar and Scopus are recent rivals to Web of Science. In this paper we examined these three citation databases through the citations of the book "Introduction to informetrics" by Leo Egghe and Ronald Rousseau. Scopus citations are comparable to Web of Science citations when limiting the citation period to 1996 and onwards (the citation coverage of Scopus)-each database covered about 90% of the citations located by the other. Google Scholar missed about 30% of the citations covered by Scopus and Web of Science (90 citations), but another 108 citations located by Google Scholar were not covered either by Scopus or by Web of Science. Google Scholar performed considerably better than reported in previous studies, however Google Scholar is not very "user-friendly" as a bibliometric data collection tool at this point in time. Such "microscopic" analysis of the citing documents retrieved by each of the citation databases allows us a deeper understanding of the similarities and the differences between the databases.
In this paper, we examine the question whether it is meaningful to talk about the scientific productivity of nations based on indexes like the Science Citation Index or Scopus, when the journal set covered by them keeps changing with time. We hypothesize from the illustrative case of India's declining productivity in the 1980s which correlated with a fall in its journals indexed, that an apparent increase/decrease in productivity for any country, based on observed change in its share of papers could, in fact, be an effect resulting from the inclusion of more/less journals from the country. To verify our hypothesis we have used SCIMAGO data. We found that for a set of 90 countries, the share of journals regressed on the share of papers gave a linear relationship that explained 80% of the variance. However, we also show that in the case of China's unusual rise in world scientific productivity (to second rank crossing several other countries), there is yet another factor that needs to be taken into account. We define a new indicator-the JOURNAL PACKING DENSITY (JPD) or average number of papers in journals from a given country. We show that the packing density of Chinese journals has steadily increased over the last few years. Currently, Chinese journals have the highest 'packing density' in the world, almost twice the world average which is about 100 papers per journal per annum. The deviation of the JPD from the world average is another indicator which will affect so called 'national productivities' in addition to the number of national journals indexed. We conclude that in the context of a five fold increase in the number of journals indexed over 20 years, the simplistic notion of 'scientific productivity' as equivalent to papers indexed needs to be re-examined.
This paper focuses on the study of self-citations at the meso and micro (individual) levels, on the basis of an analysis of the production (1994-2004) of individual researchers working at the Spanish CSIC in the areas of Biology and Biomedicine and Material Sciences. Two different types of self-citations are described: author self-citations (citations received from the author him/herself) and co-author self-citations (citations received from the researchers' co-authors but without his/her participation). Self-citations do not play a decisive role in the high citation scores of documents either at the individual or at the meso level, which are mainly due to external citations. At micro-level, the percentage of self-citations does not change by professional rank or age, but differences in the relative weight of author and co-author self-citations have been found. The percentage of co-author self-citations tends to decrease with age and professional rank while the percentage of author self-citations shows the opposite trend. Suppressing author self-citations from citation counts to prevent overblown self-citation practices may result in a higher reduction of citation numbers of old scientists and, particularly, of those in the highest categories. Author and co-author self-citations provide valuable information on the scientific communication process, but external citations are the most relevant for evaluative purposes. As a final recommendation, studies considering self-citations at the individual level should make clear whether author or total self-citations are used as these can affect researchers differently.
We compare a new method for measuring research leadership with the traditional method. Both methods are objective and reliable, utilize standard citation databases, and are easily replicated. The traditional method uses partitions of science based on journal categories, and has been extensively used to measure national leadership patterns in science, including those appearing in the NSF Science & Engineering Indicators Reports and in prominent journals such as Science and Nature. Our new method is based on co-citation techniques at the paper level. It was developed with the specific intent of measuring research leadership at a university, and was then extended to examine national patterns of research leadership. A comparison of these two methods provides compelling evidence that the traditional method grossly underestimates research leadership in most countries. The new method more accurately portrays the actual patterns of research leadership at the national level.
Relationships between the journal download immediacy index (DII) and some citation indicators are studied. The Chinese full-text database CNKI is used for data collection. Results suggest that the DII can be considered as an independent indicator, but that it also has predictive value for other indicators, such as a journal's h-index. In case a journal cannot yet have an impact factor-because its citation history within the database is too short-the DII can be used for a preliminary evaluation. The article provides results related to the CNKI database as a whole and additionally, some detailed information about agricultural and forestry journals.
It is the objective of this article to examine in which aspects journal usage data differ from citation data. This comparison is conducted both at journal level and on a paper by paper basis. At journal level, we define a so-called usage impact factor and a usage half-life in analogy to the corresponding Thomson's citation indicators. The usage data were provided from Science Direct, subject category "oncology". Citation indicators were obtained from JCR, article citations were retrieved from SCI and Scopus. Our study shows that downloads and citations have different obsolescence patterns. While the average cited half-life was 5.6 years, we computed a mean usage half-life of 1.7 years for the year 2006. We identified a strong correlation between the citation frequencies and the number of downloads for our journal sample. The relationship was lower when performing the analysis on a paper by paper basis because of existing variances in the citation-download-ratio among articles. Also the correlation between the usage impact factor and Thomson's journal impact factor was "only" moderate because of different obsolescence patterns between downloads and citations.
A term map is a map that visualizes the structure of a scientific field by showing the relations between important terms in the field. The terms shown in a term map are usually selected manually with the help of domain experts. Manual term selection has the disadvantages of being subjective and labor-intensive. To overcome these disadvantages, we propose a methodology for automatic term identification and we use this methodology to select the terms to be included in a term map. To evaluate the proposed methodology, we use it to construct a term map of the field of operations research. The quality of the map is assessed by a number of operations research experts. It turns out that in general the proposed methodology performs quite well.
It has been about 30 years since China adopted an open-up and reform policy for global competition and collaboration. This opening-up policy is accompanied by a spectacular growth of the country's economy as well as visibility in the world's scientific literature. Also China' competitiveness in scientific research has grown, and is mirroring the development of the country's economy. On the other hand, international collaboration of most countries dramatically increased during the last two decades and accompanied the growth of science in emerging economies. Thus the question arises of whether growth of competitiveness in research is accompanied by an intensification of collaboration in China as well. In the present study we analyse the dynamics and the national characteristics of China's co-operation in a global context. We also study research profile and citation impact of international collaboration with respect to the corresponding domestic 'standards'.
This article introduces the Impact Factor squared or IF2-index, an h-like indicator of research performance. This indicator reflects the degree to which large entities such as countries and/or their states participate in top-level research in a field or subfield. The IF2-index uses the Journal Impact Factor (JIF) of research publications instead of the number of citations. This concept is applied to other h-type indexes and their results compared to the IF2-index. These JIF-based indexes are then used to assess the overall performance of cancer research in Australia and its states over 8 years from 1999 to 2006. The IF2-index has three advantages when evaluating larger research units: firstly, it provides a stable value that does not change over time, reflecting the degree to which a research unit participated in top-level research in a given year; secondly, it can be calculated closely approximating the publication date of yearly datasets; and finally, it provides an additional dimension when a full article-based citation analysis is not feasible. As the index reflects the degree of participation in top-level research it may favor larger units when units of different sizes are compared.
In this study we investigate the scientific output of Yugoslavia and its successor republics viz. Serbia, Croatia, Slovenia, Bosnia & Herzegovina, Macedonia and Montenegro. Additionally, Kosovo was included as a separate entity, since it recently declared its independence. The publications and cooperation between the republics are analyzed for the years from 1970 until 2007. In contrast to similar studies, we examine a larger time window and take into consideration not only the three big republics (Serbia, Croatia, and Slovenia) but also include the smaller ones, namely Bosnia & Herzegovina, Macedonia and Montenegro. For our analysis we introduce two new indicators: the normalized cooperation score (R(i)((cs)))and the dominance factor (D(i)((c))), a measure of dominance within a weighted network. Furthermore, we develop and assess the reliability of various techniques for visualizing our findings. We found that the civil wars had a severe impact on the inner-Yugoslav cooperation network. Additionally it seems, as if with the ending of the conflicts a process of recovery started.
In advanced methods of delineation and mapping of scientific fields, hybrid methods open a promising path to the capitalisation of advantages of approaches based on words and citations. One way to validate the hybrid approaches is to work in cooperation with experts of the fields under scrutiny. We report here an experiment in the field of genomics, where a corpus of documents has been built by a hybrid citation-lexical method, and then clustered into research themes. Experts of the field were associated in the various stages of the process: lexical queries for building the initial set of documents, the seed; citation-based extension aiming at reducing silence; final clustering to identify noise and allow discussion on border areas. The analysis of experts' advices show a high level of validation of the process, which combines a high-precision and low-recall seed, obtained by journal and lexical queries, and a citation-based extension enhancing the recall. This findings on the genomics field suggest that hybrid methods can efficiently retrieve a corpus of relevant literature, even in complex and emerging fields.
Following up the European project PromTech the aim of which was to detect emerging technologies by studying the scientific literature, we chose one field, Molecular Biology, to identify and characterize emerging topics within that domain. We combined two analytical approaches: the first one introduces a model of the terminological evolution of the field based on bibliometric indicators and the second one operates a diachronic clustering analysis. Our objective is to bring answers to questions such as: Which technological aspects can be detected? Which of them are already established and which of them are new? How are the topics linked to each other?.
Co-authored publications across sectors have been used as indicators of the triple helix model and more generally for the study of science-technology relations. However, how to measure the relationships among the three or more sectors is a technically difficult issue. Using mutual information as an indicator has proved to be effective, but it is not widely used. In this paper, we introduced phi coefficients and partial correlation as conventional indicators to measure the relationships among sectors on the basis of Japanese publication data in the ISI-databases. We also proposed a new approach of graphical modeling based on partial correlation for studying university-industry-government relationships and relationships with other sectors. The conventional indicators give results that are consistent with mutual information, which shows that collaborations among the three national sectors (U, I, G) are getting weaker and that members of these sectors tend to collaborate much more with foreign researchers. It is also shown that universities used to play the central role in the national publication system and acted as a bridge between national sectors and foreign researchers. However, since 2000, the situation has been changing. The center of the Japanese research network is becoming more "foreign" oriented.
The objective of this study is to use a clustering algorithm based on journal cross-citation to validate and to improve the journal-based subject classification schemes. The cognitive structure based on the clustering is visualized by the journal cross-citation network and three kinds of representative journals in each cluster among the communication network have been detected and analyzed. As an existing reference system the 15-field subject classification by Glanzel and Schubert (Scientometrics 56:55-73, 2003) has been compared with the clustering structure.
Although there are many studies for quantifying the academic performance of researchers, such as measuring the scientific performance based on the number of publications, there are no studies about quantifying the collaboration activities of researchers. This study addresses this shortcoming. Based on three measures, namely the collaboration network structure of researchers, the number of collaborations with other researchers, and the productivity index of co-authors, two new indices, the RC-Index and CC-Index, are proposed for quantifying the collaboration activities of researchers and scientific communities. After applying these indices on a data set generated from publication lists of five schools of information systems, this study concludes with a discussion of the shortcomings and advantages of these indices.
Research fronts represent the most dynamic areas of science and technology and the areas that attract the most scientific interest. We construct a methodology to identify these fronts, and we use quantitative and qualitative methodology to analyze and describe them. Our methodology is able to identify these fronts as they form-with potential use by firms, venture capitalists, researchers, and governments looking to identify emerging high-impact technologies. We also examine how science and technology absorbs the knowledge developed in these fronts and find that fronts which maximize impact have very different characteristics than fronts which maximize growth, with consequences for the way science develops over time.
Witnessing a substantial growth rate in its scientific production, Iran is considered as one of the recently rising stars in scientific contribution scene. However, its impact in science progress is widely unknown, especially at global level. Studying Iran's scholarly publications and recognition in SCI, the present communication tries to clarify the country's science system performance using regression analyses and then to compare its performance to that of the world, using Relative Citation Rate (RCR) and Relative Subfield Citedness (RW). The results of the regression analyses reveal that although Iran displays considerable weaknesses in its performance, it is increasingly recognized as its outputs grow. According to the RCR values, Iran performed at/above the global level in 21 subfields. However, the RW values show that the country's performance is above the global level in only two subfields. Although Iran is very far from an ideal situation; these evidences can be considered as heralds of a successful movement towards a wealthy scientific future.
Scientific and other non-patent references (NPRs) in patents are important tools to analyze interactions between science and technology. This paper organizes a database with 514,894 USPTO patents granted globally in 1974, 1982, 1990, 1998 and 2006. There are 165,762 patents with at least one reference to science and engineering (S&E) literature, from a total of 1,375,503 references. Through a lexical analysis, 71.1% of this S&E literature is classified by S&E fields. These data serve as the basis for the elaboration of global and national 3-dimensional matrices (technological domains, S&E fields and number of references). Three indicators are proposed to analyze these matrices, allowing us to identify patterns of structured growth that differentiate developed and non-developed countries. This differentiation informs suggestions for public policies for development, emphasizing the need for an articulation between the industrial and technological dimension and scientific side. The intertwinement of these two dimensions is a key component of developmental policies for the twenty-first century.
This study examines network topologies of interdisciplinary research relationships in science and technology (S&T) and investigates the relational linkages between the interdisciplinary relations and the quality of research performance. A network analysis was performed to evaluate the General Research Grant (GRG) program, an interdisciplinary research funding program of the Korea Science and Engineering Foundation (KOSEF); the dataset covered the 2002-2004 period. The analytical results reveal the hidden network structure of interdisciplinary research relationships and demonstrate that the quality of research performance might be enhanced not only by interdependent pressures placed on various research fields but also by accumulated research capabilities that are relatively difficult to access and reproduce by other research fields.
This paper analyses the scientific output and impact of 731 Ph.D. holders who were awarded their doctorate at Spanish universities between 1990 and 2002. The aim was to identify any differences in the amount of scientific output and the impact of publications, in terms of citations, according to gender. The analysis revealed no significant differences in the amount of scientific output between males and females. However, the proportion of female Ph.D. holders with no postdoctoral output was significantly higher than that of their male counterparts, and the median number of papers published after Ph.D. completion was also lower among women. As regards pre- and postdoctoral research, the data showed that early scientific output may be a good predictor of subsequent productivity in both gender groups. The results also indicated that articles by female Ph.D. holders were cited significantly more often, even when self-citations were excluded.
Our study evaluates results and impacts of the Framework Programs (FP) 5 and 6 in the Czech Republic. Publications resulting from the FP projects had 42% higher mean citation rate and 77% more EU-25 collaborations than the Czech standards. Teams participating in the FP are better-than-average, because citation rate of all their papers is 21% higher than the Czech standards. The most striking finding is the marked influence of FP on research direction. After the project start, the participating teams published papers in ten new fields in which they did not publish before the project. In 45 other fields, more than 200% increase of papers was observed.
This paper aims to identify the collaboration pattern and network structure of the coauthorship network of library and information science (LIS) in China. Using data from 18 core source LIS journals in China covering 6 years, we construct the LIS coauthorship network. We analyze the network from both macro and micro perspectives and identify some key features of this network: this network is a small-world network, and follows the scale-free character. In the micro-level, we calculate each author's centrality values and compare them with citation counts. We find that centrality rankings are highly correlated with citation rankings. We also discuss the limitation of current centrality measures for coauthorship network analysis.
Assessing the quality of scientific conferences is an important and useful service that can be provided by digital libraries and similar systems. This is specially true for fields such as Computer Science and Electric Engineering, where conference publications are crucial. However, the majority of the existing quality metrics, particularly those relying on bibliographic citations, has been proposed for measuring the quality of journals. In this article we conduct a study about the relative performance of existing journal metrics in assessing the quality of scientific conferences. More importantly, departing from a deep analysis of the deficiencies of these metrics, we propose a new set of quality metrics especially designed to capture intrinsic and important aspects related to conferences, such as longevity, popularity, prestige, and periodicity. To demonstrate the effectiveness of the proposed metrics, we have conducted two sets of experiments that contrast their results against a "gold standard" produced by a large group of specialists. Our metrics obtained gains of more than 12% when compared to the most consistent journal quality metric and up to 58% when compared to standard metrics such as Thomson's Impact Factor.
The method of co-link was proposed in 1996 and since then it has been applied in many Webometric studies. Its definition refers to "page co-link analysis", as links are provided by URLs or pages. This paper presents a new methodological approach, a "site co-link analysis", to investigate relations in small networks. The Oswaldo Cruz Foundation institutes were used as a case study. The results indicate that the number of co-links provided by sites led to an increase of 133% in the sample analyzed. In a cluster analysis, three clusters were formed mainly for thematic reasons and four institutes remained isolated.
This article explores the concentration in the global plant molecular life science research output. In the past 15 years, especially the share of articles which refer to the model organism A. thaliana has increased rapidly. Citation analyses show an even greater rise in the importance of this organism. Attempts are discussed to come to a scientometric definition of model organisms. For this purpose a comparison is made with applied microbiology. However, few shared scientometric characteristics were found which could help characterise model organisms. A distinction between major economic organisms and model organisms will therefore continue to rely on qualitative data.
As scientific collaboration is a phenomenon that is becoming increasingly important, studies on scientific collaboration are numerous. Despite the proliferation of studies on various dimensions of collaboration, there is still a dearth of analyses on the effects, motives and modes of collaboration in the context of developing countries. Adopting Wallerstein's world-system theory, this paper makes use of bibliometric data in an attempt to understand the pattern of collaboration that emerges between South Africa and Germany. The key argument is that we can expect the collaborative relationship between South Africa and Germany to be one that is shaped by a centre-periphery pattern. The analyses show that a theory of scientific collaboration building on the notion of marginality and centre-periphery can explain many facets of South African-German collaboration, where South Africa is a semi-peripheral region, a centre for the periphery, and a periphery for the centre.
In this work we have studied the research activity for countries of Europe, Latin America and Africa for all sciences between 1945 and November 2008. All the data are captured from the Web of Science database during this period. The analysis of the experimental data shows that, within a nonextensive thermostatistical formalism, the Tsallis q-exponential distribution N(c) satisfactorily describes Institute of Scientific Information citations. The data which are examined in the present survey can be fitted successfully as a first approach by applying a single curve (namely, N(c) proportional to 1/[1 + (q - 1)c/T](1/q-1) with q similar or equal to 4/3 for all the available citations c, T being an "effective temperature''. The present analysis ultimately suggests that the phenomenon might essentially be one and the same along the entire range of the citation number. Finally, this manuscript provides a new ranking index, via the "effective temperature'' T, for the impact level of the research activity in these countries, taking into account the number of the publications and their citations.
In this paper the relationship between knowledge production and the structure of research networks in two scientific fields is assessed. We investigate whether knowledge production corresponds positively or negatively with different types of social network structure. We show that academic fields generate knowledge in different ways and that within the fields, different types of networks act as a stimulant for knowledge generation.
Given the current availability of different bibliometric indicators and of production and citation data sources, the following two questions immediately arise: do the indicators' scores differ when computed on different data sources? More importantly, do the indicator-based rankings significantly change when computed on different data sources? We provide a case study for computer science scholars and journals evaluated on Web of Science and Google Scholar databases. The study concludes that Google scholar computes significantly higher indicators' scores than Web of Science. Nevertheless, citation-based rankings of both scholars and journals do not significantly change when compiled on the two data sources, while rankings based on the h index show a moderate degree of variation.
According to the definition of reliability-based citation impact factor (R-impact factor) proposed by KUO & RUPE and the cumulative citation age distribution model, a mathematical expression of the relationship between R-impact factor and impact factor is established in this paper. By simulation of the change processes of the R-impact factor and impact factor in the manipulation process of the impact factor, it is found that the effect of manipulation can be partly corrected by the R-impact factor in some cases. Based on the Journal Citation Report database, impact factors of 4 normal journals and 4 manipulated journals were collected. The journals' R-impact factors and self-cited rates in the previous two years were calculated for each year during the period 2000 to 2007, and various characteristics influenced by the manipulation were analyzed. We find that the R-impact factor has greater fairness than the impact factor for journals with relatively short cited half-lives. Finally, some issues about using the R-impact factor as a measure for evaluating scientific journals are discussed.
Integrating data from three independent data sources--USPTO patenting data, Shanghai Jiao Tong University's Academic Ranking of World Universities (ARWU) and the Times Higher Education Supplement's World University Ranking (WUR), we examine the possible link between patenting output and the quantity and quality of scientific publications among 281 leading universities world-wide. We found that patenting by these universities, as measured by patents granted by the USPTO, has grown consistently faster than overall US patenting over 1977-2000, although it has grown more slowly over the last 5 years (2000-2005). Moreover, since the mid-1990s, patenting growth has been faster among universities outside North America than among those within North America. We also found that the patenting output of the universities over 2003-2005 is significantly correlated with the quantity and quality of their scientific publications. However, significant regional variations are found: for universities in North America, both the quantity and quality of scientific publications matter, but for European and Australian/NZ universities, only the quantity of publications matter, while for other universities outside North America and Europe/Australia/NZ, only quality of publications matter. We found similar findings when using EPO patenting data instead of USPTO data. Additionally, for USPTO data only, the degree of internationalization of faculty members is found to reduce patenting performance among North American universities, but to increase that of universities outside North America. Plausible explanations for these empirical observations and implications for future research are discussed.
Motivated by the merging of four Swedish counties to a larger administrative and political unit with increased responsibilities, a comprehensive study of regional-foreign research collaboration was carried out. Various multivariate methods were applied for the depiction of collaborative networks of various compositions and at various levels of aggregation. Other aspects investigated concerned the influence of institutions and countries on regional-foreign collaboration and the relation between collaboration and research fields. Findings showed that foreign research collaboration was concentrated to three major regional institutions, each with a characteristic collaborative context. The influence of domestic collaboration was notable with regard to medical research while collaboration within the field of physics and astronomy was characteristic for pure regional-foreign collaboration, which was the dominating type of research collaboration throughout the period of observation (1998-2006).
We present an application of a clustering technique to a large original dataset of SCI publications which is capable at disentangling the different research lines followed by a scientist, their duration over time and the intensity of effort devoted to each of them. Information is obtained by means of software-assisted content analysis, based on the co-occurrence of words in the full abstract and title of a set of SCI publications authored by 650 American star-physicists across 17 years. We estimated that scientists in our dataset over the time span contributed on average to 16 different research lines lasting on average 3.5 years and published nearly 5 publications in each single line of research. The technique is potentially useful for scholars studying science and the research community, as well as for research agencies, to evaluate if the scientist is new to the topic and for librarians, to collect timely biographic information.
In this paper we compute journal rankings in the Information and Library Science JCR category according to the JIF and according to several h-type indices. Even though the correlations between all the ranked lists are very high, there are considerable individual differences between the rankings as can be seen by visual inspection, showing that the correlation measure is not sensitive enough. Thus we also compute other measures, Spearman's footrule and the M-measure that are more sensitive to the differences between the rankings in the sense that the range of values is larger than the range of correlation values when comparing the JIF ranking to the rankings induced by the h-type indices. (C) 2009 Elsevier Ltd. All rights reserved.
The public sharing of primary research datasets potentially benefits the research community but is not yet common practice. In this pilot study, we analyzed whether data sharing frequency was associated with funder and publisher requirements, journal impact factor, or investigator experience and impact. Across 397 recent biomedical microarray studies, we found investigators were more likely to publicly share their raw dataset when their study was published in a high-impact journal and when the first or last authors had high levels of career experience and impact. We estimate the USA's National Institutes of Health (NIH) data sharing policy applied to 19% of the studies in our cohort; being subject to the NIH data sharing plan requirement was not found to correlate with increased data sharing behavior in multivariate logistic regression analysis. Studies published in journals that required a database submission accession number as a condition of publication were more likely to share their data, but this trend was not statistically significant. These early results will inform our ongoing larger analysis, and hopefully contribute to the development of more effective data sharing initiatives. (C) 2009 Elsevier Ltd. All rights reserved.
This paper reports two interrelated citation-based studies of the intellectual structure of Evolutionary Developmental Biology (Evo-Devo). The core journals of Evo-Devo (Evolution & Development, Development, Genes & Evolution, and Journal of Experimental Zoology, pt. B) and its supporting/parental disciplines are identified and their strong citation links mapped based on data from Journal Citation Reports, 2005-2007. Evo-Devo cites into Developmental Biology in all three years and exchanges citations with Paleontology in 2007. There are no strong connections with either general or molecular Evolution journals. Persistent, visible research themes are visualized as citing-cited networks and subnetworks of articles extracted from the Web of Science for the core Evo-Devo journals and a larger set of articles citing one or more Evo-Devo journals. Most research themes in the core set are specific to a single journal. Few highly cited core journal articles are also visible in the broader set of networks and subnetworks, although some themes (e. g., arthropod body plans, chordate genes/gene expression) are visible in both data sets. (C) 2009 Elsevier Ltd. All rights reserved.
Latent semantic analysis (LSA) is a relatively new research tool with a wide range of applications in different fields ranging from discourse analysis to cognitive science, from information retrieval to machine learning and so on. In this paper, we chart the development and diffusion of LSA as a research tool using social network analysis (SNA) approach that reveals the social structure of a discipline in terms of collaboration among scientists. Using Thomson Reuters' Web of Science (WoS), we identified 65 papers with "latent semantic analysis" in their titles and 250 papers in their topics (but not in titles) between 1990 and 2008. We then analyzed those papers using bibliometric and SNA techniques such as co-authorship and cluster analysis. It appears that as the emphasis moves from the research tool (LSA) itself to its applications in different fields, citations to papers with LSA in their titles tend to decrease. The productivity of authors fits Lotka's Law while the network of authors is quite loose. Networks of journals cited in papers with LSA in their titles and topics are well connected. (C) 2009 Elsevier Ltd. All rights reserved.
Hirsch-type indices are not only used for the evaluation of individual scientists but also for institutional evaluation. In particular, Prathap's suggestion, using the pair of h-indices (h(1)-h(2)) seems a promising approach. This paper discusses these h-indices, incorporating moreover a Molinari correction for size. We provide a theoretical framework and provide practical examples in the field of HIV infection and therapy. It is shown that the National Cancer Institute (NCI, USA) and Harvard University are the leading institutes in this field (in the world). If, however, size is controlled according to the Molinari approach, the National Institute of Allergy and Infectious Diseases (NIAID, USA) becomes the leader. In addition, we provide a new structural index: the ratio h(2)/h(1). It is suggested that this ratio indicator is related to the stability of the research performed at an institute or university. The term stability is used here in the sense of not depending on a small group of scientists that could easily move to another university or institute. (C) 2009 Elsevier Ltd. All rights reserved.
The study focuses on the analysis of the information flow among the ISI subject categories and aims at finding an appropriate field structure of the Web of Science using the subject clustering algorithm developed in previous studies. The clustering journals and ISI subject categories provide two subject classification schemes through different perspectives and levels. The two clustering results have been compared and their accordance and divergence have been analyzed. Several indicators have been used to compare the communication characteristics among different ISI subject categories. The neighbour map of each category clearly reflects the affinities between the "core" category and its satellites around. (C) 2009 Elsevier Ltd. All rights reserved.
Field delimitation for citation analysis, the process of collecting a set of bibliographic records with cited-reference information of research articles that represent a research field, is the first step in any citation analysis study of a research field. Due to a number of limitations, the commercial citation indexes have long made it difficult to obtain a comprehensive dataset in this step. This paper discusses some of the limitations imposed by these databases, and reports on a method to overcome some of these limitations that was used with great success to delimit an emerging and highly interdisciplinary biomedical research field, stem cell research. The resulting field delimitation and the citation network it induces are both excellent. This multi-database method relies on using PubMed for the actual field delimitation, and on mapping between Scopus and PubMed records for obtaining comprehensive information about cited-references contained in the resulting literature. This method provides high-quality field delimitations for citation studies that can be used as benchmarks for studies of the impact of data collection biases on citation metrics, and may help improve confidence in results of scientometric studies for an increased impact of scientometrics on research policy. (C) 2010 Published by Elsevier Ltd.
The difference among journal reference characteristics in various fields causes a field-based difference in their citation counts. For the purpose of improving indicators used in cross-field evaluations it is necessary to continue explorations corresponding to the characteristics of journal references. Such an exploration would offer new clues for solving the problem of cross-field journal evaluation. During the past years studies of the rhythm of science have obtained some achievements: constructing various types of publication-citation matrices (in short: p-c matrices), creating a series of rhythm indicators, studying the fundamental mathematical properties of rhythm sequences and exploring some journals' rhythm sequences. Rhythm indicators can be applied to many studies, if the system is a source-item system with two time dimensions, ensuring the construction of a p-c-like matrix, then such a study is theoretically feasible. In this article we create a journal's publication-reference matrix (p-r matrix). Based on the p-r matrix the rR' indicator is defined, which is used to measure the so-called input rhythm of a journal. As two case studies, the input rhythms of the Journal of the American Society for Information Science and Technology and of the Journal of Documentation are presented and analyzed. (C) 2009 Elsevier Ltd. All rights reserved.
The importance of the acquisition and provision of information within knowledge work such as engineering is widely acknowledged. This article reports an extensive empirical study of such information behaviors in engineers, using a novel and effective work sampling method. Seventy-eight design engineers each carried a portable handheld computer (PDA) for 20 working days. Once every hour, they were prompted to enter data concerning the task they were currently performing, including the information behaviors in which they were engaging. The resultant data represent a comprehensive picture of engineers' information behaviors and the percentage of their working time for which each of these behaviors accounts (55.75% in total). Specific hypotheses concerning the time spent engaged in these behaviors were also tested. Accordingly, it was found that participants spent substantially more time receiving information they had not requested than information they had, and this pattern was also reflected when they provided others with information. Furthermore, although there was no difference found between the time participants spent searching for information from other people compared with nonhuman sources, in the former case they spent relatively less time locating the information source and information within that source, and relatively more time engaged in problem solving and decision making. The results are discussed in terms of their implications for theory and organizational practice.
Despite the development of huge healthcare Web sites and powerful search engines, many searchers end their searches prematurely with incomplete information. Recent studies suggest that users often retrieve incomplete information because of the complex scatter of relevant facts about a topic across Web pages. However, little is understood about regularities underlying such information scatter. To probe regularities within the scatter of facts across Web pages, this article presents the results of two analyses: (a) a cluster analysis of Web pages that reveals the existence of three page clusters that vary in information density and (b) a content analysis that suggests the role each of the above-mentioned page clusters play in providing comprehensive information. These results provide implications for the design of Web sites, search tools, and training to help users find comprehensive information about a topic and for a hypothesis describing the underlying mechanisms causing the scatter. We conclude by briefly discussing how the analysis of information scatter, at the granularity of facts, complements existing theories of information-seeking behavior.
With the current information environment characterized by the proliferation of digital resources, including collaboratively created and shared resources, Library of Congress Subject Headings (LCSH) is facing the challenges of effective and efficient subject-based organization and retrieval of digital resources. To explore the feasibility of utilizing LCSH in a digital environment, we might need to revisit its basic characteristics. The objectives of our study were to analyze LCSH in both syntactic and relational structures, to discover the structural characteristics of LCSH, and to identify problems and issues for the feasibility of LCSH as an effective subject access tool. This study reports and discusses issues raised by the syntactic and hierarchical structures of LCSH that present challenges to its use in a networked environment. Given the results of this study, we recommend a number of provisional future directions for the development of LCSH towards further becoming a viable system for digital and networked resources.
The purpose of this study is to examine whether the understandings of subject-indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject-indexing approaches, or conceptions, in conjunction with semantic sources were explored in the context of a typical scientific journal article dataset. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, two research questions were explored: To what extent are semantic sources effective? To what extent are indexing conceptions effective? The experiments were conducted using a Support Vector Machine implementation in WEKA (I.H. Witten & E. Frank, 2000). Using F-measure, the experiment results showed that cited works, source title, and title were as effective as the full text while a keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. The content-oriented and the document-oriented indexing approaches especially were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article dataset, the objective contents and authors' intentions were more desirable for automatic subject term assignment via TC than the possible users' needs. The findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.
Building on the major premises of transactive memory (TM) theory as well as the recent multilevel extension to the original theory, this study examined the influence of perceived social accessibility of expertise providers, technological accessibility, and awareness of expertise distribution on expertise retrieval. Using social network data collected from a large global sales team, the study found that all three variables had significant impact on expertise retrieval at both the dyadic and individual levels. Our study confirmed the conceptual and theoretical value of approaching TM from a multilevel network perspective.
This study evaluates how well the authors of Wikipedia history articles adhere to the site's policy of assuring verifiability through citations. It does so by examining the references and citations of a subset of country histories. The findings paint a dismal picture. Not only are many claims not verified through citations, those that are suffer from the choice of references used. Many of these are from only a few US government Websites or news media and few are to academic journal material. Given these results, one response would be to declare Wikipedia unsuitable for serious reference work. But another option emerges when we jettison technological determinism and look at Wikipedia as a product of a wider social context. Key to this context is a world in which information is bottled up as commodities requiring payment for access. Equally important is the problematic assumption that texts are undifferentiated bearers of knowledge. Those involved in instructional programs can draw attention to the social nature of texts to counter these assumptions and by so doing create an awareness for a new generation of Wikipedians and Wikipedia users of the need to evaluate texts (and hence citations) in light of the social context of their production and use.
Informal communications media pose new challenges for information-systems design, but the nature of informal interaction offers new opportunities as well. This paper describes Net Lens-E-mail, a system designed to support exploration of the content-actor network in large e-mail collections. Unique features of Net Lens-E-mail include close coupling of orientation, specification, restriction, and expansion, and introduction and incorporation of a novel capability for iterative projection between content and actor networks within the same collection. Scenarios are presented to illustrate the intended employment of Net Lens-E-mail, and design walkthroughs with two domain experts provide an initial basis for assessment of the suitability of the design by scholars and analysts.
Font-rendering technologies play a critical role in presenting clear and aesthetic fonts to enhance the experience of reading from computer screens. This article presents three studies investigating visual and psychological correlates of people's preferences toward different onscreen text enhancements such as ClearType developed by Microsoft. Findings suggested that (a) people's acuity and hue sensitivity were two major factors that affect their preferences to ClearType's color filtering of subpixels on fonts, and (b) specific personality traits such as disagreeableness also could correlate with people's impressions of different onscreen text enhancements that were used. These empirical data would inform digital typographers and human computer interaction scientists who aim to develop better systems of onscreen reading.
Understanding how groups end and how group members depart helps us understand how these ending and departure processes affect group outcomes, individuals' willingness and ability to work in subsequent groups together or with others, and the maintenance of group-generated knowledge over time. In this article, a distributed, grant-funded research project group provides the setting for an analysis of the process that members went through as they disengaged from and dismantled their group when the funding period ended and the project was winding down. Qualitative interviews with group members were analyzed using a model of disengaging that was developed in an earlier study. The model comprises 12 interwoven parts of a disengaging process that begins well before the group ends and extends beyond the official termination of the group. The model frames the analysis and is revised as a result of the research findings.
The objective of this article is to introduce a socially oriented approach to the analysis and representation of the social and intellectual structure of scientific fields. A sociological perspective is introduced as the theoretical basis to analyze scientific fields, and the social network framework is adopted to develop a multidisciplinary approach to analyze the organization of science. This approach is then applied to study the Spanish Library and Information Science community from 1999 to 2007. The underlying notion is that science is organized work, in which the pursuit of impact shapes the specific scientific organization. This generates mutual dependence and control among researchers, which may restrict access when formally communicating with other scientific communities. On the other hand, scholarly journals facilitate the coordination of new knowledge and serve as platforms for interaction among scientists. Consequently, the interaction of well-defined groups of homogenous researchers, concentrated around particular sets of journals, leads to the formation of cohesive (sub)groups tied together by the degree of similarity of the researchers' competence. An empirical test suggests that this consideration can accurately reveal a segment of the structure of the scientific field. This study therefore introduces a new approach for mapping the structure of scientific fields that differs from most existing methods, which are based on (co)citation.
The possibilities of using the Arts & Humanities Citation Index (A&HCI) for journal mapping have not been sufficiently recognized because of the absence of a Journal Citations Report (JCR) for this database. A quasi-JCR for the A&HCI (2008) was constructed from the data contained in the Web of Science and is used for the evaluation of two journals as examples: Leonardo and Art Journal. The maps on the basis of the aggregated journal journal citations within this domain can be compared with maps including references to journals in the Science Citation Index and Social Science Citation Index. Art journals are cited by (social) science journals more than by other art journals, but these journals draw upon one another in terms of their own references. This cultural impact in terms of being cited is not found when documents with a topic such as "digital humanities" are analyzed. This community of practice functions more as an intellectual organizer than a journal.
Topic detection and tracking (TDT) applications aim to organize the temporally ordered stories of a news stream according to the events. Two major problems in TDT are new event detection (NED) and topic tracking (TT). These problems focus on finding the first stories of new events and identifying all subsequent stories on a certain topic defined by a small number of sample stories. In this work, we introduce the first large-scale TDT test collection for Turkish, and investigate the NED and TT problems in this language. We present our test-collection-construction approach, which is inspired by the TDT research initiative. We show that in TDT for Turkish with some similarity measures, a simple word truncation stemming method can compete with a lemmatizer-based stemming approach. Our findings show that contrary to our earlier observations on Turkish information retrieval, in NED word stopping has an impact on effectiveness. We demonstrate that the confidence scores of two different similarity measures can be combined in a straightforward manner for higher effectiveness. The influence of several similarity measures on effectiveness also is investigated. We show that it is possible to deploy IT applications in Turkish that can be used in operational settings.
The United States Supreme Court case of 1991, Feist Publications, Inc. v. Rural Tel. Service Co., continues to be highly significant for property in data and databases, but remains poorly understood. The approach taken in this article contrasts with previous studies. It focuses upon the "not original" rather than the original. The delineation of the absence of a modicum of creativity in selection, coordination, and arrangement of data as a component of the not original forms a pivotal point in the Supreme Court decision. The author also aims at elucidation rather than critique, using close textual exegesis of the Supreme Court decision. The results of the exegesis are translated into a more formal logical form to enhance clarity and rigor. The insufficiently creative is initially characterized as "so mechanical or routine." Mechanical and routine are understood in their ordinary discourse senses, as a conjunction or as connected by AND, and as the central clause. Subsequent clauses amplify the senses of mechanical and routine without disturbing their conjunction. The delineation of the absence of a modicum of creativity can be correlated with classic conceptions of computability. The insufficiently creative can then be understood as a routine selection, coordination, or arrangement produced by an automatic mechanical procedure or algorithm. An understanding of a modicum of creativity and of copyright law is also indicated. The value of the exegesis and interpretation is identified as its final simplicity, clarity, comprehensiveness, and potential practical utility.
The invention of automatic indexing using a keyword-in-context approach has generally been attributed solely to Hans Peter Luhn of IBM. This article shows that credit for this invention belongs equally to Luhn and Herbert Oh !man of the System Development Corporation. It also traces the origins of title derivative automatic indexing, its development and implementation, and current status.
In this study an attempt is made to establish new bibliometric indicators for the assessment of research in the Humanities. Data from a Dutch Faculty of Humanities was used to provide the investigation a sound empirical basis. For several reasons (particularly related to coverage) the standard citation indicators, developed for the sciences, are unsatisfactory. Target expanded citation analysis and the use of oeuvre (lifetime) citation data, as well as the addition of library holdings and productivity indicators enable a more representative and fair assessment. Given the skew distribution of population data, individual rankings can best be determined based on log transformed data. For group rankings this is less urgent because of the central limit theorem. Lifetime citation data is corrected for professional age by means of exponential regression.
University patenting has been heralded as a symbol of changing relations between universities and their social environments. The Bayh-Dole Act of 1980 in the USA was eagerly promoted by the OECD as a recipe for the commercialization of university research, and the law was imitated by a number of national governments. However, since the 2000s university patenting in the most advanced economies has been on the decline both as a percentage and in absolute terms. In addition to possible saturation effects and institutional learning, we suggest that the institutional incentives for university patenting have disappeared with the new regime of university ranking. Patents and spin-offs are not counted in university rankings. In the new arrangements of university-industry-government relations, universities have become very responsive to changes in their relevant environments.
In this study we show that it is possible to identify top-cited publications other than Web of Science (WoS) publications, particularly non-journal publications, within fields in the social and behavioral sciences. We analyzed references in publications that were themselves highly cited, with at least one European address. Books represent between 62 (psychology) and 81% (political science) of the non-WoS references, journal articles 15-24%. Books (economics, political science) and manuals (psychology) account for the most highly cited publications. Between 50 (psychology, political science) and 71% (economics) of the top-ranked most cited publications originated from the US versus between 18 (economics) and 38% (psychology) from Europe. Finally, it is discussed how the methods and procedures of the study can be optimized.
The purpose of the research was to establish and inform about the features of productivity across all scholarly fields measured through journals indexed in WoS in which Croatian authors working in Croatian institutions published since independence (1991) to 2005. Total 19,929 papers in 2,946 journals were found. Of these, 17,875 papers were published in 2,690 science, technology and medicine (STM) journals, 1,869 papers were published in 178 social science journals, and 185 were published in 78 A&H journals according to custom classification used in the research. Special attention was given to publishing features of specific scholarly fields. The number of different journals in which the papers were published per year has doubled in the period (from 404 in 1991 to 894 in 2005). To support additional insight, a distinction between national and international journals was made and top 10% journals according to JCR 2005 categories were identified by IF. National journals accounted for 15.9% of STM papers, 77.6% of social science papers and 25.9% of A&H papers. Top 10% journals accounted for a total of 368 journals and 2,336 papers with significant variations across the subfields.
Several bibliometric studies have shown that international or multicountry papers are generally more cited than domestic or single country papers. Does this also hold for the most cited papers? In this study, the citation impact of domestic versus international papers is analyzed by comparing the share of international papers among the hundred most cited papers in four research specialities, from three universities, four cities and two countries. It is concluded that international papers are not well represented among high impact papers in research specialities, but dominate highly cited papers from small countries, and from cities and institutions within them. The share of international papers among highly cited papers is considerably higher during 2001-2008 compared to earlier years for institutions, cities and countries, but somewhat less for two of the research fields and slightly higher for the other two. Above all, domestic papers from the USA comprise about half of the highly cited papers in the research specialities.
This paper empirically examines the relationship between research commercialization, entrepreneurial commitment, and knowledge production and diffusion in academia. Through a dataset of 229 academic patent inventors, this paper reveals that the effects of research commercialization on publication quantity, application-oriented research, and disclosure delay are moderated by the entrepreneurial commitment of faculty members. This paper concludes that encouraging entrepreneurial commitment of faculty members may possibly drive academics away from their traditional approaches in producing and diffusing knowledge.
The h-index is a recent metric that captures a scholar's influence. In the current work, it is used to: (1) obtain the h-index scores of the most productive scholars in the Journal of Consumer Research (JCR), and compare these to other elite scholars (including those of the other three premier marketing journals); (2) demonstrate the relationship between the h-indices and total number of citations of the top JCR producers; (3) examine the h-indices of Ferber winners (best interdisciplinary paper based on a doctoral dissertation published in JCR in a given year) and those having received honorable mentions; (4) explore the relationship between a marketing journal's prestige and the corresponding hindex score of its editor. These varied analyses demonstrate the multitudinous ways in which the h-index can be used when investigating bibliometric phenomena within a given discipline.
Intensified technology convergence, increasing relatedness between technological fields, is a mega-trend in 21st century science and technology. However, scientometrics has been unsuccessful in identifying this techno-economic paradigm change. To address the limitations and validity problems of conventional measures of technology convergence, we introduce a multi-dimensional contingency table representation of technological field co-occurrence and a relatedness measure based on the Mantel-Haenszel common log odds ratio. We used Korean patent data to compare previous and proposed methods. Results show that the proposed method can increase understanding of the techno-economic paradigm change because it reveals significant changes in technological relatedness over time.
The characteristic scores and scales (CSS), introduced by Glanzel and Schubert (J Inform Sci 14: 123-127, 1988) and further studied in subsequent papers of Glanzel, can be calculated exactly in a Lotkaian framework. We prove that these CSS are simple exponents of the average number of items per source in general IPPs. The proofs are given using size-frequency functions as well as using rank-frequency functions. We note that CSS do not necessarily have to be defined as averages but that medians can be used as well. Also for these CSS we present exact formulae in the Lotkaian framework and both types of CSS are compared. We also link these formulae with the h-index.
This paper evaluates the importance of jointly conducted research versus national, when neighbouring countries are trying to study a topic of their mutual interest. The chosen topic was the shared ocean or lake basin. The number of non-mutual and mutual articles in the period 1999-2008 for seven pairs of neighbouring countries was analysed by extracting published articles and citations from the Web of Science database. It was found that mutual articles have generally better visibility than the non-mutual articles, valid even for large and developed countries. Also, the percentage of self-citations in the mutual articles is much lower than in the non-mutual ones. However, the citations of the non-mutual articles are influenced by the development of the country or, in some cases, by the development of the countries in which researchers from a certain country are presently working (this applies strongly to the Eastern Europe countries).
I aim to advance our understanding of the size of scientific specialties. Derek Price's groundbreaking work has provided us with valuable conceptual tools and data for making progress on this issue. But, I argue that his estimate of 100 scientists per specialty is flawed. He fails to take into account the fact that the average publishing scientist publishes only 3.5 articles throughout her career. Hence, rather than consisting of 100 scientists, I have suggested that specialties are probably somewhat larger, perhaps somewhere between 250 and 600 scientists.
This study explores a bibliometric approach to quantitatively assessing current research trends on volatile organic compounds, by using the related literature in the Science Citation Index (SCI) database from 1992 to 2007. The articles acquired from such literature were concentrated on the general analysis by scientific output, the research performances by countries, institutes, and collaborations, and the research trends by the frequency of author keywords, words in title, words in abstract, and keywords plus. Over the past years, there had been a notable growth trend in publication outputs, along with more participation and collaboration of countries and institutes. Research collaborative papers had shifted from the national inter-institutional to the international collaboration. Benzene, toluene, and formaldehyde were the three kinds of VOCs concerned mostly. Detection and removing, especially by adsorption and oxidation, of VOCs were to be the orientation of all VOCs research in the next few years.
Using the data of a comprehensive evaluation study on the peer review process of Angewandte Chemie International Edition (AC-IE), we examined in this study the way in which referees' comments differ on manuscripts rejected at AC-IE and later published in either a low-impact journal (Tetrahedron Letters, n = 54) or a high-impact journal (Journal of the American Chemical Society, n = 42). For this purpose, a content analysis was performed of comments which led to the rejection of the manuscripts at AC-IE. For the content analysis, a classification scheme with thematic areas developed by Bornmann et al. (2008) was used. As the results of the analysis demonstrate, a large number of negative comments from referees in the areas "Relevance of contribution'' and "Design/Conception'' are clear signs that a manuscript rejected at AC-IE will not be published later in a high-impact journal. The number of negative statements in the areas "Writing/Presentation,'' "Discussion of results,'' "Method/Statistics,'' and "Reference to the literature and documentation,'' on the other hand, had no statistically significant influence on the probability that a rejected manuscript would later be published in a low-or high-impact journal. The results of this study have various implications for authors, journal editors and referees.
With reference to social constructivist approaches on citing behavior in the sciences, the hypothesis of acceleration of citing behavior after the millennium was empirically tested for a stratified random sample of exemplary psychology journal articles. The sample consists of 45 English and 45 German articles published in the years 1985 versus 1995 versus 2005 in high impact journals on developmental psychology, psychological diagnosis and assessment, and social psychology. Content analyses of the reference lists refer to the total number of references cited in the articles and the publication years of all references. In addition, the number of self-references, the number of pages, and the number of authors were determined for each article. Results show that there is no acceleration of citing behavior; rather, on the contrary, a significant trend is revealed for an increase in authors' citing somewhat older references in the newer journal articles. Significant main effects point also at more citations of somewhat older references in the English (vs. German) journal articles as well as in articles on social psychology and psychological diagnosis (vs. on developmental psychology). Complementary analyses show that multiple authorships and the number of pages as well as the total number of references and the number of self-references increase significantly with time. However, percentage of self-references remains quite stable at about 10%. Some methodological and statistical traps in bibliometric testing the starting hypothesis are considered. Thus, the talk that has been circulating among psychology colleagues and students on the potential millennium effects on citing behavior in the sciences (which can, however, become a self-fulfilling prophecy) are not confirmed-at least for psychology journals.
In this paper, we examine whether the quality of academic research can be accurately captured by a single aggregated measure such as a ranking. With Shanghai University's Academic Ranking of World Universities as the basis for our study, we use robust principal component analysis to uncover the underlying factors measured by this ranking. Based on a sample containing the top 150 ranked universities, we find evidence that, for the majority of these institutions, the Shanghai rankings reflect not one but in fact two different and uncorrelated aspects of academic research: overall research output and top-notch researchers. Consequently, the relative weight placed upon these two factors determines to a large extent the final ranking.
A useful level of analysis for the study of innovation may be what we call "knowledge communities''-intellectually cohesive, organic inter-organizational forms. Formal organizations like firms are excellent at promoting cooperation, but knowledge communities are superior at fostering collaboration-the most important process in innovation. Rather than focusing on what encourages performance in formal organizations, we study what characteristics encourage aggregate superior performance in informal knowledge communities in computer science. Specifically, we explore the way knowledge communities both draw on past knowledge, as seen in citations, and use rhetoric, as found in writing, to seek a basis for differential success. We find that when using knowledge successful knowledge communities draw from a broad range of sources and are extremely flexible in changing and adapting. In marked contrast, when using rhetoric successful knowledge communities tend to use very similar vocabularies and language that does not move or adapt over time and is not unique or esoteric compared to the vocabulary of other communities. A better understanding of how inter-organizational collaborative network structures encourage innovation is important to understanding what drives innovation and how to promote it.
Cohesive intellectual communities called "schools of thought'' can provide powerful benefits to those developing new knowledge, but can also constrain them. We examine how developers of new knowledge position themselves within and between schools of thought, and how this affects their impact. Looking at the micro and macro fields of management publications from 1956 to 2002 with an extensive dataset of 113,000+ articles from 41 top journals, we explore the dynamics of knowledge positioning for management scholars. We find that it is significantly beneficial for new knowledge to be a part of a school of thought, and that within a school of thought new knowledge has more impact if it is in the intellectual semi-periphery of the school.
In 538 randomly selected Swedish biomedical PhDs from 2008, 50% of the external examiners came from abroad, most commonly USA and UK. The sex distribution between candidates was equal, while 17% of the external examiners were women. Twice as many women candidates as men had women examiners. Swedish PhDs are based on work published in international peer-reviewed journals; the median number of works per thesis was 4. The Swedish thesis examination system offers a model for international cross-fertilisation.
Journals covered by the 2006 Science Citation Index Journal Citation Reports database have been subjected to a clustering procedure utilizing h-similarity as the underlying similarity measure. Clustering complemented with a prototyping routine provided well-conceivable results that are both compatible with and further refine existing taxonomies of science.
The authors investigate factors influencing user satisfaction in information retrieval. It is evident from this study that user satisfaction is a subjective variable, which can be influenced by several factors such as system effectiveness, user effectiveness, user effort, and user characteristics and expectations. Therefore, information retrieval evaluators should consider all these factors in obtaining user satisfaction and in using it as a criterion of system effectiveness. Previous studies have conflicting conclusions on the relationship between user satisfaction and system effectiveness; this study has substantiated these findings and supports using user satisfaction as a criterion of system effectiveness.
As new technologies and information delivery systems emerge, the way in which individuals search for information to support research, teaching, and creative activities is changing. To understand different aspects of researchers' information-seeking behavior, this article surveyed 2,063 academic researchers in natural science, engineering, and medical science from five research universities in the United States. A Web-based, in-depth questionnaire was designed to quantify researchers' information searching, information use, and information storage behaviors. Descriptive statistics are reported. Additionally, analysis of results is broken out by institutions to compare differences among universities. Significant findings are reported, with the biggest changes because of increased utilization of electronic methods for searching, sharing, and storing scholarly content, as well as for utilizing library services. Generally speaking, researchers in the five universities had similar information-seeking behavior, with small differences because of varying academic unit structures and myriad library services provided at the individual institutions.
With the emergence of Web 2.0, sharing personal content, communicating ideas, and interacting with other online users in Web 2.0 communities have become daily routines for online users. User-generated data from Web 2.0 sites provide rich personal information (e.g., personal preferences and interests) and can be utilized to obtain insight about cyber communities and their social networks. Many studies have focused on leveraging user-generated information to analyze blogs and forums, but few studies have applied this approach to video-sharing Web sites. In this study, we propose a text-based framework for video content classification of online-video sharing Web sites. Different types of user-generated data (e.g., titles, descriptions, and comments) were used as proxies for online videos, and three types of text features (lexical, syntactic, and content-specific features) were extracted. Three feature-based classification techniques (C4.5, Nave Bayes, and Support Vector Machine) were used to classify videos. To evaluate the proposed framework, user-generated data from candidate videos, which were identified by searching user-given keywords on YouTube, were first collected. Then, a subset of the collected data was randomly selected and manually tagged by users as our experiment data. The experimental results showed that the proposed approach was able to classify online videos based on users' interests with accuracy rates up to 87.2%, and all three types of text features contributed to discriminating videos. Support Vector Machine outperformed C4.5 and Naive Bayes techniques in our experiments. In addition, our case study further demonstrated that accurate video-classification results are very useful for identifying implicit cyber communities on video-sharing Web sites.
The authors describe a flexible model and a system for content-based image retrieval of objects' shapes. Flexibility is intended as the possibility of customizing the system behavior to the user's needs and perceptions. This is achieved by allowing users to modify the retrieval function. The system implementing this model uses multiple representations to characterize some macroscopic characteristics of the objects shapes. Specifically, the shape indexes describe the global features of the object's contour (represented by the Fourier coefficients), the contour's irregularities (represented by the multifractal spectrum), and the presence of concavities and convexities (represented by the contour scale space distribution). During a query formulation, the user can specify both the preference for the macroscopic shape aspects that he or she considers meaningful for the retrieval, and the desired level of accuracy of the matching, which means that the visual query shape must be considered with a given tolerance in representing the desired shapes. The evaluation experiments showed that this system can be suited to different retrieval behaviors, and that, generally, the combination of the multiple shape representations increases both recall and precision with respect to the application of any single representation.
Two key problems in developing a storyboard are (a) the extraction of video key frames and (b) the display of the storyboard. On the basis of our findings from a preliminary study as well as the results of previous studies on the computerized extraction of key frames and human recognition of images and videos, we propose an algorithm for the extraction of key frames and the structural display of a storyboard. In order to evaluate the proposed algorithm, we conducted an experiment, the results of which suggest that participants produce better summaries of the given videos when they view storyboards that are composed of key frames extracted using the proposed algorithmic method. This finding held, regardless of whether the display pattern used was sequential or structural. In contrast, the experimental results suggest that in the case of employing a mechanical method, the use of a structural display pattern yields greater performance in terms of participants' ability to summarize the given videos. Elaborating on our results, we discuss the practical implications of our findings for video summarization and retrieval.
In this article, we describe the results of an experiment designed to understand the effects of background information and social interaction on image tagging. The participants in the experiment were asked to tag 12 pre-selected images of Jewish cultural heritage. The users were partitioned into three groups: the first group saw only the images with no additional information whatsoever, the second group saw the images plus a short, descriptive title, and the third group saw the images, the titles, and the URL of the page in which the image appeared. In the first stage of the experiment, each user tagged the images without seeing the tags provided by the other users. In the second stage, the users saw the tags assigned by others and were encouraged to interact. Results show that after the social interaction phase, the tag sets converged and the popular tags became even more popular. Although in all cases the total number of assigned tags increased after the social interaction phase, the number of distinct tags decreased in most cases. When viewing the image only, in some cases the users were not able to correctly identify what they saw in some of the pictures, but they overcame the initial difficulties after interaction. We conclude from this experiment that social interaction may lead to convergence in tagging and that the "wisdom of the crowds" helps overcome the difficulties due to the lack of information.
In this article, we explore the dynamics of prosocial and self-interested behavior among musicians on My Space Music. My Space Music is an important platform for social interactions and at the same time provides musicians with the opportunity for significant profit. We argue that these forces can be in tension with each other, encouraging musicians to make strategic choices about using My Space to promote their own or others' rewards. We look for evidence of self-interested and prosocial "friending" strategies in the social network created by Top Friends links. We find strong evidence that individual preferences for prosocial and self-interested behavior influence friending strategies. Furthermore, our data illustrate a robust relationship between increased prominence and increased attention to others' rewards. These results shed light on how musicians manage their interactions in complex online environments and extend research on social values by demonstrating consistent preferences for prosocial or self-interested behavior in a multifaceted online setting.
The rapid advancement of nanotechnology research and development during the past decade presents an excellent opportunity for a scientometric study because it can provide insights into the dynamic growth of the fastevolving social networks associated with this field. In this article, we describe a case study conducted on nanotechnology to discover the dynamics that govern the growth process of rapidly advancing scientific-collaboration networks. This article starts with the definition of temporal social networks and demonstrates that the nanotechnology collaboration network, similar to other real-world social networks, exhibits a set of intriguing static and dynamic topological properties. Inspired by the observations that in collaboration networks new connections tend to be augmented between nodes in proximity, we explore the locality elements and the attachedness factor in growing networks. In particular, we develop two distance-based computational network growth schemes, namely the distance-based growth model (DG) and the hybrid degree and distance-based growth model (DOG). The DG model considers only locality element while the DDG is a hybrid model that factors into both locality and attachedness elements. The simulation results from these models indicate that both clustering coefficient rates and the average shortest distance are closely related to the edge densification rates. In addition, the hybrid DDG model exhibits higher clustering coefficient values and decreasing average shortest distance when the edge densification rate is fixed, which implies that combining locality and attachedness can better characterize the growing process of the nanotechnology community. Based on the simulation results, we conclude that social network evolution is related to both attachedness and locality factors.
University libraries invest a massive amount of resources in digitizing information for the Web, yet there is growing concern that much of this information is being underutilized. The present study uses the technology acceptance model (TAM) to investigate university library website resources (ULWR) usage. We categorize users based on academic roles and then analyze them as subgroups in order to observe different adoption patterns across groups. A total of 299 usable responses was collected from four different universities and across three populations: undergraduate, master, and doctoral student/faculty groups. The findings show that different library users indeed access ULWR for different reasons, resulting in a need for tailored managerial efforts. Overall, the extended TAM explains undergraduate students' usage best; the explanatory power of the model is significantly lower for the doctoral student/faculty group. Some of the findings challenge results reported in TAM research in other fields. The unexpected findings may result from the application of the model to a different context. Detailed theoretical implications and managerial guidance are offered.
Expertise-seeking research studies how people search for expertise and choose whom to contact in the context of a specific task. An important outcome are models that identify factors that influence expert finding. Expertise retrieval addresses the same problem, expert finding, but from a system-centered perspective. The main focus has been on developing content-based algorithms similar to document search. These algorithms identify matching experts primarily on the basis of the textual content of documents with which experts are associated. Other factors, such as the ones identified by expertise-seeking models, are rarely taken into account. In this article, we extend content-based expert-finding approaches with contextual factors that have been found to influence human expert finding. We focus on a task of science communicators in a knowledge-intensive environment, the task of finding similar experts, given an example expert. Our approach combines expertise-seeking and retrieval research. First, we conduct a user study to identify contextual factors that may play a role in the studied task and environment. Then, we design expert retrieval models to capture these factors. We combine these with content-based retrieval models and evaluate them in a retrieval experiment. Our main finding is that while content-based features are the most important, human participants also take contextual factors into account, such as media experience and organizational structure. We develop two principled ways of modeling the identified factors and integrate them with content-based retrieval models. Our experiments show that models combining content-based and contextual factors can significantly outperform existing content-based models.
Arabic has a complex structure, which makes it difficult to apply natural language processing (NLP). Much research on Arabic NLP (ANLP) does exist; however, it is not as mature as that of other languages. Finding Arabic roots is an important step toward conducting effective research on most of ANLP applications. The authors have studied and compared six root-finding algorithms with success rates of over 90%. All algorithms of this study did not use the same testing corpus and/or benchmarking measures. They unified the testing process by implementing their own algorithm descriptions and building a corpus out of 3823 triliteral roots, applying 73 triliteral patterns, and with 18 affixes, producing around 27.6 million words. They tested the algorithms with the generated corpus and have obtained interesting results; they offer to share the corpus freely for benchmarking and ANLP research.
Our limited understanding of real-life queries is an obstacle in developing music information retrieval (MIR) systems that meet the needs of real users. This study aimed, by an empirical investigation of real-life queries, to contribute to developing a theorized understanding of how users seek music information. This is crucial for informing the design of future MIR systems, especially the selection of potential access points, as well as establishing a set of test queries that reflect real-life music information-seeking behavior. Natural language music queries were collected from an online reference Website and coded using content analysis. A taxonomy of user needs expressed and information features used in queries were established by an iterative coding process. This study found that most of the queries analyzed were known-item searches, and most contained a wide variety of kinds of information, although a few features were used much more heavily than the others. In addition to advancing our understanding of real-life user queries by establishing an improved taxonomy of needs and features, three recommendations were made for improving the evaluation of MIR systems: (i) incorporating user context in test queries, (ii) employing terms familiar to users in evaluation tasks, and (iii) combining multiple task results.
Discovering relationships among concepts and categories is crucial in various information systems. The authors' objective was to discover such relationships among document categories. Traditionally, such relationships are represented in the form of a concept hierarchy, grouping some categories under the same parent category. Although the nature of hierarchy supports the identification of categories that may share the same parent, not all of these categories have a relationship with each other other than sharing the same parent. However, some "non-sibling" relationships exist that although are related to each other are not identified as such. The authors identify and build a relationship network (relationship-net) with categories as the vertices and relationships as the edges of this network. They demonstrate that using a relationship-net, some nonobvious category relationships are detected. Their approach capitalizes on the misclassification information generated during the process of text classification to identify potential relationships among categories and automatically generate relationship-nets. Their results demonstrate a statistically significant improvement over the current approach by up to 73% on 20 News groups 20NG, up to 68% on 17 categories in the Open Directories Project (ODP17), and more than twice on ODP46 and Special Interest Group on Information Retrieval (SIGIR) data sets. Their results also indicate that using misclassification information stemming from passage classification as opposed to document classification statistically significantly improves the results on 20NG (8%), ODP17 (5%), ODP46 (73%), and SIGIR (117%) with respect to F1 measure. By assigning weights to relationships and by performing feature selection, results are further optimized.
In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated query) for sentence ranking, they propose a novel ranking approach that incorporates intertopic information mining. Intertopic information, in contrast to intratopic information, is able to reveal pairwise topic relationships and thus can be considered as the bridge across different topics. In this article, the intertopic information is used for transferring word importance learned from known topics to unknown topics under a learning-based summarization framework. To mine this information, the authors model the topic relationship by clustering all the words in both known and unknown topics according to various kinds of word conceptual labels, which indicate the roles of the words in the topic. Based on the mined relationships, we develop a probabilistic model using manually generated summaries provided for known topics to predict ranking scores for sentences in unknown topics. A series of experiments have been conducted on the Document Understanding Conference (DUC) 2006 data set. The evaluation results show that intertopic information is indeed effective for sentence ranking and the resultant summarization system performs comparably well to the best-performing DUC participating systems on the same data set.
The involvement of male and female scientists in the technological activity developed in Spain is analysed through the study of patent applications filed with the Spanish OEPM database during the period 1990-2005. Comparative analyses based on participation, contribution and inventors by gender are presented and discussed. The study reveals a low female involvement in technology, which tends to concentrate in specific institutional sectors (public research institutions) and technological sections (A/Human Necessities and C/Chemistry). Over the 16-year period analysed the involvement of female scientists rose at a higher rate than that of men in most of the institutional sectors and technological fields. The highest relative increase corresponds to University and Spanish National Research Council, and our data suggest that it is enhanced by collaboration. To make the production of sex-disaggregated technology indicators easier the inclusion of the sex of the inventors as an additional field in patent databases would be desirable, as well as a higher normalisation of inventor names, applicant names (full names) and institutional affiliations.
The investigators studied author research impact using the number of citers per publication an author's research has been able to attract, as opposed to the more traditional measure of citations. A focus on citers provides a complementary measure of an author's reach or influence in a field, whereas citations, although possibly numerous, may not reflect this reach, particularly if many citations are received from a small number of citers. In this exploratory study, Web of Science was used to tally citer and citation-based counts for 25 highly cited researchers in information studies in the United States and 26 highly cited researchers from the United Kingdom. Outcomes of the tallies based on several measures, including an introduced ch-index, were used to determine whether differences arise in author rankings when using citer-based versus citation-based counts. The findings indicate a strong correlation between some citation and citer-based measures, but not with others. The findings of the study have implications for the way authors' research impact may be assessed.
This research intends to investigate the patent activity on water pollution and treatment in China (1985-2007), and then compares the results with patents data about Triadic patents, South Korea, Brazil and India over the same periods, patents data were collected from Derwent World Patents Index between 1985 and May 2008. For this study, 169,312 patents were chosen and examined. Total volume of patents, technology focus, assignee sector, priority date and the comparison with other countries are analyzed. It is found that patents on water pollution and treatment filed at China have experienced a remarkable increase and the increase rate of patents filed at China change simultaneous with the percentage of domestic applications. However, the number of high quality Triadic patents with priority country as China remains small. Furthermore, in addition to individual patent assignees, both Chinese universities and enterprises also play important roles in patent activity of water pollution and treatment. In addition, the pattern of South Korea's development can provide short-term implications for China and the regularity in Triadic patents' development can provide some guidance to China's long-term development. In contrast, the development pattern of Brazil and India is less influential to China's development. Furthermore, China's technology focuses on water pollution and treatment seem to parallel global and triadic patent trends. This research provides a comprehensive picture of China's innovation capability in the area of water pollution and treatment. It will help China's local governments to improve their regional S&T capability and will provide support the National Water Pollution Control and Treatment Project in China.
A central idea in Dan Sperber and Deirdre Wilson's relevance theory is that an individual's sense of the relevance of an input varies directly with the cognitive effects, and inversely with the processing effort, of the input in a context. I argue that this idea has an objective analog in information science-the tf*idf (term frequency, inverse document frequency) formula used to weight indexing terms in document retrieval. Here, tf*idf is used to weight terms from five bibliometric distributions in the context of the seed terms that generated them. The distributions include the descriptors co-assigned with a descriptor, the descriptors and identifiers assigned to an author, two examples of cited authors and their co-citees, and the books and journals cited with a famous book, The Structure of Scientific Revolutions. In each case, the highest-ranked terms are contrasted with lowest-ranked terms. In two cases, pennant diagrams, a new way of displaying bibliometric data, augment the tabular results. Clear qualitative differences between the sets of terms are intuitively well-explained by relevance theory.
We introduce a new visual analytic approach to the study of scientific discoveries and knowledge diffusion. Our approach enhances contemporary co-citation network analysis by enabling analysts to identify co-citation clusters of cited references intuitively, synthesize thematic contexts in which these clusters are cited, and trace how research focus evolves over time. The new approach integrates and streamlines a few previously isolated techniques such as spectral clustering and feature selection algorithms. The integrative procedure is expected to empower and strengthen analytical and sense making capabilities of scientists, learners, and researchers to understand the dynamics of the evolution of scientific domains in a wide range of scientific fields, science studies, and science policy evaluation and planning. We demonstrate the potential of our approach through a visual analysis of the evolution of astronomical research associated with the Sloan Digital Sky Survey (SDSS) using bibliographic data between 1994 and 2008. In addition, we also demonstrate that the approach can be consistently applied to a set of heterogeneous data sources such as e-prints on arXiv, publications on ADS, and NSF awards related to the same topic of SDSS.
The uncitedness factor of a journal is its fraction of uncited articles. Given a set of journals (e.g. in a field) we can determine the rank-order distribution of these uncitedness factors. Hereby we use the Central Limit Theorem which is valid for uncitedness factors since it are fractions, hence averages. A similar result was proved earlier for the impact factors of a set of journals. Here we combine the two rank-order distributions, hereby eliminating the rank, yielding the functional relation between the impact factor and the uncitedness factor. It is proved that the decreasing relation has an S-shape: first convex, then concave and that the inflection point is in the point (mu', mu) where mu is the average of the impact factors and mu' is the average of the uncitedness factors.
The tail properties of scientometric distributions are studied in the light of the h-index and the characteristic scores and scales. A statistical test for the h-core is presented and illustrated using the example of four selected authors. Finally, the mathematical relationship between the h-index and characteristic scores and scales is analysed. The results give new insights into important properties of rank-frequency and extreme-value statistics derived from scientometric and informetric processes.
Applications of non-parametric frontier production methods such as Data Envelopment Analysis (DEA) have gained popularity and recognition in scientometrics. DEA seems to be a useful method to assess the efficiency of research units in different fields and disciplines. However, DEA results give only a synthetic measurement that does not expose the multiple relationships between scientific production variables by discipline. Although some papers mention the need for studies by discipline, they do not show how to take those differences into account in the analysis. Some studies tend to homogenize the behaviour of different practice communities. In this paper we propose a framework to make inferences about DEA efficiencies, recognizing the underlying relationships between production variables and efficiency by discipline, using Bayesian Network (BN) analysis. Two different DEA extensions are applied to calculate the efficiency of research groups: one called CCRO and the other Cross Efficiency (CE). A BN model is proposed as a method to analyze the results obtained from DEA. BNs allow us to recognize peculiarities of each discipline in terms of scientific production and the efficiency frontier. Besides, BNs provide the possibility for a manager to propose what-if scenarios based on the relations found.
Cuban scientific output at macro level has not been frequently studied in the literature on scientometrics. The current paper explores the different metric approaches to the Cuban scientific activity carried out by national and international authors. Also, the article develops a scientometric study of the Cuban scientific production as included in Scopus during the period 1996-2007, using socio-economic indicators combined with bibliometric indicators supported by the SCImago Journal & Country Rank. Web of Science and Scopus are compared as information sources. Results confirm the possibility to use Scopus to obtain an objective picture of the Cuban science behaviour during the end of the 1990s and the beginning of the XXI century. The SCImago Journal & Country Rank, in this case, offers an important set of indicators. The combination of these indicators with those related to socio-economic aspects of activities in Science and Technology, allow the authors to show a perspective of the Cuban science system evolution during the period analyzed. The inclusion in Scopus of less-cited journals published in Spanish language and its impact on productivity and citation-based indicators is also discussed. Our investigation found an increasing growth of the Cuban scientific production during the whole period, which is in correspondence to the country efforts and expenditures in Research and Development activities.
We consider the "Matthew effect" in the citation process which leads to reallocation (or misallocation) of the citations received by scientific papers within the same journals. The case when such reallocation correlates with a country where an author works is investigated. Russian papers in chemistry and physics published abroad were examined. We found that in both disciplines in about 60% of journals Russian papers are cited less than average ones. However, if we consider each discipline as a whole, citedness of a Russian paper in physics will be on the average level, while chemistry publications receive about 16% citations less than one may expect from the citedness of the journals where they appear. Moreover, Russian chemistry papers mostly become undercited in the leading journals of the field. Characteristics of a "Matthew index" indicator and its significance for scientometric studies are also discussed.
This study uses author co-citation analysis to trace prospectively the development of the cognitive neuroscience of attention between 1980 and 2005 from its precursor disciplines: cognitive psychology, single cell neurophysiology, neuropsychology, and evoked potential research. The author set consists of 28 authors highly active in attentional research in the mid-1980s. PFNETS are used to present the co-citation networks. Authors are clustered via the single-link clustering intrinsic to the PFNET algorithm. By 1990 a distinct cognitive neuroscience specialty cluster emerges, dominated by authors engaged in brain imaging research.
Scientific progress in technology oriented research fields is made by incremental or fundamental inventions concerning natural science effects, materials, methods, tools and applications. Therefore our approach focuses on research activities of such technological elements on the basis of keywords in published articles. In this paper we show how emerging topics in the field of optoelectronic devices based on scientific literature data from the PASCAL-database can be identified. We use Results from PROMTECH project, whose principal objective was to produce a methodology allowing the identification of promising emerging technologies. In this project, the study of the intersection of Applied Sciences as well as Life (Biological & Medical) Sciences domains and Physics with bibliometric methods produced 45 candidate technological fields and the validation by expert panels led to a final selection of 10 most promising ones. These 45 technologies were used as reference fields. In order to detect the emerging research, we combine two methodological approaches. The first one introduces a new modelling of field terminology evolution based on bibliometric indicators: the diffusion model and the second one is a diachronic cluster analysis. With the diffusion model we identified single keywords that represent a high dynamic of the mentioned technology elements. The cluster analysis was used to recombine articles, where the identified keywords were used to technological topics in the field of optoelectronic devices. This methodology allows us to answer the following questions: Which technological aspects within our considered field can be detected? Which of them are already established and which of them are new? How are the topics linked to each other?.
Citation network analysis is an effective tool to analyze the structure of scientific research. Clustering is often used to visualize scientific domain and to detect emerging research front there. While we often set arbitrarily clustering threshold, there is few guide to set appropriate threshold. This study analyzed basic process how clustering of citation network proceeds by tracking size and modularity change during clustering. We found that there are three stages in clustering of citation networks and it is universal across our case studies. In the first stage, core clusters in the domain are formed. In the second stage, peripheral clusters are formed, while core clusters continue to grow. In the third stage, core clusters grow again. We found the minimum corpus size around one hundred assuring the clustering. When the corpus size is less than one hundred, clustered network structure tends to be more random. In addition even for the corpus whose size is larger than it, the clustering quality for some clusters formed in the later stage is low. These results give a fundamental guidance to the user of citation network analysis.
This paper tests the validity of Urquhart's Law ("the inter-library loan demand for a periodical is as a rule a measure of its total use"). It compares the use of print journals at the Turkish Academic Network and Information Center (ULAKBIM) with the consortial use of the same journals in their electronic form by the individual libraries making up the Consortium of Turkish University Libraries (ANKOS). It also compares the on-site use of electronic journals at ULAKBIM with their consortial use at ANKOS. About 700 thousand document delivery, in-house and on-site use data and close to 28 million consortial use data representing seven years' worth of downloads of full-text journal articles were used. Findings validate Urquhart's Law in that a positive correlation was observed between the use of print journals at ULAKBIM and the consortial use of their electronic copies at ANKOS. The on-site and consortial use of electronic journals was also highly correlated. Both print and electronic journals that were used most often at ULAKBIM tend to get used heavily by the member libraries of ANKOS consortium, too. Findings can be used in developing consortial collection management policies and negotiate better consortial licence agreements.
In September 2008 Thomson Reuters added to the ISI Web of Science (WOS) the Conference Proceedings Citation Indexes for Science and for the Social Sciences and Humanities. This paper examines how this change affects the publication and citation counts of highly cited computer scientists. Computer science is a field where proceedings are a major publication venue. The results show that most of the highly cited publications of the sampled researchers are journal publications, but these highly cited items receive more than 40% of their citations from proceedings papers. The paper also discusses issues related to double-counting, i.e., when a given work is published both in a proceedings and later on as a journal paper.
We proposed an original research design based on applied Scientometrics and frame analysis to assess how a citation was made to sustain arguments in documents on public health policies subjected to online public consultation from 2003 to 2008 in Brazil. So we built on citation studies to create a new scale to estimate why a scientific work was mentioned in our sample of 278 citations. We found that government branches make citations mainly to value their arguments, not to explain them, and that contributors mainly make citations in such a way that could discourage others from engaging in digital democracy.
Interdisciplinarity can be manifest in many forms: through collaboration or communication between scientists working in different fields or through the work of individual scientists who employ concepts or methods across disciplines. This latter form of interdisciplinarity is addressed here with the goal of understanding how ideas in different fields come together to create new opportunities for discovery. Maps of science are used to suggest possible interdisciplinary links which are then analyzed by co-citation context analysis. Interdisciplinary links are identified by juxtaposing a clustering and mapping of documents against a journal-based categorization of the same document clusters. Links between clusters are characterized as interdisciplinary based on the dissonance of their category assignments. To verify and probe more deeply into the meaning of interdisciplinary links, co-citation contexts for selected links from five separate cases are analyzed in terms of prominent cue words. This analysis reveals that interdisciplinary connections are often based on authors' perceptions of analogous problems across scientific domains. Cue words drawn from the citation contexts also suggest that these connections are viewed as important and ripe with both opportunity and risk.
It is known that there are significant correlations between linking and geographical patterns. Although interlinking patterns have been studied in various contexts, co-inlinking patterns on the Web have only been studied as indicator of business competitive positions. This research studies the use of co-inlinks to local government Web sites, assesses whether co-inlinking follows geographic patterns and investigates reasons for creating the co-inlinks. Strong evidence was found that co-inlinking is more frequent to municipalities in the same functional region than to municipalities in different functional regions, indicating that this geographic aspect influences co-inlinking, even though geographic co-inlinking was not a strong trend overall. Because the functional regions are created based on cooperation between the municipalities, we have indirectly been able to map cooperation from co-inlinking patterns on the Web. The main reason to create co-inlinking links to municipalities was that the source of the links wanted to show a connection to its region.
The enormous increase in digital scholarly data and computing power combined with recent advances in text mining, linguistics, network science, and scientometrics make it possible to scientifically study the structure and evolution of science on a large scale. This paper discusses the challenges of this 'BIG science of science'aEuro"also called 'computational scientometrics' research-in terms of data access, algorithm scalability, repeatability, as well as result communication and interpretation. It then introduces two infrastructures: (1) the Scholarly Database (SDB) ( http://sdb.slis.indiana.edu ), which provides free online access to 22 million scholarly records-papers, patents, and funding awards which can be cross-searched and downloaded as dumps, and (2) Scientometrics-relevant plug-ins of the open-source Network Workbench (NWB) Tool ( http://nwb.slis.indiana.edu ). The utility of these infrastructures is then exemplarily demonstrated in three studies: a comparison of the funding portfolios and co-investigator networks of different universities, an examination of paper-citation and co-author networks of major network science researchers, and an analysis of topic bursts in streams of text. The article concludes with a discussion of related work that aims to provide practically useful and theoretically grounded cyberinfrastructure in support of computational scientometrics research, education and practice.
Ranking information retrieval (IR) systems with respect to their effectiveness is a crucial operation during IR evaluation, as well as during data fusion. This article offers a novel method of approaching the system-ranking problem, based on the widely studied idea of polyrepresentation. The principle of polyrepresentation suggests that a single information need can be represented by many query articulations what we call query aspects. By skimming the top k (where k is small) documents retrieved by a single system for multiple query aspects, we collect a set of documents that are likely to be relevant to a given test topic. Labeling these skimmed documents as putatively relevant lets us build pseudorelevance judgments without undue human intervention. We report experiments where using these pseudorelevance judgments delivers a rank ordering of IR systems that correlates highly with rankings based on human relevance judgments.
Document clustering is an important tool, but it is not yet widely used in practice probably because of its high computational complexity. This article explores techniques of high-speed rough clustering of documents, assuming that it is sometimes necessary to obtain a clustering result in a shorter time, although the result is just an approximate outline of document clusters. A promising approach for such clustering is to reduce the number of documents to be checked for generating cluster vectors in the leader follower clustering algorithm. Based on this idea, the present article proposes a modified Crouch algorithm and incomplete single-pass leader follower algorithm. Also, a two-stage grouping technique, in which the first stage attempts to decrease the number of documents to be processed in the second stage by applying a quick merging technique, is developed. An experiment using a part of the Reuters corpus RCV1 showed empirically that both the modified Crouch and the incomplete single-pass leader follower algorithms achieve clustering results more efficiently than the original methods, and also improved the effectiveness of clustering results. On the other hand, the two-stage grouping technique did not reduce the processing time in this experiment.
We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.
The study examined the possibility of constructing business profiles (specifically, product profiles) based on keyword patterns on various types of Web sites, including a company's own Web site, blog sites, and Web sites that have particular keywords and also hyperlinks pointing to company Web sites. To test the proposed methods, we selected China's four major oil companies and two other companies that have related products. We collected three rounds of data over a 7-month period from these three Web sources and analyzed the numbers of retrieved pages to construct business profiles. The business profiles constructed were checked against business information collected from other sources such as company annual reports and company newsletters to determine the correctness of the profiles and thus the usefulness of the proposed methods. We found that we can construct fairly accurate profiles by examining the frequency distribution of product keywords on company Web sites. Analyzing the frequency distribution of blogs on various topics was very useful in following major business events and developments during particular time periods. We also conducted qualitative content analysis for a sample of 454 Web pages retrieved from the three sources. Findings from the content analysis confirmed the conclusions from the quantitative analysis.
In this work, a novel method of cocitation analysis, coined "contextual cocitation analysis," is introduced and described in comparison to traditional methods of cocitation analysis. Equations for quantifying contextual cocitation strength are introduced and their implications explored using theoretical examples alongside the application of contextual cocitation to a series of BioMed Central publications and their cited resources. Based on this work, the implications of contextual cocitation for understanding the granularity of the relationships created between cited published research and methods for its analysis are discussed. Future applications and improvements of this work, including its extended application to the published research of multiple disciplines, are then presented with rationales for their inclusion.
Specimen identification keys are still the most commonly created tools used by systematic biologists to access biodiversity information. Creating identification keys requires analyzing and synthesizing large amounts of information from specimens and their descriptions and is a very labor-intensive and time-consuming activity. Automating the generation of identification keys from text descriptions becomes a highly attractive text mining application in the biodiversity domain. Fine-grained semantic annotation of morphological descriptions of organisms is a necessary first step in generating keys from text. Machine-readable ontologies are needed in this process because most biological characters are only implied (i.e., not stated) in descriptions. The immediate question to ask is "How well do existing ontologies support semantic annotation and automated key generation?"With the intention to either select an existing ontology or develop a unified ontology based on existing ones, this paper evaluates the coverage, semantic consistency, and inter-ontology agreement of a biodiversity character ontology and three plant glossaries that may be turned into ontologies. The coverage and semantic consistency of the ontology/glossaries are checked against the authoritative domain literature, namely, Flora of North America and Flora of China. The evaluation results suggest that more work is needed to improve the coverage and interoperability of the ontology/glossaries. More concepts need to be added to the ontology/glossaries and careful work is needed to improve the semantic consistency. The method used in this paper to evaluate the ontology/glossaries can be used to propose new candidate concepts from the domain literature and suggest appropriate definitions.
Wikis are designed to support collaborative editing, without focusing on individual contribution, such that it is not straightforward to determine who contributed to a specific page. However, as wikis are increasingly adopted in settings such as business, government, and education, where editors are largely driven by career goals, there is a perceived need to modify wikis so that each editor's contributions are clearly presented. In this paper we introduce an approach for assessing the contributions of wiki editors along several authorship categories, as well as a variety of information glyphs for visualizing this information. We report on three types of analysis: (a) assessing the accuracy of the algorithms, (b) estimating the understandability of the visualizations, and (c) exploring wiki editors' perceptions regarding the extent to which such an approach is likely to change their behavior. Our findings demonstrate that our proposed automated techniques can estimate fairly accurately the quantity of editors' contributions across various authorship categories, and that the visualizations we introduced can clearly convey this information to users. Moreover, our user study suggests that such tools are likely to change wiki editors' behavior. We discuss both the potential benefits and risks associated with solutions for estimating and visualizing wiki contributions.
Finding audiovisual material for reuse in new programs is an important activity for news producers, documentary makers, and other media professionals. Such professionals are typically served by an audiovisual broadcast archive. We report on a study of the transaction logs of one such archive. The analysis includes an investigation of commercial orders made by the media professionals and a characterization of sessions, queries, and the content of terms recorded in the logs. One of our key findings is that there is a strong demand for short pieces of audiovisual material in the archive. In addition, while searchers are generally able to quickly navigate to a usable audiovisual broadcast, it takes them longer to place an order when purchasing a subsection of a broadcast than when purchasing an entire broadcast. Another key finding is that queries predominantly consist of (parts of) broadcast titles and of proper names. Our observations imply that it may be beneficial to increase support for fine-grained access to audiovisual material, for example, through manual segmentation or content-based analysis.
This study explores how and why people participate in collaborative knowledge-building practices in the context of Wikipedia. Based on a survey of 223 Wikipedians, this study examines the relationship between motivations, internal cognitive beliefs, social-relational factors, and knowledge-sharing intentions. Results from structural equation modeling (SEM) analysis reveal that attitudes, knowledge self-efficacy, and a basic norm of generalized reciprocity have significant and direct relationships with knowledge-sharing intentions. Altruism (an intrinsic motivator) is positively related to attitudes toward knowledge sharing, whereas reputation (an extrinsic motivator) is not a significant predictor of attitude. The study also reveals that a social-relational factor, namely, a sense of belonging, is related to knowledge-sharing intentions indirectly through different motivational and social factors such as altruism, subjective norms, knowledge self-efficacy, and generalized reciprocity. Implications for future research and practice are discussed.
The unprecedented growth of the Internet has given rise to the Dark Web, the problematic facet of the Web associated with cybercrime, hate, and extremism. Despite the need for tools to collect and analyze Dark Web forums, the covert nature of this part of the Internet makes traditional Web crawling techniques insufficient for capturing such content. In this study, we propose a novel crawling system designed to collect Dark Web forum content. The system uses a human-assisted accessibility approach to gain access to Dark Web forums. Several URL ordering features and techniques enable efficient extraction of forum postings. The system also includes an incremental crawler coupled with a recall-improvement mechanism intended to facilitate enhanced retrieval and updating of collected content. Experiments conducted to evaluate the effectiveness of the human-assisted accessibility approach and the recall-improvement-based, incremental-update procedure yielded favorable results. The human-assisted approach significantly improved access to Dark Web forums while the incremental crawler with recall improvement also outperformed standard periodic- and incremental-update approaches. Using the system, we were able to collect over 100 Dark Web forums from three regions. A case study encompassing link and content analysis of collected forums was used to illustrate the value and importance of gathering and analyzing content from such online communities.
The authors address the problem of unsupervised ensemble ranking. Traditional approaches either combine multiple ranking criteria into a unified representation to obtain an overall ranking score or to utilize certain rank fusion or aggregation techniques to combine the ranking results. Beyond the aforementioned "combine-then-rank" and "rank-then-combine" approaches, the authors propose a novel "rank-learn-combine" ranking framework, called Interactive Ranking (iRANK), which allows two base rankers to "teach" each other before combination during the ranking process by providing their own ranking results as feedback to the others to boost the ranking performance. This mutual ranking refinement process continues until the two base rankers cannot learn from each other any more. The overall performance is improved by the enhancement of the base rankers through the mutual learning mechanism. The authors further design two ranking refinement strategies to efficiently and effectively use the feedback based on reasonable assumptions and rational analysis. Although iRANK is applicable to many applications, as a case study, they apply this framework to the sentence ranking problem in query-focused summarization and evaluate its effectiveness on the DUC 2005 and 2006 data sets. The results are encouraging with consistent and promising improvements.
The confluence of mobile content sharing and pervasive gaming yields new opportunities for developing novel applications on mobile devices. Yet, studies on users' attitudes and behaviors related to mobile gaming, content-sharing, and retrieval activities (referred to simply as content sharing and gaming) have been lacking. For this reason, the objectives of this article are three-fold. One, it introduces Indagator, an application that incorporates multiplayer, pervasive gaming elements into mobile content-sharing activities. Two, it seeks to uncover the motivations for content sharing within a game-based environment. Three, it aims to identify types of users who are motivated to use Indagator for content sharing. Informed by the uses and gratifications paradigm, a survey was designed and administered to 203 undergraduate and graduate students from two large universities. The findings revealed that perceived gratification factors, such as information discovery, entertainment, information quality, socialization, and relationship maintenance, demographic variables, such as basic familiarity with features of mobile communication devices, and IT-related backgrounds were significant in predicting intention to use mobile sharing and gaming applications such as Indagator. However, age, gender, and the personal status gratification factor were nonsignificant predictors. This article concludes by presenting the implications, limitations, and future research directions.
Formulating high-quality queries is a key aspect of context-based search. However, determining the effectiveness of a query is challenging because multiple objectives, such as high precision and high recall, are usually involved. In this work, we study techniques that can be applied to evolve contextualized queries when the criteria for determining query quality are based on multiple objectives. We report on the results of three different strategies for evolving queries: (a) single-objective, (b) multiobjective with Pareto-based ranking, and (c) multiobjective with aggregative ranking. After a comprehensive evaluation with a large set of topics, we discuss the limitations of the single-objective approach and observe that both the Pareto-based and aggregative strategies are highly effective for evolving topical queries. In particular, our experiments lead us to conclude that the multiobjective techniques are superior to a baseline as well as to well-known and ad hoc query reformulation techniques.
The use of Hirsch's h-index as a joint proxy of the impact and productivity of a scientist's research work continues to gain ground, accompanied by the efforts of bibliometrists to resolve some of its critical issues through the application of a number of more or less sophisticated variants. However, the literature does not reveal any appreciable attempt to overcome the objective problems of measuring h-indexes on a large scale for purposes of comparative evaluation. Scientists may succeed in calculating their own h-indexes but, being unable to compare them to those of their peers, they are unable to obtain truly useful indications of their individual research performance. This study proposes to overcome this gap, measuring the h- and Egghe's g-indexes of all Italian university researchers in the hard sciences over a 5-year window. Descriptive statistics are provided concerning all of the 165 subject fields examined, offering robust benchmarks for those who wish to compare their individual performance to those of their colleagues in the same subject field.
This paper investigates the controversy surrounding the systems approach in medicine, contributing to the body of literature on systems and information technology in civilian contexts. Specifically, the paper follows the design and implementation of a hospital information system at El Camino Hospital in Mountain View, California, in the 1960s and 1970s. The case study suggests that while many considered "people problems" like healthcare too complex for the systems approach, in fact it could have positive results if system engineers could translate social concerns about medicine into business and organizational strategies. This paper identifies the ways systems designers approached an organization characterized by autonomy rather than collaboration, craft rather than science, and charity rather than business, and helped to redefine that organization as one that emphasized rationality, efficiency, and the coexistence of man and machine.
In this era of a rapid change in the way people finding and using information resources, despite that the academic communication and using patterns for people in the traditional print environment have been studied for many years, the Internet media presents a new and relatively unexplored area for such study. In this article, we explored the distribution and utilization of web recourses in humanities and social sciences based on web citations. We collected 1,421,731 citations listed in 148,172 articles from 493 journals published during the period of 2006-2007 in the CSSCI, which resulted in 44,973 web citations. We counted the amount and types of web resources used in various disciplines, analyzed the URLs frequency from the host-level, fitted the frequency distribution into the regression models with SPSS, and perform the disciplines coupling analysis based on the web citations. We found out that: (a) The distributions of web citations by years or by websites and webpage types are selective and regular; (b) Great disparity exists among various disciplines in terms of using web information, and the high-frequency websites; (c) The frequency distribution of web citations is similar to the Garfield's citation distribution curve; (d) Some relationships between disciplines are detected, based on the utilization of web information.
Research astronomers and the telescopes they use each have typical life spans of about 40 years. Most of their journals live a good deal longer, though the second most important one today is only 40 years old. This paper looks at numbers for productivity and impact of specific astronomical facilities, changes in equality of opportunities and achievements in observational astronomy, and some aspects of national contributions. The focus is on optical astronomy, though something is also said about radio telescopes and astronomy from space. In summary, nothing stays "best of class" for very long; the fraction of the community with access to the most valuable facilities has increased with time (more equality of opportunity); but the fraction of citations earned by the few super-star papers has also increased (less equality of achievement); and the USA remains the host of the most-cited journals and the most productive telescopes, though Europe (meaning in this context the member nations of the European Southern Observatory, the European Space Agency, and the supporters of the journal Astronomy & Astrophysics) are fast closing the gap, with the UK retaining its own journal and some observing facilities not shared with either the USA or other European countries. Detailed examination of specific facilities indicates that size (of telescope, community, and budget) are all of great importance, but that the most significant "focal plane instrument" is still the astronomer at the virtual eyepiece. The changes have happened against a background of enormous increases in numbers of astronomers, sizes of available facilities (but not total number), numbers of papers (but not of journals), and numbers of citations per paper. A significant subset of the conclusions on turnover of people and facilities accompanying major growth: opportunity versus achievement; Europe versus the USA; and the trade-off between community size and the influence of individual scientists undoubtedly apply in many other fields.
I studied the distribution of changes in journal impact factors (JIF) between 1998 and 2007 according to an empirical beta law with two exponents. Changes in JIFs (CJIF) were calculated as the quotient obtained by dividing the JIF for a given year by the JIF for the preceding year. The CJIFs showed good fit to a beta function with two exponents. In addition, I studied the distribution of the changes in segments of the CJIF rank order. The distributions, which were similar from year to year, could be fitted to a Lorentzian function. The methods used here can be useful to understand the changes in JIFs using relatively simple functions.
We introduce the dominance dimension principle and the parameterized family of criteria for the assessment of publication/citation profiles it generates. We show that by a suitable choice of parameters dominance dimension may specialize to the most widely known and used of those impact scores for the scientific output of authors which disallow endogenous reputation effects, including the Durfee- or h-number, the publication number and the citation number.
The present paper describes the application of growth models as suggested by Egghe and Ravichadra Rao (Scientometrics 25:5-46, 1992). The scope of the paper is limited to study the growth and dynamics of Indian and Chinese publications in the field of liquid crystals research (1997-2006).
Patenting and licensing is not only a significant method of university knowledge transfer, but also an important indicator for measuring academic R&D strength and knowledge utilization. The methodologies of quantitative and qualitative analysis, including a special patent h-index indicator to assess patenting quality, were used to examine university patenting worldwide. Analysis of university patenting from 1998 to 2008 showed a significant overall global increase in which Chinese academia stands out: most of the top 20 universities in patenting in 2008 were in China. However, a low rate of utilization of Chinese academic patents may have roots in: (1) university research evaluation system encourages the patent production more, rather than the utilization; (2) problems in the formal mechanisms for university technology transfer and licensing, (3) industry's limited expectation and receptive capabilities and/or (4) a mismatch between the interests of the two institutional spheres. The next action to be taken by government, university and industry in China will be to explore strategies for improving academic patent quality and industry take-up.
Using strictly the same parameters (identical two publication years (2004 2005) and identical one-year citation window (2006)), IF 2006 was compared with h-index 2006 for two samples of "Pharmacology and Pharmacy" and "Psychiatry" journals computed from the ISI Web of Science. For the two samples, the IF and the h-index rankings of the journals are very different. The correlation coefficient between the IF and the h-index is high for Psychiatry but lower for Pharmacology. The linearity test performed between the h-index and IF alpha/alpha+1.n(1/alpha+1) showed the great sensitivity of the model compared with alpha. The IF and h-index can be completely complementary when evaluating journals of the same scientific discipline.
This study investigates the knowledge diffusion patterns of Nanoscience & Nanotechnology (N&N) by analyzing the overall research interactions between N&N and nano-related subjects through citation analysis. Three perspectives were investigated to achieve this purpose. Firstly, the overall research interactions were analyzed to identify the dominant driving forces in advancing the development of N&N. Secondly, the knowledge diffusion intensity between N&N and nano-related subjects was investigated to determine the areas most closely related to N&N. Thirdly, the diffusion speed was identified to detect the time distance of knowledge diffusion between N&N and nano-related subjects. The analysis reveals that driving forces from the outside environment rather than within N&N itself make the foremost contributions to the development of N&N. From 1998 to 2007, Material Science, Physics, Chemistry, N&N, Electrical & Electronic and Metallurgy & Metallurgical Engineering are the key contributory and reference subjects for N&N. Knowledge transfer within N&N itself is the quickest. And the speed of knowledge diffusion from other subjects to N&N is slower than that from N&N to other subjects, demonstrating asymmetry of knowledge diffusion in the development of N&N. The results indicate that N&N has matured into a relatively open, diffuse and dynamic system of interactive subjects.
Countries often spend billions on university research. There is growing interest in how to assess whether that money is well spent. Is there an objective way to assess the quality of a nation's world-leading science? I suggest a method, and illustrate it with data on economics. Of 450 genuinely world-leading journal articles, the UK produced 10% and the rest of Europe slightly more. Interestingly, more than a quarter of these UK articles came from outside the best-known university departments. The proposed methodology could be applied to almost any academic discipline or nation.
A relation, established by Andras Schubert (Scientometrics 78(3): 559-565, 2009) on the relation between a paper's h-index and its total number of received citations, is explained. The relation is a concavely increasing power law and is explained based on the Lotkaian model for the h-index, proved by Egghe and Rousseau.
The acquisition of new technologies represents a vitally important and fundamental goal of many corporate managers, particularly those within the medical device industry. We collect data on ten medical device companies as our sample in this study, covering the period from 1990 to 2006; this sample is drawn from the top 20 companies in the US, on the basis of international sales performance. We also collect details on all of the acquisitions undertaken by these companies, along with their patenting performance. The empirical results of this study suggest that technological acquisitions are only likely to be of help to the acquiring firms, in terms of improving their innovative performance, if they set out to acquire those companies that are in similar proximity, in terms of their technological field. There is also a clear need for such acquiring firms to ensure their continuing commitment to internal R&D investment in order to maintain their own versatility.
Hirsch's concept of h-index was used to define a similarity measure for journals. The h-similarity is easy to calculate from the publicly available data of the Journal Citation Reports, and allows for plausible interpretation. On the basis of h-similarity, a relative eminence indicator of journals was determined: the ratio of the JCR impact factor to the weighted average of that of similar journals. This standardization allows journals from disciplines with lower average citation level (mathematics, engineering, etc.) to get into the top lists.
The h-index is now used almost as a canonical tool for research assessment of individuals, research faculties and institutions and even for comparing performance of journals and countries. However, its limitations have also been noticed and many Hirsch-type variants have been proposed. In this paper, a "mock h-index" which was recently proposed is compared with the "tapered h-index".
The h-index has captured the imagination of scientometricians and bibliometricians to such an extent that one can now divide the history of the subject virtually into a pre-Hirsch and a post-Hirsch period. Beyond its academic value, it is now used as a tool for research assessment of individuals, research faculties and institutions and even for comparing performance of journals and countries. Since its introduction, many Hirsch-type variants have been proposed to overcome perceived limitations of the original index. In this paper, using ideas from mathematical modeling, another mock h-index is proposed which may complement the h-index and give it better resolving power.
In this paper, a new indicator called the performance index (p-index) is used to rank a 100 most prolific economists. The p-index strikes the best balance between activity (total citations C) and excellence (mean citation rate C/P). The surprise is that the h-index, which is now universally accepted almost as a canonical tool for research assessment of individuals, research faculties and institutions and even for comparing performance of journals and countries, is actually a poor indicator of performance.
It is widely recognized that collaboration between the public and private research sectors should be stimulated and supported, as a means of favoring innovation and regional development. This work takes a bibliometric approach, based on co-authorship of scientific publications, to propose a model for comparative measurement of the performance of public research institutions in collaboration with the domestic industry collaboration with the private sector. The model relies on an identification and disambiguation algorithm developed by the authors to link each publication to its real authors. An example of application of the model is given, for the case of the academic system and private enterprises in Italy. The study demonstrates that for each scientific discipline and each national administrative region, it is possible to measure the performance of individual universities in both intra-regional and extra-regional collaboration, normalized with respect to advantages of location. Such results may be useful in informing regional policies and merit-based public funding of research organizations.
This paper presents an analysis of the structure of computer science research articles published in the Lecture Notes of Computer Science series. While it is clear that most articles start with an Introduction and end with a Conclusion, the structure of text between these two sections is rather diverse. We studied the positions of different section types, and analysed dependencies between them. As a result, we present a number of common patterns used by writers, and make suggestions on how to improve the presentation of research in computer science.
This paper revisits an aspect of citation theory (i.e., citer motivation) with respect to the Mathematical Review system and the reviewer's role in mathematics. We focus on a set of journal articles (369) published in Singularity Theory (1974-2003), the mathematicians who wrote editorial reviews for these articles, and the number of citations each reviewed article received within a 5 year period. Our research hypothesis is that the cognitive authority of a high status reviewer plays a positive role in how well a new article is received and cited by others. Bibliometric evidence points to the contrary: Singularity Theorists of lower status (junior researchers) have reviewed slightly more well-cited articles (2-5 citations, excluding author self-citations) than their higher status counterparts (senior researchers). One explanation for this result is that lower status researchers may have been asked to review 'trendy' or more accessible parts of mathematics, which are easier to use and cite. We offer further explanations and discuss a number of implications for a theory of citation in mathematics. This research opens the door for comparisons to other editorial review systems, such as book reviews written in the social sciences or humanities.
This paper proposes a critical analysis of the "Academic Ranking of World Universities", published every year by the Institute of Higher Education of the Jiao Tong University in Shanghai and more commonly known as the Shanghai ranking. After having recalled how the ranking is built, we first discuss the relevance of the criteria and then analyze the proposed aggregation method. Our analysis uses tools and concepts from Multiple Criteria Decision Making (MCDM). Our main conclusions are that the criteria that are used are not relevant, that the aggregation methodology is plagued by a number of major problems and that the whole exercise suffers from an insufficient attention paid to fundamental structuring issues. Hence, our view is that the Shanghai ranking, in spite of the media coverage it receives, does not qualify as a useful and pertinent tool to discuss the "quality" of academic institutions, let alone to guide the choice of students and family or to promote reforms of higher education systems. We outline the type of work that should be undertaken to offer sound alternatives to the Shanghai ranking.
Although technological diversification is an important strategic decision for both large and small firms alike, the conventional method of measuring such diversification may well introduce significant scale bias against small- and medium-sized firms. We examine this issue in this study using a sample of 73 Taiwanese integrated-circuit (IC) design firms covering the period from 1995 to 2007 and conclude that the conventional measure of technological diversification reflects the spread or distribution amongst technology classes of a company's current technology portfolio, and does not capture the incremental expansion in technological scope, or the 'dynamic act of diversification', as reflected in our alternative scope measure. Our results suggest clear constraints on the applications made under the conventional index, particularly for firms with small patent scale.
Combining different data sets with information on grant and fellowship applications submitted to two renowned funding agencies, we are able to compare their funding decisions (award and rejection) with scientometric performance indicators across two fields of science (life sciences and social sciences). The data sets involve 671 applications in social sciences and 668 applications in life sciences. In both fields, awarded applicants perform on average better than all rejected applicants. If only the most preeminent rejected applicants are considered in both fields, they score better than the awardees on citation impact. With regard to productivity we find differences between the fields. While the awardees in life sciences outperform on average the most preeminent rejected applicants, the situation is reversed in social sciences. (C) 2009 Elsevier Ltd. All rights reserved.
Research was undertaken that examined what, if any, correlation there was between the h-index and rankings by peer assessment, and what correlation there was between the 2008 UK RAE rankings and the collective h-index of submitting departments. About 100 international scholars in Library and Information Science were ranked by their peers on the quality of their work. These rankings were correlated with the h and g scores the scholars had achieved. The results showed that there was a correlation between their median rankings and the indexes. The 2008 RAE grade point averages (GPA) achieved by departments from three UoAs - Anthropology, Library and Information Management and Pharmacy were compared with each of their collective h and g index scores. Results were mixed, with a strong correlation between pharmacy departments and index scores, followed by library and information management to anthropology where negative and non-significant results were found. Taken together, the findings from the research indicate that individual ranking by peer assessment and their h-index or variants was generally good. Results for the RAE 2008 gave correlations between GPA and successive versions of the h-index which varied in strength, except for anthropology where, it is suggested detailed cited reference searches must be undertaken to maximise citation counts. (C) 2009 Charles Oppenheim. Published by Elsevier Ltd. All rights reserved.
Growing cooperation between Chinese journals and international publishers invites an investigation of the effect of this cooperation, based on an analysis of journal IF changes. Data from 23 Chinese academic journals were chosen from about 50 English-language academic journals indexed by SCI or SCIE and with a long history of cooperation. The data do not suggest that cooperation has improved the journals' IF thus far. It appears that cooperation is generally limited to international distribution, and this has a weak influence on the quality of the journal and its IF, even though the papers can be accessed by worldwide users through publishers' international distribution networks. Cooperation with international publishers is one step, but actively working on the quality of the journals is a more important step. (C) 2009 Elsevier Ltd. All rights reserved.
We performed a thorough comparison of four main indicators of journal influence, namely 2-year impact factor, 5-year impact factor, eigenfactor and article influence. These indicators have been recently added by Thomson Reuters to the Journal Citation Reports, in both science and social science editions, and are thus available for study and comparison over a sample of significative size. We find that the distribution associated with the eigenfactor largely differs from the distribution of the other surveyed measures in terms of deviation from the mean, concentration, entropy, and skewness. Moreover, it is the one that best fits to the lognormal theoretical model. Surprisingly, the eigenfactor is also the most variable indicator when computed across different fields of science and social science, while article influence is the most stable in this respect, and hence the most suitable metric to be used interdisciplinarily. Finally, the journal rankings provided by impact factors and article influence are relatively similar and diverge from the one produced by eigenfactor, which is closer to that given by the total number of received citations. (C) 2009 Elsevier Ltd. All rights reserved.
In the present study we have tried to trace the growth of malaria research at Global Level and the distribution of articles in various journals for the period 1955-2005. The data have been extracted from a database, which has been developed in-house from MEDLINE, SCI, TDB, Ovid Heath Information and Indian Science Abstracts. Study indicates that the exponential model fits the data on journals, articles and authors. The R(2) value for the trend for journals, articles, and authors are 0.9502, 0.9475, and 0.9651, respectively. The growth rates for journals, articles and authors are 5.31%, 7.38%, and 10.06%, respectively. The linear multiple regression equation that Articles =-39.2771 + 3.61719*journals + 0.085882*Authors (R(2) = 99.16%) is most meaningful and it may be used to estimate the articles for given numbers of journals and authors. (C) 2009 Elsevier Ltd. All rights reserved.
Scientific papers are usually assessed by a number of direct citations. The number of citations received by direct citations (2nd generation citations) has been considered as an alternative criterion of evaluation. Such an approach overrates the papers, which received citation(s) in one or in a few very highly cited papers. Hirsch-type approach to the 2nd generation citations suggested by Schubert was used to combine the impact and quantity of 1st generation citations into one number. (C) 2010 Elsevier Ltd. All rights reserved.
This paper explores a new indicator of journal citation impact, denoted as source normalized impact per paper (SNIP). It measures a journal's contextual citation impact, taking into account characteristics of its properly defined subject field, especially the frequency at which authors cite other papers in their reference lists, the rapidity of maturing of citation impact, and the extent to which a database used for the assessment covers the field's literature. It further develops Eugene Garfield's notions of a field's 'citation potential' defined as the average length of references lists in a field and determining the probability of being cited, and the need in fair performance assessments to correct for differences between subject fields. A journal's subject field is defined as the set of papers citing that journal. SNIP is defined as the ratio of the journal's citation count per paper and the citation potential in its subject field. It aims to allow direct comparison of sources in different subject fields. Citation potential is shown to vary not only between journal subject categories - groupings of journals sharing a research field - or disciplines (e. g., journals in mathematics, engineering and social sciences tend to have lower values than titles in life sciences), but also between journals within the same subject category. For instance, basic journals tend to show higher citation potentials than applied or clinical journals, and journals covering emerging topics higher than periodicals in classical subjects or more general journals. SNIP corrects for such differences. Its strengths and limitations are critically discussed, and suggestions are made for further research. All empirical results are derived from Elsevier's Scopus. (C) 2010 Elsevier Ltd. All rights reserved.
We investigate the community structure of physics subfields in the citation network of all Physical Review publications between 1893 and August 2007. We focus on well-cited publications (those receiving more than 100 citations), and apply modularity maximization to uncover major communities that correspond to clearly identifiable subfields of physics. While most of the links between communities connect those with obvious intellectual overlap, there sometimes exist unexpected connections between disparate fields due to the development of a widely applicable theoretical technique or by cross fertilization between theory and experiment. We also examine communities decade by decade and also uncover a small number of significant links between communities that are widely separated in time. (C) 2010 Elsevier Ltd. All rights reserved.
The creation of some kind of representations depicting the current state of Science (or scientograms) is an established and beaten track for many years now. However, if we are concerned with the automatic comparison, analysis and understanding of a set of scientograms, showing for instance the evolution of a scientific domain or a face-to-face comparison of several countries, the task is titanically complex as the amount of data to analyze becomes huge and complex. In this paper, we aim to show that graph-based data mining tools are useful to deal with scientogram analysis. Subdue, the first algorithm proposed in the graph mining area, has been chosen for this purpose. This algorithm has been customized to deal with three different scientogram analysis tasks regarding the evolution of a scientific domain over time, the extraction of the common research categories substructures in the world, and the comparison of scientific domains between different countries. The outcomes obtained in the developed experiments have clearly demonstrated the potential of graph mining tools in scientogram analysis. (C) 2010 Elsevier Ltd. All rights reserved.
In the present paper we give an overview over the opportunities of probabilistic models in scientometrics. Four examples from different topics are used to shed light on some important aspects of reliability and robustness of indicators based on stochastic models. Limitations and future tasks are discussed as well. (C) 2010 Elsevier Ltd. All rights reserved.
The well-known discrete theory of conjugate partitions, Ferrers graphs and Durfee squares is interpreted in informetrics. It is shown that partitions and their conjugates have the same h-index, a fact that is not true for the g- and R-index. A modification of Ferrers graph is presented, yielding the g-index. We then present a formula for the Lorenz curve of the conjugate partition in function of the Lorenz curve of the original partition in the discrete setting. Ferrers graphs, Durfee squares and conjugate partitions are then defined in the continuous setting where variables range over intervals. Conjugate partitions are nothing else than the inverses of rank-frequency functions in informetrics. Also here they have the same h-index and we can again give a formula for the Lorenz curve of the conjugate partition in function of the Lorenz curve of the original partition. Calculatory examples are given where these Lorenz curves are equal and where one Lorenz curve dominates the other one. We also prove that the Lorenz curve of a partition and the one of its conjugate can intersect on the open interval ]0, 1[. (C) 2010 Elsevier Ltd. All rights reserved.
The aim of this paper is to characterize the distribution of number of hits and spent time by web session. It also expects to find if there are significant differences between the length and the duration of a session with regard to the point of access-search engine, link or root. Web usage mining was used to analyse 17,174 web sessions that were identified from the webometrics.info web site. Results show that both distribution of length and duration follow an exponential decay. Significant differences between the different origins of the visits were also found, being the search engines' users those who spent most time and did more clicks in their sessions. We conclude that a good SEO policy would be justified, because search engines are the principal intermediaries to this web site. (C) 2010 Elsevier Ltd. All rights reserved.
Scientists from universities are becoming more proactive in their efforts to commercialize research results. Patenting, as an important channel of university knowledge transfer, has initiated a controversy on potential effects for the future of scientific research. This paper contributes to the growing study on the relationship between patenting and publishing among faculty members with China's evidence in the field of nanotechnology. Data from top 32 most prolific universities in patenting are used to examine the relationship, consisting of 6321 confirmed academic inventors who both publish and patent over the time period 1991-2008. By controlling for heterogeneity of patenting activities, patenting experience, institutional affiliation and collaboration with foreign researchers, the findings in China's nanotechnology generally support earlier investigations concluding that patenting activity does not adversely affect research output. Patenting, however, has negative impacts on both quantity and quality of university researchers' publication output, when the assignee lists include corporations or scientists themselves. (C) 2010 Elsevier Ltd. All rights reserved.
The country-wise distribution of papers, which cite certain scientist is a sum of typical distribution for his/her branch of science and excessive citations from one or from a few countries. A new Hirsch-type index h_int is defined as the number of countries, h_int, from which at least h_int papers cite certain scientist, but from the country ranked h_int+1 in citing the scientist, less than h_int+1 papers cite that scientist. The h_int index reflects broad international recognition of a scientist, and prevents overrating of a citation record earned chiefly by self-citations or by citations received from a narrow circle of co-workers. (C) 2010 Elsevier Ltd. All rights reserved.
Slovenia's Current Research Information System (SICRIS) currently hosts 86,443 publications with citation data from 8359 researchers working on the whole plethora of social and natural sciences from 1970 till present. Using these data, we show that the citation distributions derived from individual publications have Zipfian properties in that they can be fitted by a power law P(x)similar to x(-alpha), with alpha between 2.4 and 3.1 depending on the institution and field of research. Distributions of indexes that quantify the success of researchers rather than individual publications, on the other hand, cannot be associated with a power law. We find that for Egghe's g-index and Hirsch's h-index the log-normal form P(x)similar to exp[-alnx - b(ln x)(2)] applies best, with a and b depending moderately on the underlying set of researchers. In special cases, particularly for institutions with a strongly hierarchical constitution and research fields with high self-citation rates, exponential distributions can be observed as well. Both indexes yield distributions with equivalent statistical properties, which is a strong indicator for their consistency and logical connectedness. At the same time, differences in the assessment of citation histories of individual researchers strengthen their importance for properly evaluating the quality and impact of scientific output. (C) 2010 Elsevier Ltd. All rights reserved.
Rankings of journals and rankings of scientists are usually discussed separately. We argue that a consistent approach to both rankings is desirable because both the quality of a journal and the quality of a scientist depend on the papers it/he publishes. We present a pair of consistent rankings (impact factor for the journals and total number of citations for the authors) and we provide an axiomatic characterization thereof. (C) Rankings 2010 Elsevier Ltd. All rights reserved.
A size-independent indicator of journals' scientific prestige, the SCImago Journal Rank (SJR) indicator, is proposed that ranks scholarly journals based on citation weighting schemes and eigenvector centrality. It is designed for use with complex and heterogeneous citation networks such as Scopus. Its computation method is described, and the results of its implementation on the Scopus 2007 dataset is compared with those of an ad hoc Journal Impact Factor, JIF(3y), both generally and within specific scientific areas. Both the SJR indicator and the JIF distributions were found to fit well to a logarithmic law. While the two metrics were strongly correlated, there were also major changes in rank. In addition, two general characteristics were observed. On the one hand, journals' scientific influence or prestige as computed by the SJR indicator tended to be concentrated in fewer journals than the quantity of citation measured by JIF(3y). And on the other, the distance between the top-ranked journals and the rest tended to be greater in the SJR ranking than in that of the JIF(3y), while the separation between the middle and lower ranked journals tended to be smaller. (C) 2010 Elsevier Ltd. All rights reserved.
The principle of a new type of impact measure was introduced recently, called the "Audience Factor" (AF). It is a variant of the journal impact factor where emitted citations are weighted inversely to the propensity to cite of the source. In the initial design, propensity was calculated using the average length of bibliography at the source level with two options: a journal-level average or a field-level average. This citing-side normalization controls for propensity to cite, the main determinant of impact factor variability across fields. The AF maintains the variability due to exports-imports of citations across field and to growth differences. It does not account for influence chains, powerful approaches taken in the wake of Pinski-Narin's influence weights. Here we introduce a robust variant of the audience factor, trying to combine the respective advantages of the two options for calculating bibliography lengths: the classification-free scheme when the bibliography length is calculated at the individual journal level, and the robustness and avoidance of ad hoc settings when the bibliography length is averaged at the field level. The variant proposed relies on the relative neighborhood of a citing journal, regarded as its micro-field and assumed to reflect the citation behavior in this area of science. The methodology adopted allows a large range of variation of the neighborhood, reflecting the local citation network, and partly alleviates the "cross-scale" normalization issue. Citing-side normalization is a general principle which may be extended to other citation counts. (C) 2010 Elsevier Ltd. All rights reserved.
The h index is a widely used indicator to quantify an individual's scientific research output. But it has been criticized for its insufficient accuracy-the ability to discriminate reliably between meaningful amounts of research output. As a single measure it cannot capture the complete information on the citation distribution over a scientist's publication list. An extensive data set with bibliometric data on scientists working in the field of molecular biology is taken as an example to introduce two approaches providing additional information to the h index: (1) h(2) lower, h(2) center, and h(2) upper are proposed, which allow quantification of three areas within a scientist's citation distribution: the low impact area (h(2) lower), the area captured by the h index (h(2) center), and the area of publications with the highest visibility (h(2) upper). (2) Given the existence of different areas in the citation distribution, the segmented regression model (sRM) is proposed as a method to statistically estimate the number of papers in a scientist's publication list with the highest visibility. However, such sRM values should be compared across individuals with great care. (C) 2010 Elsevier Ltd. All rights reserved.
In this study direct citations are weighted with shared references and co-citations in an attempt to decompose a citation network of articles on the subject of library and information science. The resulting maps have much in common with author co-citation maps that have been previously presented. However, using direct citations yields somewhat more detail in terms of detecting sub-domains. Reducing the network down to the strongest links of each article yielded the best results in terms of a high number of clusters, each with a substantial number of articles similar in content. (C) 2010 Elsevier Ltd. All rights reserved.
The Center for Science and Technology Studies at Leiden University advocates the use of specific normalizations for assessing research performance with reference to a world average. The Journal Citation Score (JCS) and Field Citation Score (FCS) are averaged for the research group or individual researcher under study, and then these values are used as denominators of the (mean) Citations per publication (CPP). Thus, this normalization is based on dividing two averages. This procedure only generates a legitimate indicator in the case of underlying normal distributions. Given the skewed distributions under study, one should average the observed versus expected values which are to be divided first for each publication. We show the effects of the Leiden normalization for a recent evaluation where we happened to have access to the underlying data. (C) 2010 Elsevier Ltd. All rights reserved.
We reply to the criticism of Opthof and Leydesdorff on the way in which our institute applies journal and field normalizations to citation counts. We point out why we believe most of the criticism is unjustified, but we also indicate where we think Opthof and Leydesdorff raise a valid point. (C) 2010 Elsevier Ltd. All rights reserved.
The scientific impact of a publication can be determined not only based on the number of times it is cited but also based on the citation speed with which its content is noted by the scientific community. Here we present the citation speed index as a meaningful complement to the h index: whereas for the calculation of the h index the impact of publications is based on number of citations, for the calculation of the speed index it is the number of months that have elapsed since the first citation, the citation speed with which the results of publications find reception in the scientific community. The speed index is defined as follows: a group of papers has the index s if for s of its N-p papers the first citation was at least s months ago, and for the other (N-p - s) papers the first citation was <= s months ago. (C) 2010 Elsevier Ltd. All rights reserved.
This paper reports on a bibliometric study of the characteristics and impact of research in the library and information science (LIS) field which was funded through research grant programs, and compares it with research that received no extra funding. Seven core LIS journals were examined to identify articles published in 1998 that acknowledge research grant funding. The distribution of these articles by various criteria (e.g., topic, affiliation, funding agency) was determined. Their impact as indicated by citation counts during 1998-2008 was evaluated against that of articles without acknowledging extra funding and published in the same journals in the same year using citation data collected from Scopus' Citation Tracker. The impact of grant-funded research as measured by citation counts was substantially higher than that of other research, both overall and in each journal individually. Scholars from outside LIS core institutions contributed heavily to grant-funded research. The two highest-impact publications by far reported non-grant-based research, and grant-based funding of research reported in core LIS journals was biased towards the information retrieval (IR) area, particularly towards research on IR systems. The percentage of articles reporting grant-funded research was substantially higher in information-oriented journals than in library-focused ones.
Reliability of citation searches is a cornerstone of bibliometric research. The authors compare simultaneous search returns at two sites to demonstrate discrepancies that can occur as a result of differences in institutional subscriptions to the Web of Science and Web of Knowledge. Such discrepancies may have significant implications for the reliability of bibliometric research in general, but also for the calculation of individual and group indices used for promotion and funding decisions. The authors caution care when describing the methods used in bibliometric analysis and when evaluating researchers from different institutions. In both situations a description of the specific databases used would enable greater reliability.
The objective of this work is to describe the distribution of different types of participating organizations in the health thematic area of the 6th Framework Programme. A total of 2132 different organizations were classified according to four types and then grouped by country. A Principal Component Analysis (PCA) was carried out on the percentage of funding obtained by each type of organization. Results show a countries map plotted around the "private" and "public" principal components. It is observed that there are countries which research is basically performed by government research centres, while others are supported in the university activity. We conclude that the PCA is a suitable method to plot the distribution of research organizations by country and the results could be used as a tool for theoretical studies about the scientific activity in a country.
This paper explores the relationship between patenting and publishing in the field of nanotechnology for Chinese universities. With their growing patents, Chinese universities are becoming main technological source for nanotechnology development that is extremely important in China. Matching names of patentees to names of research paper authors in Chinese universities, we find 6,321 authors with patents, i.e. inventor-authors, and 65,001 without any patent. Research performance is measured using three indicators-publication counts, total citations and h-index received by each researcher. It is found that research performance of authors who are also inventors holding patents is better than that of those authors who do not have a patent, and that most of high quality research is performed by inventor-authors. Our findings indicate that patent-oriented research may produce better results.
Contemporary scholarly discourse follows many alternative routes in addition to the three-century old tradition of publication in peer-reviewed journals. The field of High-Energy Physics (HEP) has explored alternative communication strategies for decades, initially via the mass mailing of paper copies of preliminary manuscripts, then via the inception of the first online repositories and digital libraries. This field is uniquely placed to answer recurrent questions raised by the current trends in scholarly communication: is there an advantage for scientists to make their work available through repositories, often in preliminary form? Is there an advantage to publishing in Open Access journals? Do scientists still read journals or do they use digital repositories? The analysis of citation data demonstrates that free and immediate online dissemination of preprints creates an immense citation advantage in HEP, whereas publication in Open Access journals presents no discernible advantage. In addition, the analysis of clickstreams in the leading digital library of the field shows that HEP scientists seldom read journals, preferring preprints instead.
The quantity and quality of scientific output of the topmost 50 countries in the four basic sciences (agricultural & biological sciences, chemistry, mathematics, and physics & astronomy) are studied in the period of the recent 12 years (1996-2007). In order to rank the countries, a novel two-dimensional method is proposed, which is inspired by the H-index and other methods based on quality and quantity measures. The countries data are represented in a "quantity-quality diagram", and partitioned by a conventional statistical algorithm into three clusters, members of which are rather the same in all of the basic sciences. The results offer a new perspective on the global positions of countries with regards to their scientific output.
Collaborative coefficient (CC) is a measure of collaboration in research, that reflects both the mean number of authors per paper as well as the proportion of multi-authored papers. Although it lies between the values 0 and 1, and is 0 for a collection of purely single-authored papers, it is not 1 for the case where all papers are maximally authored, i.e., every publication in the collection has all authors in the collection as co-authors. We propose a simple modification of CC, which we call modified collaboration coefficient (or MCC, for short), which improves its performance in this respect.
Scientific collaboration is growing in its importance; more so in Asian and African countries. This paper examines the scenario of science and scientific collaboration in South Africa which had passed through the colonial and apartheid regimes before it became a democracy in 1994. South African science under distinct political periods moved through some difficult periods but it did not badly affect the progress and direction of South African science. Science and scientific collaboration continued to grow under its major political phases amidst serious challenges. Despite internal conflict and boycott by the international scientific community, South Africa could move onto a stable and steady path of growth in science and collaboration under apartheid which is being carried on in the new South Africa. Collaborative research is encouraged at various levels of knowledge production and in science. The importance science and scientific development is gaining in today's South Africa is remarkable.
A change in scientific developments in recent decades is widely proclaimed which may be associated with terms like postmodern science or steady state science. This change is usually discussed from a more epistemological viewpoint. In order to enhance the understanding of the underlying key factors, bibliometric, demographic and Nobel Prize recipient data spanning of the last hundred years are considered and analyzed. It is found that in general the considered data point to a quasi-steady state in bibliometric developments of highly developed countries. For emerging countries, such a steady state is not yet attained; therefore, the research output in scientific journal articles is still expected to rise considerably. Consequences and interpretations of an ever growing research output in relation to the increasing age of Nobel Prize recipients are discussed and conclusions are drawn from the considered data.
As part of a research program to analyse research in Bangladesh we provide a comparison between research indicators related to India, Bangladesh, Pakistan and Sri Lanka. In this investigation we make use of Web of Science (WoS) data as well as Scopus data (using the SCImago website). Special attention is given to collaboration data and to the evolution of country h-indices. A comparison based on relative quality indicators shows that Sri Lanka is the best performer among these four countries. Such a result agrees with the ranking of these countries according to the United Nations' Human Development Index (HDI).
An article assessment system based on both Tianjin University and nine key Chinese Universities' academic disciplinary benchmarks was established to evaluate researcher's published papers. With this scientific benchmarking system, the quality of a researcher's papers could be easily located in a percentile scale in corresponding field within certain groups. Several factors, including total number of papers, order of authors, impact of journals, citation count, h-index, e-index, a-index, m-quotient, etc., were also utilized for both quantity and quality analysis. Furthermore, the novel proposed weighted citation analysis was introduced to judge a researcher's contribution to his/her research outcomes. The convenient application and comprehensive evaluation property of this assessment system was thoroughly discussed via a given example.
The set of citations received by a set of publications consists of citations received by articles in the h-core and citations received by articles in the h-tail. Denoting the cardinalities of these fours sets as C, P, C (H) and C (T) we introduce the tail-core ratio (C (T)/C (H)) and show that in practical cases this ratio tends to increase. Introducing further the k-index, defined as k = (C/P)/(C (T)/C (H)), we show that this index decreases in most practical cases. A power law model is in accordance with these practical observations.
Up to the 1960s the prevalent view of science was that it was a step-by-step undertaking in slow, piecemeal progression towards truth. Thomas Kuhn argued against this view and claimed that science always follows this pattern: after a phase of "normal" science, a scientific "revolution" occurs. Taking as a case study the transition from the static view of the universe to the Big Bang theory in cosmology, we appraised Kuhn's theoretical approach by conducting a historical reconstruction and a citation analysis. As the results show, the transition in cosmology can be linked to many different persons, publications, and points in time. The findings indicate that there was not one (short term) scientific revolution in cosmology but instead a paradigm shift that progressed as a slow, piecemeal process.
Over the last years the h-index has gained popularity as a measure for comparing the impact of scientists. We investigate if ranking according to the h-index is stable with respect to (i) different choices of citation databases, (ii) normalizing citation counts by the number of authors or by removing self-citations, (iii) small amounts of noise created by randomly removing citations or publications and (iv) small changes in the definition of the index. In experiments for 5,283 computer scientists and 1,354 physicists we show that although the ranking of the h-index is stable under most of these changes, it is unstable when different databases are used. Therefore, comparisons based on the h-index should only be trusted when the rankings of multiple citation databases agree.
This study investigates South-South collaboration in research, and specifically collaboration among the 15 countries of the Southern African Development Community (SADC) as well as between the SADC and the rest of Africa. It was found that only 3% of SADC papers during 2005-2008 were jointly authored by researchers from two or more SADC countries (intra-regional collaboration), and only 5% of SADC papers were jointly authored with researchers from African countries outside the SADC (continental collaboration). In contrast, 47% of SADC papers were co-authored with scientists from high-income countries. The few instances of intra-regional and continental collaboration in the SADC are largely the product of North-South collaboration. Authors from high-income countries are included in 60% of intra-regional co-authored papers and in 59% of continental co-authored papers. Moreover, between 2005 and 2008, South Africa produced 81% of all SADC papers and 78% of all intra-regional co-authored papers. This implies that there is a highly unbalanced and unequal partnership that can best be described as a variant of North-South collaboration with the scientific giant in the South taking on the role of the 'political North'. As a consequence, guidelines for successful North-South collaborations should be extended to include South-South collaborations that comprise highly unequal partners, as is the case between South Africa and the other SADC countries.
This article reports findings from a study of the relationship between citation measures (impact factor and its quartile) among international composition of editorial board and foreign authorship in 17 Korean SCI journals for the three 5-year periods, 1995, 2000, and 2005. With few exceptions, the relationship between international editorial board composition and foreign authorship and citation measures was non-existent, at p > 0.05. However, the international members on editorial boards and foreign authorship of papers in Korean journals have increased greatly over the three 5-year periods, and there has been to a certain degree growth in the visibility and performance of Korean SCI journals in terms of impact factors, but not their quartiles.
We present VOSviewer, a freely available computer program that we have developed for constructing and viewing bibliometric maps. Unlike most computer programs that are used for bibliometric mapping, VOSviewer pays special attention to the graphical representation of bibliometric maps. The functionality of VOSviewer is especially useful for displaying large bibliometric maps in an easy-to-interpret way. The paper consists of three parts. In the first part, an overview of VOSviewer's functionality for displaying bibliometric maps is provided. In the second part, the technical implementation of specific parts of the program is discussed. Finally, in the third part, VOSviewer's ability to handle large maps is demonstrated by using the program to construct and display a co-citation map of 5,000 major scientific journals.
Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.
The category-tree document-classification structure is widely used by enterprises and information providers to organize, archive, and access documents for effective knowledge management. However, category trees from various sources use different hierarchical structures, which usually make mappings between categories in different category trees difficult. In this work, we propose a category-tree integration technique. We develop a method to learn the relationships between any two categories and develop operations such as mapping, splitting, and insertion for this integration. According to the parent-child relationship of the integrating categories, the developed decision rules use integration operations to integrate categories from the source category tree with those from the master category tree. A unified category tree can accumulate knowledge from multiple resources without forfeiting the knowledge in individual category trees. Experiments have been conducted to measure the performance of the integration operations and the accuracy of the integrated category trees. The proposed category-tree integration technique achieves greater than 80% integration accuracy, and the insert operation is the most frequently utilized, followed by map and split. The insert operation achieves 77% of F-1 while the map and split operations achieves 86% and 29% of F-1, respectively.
In a world of increasing information and communications possibilities, the difficulty for users of information systems and services may not lie in finding information but in filtering and integrating it into a cohesive whole. To do this, information seekers must know when and how to effectively use cognitive strategies to regulate their own thinking, motivation, and actions. Sometimes this is difficult when the topic is interesting and one is driven to explore it in great depth. This article reports on a qualitative study that, in the course of exploring the thinking and emotions of 10 adolescents during the information search process, uncovered patterns of behavior that are related to curiosity and interest. The larger purpose of the study was to investigate the metacognitive knowledge of adolescents, ages 16-18, as they searched for, selected, and used information to complete a school-based information task. The study found that the curiosity experienced by adolescents during the search process was accompanied by feelings of both pleasure and pain and that both feelings needed to be managed in order to navigate a pathway through the search process. The self-regulation of curiosity and interest was a clear and distinct metacognitive strategy fueled by metacognitive knowledge related to understanding one's own curiosity and the emotions attached to it.
Emotional meaning is critical for users to retrieve relevant images. However, because emotional meanings are subject to the individual viewer's interpretation, they are considered difficult to implement when designing image retrieval systems. With the intent of making an image's emotional messages more readily accessible, this study aims to test a new approach designed to enhance the accessibility of emotional meanings during the image search process. This approach utilizes image searchers' emotional reactions, which are quantitatively measured. Broadly used quantitative measurements for emotional reactions, Semantic Differential (SD) and Self-Assessment Manikin (SAM), were selected as tools for gathering users' reactions. Emotional representations obtained from these two tools were compared with three image perception tasks: searching, describing, and sorting. A survey questionnaire with a set of 12 images was administered to 58 participants, which were tagged with basic emotions. Results demonstrated that the SAM represents basic emotions on 2-dimensional plots (pleasure and arousal dimensions), and this representation consistently corresponded to the three image perception tasks. This study provided experimental evidence that quantitative users' reactions can be a useful complementary element of current image retrieval/indexing systems. Integrating users' reactions obtained from the SAM into image browsing systems would reduce the efforts of human indexers as well as improve the effectiveness of image retrieval systems.
We argue that the communication structures in the Chinese social sciences have not yet been sufficiently reformed. Citation patterns among Chinese domestic journals in three subject areas political science and Marxism, library and information science, and economics are compared with their counterparts internationally. Like their colleagues in the natural and life sciences, Chinese scholars in the social sciences provide fewer references to journal publications than their international counterparts; like their international colleagues, social scientists provide fewer references than natural sciences. The resulting citation networks, therefore, are sparse. Nevertheless, the citation structures clearly suggest that the Chinese social sciences are far less specialized in terms of disciplinary delineations than their international counterparts. Marxism studies are more established than political science in China. In terms of the impact of the Chinese political system on academic fields, disciplines closely related to the political system are less specialized than those weakly related. In the discussion section, we explore reasons that may cause the current stagnation and provide policy recommendations.
A central issue in evaluative bibliometrics is the characterization of the citation distribution of papers in the scientific literature. Here, we perform a large-scale empirical analysis of journals from every field in Thomson Reuters' Web of Science database. We find that only 30 of the 2,184 journals have citation distributions that are inconsistent with a discrete lognormal distribution at the rejection threshold that controls the false discovery rate at 0.05. We find that large, multidisciplinary journals are over-represented in this set of 30 journals, leading us to conclude that, within a discipline, citation distributions are lognormal. Our results strongly suggest that the discrete lognormal distribution is a globally accurate model for the distribution of "eventual impact" of scientific papers published in single-discipline journal in a single year that is removed sufficiently from the present date.
A multiple-perspective cocitation analysis method is introduced for characterizing and interpreting the structure and dynamics of cocitation clusters. The method facilitates analytic and sense making tasks by integrating network visualization, spectral clustering, automatic cluster labeling, and text summarization. Cocitation networks are decomposed into cocitation clusters. The interpretation of these clusters is augmented by automatic cluster labeling and summarization. The method focuses on the interrelations between a cocitation cluster's members and their citers. The generic method is applied to a three-part analysis of the field of information science as defined by 12 journals published between 1996 and 2008: (a) a comparative author cocitation analysis (ACA), (b) a progressive ACA of a time series of cocitation networks, and (c) a progressive document cocitation analysis (DCA). Results show that the multiple-perspective method increases the interpretability and accountability of both ACA and DCA networks.
The goal of the study was to determine the underlying processes leading to the observed collaborator distribution in modern scientific fields, with special attention to nonpower-law behavior. Nanoscience is used as a case study of a modern interdisciplinary field and its coauthorship network for 2000-2004 period is constructed from the Nano Bank database. We find three collaboration modes that correspond to three distinct ranges in the distribution of collaborators: (1) for authors with fewer than 20 collaborators (the majority) preferential attachment does not hold and they form a log-normal "hook" instead of a power law; (2) authors with more than 20 collaborators benefit from preferential attachment and form a power law tail; and (3) authors with between 250 and 800 collaborators are more frequent than expected because of the hyperauthorship practices in certain subfields.
The Discounted Cumulated Impact (DCI) index has recently been proposed for research evaluation. In the present work an earlier dataset by Cronin and Meho (2007) is reanalyzed, with the aim of exemplifying the salient features of the DCI index. We apply the index on, and compare our results to, the outcomes of the Cronin-Meho (2007) study. Both authors and their top publications are used as units of analysis, which suggests that, by adjusting the parameters of evaluation according to the needs of research evaluation, the DCI index delivers data on an author's (or publication's) "lifetime" impact or current impact at the time of evaluation on an author's (or publication's) capability of inviting citations from highly cited later publications as an indication of impact, and on the relative impact across a set of authors (or publications) over their "lifetime" or currently.
This paper analyzes the applicability of the article mean citation rate measures in the Science Citation Index Journal Citation Reports (SCI JCR) to the five JCR mathematical subject categories. These measures are the traditional 2-year impact factor as well as the recently added 5-year impact factor and 5-year article influence score. Utilizing the 2008 SCI JCR, the paper compares the probability distributions of the measures in the mathematical categories to the probability distribution of a scientific model of impact factor distribution. The scientific model distribution is highly skewed, conforming to the negative binomial type, with much of the variance due to the important role of review articles in science. In contrast, the three article mean citation rate measures' distributions in the mathematical categories conformed to either the binomial or Poisson, indicating a high degree of randomness. Seeking reasons for this, the paper analyzes the bibliometric structure of Mathematics, finding it a disjointed discipline of isolated subfields with a weak central core of journals, reduced review function, and long cited half-life placing most citations beyond the measures' time limits. These combine to reduce the measures' variance to one commensurate with random error. However, the measures were found capable of identifying important journals. Using data from surveys of the Louisiana State University (LSU) faculty, the paper finds a higher level of consensus among mathematicians and others on which are the important mathematics journals than the measures indicate, positing that much of the apparent randomness may be due to the measures' inapplicability to mathematical disciplines. Moreover, tests of the stability of impact factor ranks across a 5-year time span suggested that the proper model for Mathematics is the negative binomial.
Despite over 10 years of research there is no agreement on the most suitable roles for Webometric indicators in support of research policy and almost no field-based Webometrics. This article partly fills these gaps by analyzing the potential of policy-relevant Webometrics for individual scientific fields with the help of 4 case studies. Although Webometrics cannot provide robust indicators of knowledge flows or research impact, it can provide some evidence of networking and mutual awareness. The scope of Webometrics is also relatively wide, including not only research organizations and firms but also intermediary groups like professional associations, Web portals, and government agencies. Webometrics can, therefore, provide evidence about the research process to compliment peer review, bibliometric, and patent indicators: tracking the early, mainly prepublication development of new fields and research funding initiatives, assessing the role and impact of intermediary organizations and the need for new ones, and monitoring the extent of mutual awareness in particular research areas.
We present a theoretical and empirical analysis of a number of bibliometric indicators of journal performance. We focus on three indicators in particular: the Eigenfactor indicator, the audience factor, and the influence weight indicator. Our main finding is that the last two indicators can be regarded as a kind of special case of the first indicator. We also find that the three indicators can be nicely characterized in terms of two properties. We refer to these properties as the property of insensitivity to field differences and the property of insensitivity to insignificant journals. The empirical results that we present illustrate our theoretical findings. We also show empirically that the differences between various indicators of journal performance are quite substantial.
In this article, the authors develop hypotheses about three key correlates of attitudes about discretionary online behaviors and control over one's own online information: frequency of engaging in risky online behaviors, experience of an online adverse event, and the disposition to be more or less trusting and cautious of others. Through an analysis of survey results, they find that online adverse events do not necessarily relate to greater overall Web discretion, but they do significantly associate with users' perceptions of Web information control. However, the frequencies with which individuals engage in risky online activities and behaviors significantly associate with both online discretion and information control. In addition, general dispositions to trust and be cautious are strongly related to prudent Internet behavior and attitudes about managing personal online information. The results of this study have clear consequences for our understanding of behaviors and attitudes that might lead to greater online social intelligence, or the ability to make prudent decisions in the presence of Internet uncertainties and risks. Implications for theory and practice are discussed.
A graph in van Eck and Waltman [JASIST, 60(8), 2009, p. 1644], representing the relation between the association strength and the cosine, is partially explained as a sheaf of parabolas, each parabola being the functional relation between these similarity measures on the trajectories (X) over right arrow.(Y) over right arrow =a, a constant. Based on earlier obtained relations between cosine and other similarity measures (e.g., Jaccard index), we can prove new relations between the association strength and these other measures.
In this article, we identify, compare, and contrast theoretical constructs for the fields of information searching and information retrieval to emphasize the uniqueness of and synergy between the fields. Theoretical constructs are the foundational elements that underpin a field's core theories, models, assumptions, methodologies, and evaluation metrics. We provide a framework to compare and contrast the theoretical constructs in the fields of information searching and information retrieval using intellectual perspective and theoretical orientation. The intellectual perspectives are information searching, information retrieval, and cross-cutting; and the theoretical orientations are information, people, and technology. Using this framework, we identify 17 significant constructs in these fields contrasting the differences and comparing the similarities. We discuss the impact of the interplay among these constructs for moving research forward within both fields. Although there is tension between the fields due to contradictory constructs, an examination shows a trend toward convergence. We discuss the implications for future research within the information searching and information retrieval fields.
Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA). Human interassessor reliability (IAR) was moderate to poor. However, correlations between average human ratings and n-gram solutions were strong. The average correlation between ATA and individual human solutions was greater than IAR. N-gram length influenced the strength of association, but optimum string length depended on the nature of the text (technical vs. nontechnical). We conclude that the methodology applied in previous studies may have led to overoptimistic views on human reliability, but that an optimal n-gram solution can provide a good approximation of the average human assessment of document similarity, a result that has important implications for future development of document visualization systems.
We report on a study that investigated the efficacy of four different interactive information retrieval (IIR) systems, each designed to support a specific information-seeking strategy (ISS). These systems were constructed using different combinations of IR techniques (i.e., combinations of different methods of representation, comparison, presentation and navigation), each of which was hypothesized to be well suited to support a specific ISS. We compared the performance of searchers in each such system, designated "experimental," to an appropriate "baseline" system, which implemented the standard specified query and results list model of current state-of-the-art experimental and operational IR systems. Four within-subjects experiments were conducted for the purpose of this comparison. Results showed that each of the experimental systems was superior to its baseline system in supporting user performance for the specific ISS (that is, the information problem leading to that ISS) for which the system was designed. These results indicate that an IIR system, which intends to support more than one kind of ISS, should be designed within a framework which allows the use and combination of different IR support techniques for different ISSs.
The authors set forth a general methodology for conducting bibliometric analyses at the micro level. It combines several indicators grouped into three factors or dimensions, which characterize different aspects of scientific performance. Different profiles or "classes" of scientists are described according to their research performance in each dimension. A series of results based on the findings from the application of this methodology to the study of Spanish National Research Council scientists in Spain in three thematic areas are presented. Special emphasis is made on the identification and description of top scientists from structural and bibliometric perspectives. The effects of age on the productivity and impact of the different classes of scientists are analyzed. The classificatory approach proposed herein may prove a useful tool in support of research assessment at the individual level and for exploring potential determinants of research success.
Thomson Reuters' Web of Science (WoS) is undoubtedly a great tool for scientiometrics purposes. It allows one to retrieve and compute different measures such as the total number of papers that satisfy a particular condition; however, it also is well known that this tool imposes several different restrictions that make obtaining certain results difficult. One of those constraints is that the tool does not offer the total count of documents in a dataset if it is larger than 100,000 items. In this article, we propose and analyze different approaches that involve partitioning the search space (using the Source field) to retrieve item counts for very large datasets from the WoS. The proposed techniques improve previous approaches: They do not need any extra information about the retrieved dataset (thus allowing completely automatic procedures to retrieve the results), they are designed to avoid many of the restrictions imposed by the WoS, and they can be easily applied to almost any query. Finally, a description of WoS Query Partitioner, a freely available and online interactive tool that implements those techniques, is presented.
Politicians' Web sites have been considered a medium for organizing, mobilizing, and agenda-setting, but extant literature lacks a systematic approach to interpret the Web sites of senators a new medium for political communication. This study classifies the role of political Web sites into relational (hyperlinking) and topical (shared-issues) aspects. The two aspects may be viewed from a social embeddedness perspective and three facets, as K. Foot and S. Schneider (2002) suggested. This study employed network analysis, a set of research procedures for identifying structures in social systems, as the basis of the relations among the system's components rather than the attributes of individuals. Hyperlink and issue data were gathered from the United States Senate Web site and Yahoo. Major findings include: (a) The hyperlinks are more targeted at Democratic senators than at Republicans and are a means of communication for senators and users; (b) the issue network found from the Web is used for discussing public agendas and is more highly utilized by Republican senators; (c) the hyperlink and issue networks are correlated; and (d) social relationships and issue ecologies can be effectively detected by these two networks. The need for further research is addressed.
Twitter is a microblogging and social networking service with millions of members and growing at a tremendous rate. With the buzz surrounding the service have come claims of its ability to transform the way people interact and share information and calls for public figures to start using the service. In this study, we are interested in the type of content that legislators are posting to the service, particularly by members of the United States Congress. We read and analyzed the content of over 6,000 posts from all members of Congress using the site. Our analysis shows that Congresspeople are primarily using Twitter to disperse information, particularly links to news articles about themselves and to their blog posts, and to report on their daily activities. These tend not to provide new insights into government or the legislative process or to improve transparency; rather, they are vehicles for self-promotion. However,Twitter is also facilitating direct communication between Congresspeople and citizens, though this is a less popular activity. We report on our findings and analysis and discuss other uses of Twitter for legislators.
Using Google Earth, Google Maps, and/or network visualization programs such as Pajek, one can overlay the network of relations among addresses in scientific publications onto the geographic map. The authors discuss the pros and cons of various options, and provide software (freeware) for bridging existing gaps between the Science Citation Indices (Thomson Reuters) and Scopus (Elsevier), on the one hand, and these various visualization tools on the other. At the level of city names, the global map can be drawn reliably on the basis of the available address information. At the level of the names of organizations and institutes, there are problems of unification both in the ISI databases and with Scopus. Pajek enables a combination of visualization and statistical analysis, whereas the Google Maps and its derivatives provide superior tools on the Internet.
The authors propose using the technique of weighted citation to measure an article's prestige. The technique allocates a different weight to each reference by taking into account the impact of citing journals and citation time intervals. Weighted citation captures prestige, whereas citation counts capture popularity. They compare the value variances for popularity and prestige for articles published in the Journal of the American Society for Information Science and Technology from 1998 to 2007, and find that the majority have comparable status.
Document classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigated the use of six stylistic feature sets (including 42 features) and/or six name-based feature sets (including 234 features) for various combinations of the following classification tasks: ethnic groups of the authors and/or periods of time when the documents were written and/or places where the documents were written. The investigated corpus contains Jewish Law articles written in Hebrew Aramaic, which present interesting problems for classification. Our system CUISINE (Classification Using Stylistic feature sets and/or NamE-based feature sets) achieves accuracy results between 90.71 to 98.99% for the seven classification experiments (ethnicity, time, place, ethnicity&time, ethnicity&place, time&place, ethnicity&time&time&place). For the first six tasks, the stylistic feature sets in general and the quantitative feature set in particular are enough for excellent classification results. In contrast, the name-based feature sets are rather poor for these tasks. However, for the most complex task (ethnicity&time&time&place), a hill-climbing model using all feature sets succeeds in significantly improving the classification results. Most of the stylistic features (34 of 42) are language-independent and domain-independent. These features might be useful to the community at large, at least for rather simple tasks.
Social tagging or collaborative tagging has become a new trend in the organization, management, and discovery of digital information. The rapid growth of shared information mostly controlled by social tags poses a new challenge for social tag-based information organization and retrieval. A plausible approach for this challenge is linking social tags to a controlled vocabulary. As an introductory step for this approach, this study investigates ways of predicting relevant subject headings for resources from social tags assigned to the resources. The prediction of subject headings was measured by five different similarity measures: tf-idf, cosine-based similarity (CoS), Jaccard similarity (or Jaccard coefficient; JS), Mutual information (MI), and information radius (IRad). Their results were compared to those by professionals. The results show that a CoS measure based on top five social tags was most effective. Inclusions of more social tags only aggravate the performance. The performance of JS is comparable to the performance of CoS while tf-idf is comparable with up to 70% less than the best performance. MI and IRad have inferior performance compared to the other methods. This study demonstrates the application of the similarity measuring techniques to the prediction of correct Library of Congress subject headings.
One of the key objectives of knowledge management is to transfer knowledge quickly and efficiently from experts to novices, who are different in terms of the structural properties of domain knowledge or knowledge structure. This study applies experts' semantic networks to hypertext navigation design and examines the potential of the resulting design, i.e., semantic hypertext, in facilitating knowledge structure acquisition and problem solving. Moreover, we argue that the level of sophistication of the knowledge structure acquired by learners is an important mediator influencing the learning outcomes (in this case, problem solving). The research model was empirically tested with a situated experiment involving 80 business professionals. The results of the empirical study provided strong support for the effectiveness of semantic hypertext in transferring knowledge structure and reported a significant full mediating effect of knowledge structure sophistication. Both theoretical and practical implications of this research are discussed.
The Arabic storytelling methodology provides solutions to the problem of information reliability. The reliability of a story depends on the credibility of its narrators. To insure reliability verification, the narrators' names are explicitly cited at the head of the story, which constitute its chain of narrators. Stories were reported from a generation to another to insure the reliable transmission of historical knowledge. We present a set of tools based on the Arabic storytelling methodology. We start by presenting this methodology as a set of principles for information-reliability assessment. Then, we detail an architecture designed to support the study of the reliability of Arabic stories. Indeed, we developed grammars for parsing Arabic full names and chains of narrators of Arabic stories. After that, an intelligent identity recognizer links names found in chains of narrators to the biographies of the corresponding persons. We model this step as a possibilistic information retrieval task. Finally, chains are analyzed through metadata available in biographies to help the user identify sources of unreliability. We propose to identify the class of reliability of a story with a possibilistic classifier. The achieved results in named entity and identity recognition were satisfactory and confirm to the targets set for the precision, recall, and F-measure metrics. The developed tools also are reusable components that can be used to study the reliability of other types of Arabic texts.
Context is a determining factor in language and plays a decisive role in polysemic words. Several psycholinguistically motivated algorithms have been proposed to emulate human management of context, under the assumption that the value of a word is evanescent and takes on meaning only in interaction with other structures. The predication algorithm (Kintsch, 2001), for example, uses a vector representation of the words produced by LSA (Latent Semantic Analysis) to dynamically simulate the comprehension of predications and even of predicative metaphors. The objective of this study was to predict some unwanted effects that could be present in vector-space models when extracting different meanings of a polysemic word (predominant meaning inundation, lack of precision, and low-level definition), and propose ideas based on the predication algorithm for avoiding them. Our first step was to visualize such unwanted phenomena and also the effect of solutions. We use different methods to extract the meanings for a polysemic word (without context, vector sum, and predication algorithm). Our second step was to conduct an analysis of variance to compare such methods and measure the impact of potential solutions. Results support the idea that a human-based computational algorithm like the predication algorithm can take into account features that ensure more accurate representations of the structures we seek to extract. Theoretical assumptions and their repercussions are discussed.
Intermediaries in a technological knowledge network have recently been highlighted as crucial innovation drivers that accelerate technological knowledge flows. Although the patent network analysis has been frequently used to monitor technological knowledge structures, it has examined only sources or recipients of the technological knowledge by mainly estimating technological knowledge inflows or outflows of a network node. This study, therefore, aims to identify technological knowledge intermediaries when a technology-level knowledge network is composed of several industries. First, types of technological knowledge flows are deductively classified into four types by highlighting industry affiliations of source technologies and recipient technologies. Second, a directed technological knowledge network is generated at the technology class level, using patent co-classification analysis. Third, for each class, mediating scores are measured according to the four types. The empirical analysis illustrates the Korea's technological knowledge network between 2000 and 2008. As a result, the four types of mediating scores are compared between industries, and industry-wise technological knowledge intermediaries are identified. The proposed approach is practical to explore converging processes in technology development where technology classes act as technological knowledge intermediaries among diverse industries.
An outcome of nuclear safety research (NSR) done by JAERI (Japan Atomic Energy Research Institute) was case studied by the bibliometric method. (1) For LOCA (loss-of-coolant accident) a domestic share of JAERI in monoclinic research paper was 63% at the past (20) 1978-1982 but was decreased to 40% at the present 1998-2002. For co-authored papers a domestic share between JAERI and PS (public sectors) was zero at past (20) but increased to 4% at the present. Research cooperation is active between Tokyo University and JAERI or between JAERI and Nagoya University. (2) It is revealed that LOCA outputs born by NSR-JAERI reflected partly to those of the Safety Licensing Guidelines, however, a share of NSR-JAERI could not determined due to the lack of necessary information in the Guideline.
The growth rate of scientific publication has been studied from 1907 to 2007 using available data from a number of literature databases, including Science Citation Index (SCI) and Social Sciences Citation Index (SSCI). Traditional scientific publishing, that is publication in peer-reviewed journals, is still increasing although there are big differences between fields. There are no indications that the growth rate has decreased in the last 50 years. At the same time publication using new channels, for example conference proceedings, open archives and home pages, is growing fast. The growth rate for SCI up to 2007 is smaller than for comparable databases. This means that SCI was covering a decreasing part of the traditional scientific literature. There are also clear indications that the coverage by SCI is especially low in some of the scientific areas with the highest growth rate, including computer science and engineering sciences. The role of conference proceedings, open access archives and publications published on the net is increasing, especially in scientific fields with high growth rates, but this has only partially been reflected in the databases. The new publication channels challenge the use of the big databases in measurements of scientific productivity or output and of the growth rate of science. Because of the declining coverage and this challenge it is problematic that SCI has been used and is used as the dominant source for science indicators based on publication and citation numbers. The limited data available for social sciences show that the growth rate in SSCI was remarkably low and indicate that the coverage by SSCI was declining over time. National Science Indicators from Thomson Reuters is based solely on SCI, SSCI and Arts and Humanities Citation Index (AHCI). Therefore the declining coverage of the citation databases problematizes the use of this source.
Measuring the efficiency of scientific research activity presents critical methodological aspects, many of which have not been sufficiently studied. Although many studies have assessed the relation between quality and research productivity and academic rank, not much is known about the extent of distortion in national university performance rankings when academic rank and the other labor factors are not considered as a factor of normalization. This work presents a comparative analysis that aims to quantify the sensitivity of bibliometric rankings to the choice of input, with input considered as only the number of researchers on staff, or alternatively where their cost is also considered. The field of observation consists of all 69 Italian universities active in the hard sciences. Performance measures are based on the 81,000 publications produced during the 2004-2006 triennium by all 34,000 research staff, with analysis carried out at the level of individual disciplines, 187 in total. The effect of the switch from labor to cost seems to be minimal except for a few outliers.
This paper analyzes the early research performance of PhD graduates in labor economics, addressing the following questions: Are there major productivity differences between graduates from American and European institutions? If so, how relevant is the quality of the training received (i.e. ranking of institution and supervisor) and the research environment in the subsequent job placement institution? The population under study consists of labor economics PhD graduates who received their degree in the years 2000-2005 in Europe or the USA. Research productivity is evaluated alternatively as the number of publications or the quality-adjusted number of publications of an individual. When restricting the analysis to the number of publications, results suggest a higher productivity by graduates from European universities than from USA universities, but this difference vanishes when accounting for the quality of the publication. The results also indicate that graduates placed at American institutions, in particular top ones, are likely to publish more quality-adjusted articles than their European counterparts. This may be because, when hired, they already have several good acceptances or because of more focused research efforts and clearer career incentives.
The h-index and Eigenfactor (TM) values of top and specialized scientific/engineering journals are tabulated and combined to provide a simple graphical representation of the journals. The information may be tailored to specific uses by respective stakeholders to aid decision making processes with regards to scholarly research and scientific journal publications.
Studies of university-industry collaboration remain subject to important limitations due to the shortage of empirical data and a lack of consistency in that obtained to date. This article puts into practice a set of universities Third Mission indicators in a regional innovation system. Selected indicators previously compiled from literature were reorganized and pre-tested. We have undertaken two face-to-face surveys of 737 firms and 765 heads of research teams, respectively. The results test the validation of indicators and provide a complex map of university-industry linkages as well as some observations on the flexibility needed to address this issue.
Through theoretical analysis and empirical demonstration, this paper attempts to model the behavior of science and technology by investigating the self-propagating behavior of their diffusion for South Korea, Malaysia and Japan. The dynamics of the self-propagating behavior were examined using the logistic growth function within a dynamic carrying capacity, while allowing for different effectiveness of potential influence of science and technology producers on potential adopters. Evidence suggests that the self-propagating growth function is particularly relevant for countries with advanced science and technology, like Japan. While self-propagating growth was also found for South Korea, the diffusion process remained fairly static for Malaysia.
Many investigations of scientific collaboration are based on statistical analyses of large networks constructed from bibliographic repositories. These investigations often rely on a wealth of bibliographic data, but very little or no other information about the individuals in the network, and thus, fail to illustrate the broader social and academic landscape in which collaboration takes place. In this article, we perform an in-depth longitudinal analysis of a relatively small network of scientific collaboration (N = 291) constructed from the bibliographic record of a research centerin the development and application of wireless and sensor network technologies. We perform a preliminary analysis of selected structural properties of the network, computing its range, configuration and topology. We then support our preliminary statistical analysis with an in-depth temporal investigation of the assortative mixing of selected node characteristics, unveiling the researchers' propensity to collaborate preferentially with others with a similar academic profile. Our qualitative analysis of mixing patterns offers clues as to the nature of the scientific community being modeled in relation to its organizational, disciplinary, institutional, and international arrangements of collaboration.
The study seeks to identify the influence of local and regional publications in the production of public health research papers in the Latin American region. A citation analysis of the papers published in the following three leading journals in the field of public health was conducted: Revista M,dica de Chile (Chile) (RMCh); Archivos Latinoamericanos de Nutricin (Venezuela) (ALAN); and Salud PA(0)blica de M,xico (M,xico) (SPM). Papers were analyzed for the period 2003-2007. SciELO (Scientific Electronic Library Online) and the printed version of the journals were used in the analysis. Overall, 1,273 papers from 122 journal issues were analyzed. References accounted for a total of 38,459. Over 90% of the production was published through the collaboration of two or more authors. Author affiliation corresponded in most cases to the country of origin of the journal. References to Portuguese papers accounted for nearly 5% in ALAN and less than 1% each in SPM and RMCh. Citations among the three journals were not significant. Only ALAN cited RMCh and SPM over 3% each, of total citations. SPM and RMCh cited each other less than 1% of total citations. With the exception of ALAN, most public health papers published in RMCh and SPM derived from the national collaboration of researchers in the field. A small amount of public health knowledge communication was being transferred from Brazil to the region through RMCh and SPM. A vertical and individual (per journal/country) model of knowledge communication in public health was identified.
Many quantitative measures exist to assess the publishing outputs of research units such as university departments or institutes. In addition to well-known issues with such measures, further shortcomings include inadequate adjustments for relative entity sizes and researcher intensity, the extent to which research is concentrated among a few rather than all researchers and lags between staffing and publication. This article presents a further array of possible measurement indices, based on operational research and economic ratios, which are capable of adjusting for each of these shortcomings, and which analysts can combine with relatively little effort into existing measures.
Patents are important intellectual assets for companies to defend or to claim their technological rights. To control R&D cost, companies should carefully examine their patents by patent quality. Approaches to evaluating patent quality are mostly a posteriori uses of factual information of patent quality. This paper examined whether patent quality can be predicted a priori, i.e., during the early years after a patent is granted, by analyzing information embedded in a network of patent citations. Social network analysis was applied to analyze two network positions occupied by a patent, brokerage and closure to determine whether either position is a good predictor of patent quality. Patent renewal decisions and forward citations were adopted as surrogates of patent quality. The analytical results showed that forward citations can be positively predicted by the brokerage position and negatively predicted by the closure position in the early and mature stages. Renewal decisions can be negatively predicted by the brokerage position in the early stage, and the closure position influences the renewal decision in a different way in the early and mature stages. These analytical results imply that a company should focus on developing patents that bridge different technologies as its technological developments reach maturity.
This study compares the citations characteristics of researchers in engineering disciplines with other major scientific disciplines, and investigates variations in citing patterns within subdisciplines in the field of engineering. Utilizing citations statistics including Hirsch's (Proc Natl Acad Sci USA 102(46):16569-16572, 2005) h-index value, we find that significant differences in citing characteristics exist between engineering disciplines and other scientific fields. Our findings also reveal statistical differences in citing characteristics between subdisciplines found within the same engineering discipline.
Authorship identity has long been an Achilles' heel in bibliometric analyses at the individual level. This problem appears in studies of scientists' productivity, inventor mobility and scientific collaboration. Using the concepts of cognitive maps from psychology and approximate structural equivalence from network analysis, we develop a novel algorithm for name disambiguation based on knowledge homogeneity scores. We test it on two cases, and the results show that this approach outperforms other common authorship identification methods with the ASE method providing a relatively simple algorithm that yields higher levels of accuracy with reasonable time demands.
Bibliometric counting methods need to be validated against perceived notions of authorship credit allocation, and standardized by rejecting methods with poor fit or questionable ethical implications. Harmonic counting meets these concerns by exhibiting a robust fit to previously published empirical data from medicine, psychology and chemistry, and by complying with three basic ethical criteria for the equitable sharing of authorship credit. Harmonic counting can also incorporate additional byline information about equal contribution, or the elevated status of a corresponding last author. By contrast, several previously proposed counting schemes from the bibliometric literature including arithmetic, geometric and fractional counting, do not fit the empirical data as well and do not consistently meet the ethical criteria. In conclusion, harmonic counting would seem to provide unrivalled accuracy, fairness and flexibility to the long overdue task of standardizing bibliometric allocation of publication and citation credit.
This paper analyses the nationalities of the editorial board members of the top 20 journals (according to their impact factor in the ISI Journal Citation Report, Science Edition 2005) serving 15 scientific disciplines. A total of 281 journals were analysed (some journals crossed disciplinary boundaries) and 10,055 of their editorial board members were identified. Some 53% of board members were from the United States. Europe provided 32%, with the United Kingdom making the greatest contribution (9.8%). The analysis of scientific output by nationality in these journals showed a significant correlation, in all disciplines, with the representation of the corresponding nations on the editorial boards. The composition of editorial boards may therefore provide a useful indicator for measuring a country's international scientific visibility. The present results should be taken into account in the design of national policies with the aim of enhancing the presence of a country's most prestigious scientists on the editorial boards of the main international journals.
This paper presents a detailed chronological survey of papers published in the journal titled Water Research which started publication since 1967. This current investigation reviews publication patterns between 1967 and 2008. An analysis of the research performance according to publication output, distribution of words in article title, author keywords, and keywords plus. Performances of countries, institutes, and authors, including total, single, collaborative, first author, and corresponding author publications were analyzed. The most-frequently cited articles each year and the articles of the highest impact in 2008 were also reported. Results showed that "activated sludge" was the most frequently used author keyword, followed by "adsorption," and "drinking water." Authors from 114 different countries/territories published in the journal, with the most articles submitted by authors from the USA.
In recent years bibliometricians have paid increasing attention to research evaluation methodological problems, among these being the choice of the most appropriate indicators for evaluating quality of scientific publications, and thus for evaluating the work of single scientists, research groups and entire organizations. Much literature has been devoted to analyzing the robustness of various indicators, and many works warn against the risks of using easily available and relatively simple proxies, such as journal impact factor. The present work continues this line of research, examining whether it is valid that the use of the impact factor should always be avoided in favour of citations, or whether the use of impact factor could be acceptable, even preferable, in certain circumstances. The evaluation was conducted by observing all scientific publications in the hard sciences by Italian universities, for the period 2004-2007. Performance sensitivity analyses were conducted with changing indicators of quality and years of observation.
This paper aims to explore the role of each country in the health thematic area of the 6th Framework Programme (6FP) of the EU. We try to explain how the collaborative research processes are generated in a research programme using social network analysis (SNA) tools. We have modelled a one-mode network set up by 2,132 organizations which participate in 601 research projects. This network was shrunk at the country level, obtaining a network of 31 countries. Results show that there is a strong relationship between R&D indicators and the structural position of each country in the network. The paper concludes that the SNA techniques are a suitable tool to assess the country performance in the EU research programmes.
This study does a bibliometric analysis based on keywords of conference proceedings. Scientometric investigations of conference proceedings are a new and innovative, not very common approach. The studies and papers presented may be interpreted as early indicators of scientific development. The Academy of International Business (AIB) was chosen for being the leading organization for studies in international business with contributions covering a 3-year period (2006-2008). The study presents the general structure of current scholarly interest in international business studies, clusters the keywords and reflects details on the focused research areas of the papers analyzed. The bibliometric analysis indicates three clusters: the core, the semi-periphery and the periphery. The five most occurring keywords were found to be multinational enterprise, emerging markets, foreign direct investment, internationalization and knowledge management in descending order. The analyses focus on concepts building the core (in total ten keywords), the semi-periphery which is coined by performance and related topics (60 keywords) and the periphery of the studies with governance and specific facets of it (199 keywords).
Being a scientifically active country in Africa, South Africa has made significant strides in the production of scientific publications. Medicine is one branch of science that has achieved a remarkable position in this regard. Extracting and analyzing medical publications for three decades and at regular intervals (1975-2005) from the SCI database, this paper pioneers an attempt to find out whether the reported pace of growth in the production of scientific papers in medicine is an effect of partnerships that scholars have with their counterparts within the organization, within the country, or with those in other countries. This paper also presents the unique patterns of scientific research in medicine, taking into account factors such as the count and fractional count of papers, citations, trends of growth, sectoral participation, partners, and publication outlets, and seeks to provide new insights into the directions medical science is taking in South Africa today.
Year-on-year trends in research outputs show increases in research activity as the date of the research assessment exercise-in New Zealand the Performance-Based Research Fund (PBRF)-looms. Moreover, changes with time in the number and types of conference presentation indicate that the vehicle of publication is also being influenced by the PBRF. Within New Zealand business schools, relating the published journal articles to the Australian Business Deans Council rankings list shows a trend towards more publications of lower rank, raising doubts about whether the rhetoric about the PBRF raising the quality of research is really justified. This 'drive' towards increasing numbers of research outputs is also fostered by an increasing trend towards co-authorship in publishing across all disciplines.
We discuss each of the recommendations made by Hochberg et al. (Ecol Lett 12:2-4, 2009) to prevent the "tragedy of the reviewer commons". Having scientific journals share a common database of reviewers would be to recreate a bureaucratic organization, where extra-scientific considerations prevailed. Pre-reviewing of papers by colleagues is a widespread practice but raises problems of coordination. Revising manuscripts in line with all reviewers' recommendations presupposes that recommendations converge, which is acrobatic. Signing an undertaking that authors have taken into accounts all reviewers' comments is both authoritarian and sterilizing. Sending previous comments with subsequent submissions to other journals amounts to creating a cartel and a single all-encompassing journal, which again is sterilizing. Using young scientists as reviewers is highly risky: they might prove very severe; and if they are not yet published authors themselves, the recommendation violates the principle of peer review. Asking reviewers to be more severe would only create a crisis in the publishing houses and actually increase reviewers' workloads. The criticisms of the behavior of authors looking to publish in the best journals are unfair: it is natural for scholars to try to publish in the best journals and not to resign themselves to being second rate. Punishing lazy reviewers would only lower the quality of reports: instead, we favor the idea of paying reviewers "in kind" with, say, complimentary books or papers.
Citation frequency has been considered a biased surrogate of publication merit. However, previous studies on this subject were based on small sample sizes and were entirely based on null-hypothesis significance testing. Here we evaluated the relative effects of different predictors on citation frequency of ecological articles using an information theory framework designed to evaluate multiple competing hypotheses. Supposed predictors of citation frequency (e.g., number of authors, length of articles) accounted for a low fraction of the total variation. We argue that biases concerning citation are minor in ecology and further studies that attempt to quantify the scientific relevance of an article, aiming to make further relationships with citation, are needed to advance our understanding of why an article is cited.
Over the past decade there have been many investigations aimed at defining the role of scientists and research groups in their coauthorship networks. Starting from the assumptions of network analysis, in this work we propose an analytical definition of a collaboration potential between authors of scientific papers based on both coauthorships and content sharing. The collaboration potential can also be considered a useful tool to investigate the relationships between a single scientist and research groups, thus allowing for the identification of characteristic "types" of scientists (integrated, independent, etc.). We computed the collaboration potential for a set of authors belonging to research groups of an institute specialized in the field of Medical Genetics. The methods presented in the paper are rather general as they can be applied to compute a collaboration potential for a network of cooperating actors in every situation in which one can qualify the content of some activities and which of them are in common among the actors of the network.
Science and innovation policy (SIP) is typically justified in terms of public values while SIP program assessments are typically limited to economic terms that imperfectly take into account these values. The study of public values through public value mapping (PVM) lacks widely-accepted methods for systematically identifying value structures within SIP and its public policy processes, especially when there are multiple stakeholder groups. This paper advances the study of public values in SIP using nanoscale science and engineering (NSE) policy by demonstrating that quantitative analysis of value statements can provide a credible and robust basis for policy analysis. We use content analysis of over 1,000 documents with over 100,000 pages from major contributors to the NSE policy discourse to identify and analyze a wide range of public value statements. Data analysis and reduction methods reveal a multifactor structure of public values that has been consistently cited by a range of actors in an NSE research policy network.
Collaboration between researchers and between research organizations is generally considered a desirable course of action, in particular by some funding bodies. However, collaboration within a multidisciplinary community, such as the Computer-Human Interaction (CHI) community, can be challenging. We performed a bibliometric analysis of the CHI conference proceedings to determine if papers that have authors from different organization or countries receive more citations than papers that are authored by members of the same organization. There was no significant difference between these three groups, indicating that there is no advantage for collaboration in terms of citation frequency. Furthermore, we tested if papers written by authors from different organizations or countries receive more best paper awards or at least award nominations. Papers from only one organization received significantly fewer nominations than collaborative papers.
Applying different institutional addresses in the scientific production of a same university has underestimated the scientific production of Iranian universities and consequently lowered their position in the international academic rankings for a long time. The present study evaluated the scientific production of Iranian medical universities according to their institutional addresses registered in the papers indexed by Science Citation Index Expanded (SCIE). By conducting a descriptive research we retrieved total SCIE indexed of top Iranian medical universities and their respective hospitals and research centers from the beginning of 1986 to the end of 2007. Then different variations of the institutional addresses of each university in the author affiliation of papers were assessed. Finally the universities were ranked according to observing a uniformed format for more registered addresses in SCIE. The findings showed unexpected diversity in the institutional affiliation of each university in their SCIE indexed papers. Although "Tehran University of Medical Sciences" showed the most variation in registering institutional addresses but ranked first according to observing unification for more addresses in the SCIE indexed papers comparing to the other universities. The problem of applying different institutional affiliations in the scientific production of the universities should be valued enough by the whole scientific community. Observing a uniformed format in registering institutional addresses of Iranian medical universities would affect their scientific credibility and international ranks through representing their real scientific productivity.
This study proposes an approach for visualizing a knowledge structure, the proposed approach creates a three-dimensional "Research focused parallelship network", a "Keyword Co-occurrence Network", and a two-dimensional knowledge map to facilitate visualization of the knowledge structure created by journal papers from different perspectives. The networks and knowledge maps can be depicted differently by choosing different information as the network actor, e.g. author, institute or country keyword, to reflect knowledge structures in micro-, meso-, and macro-levels, respectively. Technology Foresight is selected as an example to illustrate the method proposed in this study. A total of 556 author keywords contained in 181 Technology Foresight related papers have been analyzed. European countries, China, India and Brazil are located at the core of Technology Foresight research. Quantitative ways of mapping journal papers are investigated in this study to unveil emerging elements as well as to demonstrate dynamics and visualization of knowledge. The quantitative method provided in this paper shows a possible way of visualizing and evaluating knowledge structure; thus a computerized calculation is possible for potential quantitative applications, e.g. R&D resource allocation, research performance evaluation, science map, etc.
To determine the degree of correlation among journal citation indices that reflect the average number of citations per article, the most recent journal ratings were downloaded from the websites publishing four journal citation indices: the Institute of Scientific Information's journal impact factor index, Eigenfactor's article influence index, SCImago's journal rank index and Scopus' trend line index. Correlations were determined for each pair of indices, using ratings from all journals that could be identified as having been rated on both indices. Correlations between the six possible pairings of the four indices were tested with Spearman's rho. Within each of the six possible pairings, the prevalence of identifiable errors was examined in a random selection of 10 journals and among the 10 most discordantly ranked journals on the two indices. The number of journals that could be matched within each pair of indices ranged from 1,857 to 6,508. Paired ratings for all journals showed strong to very strong correlations, with Spearman's rho values ranging from 0.61 to 0.89, all p < 0.001. Identifiable errors were more common among scores for journals that had very discordant ranks on a pair of indices. These four journal citation indices were significantly correlated, providing evidence of convergent validity (i.e. they reflect the same underlying construct of average citability per article in a journal). Discordance in the ranking of a journal on two indices was in some cases due to an error in one index.
The measurement of textual patent similarities is crucial for important tasks in patent management, be it prior art analysis, infringement analysis, or patent mapping. In this paper the common theory of similarity measurement is applied to the field of patents, using solitary concepts as basic textual elements of patents. After unfolding the term 'similarity' in a content and formal oriented level and presenting a basic model of understanding, a segmented approach to the measurement of underlying variables, similarity coefficients, and the criteria-related profiles of their combinations is lined out. This leads to a guided way to the application of textual patent similarities, interesting both for theory and practice.
We report here a simple method to identify the 'emerging topics' in life sciences. First, the keywords selected from MeSH terms on PubMed by filtering the terms based on their increment rate of the appearance, and, then, were sorted into groups dealing with the same topics by 'co-word' analysis. These topics were defined as 'emerging topics'. The survey of the emerging keywords with high increment rates of appearance between 1972 to 2006 showed that emerging topics changed dramatically year by year, and that the major shift of the topics occurred in the late 90s; the topics that cover technical and conceptual aspects in molecular biology to the more systematic '-omics'-related and nanoscience-related aspects. We further investigated trends in emerging topics within various sub-fields in the life sciences.
In science, a relatively small pool of researchers garners a disproportionally large number of citations. Still, very little is known about the social characteristics of highly cited scientists. This is unfortunate as these researchers wield a disproportional impact on their fields, and the study of highly cited scientists can enhance our understanding of the conditions which foster highly cited work, the systematic social inequalities which exist in science, and scientific careers more generally. This study provides information on this understudied subject by examining the social characteristics and opinions of the 0.1% most cited environmental scientists and ecologists. Overall, the social characteristics of these researchers tend to reflect broader patterns of inequality in the global scientific community. However, while the social characteristics of these researchers mirror those of other scientific elites in important ways, they differ in others, revealing findings which are both novel and surprising, perhaps indicating multiple pathways to becoming highly cited.
This article reports the results of a scientometric assessment of the Southern Africa Development Community countries. The National Science Indicators database of Thomson-Reuters and the online ISI Web of Knowledge are utilized in order to identify the number of publications of the 15 countries over a period of 15 years; the activity and relative impact indicators of 22 scientific disciplines for each country and their collaborative patterns. It is identified that South Africa with 19% of the population in the region is responsible for 60% of the regional GDP and 79% of the regions publications. All countries tend to have the same focus in their disciplinary priorities and underemphasize disciplines such as engineering, materials science and molecular biology. It is expressed concern that the current research infrastructures are inadequate to assist in reaching the objectives developed in the Regional Indicative Strategic Development Plan of the Community.
Nanotechnology is an emerging field of science with the potential to generate new and enhance existing products and transform the production process. US patent data is used to track the emergence of nanotechnologies since 1978. The nanotechnologies that have undergone the most development are identified using patent citation data and cocitation patterns of patents are examined to define clusters of related nanotechnologies. The potential for economic impact of the emerging nanotechnologies is assessed using a generality index.
Many studies have found that collaborative research is, in general, more highly cited than non-collaborative research. This paper describes an investigation into the extent to which the association between high citation and collaboration for Economics articles published in 2000 varies from region to region and depends on the choice of indicator of citation level. Using data from the Social Science Citation Index (SSCI) for 18 countries, 17 American states and four indicators of citation level the citation levels of the collaborative articles are compared with the citation levels of the non-collaborative articles. The main findings are that: (a) for every country and every indicator the mean citation level of the collaborative articles was at least as high as that for the non-collaborative articles, but for five US states and for at least one other indicator the citation level of collaborative articles was lower than that of non-collaborative articles, and (b) the extent to which collaborative articles were more highly cited varied considerably from country to country, from state to state, and from indicator to indicator. This indicates the importance of using multiple indicators when investigating citation advantage since the choice of indicator can change the results.
An indicator called the performance index (p-index) which can effectively combine size and quality of scientific papers, mocking what the h-index could do, emerges from an energy like term E = iC, where i is a measure of quality, expressed as the ratio of citations C to papers published P. In this paper, we demonstrate how this energy paradigm can be used for bibliometric research assessment. The energy assessment technique is demonstrated by applying it to the research assessment of all the countries listed in Essential Science Indicators. Partitioning is easily done by using contour lines on the two-dimensional iCE (impact-Citations-Energy) map.
The file-drawer problem is the tendency of journals to preferentially publish studies with statistically significant results. The problem is an old one and has been documented in various fields, but to my best knowledge there has not been attention to how the issue is developing in a quantitative way through time. In the abstracts of various major scholarly databases (Science and Social Science Citation Index (1991-2008), CAB Abstracts and Medline (1970s-2008), the file drawer problem is gradually getting worse, in spite of an increase in (1) the total number of publications and (2) the proportion of publications reporting both the presence and the absence of significant differences. The trend is confirmed for particular natural science topics such as biology, energy and environment but not for papers retrieved with the keywords biodiversity, chemistry, computer, engineering, genetics, psychology and quantum (physics). A worsening file-drawer problem can be detected in various medical fields (infection, immunology, malaria, obesity, oncology and pharmacology), but not for papers indexed with strings such as AIDS/HIV, epidemiology, health and neurology. An increase in the selective publication of some results against some others is worrying because it can lead to enhanced bias in meta-analysis and hence to a distorted picture of the evidence for or against a certain hypothesis. Long-term monitoring of the file-drawer problem is needed to ensure a sustainable and reliable production of (peer-reviewed) scientific knowledge.
This paper focuses the attention on the ch-index, a recent bibliometric indicator similar to the Hirsch (h) index, to evaluate the published research output of a scientist (Ajiferuke and Wolfram, Proceedings of the 12th international conference of the international society for scientometrics and informetrics. Rio de Janeiro, pp. 798-808, 2009). Ch-index is defined as the number such that, for a general group of scientific publications, ch publications are cited by at least ch different citers while the other publications are cited by no more than ch different citers. The basic difference from the classical h is that, according to ch, the diffusion of one author's publication is evaluated on the basis of the number of different citing authors (or citers), rather than the number of received citations. The goal of this work is to discuss the pros and cons of ch and identify its connection with h. A large sample of scientists in the Quality Engineering/Management field are analyzed so as to investigate the novel indicator's characteristics. Then, the analysis is preliminarily extended to other scientific disciplines. The most important result is that ch is almost insensitive to self-citations and/or citations made by recurrent citers, and it can be profitably used for complementing h.
This paper focuses on methods to study patterns of collaboration in co-authorship networks at the mesoscopic level. We combine qualitative methods (participant interviews) with quantitative methods (network analysis) and demonstrate the application and value of our approach in a case study comparing three research fields in chemistry. A mesoscopic level of analysis means that in addition to the basic analytic unit of the individual researcher as node in a co-author network, we base our analysis on the observed modular structure of co-author networks. We interpret the clustering of authors into groups as bibliometric footprints of the basic collective units of knowledge production in a research specialty. We find two types of coauthor-linking patterns between author clusters that we interpret as representing two different forms of cooperative behavior, transfer-type connections due to career migrations or one-off services rendered, and stronger, dedicated inter-group collaboration. Hence the generic coauthor network of a research specialty can be understood as the overlay of two distinct types of cooperative networks between groups of authors publishing in a research specialty. We show how our analytic approach exposes field specific differences in the social organization of research.
Recently there is increasing interest in university rankings. Annual rankings of world universities are published by QS for the Times Higher Education Supplement, the Shanghai Jiao Tong University, the Higher Education and Accreditation Council of Taiwan and rankings based on Web visibility by the Cybermetrics Lab at CSIC. In this paper we compare the rankings using a set of similarity measures. For the rankings that are being published for a number of years we also examine longitudinal patterns. The rankings limited to European universities are compared to the ranking of the Centre for Science and Technology Studies at Leiden University. The findings show that there are reasonable similarities between the rankings, even though each applies a different methodology. The biggest differences are between the rankings provided by the QS-Times Higher Education Supplement and the Ranking Web of the CSIC Cybermetrics Lab. The highest similarities were observed between the Taiwanese and the Leiden rankings from European universities. Overall the similarities are increased when the comparison is limited to the European universities.
The most popular method for judging the impact of biomedical articles is citation count which is the number of citations received. The most significant limitation of citation count is that it cannot evaluate articles at the time of publication since citations accumulate over time. This work presents computer models that accurately predict citation counts of biomedical publications within a deep horizon of 10 years using only predictive information available at publication time. Our experiments show that it is indeed feasible to accurately predict future citation counts with a mixture of content-based and bibliometric features using machine learning methods. The models pave the way for practical prediction of the long-term impact of publication, and their statistical analysis provides greater insight into citation behavior.
Scientific production has been evaluated from very different perspectives, the best known of which are essentially based on the impact factors of the journals included in the Journal Citation Reports (JCR). This has been no impediment to the simultaneous issuing of warnings regarding the dangers of their indiscriminate use when making comparisons. This is because the biases incorporated in the elaboration of these impact factors produce significant distortions, which may invalidate the results obtained. Notable among such biases are those generated by the differences in the propensity to cite of the different areas, journals and/or authors, by variations in the period of materialisation of the impact and by the varying presence of knowledge areas in the sample of reviews contained in the JCR. While the traditional evaluation method consists of standardisation by subject categories, recent studies have criticised this approach and offered new possibilities for making inter-area comparisons. In view of such developments, the present study proposes a novel approach to the measurement of scientific activity, in an attempt to lessen the aforementioned biases. This approach consists of combining the employment of a new impact factor, calculated for each journal, with the grouping of the institutions under evaluation into homogeneous groups. An empirical application is undertaken to evaluate the scientific production of Spanish public universities in the year 2000. This application considers both the articles published in the multidisciplinary databases of the Web of Science (WoS) and the data concerning the journals contained in the Sciences and Social Sciences Editions of the Journal Citation Report (JCR). All this information is provided by the Institute of Scientific Information (ISI), via its Web of Knowledge (WoK).
Several individual indicators from the Times Higher Education Survey (THES) data base-the overall score, the reported staff-to-student ratio, and the peer ratings-demonstrate unacceptably high fluctuation from year to year. The inappropriateness of the summary tabulations for assessing the majority of the "top 200" universities would be apparent purely for reason of this obvious statistical instability regardless of other grounds of criticism. There are far too many anomalies in the change scores of the various indices for them to be of use in the course of university management.
A scheme of evaluating an impact of a given scientific paper based on importance of papers quoting it is investigated. Introducing a weight of a given citation, dependent on the previous scientific achievements of the author of the citing paper, we define the weighting factor of a given scientist. Technically the weighting factors are defined by the components of the normalized leading eigenvector of the matrix describing the citation graph. The weighting factor of a given scientist, reflecting the scientific output of other researchers quoting his work, allows us to define weighted number of citation of a given paper, weighted impact factor of a journal and weighted Hirsch index of an individual scientist or of an entire scientific institution.
Bibliometric measurements, though controversial, are useful in providing measures of research performance in a climate of research competition and marketisation. Numerous bibliometric studies have been performed which rely on traditional indices (such as the journal impact factor and citation index) and provide little descriptive data regarding the actual characteristics of research. The purpose of this study was two-fold, to develop three novel bibliometric indices, designed to describe the characteristics of research (relating to evidence base, quantitation and collaboration), and to apply them in a cross-sectional audit of original research articles published in Australian professional association journals across medicine, nursing and allied health in 2007. Results revealed considerable variation in bibliometric indices across these journals. There were emerging clusters of journals that published collaborative research using higher levels of evidence and reported quantitative data, with others featuring articles using lower levels of evidence, fewer quantitative data and less collaboration among authors.
In this paper, scientific performance is identified with the impact that journal articles have through the citations they receive. In 15 disciplines, as well as in all sciences as a whole, the EU share of total publications is greater than that of the U.S. However, as soon as the citations received by these publications are taken into account the picture is completely reversed. Firstly, the EU share of total citations is still greater than the U.S. in only seven fields. Secondly, the mean citation rate in the U.S. is greater than in the EU in every one of the 22 fields studied. Thirdly, since standard indicators-such as normalized mean citation ratios-are silent about what takes place in different parts of the citation distribution, this paper compares the publication shares of the U.S. and the EU at every percentile of the world citation distribution in each field. It is found that in seven fields the initial gap between the U.S. and the EU widens as we advance towards the more cited articles, while in the remaining 15 fields-except for Agricultural Sciences-the U.S. always surpasses the EU when it counts, namely, at the upper tail of citation distributions. Finally, for all sciences as a whole the U.S. publication share becomes greater than that of the EU for the top 50% of the most highly cited articles. The data used refers to 3.6 million articles published in 1998-2002, and the more than 47 million citations they received in 1998-2007.
New Scientist is a British weekly magazine that is half-way between a newspaper and a scientific journal. It has many news items, and also longer feature articles, both of which cite biomedical research papers, and thus serve to make them better known to the public and to the scientific community, mainly in the UK but about half overseas. An analysis of these research papers shows (in relation to their presence in the biomedical research literature) a strong bias towards the UK, and also one to the USA, Scandinavia and Ireland. There is a reasonable spread of subject areas, although neuroscience is favoured, and coverage of many journals-not just the leading weeklies. Most of the feature articles (but not the news items) in New Scientist include comments by other researchers, who can put the new results in context. Their opinions appear to be more discriminating than those of commentators on research in the mass media, who usually enthuse over the results while counselling patience before a cure for the disease is widely available.
The study examines India's performance based on its publication output in dental sciences during 1999-2008, based on several parameters, including the country annual average growth rate, global publication share & rank among 25 most productive countries of the world, national publication output and impact in terms of average citations per paper, international collaboration output and share and contribution of major collaborative partners, contribution and impact of select top 25 Indian institutions and select top 15 most productive authors, patterns of communication in national and international journals and characteristics of its 45 high cited papers. The study uses 10 years (1999-2008) publications data in dental sciences of India and other countries drawn from Scopus international multidisciplinary bibliographical database.
This paper aims to analyse the collaboration network of the 6th Framework Programme of the EU, specifically the "Life sciences, genomics and biotechnology for health" thematic area. A collaboration network of 2,132 participant organizations was built and several variables were added to improve the visualization such as type of organization and nationality. Several statistical tests and structural indicators were used to uncover the main characteristic of this collaboration network. Results show that the network is constituted by a dense core of government research organizations and universities which act as large hubs that attract new partners to the network, mainly companies and non-profit organizations.
The web contains a huge number of digital pictures. For scholars publishing such images it is important to know how well used their images are, but no method seems to have been developed for monitoring the value of academic images. In particular, can the impact of scientific or artistic images be assessed through identifying images copied or reused on the Internet? This article explores a case study of 260 NASA images to investigate whether the Tin Eye search engine could theoretically help to provide this information. The results show that the selected pictures had a median of 11 online copies each. However, a classification of 210 of these copies reveals that only 1.4% were explicitly used in academic publications, reflecting research impact, and the majority of the NASA pictures were used for informal scholarly (or educational) communication (37%). Additional analyses of world famous paintings and scientific images about pathology and molecular structures suggest that image contents are important for the type and extent of image use. Although it is reasonable to use statistics derived from Tin Eye for assessing image reuse value, the extent of its image indexing is not known.
Architectural design projects are heavily reliant on electronic information seeking. However, there have been few studies on how architects look for and use information on the Web. We examined the electronic information behavior of 9 postgraduate architectural design and urban design students. We observed them undertake a self-chosen, naturalistic information task related to one of their design projects and found that although the architectural students performed many similar interactive information behaviors to academics and practitioners in other disciplines, they also performed behaviors reflective of the nature of their domain. The included exploring and encountering information (in addition to searching and browsing for it) and visualizing/appropriating information. The observations also highlighted the importance of information use behaviors (such as editing and recording) and communication behaviors (such as sharing and distributing) as well as the importance of multimedia materials, particularly images, for architectural design projects. A key overarching theme was that inspiration was found to be both an important driver for and potential outcome of information work in the architecture domain, suggesting the need to design electronic information tools for architects that encourage and foster creativity. We make suggestions for the design of such tools based on our findings.
This study explores the relationships between work task and interactive information search behavior. Work task was conceptualized based on a faceted classification of task. An experiment was conducted with six work-task types and simulated work-task situations assigned to 24 participants. The results indicate that users present different behavior patterns to approach useful information for different work tasks:They select information systems to search based on the work tasks at hand, different work tasks motivate different types of search tasks, and different facets controlled in the study play different roles in shaping users' interactive information search behavior. The results provide empirical evidence to support the view that work tasks and search tasks play different roles in a user's interaction with information systems and that work task should be considered as a multifaceted variable. The findings provide a possibility to make predictions of a user's information search behavior from his or her work task, and vice versa. Thus, this study sheds light on task-based information seeking and search, and has implications in adaptive information retrieval (IR) and personalization of IR.
In the interest of standardization and quality assurance, it is desirable for authors and staff of access services to follow the American National Standards Institute (ANSI) guidelines in preparing abstracts. Using the statistical approach an extraction system (the Abstraction Assistant) was developed to generate informative abstracts to meet the ANSI guidelines for structural content elements. The system performance is evaluated by comparing the system-generated abstracts with the author's original abstracts and the manually enhanced system abstracts on three criteria: balance (satisfaction of the ANSI standards), fluency (text coherence), and understandability (clarity). The results suggest that it is possible to use the system output directly without manual modification, but there are issues that need to be addressed in further studies to make the system a better tool.
The Eigenfactor (TM) Metrics provide an alternative way of evaluating scholarly journals based on an iterative ranking procedure analogous to Google's Page Rank algorithm. These metrics have recently been adopted by Thomson Reuters and are listed alongside the Impact Factor in the Journal Citation Reports. But do these metrics differ sufficiently so as to be a useful addition to the bibliometric toolbox? Davis (2008) has argued that they do not, based on his finding of a 0.95 correlation coefficient between Eigenfactor score and Total Citations for a sample of journals in the field of medicine. This conclusion is mistaken; in this article, we illustrate the basic statistical fallacy to which Davis succumbed. We provide a complete analysis of the 2006 Journal Citation Reports and demonstrate that there are statistically and economically significant differences between the information provided by the Eigenfactor Metrics and that provided by Impact Factor and Total Citations.
New video technologies are emerging to facilitate collaboration in emergency healthcare. One such technology is 3D telepresence technology for medical consultation (3DMC) that may provide richer visual information to support collaboration between medical professionals to, ideally, enhance patient care in real time. Today only an early prototype of 3DMC exists. To better understand 3DMC's potential for adoption and use in emergency healthcare before large amounts of development resources are invested we conducted a visioning study. That is, we shared our vision of 3DMC with emergency room physicians, nurses, administrators, and information technology (IT) professionals working at large and small medical centers, and asked them to share their perspectives regarding 3DMC's potential benefits and disadvantages in emergency healthcare and its compatibility and/or lack thereof with their and their organization's current ways of working. We found that social and technical challenges can be identified regarding new innovations even before working prototypes are available. The compatibility of 3DMC with current ways of working was conceptualized by participants in terms of processes, relationships, and resources. Both common and unique perceptions regarding 3DMC emerged, illustrating the need for 3DMC, and other collaboration technologies, to support interwoven situational awareness across different technological frames.
Network structure analysis plays an important role in characterizing complex systems. Different from previous network centrality measures, this article proposes the topological centrality measure reflecting the topological positions of nodes and edges as well as influence between nodes and edges in general network. Experiments on different networks show distinguished features of the topological centrality by comparing with the degree centrality, closeness centrality, betweenness centrality, information centrality, and Page Rank. The topological centrality measure is then applied to discover communities and to construct the backbone network. Its characteristics and significance is further shown in e-Science applications.
This paper introduces a comprehensive system for classifying scholarly journals according to their degree of 'application orientation.' The method extends earlier models and journal classification systems that were designed to tackle the crude duality between 'basic research' and 'applied research. 'This metrics-based system rests on a 'Knowledge Utilization Triangle' typology, which distinguishes three types of coexisting knowledge application domains: 'clinical,' industrial,' and 'civic.' The empirical data relate to the institutional origin of authors who publish their research papers in the scientific journal literature. The case study applies indicators of 'clinical relevance' and 'industrial relevance' to 11,000 journals indexed by the Web of Science (WoS) database. The resulting multidimensional classification system of journals comprises six Journal Application Domain (JAD) categories. Macro-level trend analysis of the WoS-indexed research publication output by JAD category reveals redistributions within global science during the years 1999-2008, with a slight increase of output published in 'industrially relevant' journals.
Name ambiguity in the context of bibliographic citations is a difficult problem which, despite the many efforts from the research community, still has a lot of room for improvement. In this article, we present a heuristic-based hierarchical clustering method to deal with this problem. The method successively fuses clusters of citations of similar author names based on several heuristics and similarity measures on the components of the citations (e.g., coauthor names, work title, and publication venue title). During the disambiguation task, the information about fused clusters is aggregated providing more information for the next round of fusion. In order to demonstrate the effectiveness of our method, we ran a series of experiments in two different collections extracted from real-world digital libraries and compared it, under two metrics, with four representative methods described in the literature. We present comparisons of results using each considered attribute separately (i.e., coauthor names, work title, and publication venue title) with the author name attribute and using all attributes together. These results show that our unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to ours. Moreover, such results are achieved without the burden of any training and without using any privileged information such as knowing a priori the correct number of clusters.
We present a novel approach to visually locate bodies of research within the sciences, both at each moment of time and dynamically. This article describes how this approach fits with other efforts to locally and globally map scientific outputs. We then show how these science overlay maps help benchmarking, explore collaborations, and track temporal changes, using examples of universities, corporations, funding agencies, and research topics. We address their conditions of application and discuss advantages, downsides, and limitations. Overlay maps especially help investigate the increasing number of scientific developments and organizations that do not fit within traditional disciplinary categories. We make these tools available online to enable researchers to explore the ongoing sociocognitive transformations of science and technology systems.
In this work we present an automatic method for the extraction of time periods related to ontological concepts from the Web. The method consists of two parts: an Information Extraction phase and a Semantic Representation phase. In the Information Extraction phase, temporal information about events that are associated with the target instance are extracted from Web documents. The resulting distribution is normalized and a model is fit to it. This distribution is then converted into a Semantic Representation in the second phase. We present the method and describe experiments where time periods for four different types of concepts are extracted and converted to a time representation vocabulary, based on the TIMEX2 annotation standard.
This study investigated factors that motivate or impede faculty participation in self-archiving practices-the placement of research work in various open access (OA) venues, ranging from personal Web pages to OA archives. The author's research design involves triangulation of survey and interview data from 17 Carnegie doctorate universities with DSpace institutional repositories. The analysis of survey responses from 684 professors and 41 telephone interviews identified seven significant factors: (a) altruism-the idea of providing OA benefits for users; (b) perceived self-archiving culture; (c) copyright concerns; (d) technical skills; (e) age; (f) perception of no harmful impact of self-archiving on tenure and promotion; and (g) concerns about additional time and effort. The factors are listed in descending order of their effect size. Age, copyright concerns, and additional time and effort are negatively associated with self-archiving, whereas remaining factors are positively related to it. Faculty are motivated by OA advantages to users, disciplinary norms, and no negative influence on academic reward. However, barriers to self-archiving-concerns about copyright, extra time and effort, technical ability, and age-imply that the provision of services to assist faculty with copyright management, and with technical and logistical issues, could encourage higher rates of self-archiving.
In many languages abbreviations are very common and are widely used in both written and spoken language. However, they are not always explicitly defined and in many cases they are ambiguous. This research presents a process that attempts to solve the problem of abbreviation ambiguity using modern machine learning (ML) techniques. Various baseline features are explored, including context-related methods and statistical methods. The application domain is Jewish Law documents written in Hebrew and Aramaic, which are known to be rich in ambiguous abbreviations. Two research approaches were implemented and tested: general and individual. Our system applied four common ML methods to find a successful integration of the various baseline features. The best result was achieved by the SVM ML method in the individual research, with 98.07% accuracy.
This article describes an approach based on structuration theory (Giddens, 1979, 1984; Orlikowski, 1992, 2000) and communities of practice (Wenger, 1998) that can be used to guide investigation into the dynamics of online question and answer (Q&A) communities. This approach is useful because most research on Q&A sites has focused attention on information retrieval, information-seeking behavior, and information intermediation and has assumed uncritically that the online Q&A community plays an important role in these domains of study. Assuming instead that research on online communities should take into account social, technical, and contextual factors (Kling, Rosenbaum, & Sawyer, 2005), the utility of this approach is demonstrated with an analysis of three online Q&A communities seen as communities of practice. This article makes a theoretical contribution to the study of online Q&A communities and, more generally, to the domain of social reference.
This study presents a ranking of 182 academic journals in the field of artificial intelligence. For this, the revealed preference approach, also referred to as a citation impact method, was utilized to collect data from Google Scholar. This list was developed based on three relatively novel indices: h-index, g-index, and hc-index. These indices correlated almost perfectly with one another (ranging from 0.97 to 0.99), and they correlated strongly with Thomson's Journal Impact Factors (ranging from 0.64 to 0.69). It was concluded that journal longevity (years in print) is an important but not the only factor affecting an outlet's ranking position. Inclusion in Thomson's Journal Citation Reports is a must for a journal to be identified as a leading A+ or A level outlet. However, coverage by Thomson does not guarantee a high citation impact of an outlet. The presented list may be utilized by scholars who want to demonstrate their research output, various academic committees, librarians and administrators who are not familiar with the AI research domain. (C) 2010 Elsevier Ltd. All rights reserved.
This study attempts to model the growth behavior of the number of publications and patents of South Korea, Taiwan, Japan and Malaysia. Three competing growth functions, namely, simple logistic growth function, bi-logistic growth function and logistic function within a dynamic carrying capacity were considered. The findings provide insight to the diffusion process of science and technology, often measured by the number of publications and patents, respectively. The function that provides the best fit to the observed data was opted for explaining the diffusion process. The function with the best fit is the bi-logistic growth function for the number of publications as well as the number of patents of South Korea and Taiwan, the logistic growth function within a dynamic carrying capacity (LGDCC) for the number of publications and the bi-logistic growth function for the number of patents of Japan, and the LGDCC for the number of publications and simple growth function for the number of patents of Malaysia. The results suggest a dynamic self-propagating growth for the science and technology, and thereby a transition from science and technology-push to market-pull growth for South Korea and Taiwan. While a similar transition was observed for the technology of Japan, the growth in science had entered a maturity stage. On the other hand, the growth potential in science is dynamic for Malaysia, but its technological advancement is relatively lower and static compared to the other economies. (C) 2010 Elsevier Ltd. All rights reserved.
We study the evolution of Slovenia's scientific collaboration network from 1960 till present with a yearly resolution. For each year the network was constructed from publication records of Slovene scientists, whereby two were connected if, up to the given year inclusive, they have coauthored at least one paper together. Starting with no more than 30 scientists with an average of 1.5 collaborators in the year 1960, the network to date consists of 7380 individuals that, on average, have 10.7 collaborators. We show that, in spite of the broad myriad of research fields covered, the networks form "small worlds" and that indeed the average path between any pair of scientists scales logarithmically with size after the largest component becomes large enough. Moreover, we show that the network growth is governed by near-liner preferential attachment, giving rise to a log-normal distribution of collaborators per author, and that the average starting year is roughly inversely proportional to the number of collaborators eventually acquired. Understandably, not all that became active early have till now gathered many collaborators. We also give results for the clustering coefficient and the diameter of the network over time, and compare our conclusions with those reported previously. (C) 2010 Elsevier Ltd. All rights reserved.
Web hyperlink analysis has been a key topic of Webometric research. However, inlink data collection from commercial search engines has been limited to only one source in recent years, which is not a promising prospect for the future development of the field. We need to tap into other Web data sources and to develop new methods. Toward this end, we propose a new Webometrics concept that is based on words rather than inlinks on Webpages. We propose that word co-occurrences on Webpages can be a measure of the relatedness of organizations. Word co-occurrence data can be collected from both general search engines and blog search engines, which expands data sources greatly. The proposed concept is tested in a group of companies in the LTE and WiMax sectors of the telecommunications industry. Data on the co-occurrences of company names on Webpages were collected from Google and Google Blog. The co-occurrence matrices were analyzed using MDS. The resulting MDS maps were compared with industry reality and with the MDS maps from co-link analysis. Results show that Web co-word analysis could potentially be as useful as Web co-link analysis. Google Blog seems to be a better source than Google for co-word data collection. (C) 2010 Elsevier Ltd. All rights reserved.
The term 'information ethics' (IE) is rapidly diversifying as new technologies enter the milieu and add to already existing 'entanglements'. Unsurprisingly, the term lacks a universally accepted definition, although there is some common ground as to its constitution. This paper explores the term using the most commonly co-occurring terms in IE literature as indexed in nine databases, namely the EBSCO-hosted Academic Search Premier (ASP); Communication and Mass Media Complete; ERIC; Library, Information Science and Technology Abstracts (LISTA); Newspaper Complete; Business Premier; and Master File Premier, and Wilson's Library Literature and Information Science (LLIS) Full Text. Core/periphery analysis, the co-occurrence of words as subject terms, and social network techniques were applied using UCINET for Windows, text STAT and Bibexcel computer-aided software to analyze data. The paper identifies the most common terms used to describe IE and the core terms with which IE can be defined. Other than informing LIS research and education, the results could potentially assist with the development of IE taxonomy and definitions (e. g. in understanding IE content and development) that may apply to the intercultural and global understanding of IE. (C) 2010 Elsevier Ltd. All rights reserved.
In a recent work by Anderson, Hankin, and Killworth (2008), Ferrers diagrams and Durfee squares are used to represent the scientific output of a scientist and construct a new h-based bibliometric indicator, the tapered h-index (h(T)). In the first part of this paper we examine h(T), identifying its main drawbacks and weaknesses: an arbitrary scoring system and an illusory increase in discrimination power compared to h. Subsequently, we propose a new bibliometric tool, the citation triad (CT), that better exploits the information contained in a Ferrers diagram, giving a synthetic overview of a scientist's publication output. The advantages of this new approach are discussed in detail. Argument is supported by several examples based on empirical data. (C) 2010 Elsevier Ltd. All rights reserved.
The paper presents comparative analyses of two publication point systems, The Norwegian and the in-house system from the interdisciplinary Danish Institute of International Studies (DIIS), used as case in the study for publications published 2006, and compares central citation-based indicators with novel publication point indicators (PPIs) that are formalized and exemplified. Two diachronic citation windows are applied: 2006-07 and 2006-08. Web of Science (WoS) as well as Google Scholar (GS) are applied to observe the cite delay and citedness for the different document types published by DIIS, journal articles, book chapters/conference papers and monographs. Journal Crown Indicator (JCI) calculations was based on WoS. Three PPIs are proposed: the Publication Point Ratio (PPR), which measures the sum of obtained publication points over the sum of the ideal points for the same set of documents; the Cumulated Publication Point Indicator (CPPI), which graphically illustrates the cumulated gain of obtained vs. ideal points, both seen as vectors; and the normalized Cumulated Publication Point Index (nCPPI) that represents the cumulated gain of publication success as index values, either graphically or as one overall score for the institution under evaluation. The case study indicates that for smaller interdisciplinary research institutions the cite delay is substantial (2-3 years to obtain a citedness of 50%) when applying WoS for articles. Applying GS implies a shorter delay and much higher citedness for all document types. Statistical significant correlations were only found between WoS and GS and the two publication point systems in between, respectively. The study demonstrates how the nCPPI can be applied to institutions as evaluation tools supplementary to JCI in various combinations, in particular when institutions include humanistic and social science disciplines. (C) 2010 Elsevier Ltd. All rights reserved.
Let L-0 consider an initial Lorenz curve. In this paper we propose a general methodology for obtaining new classes of parametric Lorenz or Leimkuhler curves that contain the original curve as limiting or special case. The new classes introduce additional parameters in the original family, providing more flexibility for the new families. The new classes are built from an ordered sequence of power Lorenz curves, assuming that the powers are distributed according to some convenient discrete random variable. Using this method we obtain many of the families proposed in the literature, including the classical proposal of Bradford (1934), Kakwani and Podder (1973) and others. We obtain some inequality measures and population functions for the proposed families. (C) 2010 Elsevier Ltd. All rights reserved.
We study how scholar collaboration varies across disciplines in science, social science, arts and humanities and the effects of author collaboration on impact and quality of coauthored papers. Impact is measured with the aid of citations collected by papers, while quality is determined by the judgements expressed by peer reviewers. To this end, we take advantage of the dataset provided by the first-ever national research assessment exercise of Italian universities, which involved 20 disciplinary areas, 102 research structures, 18,500 research products, and 6661 peer reviewers. Collaboration intensity neatly varies across disciplines: it is inescapable is most sciences and negligible in most humanities. We measured a general positive association between cardinality of the author set of a paper and citation count as well as peer quality of the contribution. The correlation is stronger when the affiliations of authors are heterogeneous. There exist, however, notable and interesting counter-examples. (C) 2010 Elsevier Ltd. All rights reserved.
This paper studies the correlations between peer review and citation indicators when evaluating research quality in library and information science (LIS). Forty-two LIS experts provided judgments on a 5-point scale of the quality of research published by 101 scholars; the median rankings resulting from these judgments were then correlated with h-, g- and H-index values computed using three different sources of citation data: Web of Science (WoS), Scopus and Google Scholar (GS). The two variants of the basic h-index correlated more strongly with peer judgment than did the h-index itself; citation data from Scopus was more strongly correlated with the expert judgments than was data from GS, which in turn was more strongly correlated than data from WoS; correlations from a carefully cleaned version of GS data were little different from those obtained using swiftly gathered GS data; the indices from the citation databases resulted in broadly similar rankings of the LIS academics; GS disadvantaged researchers in bibliometrics compared to the other two citation database while WoS disadvantaged researchers in the more technical aspects of information retrieval; and experts from the UK and other European countries rated UK academics with higher scores than did experts from the USA. (C) 2010 Elsevier Ltd. All rights reserved.
In this article a relational database schema for a bibliometric database is developed. After the introduction explaining the motivation to use relational databases in bibliometrics, an overview of the related literature is given. A review of typical bibliometric questions serves as an informal requirement analysis. The database schema is developed as an entity-relationship diagram using the structural information typically found in scientific articles. Several SQL queries for the tasks presented in the requirement analysis show the usefulness of the developed database schema. (C) 2010 Elsevier Ltd. All rights reserved.
This paper introduces a new impact indicator for the research effort of a university, (n)h(3). The number of documents or the number of citations obtained by an institution are used frequently in international ranking of institutions. However, these are very dependent on the size and this is inducing mergers with the apparent sole goal of improving the research ranking. The alternative is to use the ratio of the two measures, the mean citation rate, that is size independent but it has been shown to fluctuate along the time as a consequence of its dependence on a very small number of documents with an extremely good citation performance. In the last few years, the popularity of the Hirsch index as an indicator of the research performance of individual researchers led to its application to journals and institutions. However, the original aim of this h index of giving a mixed measure of the number of documents published and their impact as measured by the citations collected along the time is totally undesirable for institutions as the overall size may be considered irrelevant for the impact evaluation of research. Furthermore, the h index when applied to institutions tends to retain a very small number of documents making all other research production irrelevant for this indicator. The (n)h(3) index proposed here is designed to measure solely the impact of research in a way that is independent of the size of the institution and is made relatively stable by making a 20-year estimate of the citations of the documents produced in a single year. (C) 2010 Elsevier Ltd. All rights reserved.
Recent studies have concluded that American contributions to science literature have been in relative decline, whereas contributions from other parts of the world such as the European Union and Asia have increased. Is the same true for the areas of bibliometrics, informetrics and scientometrics? This study investigates the growth and geographic distribution of metrics research for the period 1987-2008. Similar to studies of other disciplines or science in general, the findings reveal that the United States continues to dominate, but there has been a recent relative decline in North American contributions overall. European and Asian contributions have grown substantially. National and institutional collaborations that contribute to this growth do not necessarily follow close geographic proximity, although European nations have been more active with international collaborations overall, both within Europe and elsewhere. (C) 2010 Elsevier Ltd. All rights reserved.
In this paper, a simple relation between the Leimkuhler curve and the mean residual life is established. The result is illustrated with several models commonly used in informetrics, such as exponential, Pareto and lognormal. Finally, relationships with some other reliability concepts are also presented. (C) 2010 Elsevier Ltd. All rights reserved.
The paper reviews the literature on disciplinary credit assignment practices, and presents the results of a longitudinal study of credit assignment practices in the fields of economics, high energy physics, and information science. The practice of alphabetization of authorship is demonstrated to vary significantly between the fields. A slight increase is found to have taken place in economics during the last 30 years (1978-2007). A substantial decrease is found to have taken place in information science during the same period. High energy physics is found to be characterised by a high and stable share of alphabetized multi-authorships during the investigated period (1990-2007). It is important to be aware of such disciplinary differences when conducting bibliometric analyses. (C) 2010 Elsevier Ltd. All rights reserved.
This paper uses scale-independent indicators to explore the Chinese national and regional innovation systems during economic transition. Our perception of an innovation system is frequently informed by conventional indicators based on linear assumptions while actually innovation systems may behave differently. Scale-independent indicators characterize non-linear properties of an innovation system. They can give decision makers deeper insight into the dynamics of innovation systems, and they may lead to more practical public policies [Katz, J. S. (2006). Indicators for complex innovation systems. Research Policy, 35, 893-909]. As reported for the European and Canadian innovation systems the Chinese systems exhibited scaling correlations between GERD (Gross Expenditure on Domestic R&D) and GDP (Gross Domestic Product) over time and at points in time. The scaling factors of the correlations tell us that between 1995 and 2005 the Chinese GERD exhibited a strong nonlinear tendency to increase with GDP. Furthermore they show that the GERD of the Western region is growing much slower than its GDP as compared with Eastern and Central regions. This observation has policy implications suggesting further improvements need to be made to the research infrastructure and funding of the Western region. The GDP-POP (Population) scaling factor shows that the 'wealth intensity' or GDP per capita is increasing much faster than the exponential growth of the Chinese population. In contrast the systemic GDP-POP scaling factor shows that regional development is non-linear. Finally, the paper-GDP and patent-GDP scaling factors tell us that outputs of science and technology for China are growing faster than economic growth. The systemic paper-GDP and patent-GDP scaling factors show that the growth rates are uneven across the provinces. (C) 2010 Elsevier Ltd. All rights reserved.
In the analysis of bibliometric networks, researchers often use mapping and clustering techniques in a combined fashion. Typically, however, mapping and clustering techniques that are used together rely on very different ideas and assumptions. We propose a unified approach to mapping and clustering of bibliometric networks. We show that the VOS mapping technique and a weighted and parameterized variant of modularity-based clustering can both be derived from the same underlying principle. We illustrate our proposed approach by producing a combined mapping and clustering of the most frequently cited publications that appeared in the field of information science in the period 1999-2008. (C) 2010 Elsevier Ltd. All rights reserved.
The citation records of 26 physicists are analyzed in order to determine the modified g index gm which takes multiple coauthorship into account by fractionalized counting of the publications. The results are compared with the original g index as well as with the h index and the respective modified h index h(m). Although the correlations between these indices are relatively strong, the arrangement of the datasets is significantly different in detail depending on whether they are put into order according to the values of either the original or the modified indices. (C) 2010 Elsevier Ltd. All rights reserved.
The Hirsch index h and the g index proposed by Egghe as well as the f index and the t index proposed by Tol are shown to be special cases of a family of Hirsch index variants, based on the generalized mean with exponent p. Inequalities between the different indices are derived from the generalized mean inequality. The graphical determination of the indices is shown for one example. (C) 2010 Elsevier Ltd. All rights reserved.
The qualitative label 'international journal' is used widely, including in national research quality assessments. We determined the practicability of analysing internationality quantitatively using 39 conservation biology journals, providing a single numeric index (IIJ) based on 10 variables covering the countries represented in the journals' editorial boards, authors and authors citing the journals' papers. A numerical taxonomic analysis refined the interpretation, revealing six categories of journals reflecting distinct international emphases not apparent from simple inspection of the IIJs alone. Categories correlated significantly with journals' citation impact (measured by the Hirsch index), with their rankings under the Australian Commonwealth's 'Excellence in Research for Australia' and with some countries of publication, but not with listing by ISI Web of Science. The assessments do not reflect on quality, but may aid editors planning distinctive journal profiles, or authors seeking appropriate outlets.
This longitudinal survey of Swedish biomedical PhDs from 1991 to 2009 found a 2.5-fold increase in biomedical PhD graduates, especially women, and mainly non-MDs, while the number of MDs remained fairly constant. The proportion obtaining a biomedical PhD in Sweden in 2006 was two and a half times that in USA compared to population and three and a half times by GDP, but similar to that of the Netherlands. Female non-MD but not female MD candidates were more likely than men to be examined by female examiners. Fewer of the non-MD than MD women continued to publish in English after their PhD. The median number of authors per paper in a thesis had increased by 1 (from 4 to 5) compared with 15-20 years ago. Swedish biomedical research was already well internationalized in 1991, when 38% of the external examiners came from abroad. This rose to 53% in 2003 but in 2009 had returned to 42%. USA and UK were the most common countries but Australia accounted for 2%. When assessed by connection with foreign research teams, Swedish researchers were also internationally well connected. Studies in other countries are needed to assess how generally applicable these findings are. Our findings suggest that the policy and management of Swedish scientific research systems needs revision to harmonize with the national economic capacity.
In this paper a new author's productivity index is introduced, namely the golden productivity index. The proposed index measures the productivity of an individual researcher evaluating the number of papers as well as the rank of co-authorship. It provides an efficient method to measure the author's contribution in articles writing, compared to other ordinary methods. It gives emphasis to the first authors contributions due to the fact that traditionally the rank of each author shows the magnitude of his contribution in the article.
A study is described of the rank/JIF (Journal Impact Factor) distributions in the high-coverage Scopus database, using recent data and a three-year citation window. It includes a comparison with an older study of the Journal Citation Report categories and indicators, and a determination of the factors most influencing the distributions. While all the specific subject areas fit a negative logarithmic law fairly well, those with a greater External JIF have distributions with a more sharply defined peak and a longer tail-something like an iceberg. No S-shaped distributions, such as predicted by Egghe, were found. A strong correlation was observed between the knowledge export and import ratios. Finally, data from both Scopus and ISI were used to characterize the rank/JIF distributions by subject area.
The issue of primary interest to this study is the collaboration that has taken place in science and technology (S&T) research in China. Due to our empirical evidences, the regions with higher relationship (network) capital enjoy higher knowledge productivity in terms of published articles. Our purpose in this paper is to investigate the relationships that exist between regional published articles and co-authorship in China covering the period from 1998 to 2007 by using Stata to investigate the relation between the regional publications and co-authored published articles. As main findings, the greater the number of co-authored articles that a region has, the greater their success, in terms of the number of articles published. Indeed, both domestic and international co-authorship have had positive effects on published article levels in China.
Citation analyses were performed for Australian social science journals to determine the differences between data drawn from Web of Science and Scopus. These data were compared with the tier rankings assigned by disciplinary groups to the journals for the purposes of a new research assessment model, Excellence in Research for Australia (ERA), due to be implemented in 2010. In addition, citation-based indicators including an extended journal impact factor, the h-index, and a modified journal diffusion factor, were calculated to assess whether subsequent analyses influence the ranking of journals. The findings suggest that the Scopus database provides higher number of citations for more of the journals. However, there appears to be very little association between the assigned tier ranking of journals and their rank derived from citations data. The implications for Australian social science researchers are discussed in relation to the use of citation analysis in the ERA.
In this paper we deal with the fixed capital nature of the means of production and labour employed in research and development which generate scientific and technological knowledge. We argue that these R&D current expenditures typically have the nature of fixed investments. We then present an empirical analysis which shows that expenditures on industrial R&D are more strongly linked to the formation of fixed capital than to the formation of capital in general. Applying this conclusion to the economics of research and innovation would make it possible to analyse investments in the production of scientific and technological knowledge with a higher degree of clarity and precision.
In recent years a number of studies have focused on Argentina's 2001 economic crisis and its political, social, and institutional repercussions. To date, however, no studies have analyzed its effects upon the country's scientific system from a scientometric perspective, in terms of resources dedicated to scientific activity and the final output and impact. The present study does so by means of a set of scientometric indicators that reflect economic effort, human resources dedicated to research, publications, collaborative relations, and the international visibility of scientific contributions.
A bibliometric analysis of Spanish cardiovascular research is presented. The study focuses on the productivity, visibility and citation impact in an international, notably European context. Special attention is given to international collaboration. The underlying bibliographic data are collected from Thomson Reuters's Web of Science on the basis of a 'hybrid' search strategy combining core journals, lexical terms and citation links especially developed for the field of cardiology.
There is a rich literature on how science and technology are related to each other. Patent citation analysis is amongst the most frequently used to tool to track the strengths of links. In this paper we explore the relationship between patent citations and citation impact in nanoscience. Our observations indicate that patent-cited papers perform better in terms of standard bibliometric indicators than comparable publications that are not linked to technology in this way. More specifically, we found that articles cited in patents are more likely to be cited also by other papers. The share of highly cited papers is the most striking result. Instead of the average of 4% of all papers, 13.8% of the papers cited once or twice in patents fall into this category and even 23.5% of the papers more frequently cited in patents receive citation rates far above the standard. Our analyses further demonstrate the presence and the relevance of bandwagon effects driving the development of science and technology.
The purpose of this paper was to analyze the intellectual structure of biomedical informatics reflected in scholarly events such as conferences, workshops, symposia, and seminars. As analysis variables, 'call for paper topics', 'session titles' and author keywords from biomedical informatics-related scholarly events, and the MeSH descriptors were combined. As analysis cases, the titles and abstracts of 12,536 papers presented at five medical informatics (MI) and six bioinformatics (BI) global scale scholarly event series during the years 1999-2008 were collected. Then, n-gram terms (MI = 6,958; BI = 5,436) from the paper corpus were extracted and the term co-occurrence network was analyzed. One hundred important topics for each medical informatics and bioinformatics were identified through the hub-authority metric, and their usage contexts were compared with the k-nearest neighbor measure. To research trends, newly popular topics by 2-year period units were observed. In the past 10 years the most important topic in MI has been "decision support", while in BI "gene expression". Though the two communities share several methodologies, according to our analysis, they do not use them in the same context. This evidence suggests that MI uses technologies for the improvement of productivity in clinical settings, while BI uses algorithms as its tools for scientific biological discovery. Though MI and BI are arguably separate research fields, their topics are increasingly intertwined, and the gap between the fields blurred, forming a broad informatics-namely biomedical informatics. Using scholarly events as data sources for domain analysis is the closest way to approximate the forefront of biomedical informatics.
Academic papers, like genes, code for ideas or technological innovations that structure and transform the scientific organism and consequently the society at large. Genes are subject to the process of natural selection which ensures that only the fittest survive and contribute to the phenotype of the organism. The process of selection of academic papers, however, is far from natural. Commercial for-profit publishing houses have taken control over the evaluation and access to scientific information with serious consequences for the dissemination and advancement of knowledge. Academic authors and librarians are reacting by developing an alternative publishing system based on free-access journals and self-archiving in institutional repositories and global disciplinary libraries. Despite the emergence of such trends, the journal monopoly, rather than the scientific community, is still in control of selecting papers and setting academic standards. Here we propose a dynamical and transparent peer review process, which we believe will accelerate the transition to a fully open and free-for-all science that will allow the natural selection of the fittest ideas.
Recent research has shown that simple graphical representations of research performance can be obtained using two-dimensional maps based on impact (i) and citations (C). The product of impact and citations leads to an energy term (E). Indeed, using E as the third coordinate, three-dimensional landscape maps can be prepared. In this paper, instead of using the traditional impact factor and total citations received for journal evaluation, Article Influence(TM) and Eigenfactor(TM) are used as substitutes. Article Influence becomes a measure of quality (i.e. a proxy for impact factor) and Eigenfactor is a proxy for size/quantity (like citations) and taken together, the product is an energy-like term. This can be used to measure the influence/prestige of a journal. It is also possible to propose a p-factor (where p = E (1/3)) as an alternative measure of the prestige or prominence of a journal which plays the equivalent role of the h-index.
A collection of coauthored papers is the new norm for doctoral dissertations in the natural and biomedical sciences, yet there is no consensus on how to partition authorship credit between PhD candidates and their coauthors. Guidelines for PhD programs vary but tend to specify only a suggested range for the number of papers to be submitted for evaluation, sometimes supplemented with a requirement for the PhD candidate to be the principal author on the majority of submitted papers. Here I use harmonic counting to quantify the actual amount of authorship credit attributable to individual PhD graduates from two Scandinavian universities in 2008. Harmonic counting corrects for the inherent inflationary and equalizing biases of routine counting methods, thereby allowing the bibliometrically identifiable amount of authorship credit in approved dissertations to be analyzed with unprecedented accuracy. Unbiased partitioning of authorship credit between graduates and their coauthors provides a post hoc bibliometric measure of current PhD requirements, and sets a de facto baseline for the requisite scientific productivity of these contemporary PhD's at a median value of approximately 1.6 undivided papers per dissertation. Comparison with previous census data suggests that the baseline has shifted over the past two decades as a result of a decrease in the number of submitted papers per candidate and an increase in the number of coauthors per paper. A simple solution to this shifting baseline syndrome would be to benchmark the amount of unbiased authorship credit deemed necessary for successful completion of a specific PhD program, and then monitor for departures from this level over time. Harmonic partitioning of authorship credit also facilitates cross-disciplinary and inter-institutional analysis of the scientific output from different PhD programs. Juxtaposing bibliometric benchmarks with current baselines may thus assist the development of harmonized guidelines and transparent transnational quality assurance procedures for doctoral programs by providing a robust and meaningful standard for further exploration of the causes of intra- and inter-institutional variation in the amount of unbiased authorship credit per dissertation.
This study developed a multilevel model of academic publishing and tested the effects of several predictors on faculty publishing. In particular, the analysis paid special attention to faculty preference, time on research, research collaboration, and faculty discipline. The data we used for this study is the Changing Academic Professions (CAP) data which is the follow-up study of the Carnegie Foundation in 1992. The study found that faculty preference for research affects research publishing. In addition, faculty collaboration with international peers is a critical factor in academic publishing. While time spent on research is related to publishing, time spent on teaching does not have a conflicting effect on faculty research. In the institution level analysis, institutional goal-orientation and institutional mission were found to have effects on academic publishing. However, the principal determinants of academic publishing were found to lie at the individual faculty member level. For each of these findings, there are subtle differences by academic discipline.
This article examines the development of social science literature focused on the emerging area of nanotechnology. It is guided by the exploratory proposition that early social science work on emerging technologies will draw on science and engineering literature on the technology in question to frame its investigative activities, but as the technologies and societal investments in them progress, social scientists will increasingly develop and draw on their own body of literature. To address this proposition the authors create a database of nanotechnology-social science literature by merging articles from the Web of Science's Social Science Citation Index and Arts and Humanities Citation Index with articles from Scopus. The resulting database comprises 308 records. The findings suggest that there are multiple dimensions of cited literature and that social science citations of other social scientists' works have increased since 2005.
Assessing the quality of the knowledge produced by business and management academics is increasingly being metricated. Moreover, emphasis is being placed on the impact of the research rather than simply where it is published. The main metric for impact is the number of citations a paper receives. Traditionally this data has come from the ISI Web of Science but research has shown that this has poor coverage in the social sciences. A newer and different source for citations is Google Scholar. In this paper we compare the two on a dataset of over 4,600 publications from three UK Business Schools. The results show that Web of Science is indeed poor in the area of management and that Google Scholar, whilst somewhat unreliable, has a much better coverage. The conclusion is that Web of Science should not be used for measuring research impact in management.
Concept-based information retrieval and knowledge representation are in need of a theory of concepts and semantic relations. Guidelines for the construction and maintenance of knowledge organization systems (KOS) (such as ANSI/NISO Z39.19-2005 in the U. S. A. or DIN 2331: 1980 in Germany) do not consider results of concept theory and theory of relations to the full extent. They are not able to unify the currently different worlds of traditional controlled vocabularies, of the social web (tagging and folksonomies) and of the semantic web (ontologies). Concept definitions as well as semantic relations are based on epistemological theories (empiricism, rationalism, hermeneutics, pragmatism, and critical theory). A concept is determined via its intension and extension as well as by definition. We will meet the problem of vagueness by introducing prototypes. Some important definitions are concept explanations (after Aristotle) and the definition of family resemblances (in the sense of Wittgenstein). We will model concepts as frames (according to Barsalou). The most important paradigmatic relation in KOS is hierarchy, which must be arranged into different classes: Hyponymy consists of taxonomy and simple hyponymy, meronymy consists of many different part-whole-relations. For practical application purposes, the transitivity of the given relation is very important. Unspecific associative relations are of little help to our focused applications and should be replaced by generalizable and domain-specific relations. We will discuss the reflexivity, symmetry, and transitivity of paradigmatic relations as well as the appearance of specific semantic relations in the different kinds of KOS (folksonomies, nomenclatures, classification systems, thesauri, and ontologies). Finally, we will pick out KOS as a central theme of the Semantic Web.
Support for explicit collaboration in information-seeking activities is increasingly recognized as a desideratum for search systems. Several tools have emerged recently that help groups of people with the same information-seeking goals to work together. Many issues for these collaborative information-seeking (CIS) environments remain understudied. The authors identified awareness as one of these issues in CIS, and they presented a user study that involved 42 pairs of participants, who worked in collaboration over 2 sessions with 3 instances of the authors' CIS system for exploratory search. They showed that while having awareness of personal actions and history is important for exploratory search tasks spanning-multiple sessions, support for group awareness is even more significant for effective collaboration. In addition, they showed that support for such group awareness can be provided without compromising usability or introducing additional load on the users.
Many studies have demonstrated that people engage in a variety of different information behaviors when engaging in information seeking. However, standard information retrieval systems such as Web search engines continue to be designed to support mainly one such behavior, specified searching. This situation has led to suggestions that people would be better served by information retrieval systems which support different kinds of information-seeking strategies. This article reports on an experiment comparing the retrieval effectiveness of an integrated interactive information retrieval (IIR) system which adapts to support different information-seeking strategies with that of a standard baseline IIR system. The experiment, with 32 participants each searching on eight different topics, indicates that using the integrated IIR system resulted in significantly better user satisfaction with search results, significantly more effective interaction, and significantly better usability than that using the baseline system.
This research examined college students' image searching processes on the Web. The study's objective was to collect empirical data on students' search needs and identify what contextual factors had a significant influence on their image searching tactics. While confirming common search behaviors such as Google-dominant use, short queries, rare use of advanced search options, and checking few search result pages, the findings also revealed a significantly different effect of contextual factors on the tactics of querying and navigating, performance, and relevance judgment. In particular, interaction activities were differentiated by task goals, level of searching expertise, and work task stages. The results suggested that context-sensitive services and interface features would better suit Web users' actual needs and enhance their searching experience.
Science Data Repositories (SDRs) have been recognized as both critical to science, and undergoing a fundamental change. A websample study was conducted of 100 SDRs. Information on the websites and from administrators of the SDRs was reviewed to determine salient characteristics of the SDRs, which were used to classify SDRs into groups using a combination of cluster analysis and logistic regression. Characteristics of the SDRs were explored for their role in determining groupings and for their relationship to the success of SDRs. Four of these characteristics were identified as important for further investigation: whether the SDR was supported with grants and contracts, whether support comes from multiple sponsors, what the holding size of the SDR is and whether a preservation policy exists for the SDR. An inferential framework for understanding SDR composition, guided by observations, characteristic collection and refinement and subsequent analysis on elements of group membership, is discussed. The development of SDRs is further examined from a business standpoint, and in comparison to its most similar form, institutional repositories. Because this work identifies important characteristics of SDRs and which characteristics potentially impact the sustainability and success of SDRs, it is expected to be helpful to SDRs.
The aim of this study was to measure the efficiency of the system by which scientists worldwide communicate results to each other, providing one measure of the degree to which the system, including all media, functions well. A randomly selected and representative sample of 246 active research scientists worldwide was surveyed. The main measure was the reported rate of "late finds": scientific literature that would have been useful to scientists' projects if it had been found at the beginning of these projects. The main result was that 46% of the sample reported late finds (+/-6.25%, p < 0.05). Among respondents from European Union countries or other countries classified as "high income" by the World Bank, 42% reported late finds. Among respondents from low-and middle-income countries, 56% reported late finds. The 42% rate in high-income countries in 2009 can be compared with results of earlier surveys by Martyn (1964a,b, 1987). These earlier surveys found a rate of 22% late finds in 1963-1964 and a rate of 27% in 1985-1986. Respondents were also queried about search habits, but this study failed to support any explanations for this increase in the rate of late finds. This study also permits a crude estimate of the cost in time and money of the increase in late finds.
In this study, reference standards and reference multipliers are suggested as a means to compare the citation impact of earlier research publications in physics (from the period of "Little Science" in the early 20th century) with that of contemporary papers (from the period of "Big Science," beginning around 1960). For the development of time-specific reference standards, the authors determined (a) the mean citation rates of papers in selected physics journals as well as (b) the mean citation rates of all papers in physics published in 1900 (Little Science) and in 2000 (Big Science); this was accomplished by relying on the processes of field-specific standardization in bibliometry. For the sake of developing reference multipliers with which the citation impact of earlier papers can be adjusted to the citation impact of contemporary papers, they combined the reference standards calculated for 1900 and 2000 into their ratio. The use of reference multipliers is demonstrated by means of two examples involving the time adjusted h index values for Max Planck and Albert Einstein.
Hirsch's h index is becoming the standard measure of an individual's research accomplishments. The aggregation of individuals' measures is also the basis for global measures at institutional or national levels. To investigate whether the h index can be reliably computed through alternative sources of citation records, the Web of Science (WoS), PsycINFO and Google Scholar (GS) were used to collect citation records for known publications of four Spanish psychologists. Compared with WoS, PsycINFO included a larger percentage of publication records, whereas GS outperformed WoS and PsycINFO in this respect. Compared with WoS, PsycINFO retrieved a larger number of citations in unique areas of psychology, but it retrieved a smaller number of citations in areas that are close to statistics or the neurosciences, whereas GS retrieved the largest numbers of citations in all cases. Incorrect citations were scarce in Wos (0.3%), more prevalent in PsycINFO (1.1%), and overwhelming in GS (16.5%). All platforms retrieved unique citations, the largest set coming from GS. WoS and PsycINFO cover distinct areas of psychology unevenly, thus applying different penalties on the h index of researches working in different fields. Obtaining fair and accurate h indices required the union of citations retrieved by all three platforms.
Author research impact was examined based on citer analysis (the number of citers as opposed to the number of citations) for 90 highly cited authors grouped into three broad subject areas. Citer-based outcome measures were also compared with more traditional citation-based measures for levels of association. The authors found that there are significant differences in citer-based outcomes among the three broad subject areas examined and that there is a high degree of correlation between citer and citation-based measures for all measures compared, except for two outcomes calculated for the social sciences. Citer-based measures do produce slightly different rankings of authors based on citer counts when compared to more traditional citation counts. Examples are provided. Citation measures may not adequately address the influence, or reach, of an author because citations usually do not address the origin of the citation beyond self-citations.
This article reports a cross-cultural analysis of four Wikipedias in different languages and demonstrates their roles as communities of practice (CoPs). Prior research on CoPs and on the Wikipedia community often lacks cross-cultural analysis. Despite the fact that over 75% of Wikipedia is written in languages other than English, research on Wikipedia primarily focuses on the English Wikipedia and tends to overlook Wikipedias in other languages. This article first argues that Wikipedia communities can be analyzed and understood as CoPs. Second, norms of behaviors are examined in four Wikipedia languages (English, Hebrew, Japanese, and Malay), and the similarities and differences across these four languages are reported. Specifically, typical behaviors on three types of discussion spaces (talk, user talk, and Wikipedia talk) are identified and examined across languages. Hofstede's dimensions of cultural diversity as well as the size of the community and the function of each discussion area provide lenses for understanding the similarities and differences. As such, this article expands the research on online CoPs through an examination of cultural variations across multiple CoPs and increases our understanding of Wikipedia communities in various languages.
This study explored the feasibility of using Web hyperlink data to study European political Web sites. Ninety-six European Union (EU) political parties belonging to a wide range of ideological, historical, and linguistic backgrounds were included in the study. Various types of data on Web links to party Web sites were collected. The Web colink data were visualized using multidimensional scaling (MDS), while the inlink data were analyzed with a 2-way analysis of variance test. The results showed that Web hyperlink data did reflect some political patterns in the EU. The MDS maps showed clusters of political parties along ideological, historical, linguistic, and social lines. Statistical analysis based on inlink counts further confirmed that there was a significant difference along the line of the political history of a country, such that left-wing parties in the former communist countries received considerably fewer inlinks to their Web sites than left-wing parties in countries without a history of communism did. The study demonstrated the possibility of using Web hyperlink data to gain insights into political situations in the EU. This suggests the richness of Web hyperlink data and its potential in studying social-political phenomena.
This research empirically evaluated the use of mobile information and communication technology in a large-sized undergraduate class, where the effectiveness of multilearner participation and prompt learner-instructor interaction is often challenged. The authors analyzed the effectiveness of a so-called "lean" communication medium using hand-held mobile devices, whose brief text-based messages considerably limit the speed of information exchange. Adopting a social construction perspective of media richness theory and a reinforced approach to learning and practice, the authors conjectured that an interactive learning system built with wireless PDA devices can enhance individual practices and reinforce peer influences. Consequently, they expected better understanding and higher satisfaction among learners. A field experiment with 118 participants in the treatment and 114 participants in the control group supported their hypotheses. Their results suggested that richness of a "lean" medium could be increased in certain socially constructed conditions, thus extending existing notions of computer-aided instruction towards a techno-social learning model.
The communication of meaning as distinct from (Shannon-type) information is central to Luhmann's social systems theory and Giddens' structuration theory of action. These theories share an emphasis on reflexivity, but focus on meaning along a divide between interhuman communication and intentful action as two different systems of reference. Recombining these two theories into a theory about the structuration of expectations, interactions, organization, and self-organization of intentional communications can be simulated based on algorithms from the computation of anticipatory systems. The self-organizing and organizing layers remain rooted in the double contingency of the human encounter, which provides the variation. Organization and self-organization of communication are reflexive upon and therefore reconstructive of each other. Using mutual information in three dimensions, the imprint of meaning processing in the modeling system on the historical organization of uncertainty in the modeled system can be measured. This is shown empirically in the case of intellectual organization as "structurating" structure in the textual domain of scientific articles.
Similarity measures, such as the ones of Jaccard, Dice, or Cosine, measure the similarity between two vectors. A good property for similarity measures would be that, if we add a constant vector to both vectors, then the similarity must increase. We show that Dice and Jaccard satisfy this property while Cosine and both overlap measures do not. Adding a constant vector is called, in Lorenz concentration theory, "nominal increase" and we show that the stronger "transfer principle" is not a required good property for similarity measures. Another good property is that, when we have two vectors and if we add one of these vectors to both vectors, then the similarity must increase. Now Dice, Jaccard, Cosine, and one of the overlap measures satisfy this property, while the other overlap measure does not. Also a variant of this latter property is studied.
Although there is considerable consensus that Finance, Management and Marketing are 'science', some debate remains with regard to whether these three areas comprise autonomous, organized and settled scientific fields of research. In this paper we aim to explore this issue by analyzing the occurrence of citations in the top-ranked journals in the areas of Finance, Management, and Marketing. We put forward a modified version of the model of science as a network, proposed by Klamer and Van Dalen (J Econ Methodol 9(2):289-315, 2002), and conclude that Finance is a 'Relatively autonomous, organized and settled field of research', whereas Management and (to a larger extent) Marketing are relatively non-autonomous and hybrid fields of research'. Complementary analysis based on sub-discipline rankings using the recursive methodology of Liebowitz and Palmer (J Econ Lit 22:77-88, 1984) confirms the results. In conclusions we briefly discuss the pertinence of Whitley's (The intellectual and social organization of the sciences, 1984) theory for explaining cultural differences across these sub-disciplines based on its dimensions of scholarly practices, 'mutual dependency' and 'task uncertainty'.
Citations in five leading environmental science journals were examined for accuracy. 24.41% of the 2,650 citations checked were found to contain errors. The largest category of errors was in the author field. Of the five journals Conservation Biology had the lowest percentage of citations with errors and Climatic Change had the highest. Of the citations with errors that could be checked in Web of Science, 18.18% of the errors caused a search for the cited article to fail. Citations containing electronic links had fewer errors than those without.
This paper presents a methodology to aggregate multidimensional research output. Using a tailored version of the non-parametric Data Envelopment Analysis model, we account for the large heterogeneity in research output and the individual researcher preferences by endogenously weighting the various output dimensions. The approach offers three important advantages compared to the traditional approaches: (1) flexibility in the aggregation of different research outputs into an overall evaluation score; (2) a reduction of the impact of measurement errors and a-typical observations; and (3) a correction for the influences of a wide variety of factors outside the evaluated researcher's control. As a result, research evaluations are more effective representations of actual research performance. The methodology is illustrated on a data set of all faculty members at a large polytechnic university in Belgium. The sample includes questionnaire items on the motivation and perception of the researcher. This allows us to explore whether motivation and background characteristics (such as age, gender, retention, etc.,) of the researchers explain variations in measured research performance.
This paper investigates the extent to which staff editors' evaluations of submitted manuscripts-that is, internal evaluations carried out before external peer reviewing-are valid. To answer this question we utilized data on the manuscript reviewing process at the journal Angewandte Chemie International Edition. The results of this study indicate that the initial internal evaluations are valid. Further, it appears that external review is indispensable for the decision on the publication worthiness of manuscripts: (1) For the majority of submitted manuscripts, staff editors are uncertain about publication worthiness; (2) there is a statistically significant proportional difference in "Rejection" between the editors' initial evaluation and the final editorial decision (after peer review); (3) three-quarters of the manuscripts that were rated negatively at the initial internal evaluation but accepted for publication after the peer review had far above-average citation counts.
This study proposes an empirical way for determining probability of network tie formation between network actors. In social network analysis, it is a usual problem that information for determining whether or not a network tie should be formed is missing for some network actors, and thus network can only be partially constructed due to unavailability of information. This methodology proposed in this study is based on network actors' similarities calculations by Vector-Space Model to calculate how possible network ties can be formed. Also, a threshold value of similarity for deciding whether or not a network tie should be generated is suggested in this study. Four ontology-based knowledge networks, with journal paper or research project as network actors, constructed previously are selected as the targets of this empirical study: (1) Technology Foresight Paper Network: 181 papers and 547 keywords, (2) Regional Innovation System Paper Network: 431 papers and 1165 keywords, (3) Global Sci-Tech Policy Paper Network: 548 papers and 1705 keywords, (4) Taiwan's Sci-Tech Policy Project Network: 143 research projects and 213 keywords. The four empirical investigations allow a cut-off threshold value calculated by Vector-Space Model to be suggested for deciding the formation of network ties when network linkage information is unavailable.
In national research assessment exercises that take the peer review approach, research organizations are evaluated on the basis of a subset of their scientific production. The dimension of the subset varies from nation to nation but is typically set as a proportional function of the number of researchers employed at each research organization. However, scientific fertility varies from discipline to discipline, meaning that the representativeness of such a subset also varies according to discipline. The rankings resulting from the assessments could be quite sensitive to the size of the share of articles selected for evaluation. The current work examines this issue, developing empirical evidence of variations in ranking due changes in the dimension of the subset of products evaluated. The field of observation is represented by the scientific production from the hard sciences of the entire Italian university system, from 2001 to 2003.
This paper quantitatively explores the social and socio-semantic patterns of constitution of academic collaboration teams. To this end, we broadly underline two critical features of social networks of knowledge-based collaboration: first, they essentially consist of group-level interactions which call for team-centered approaches. Formally, this induces the use of hypergraphs and n-adic interactions, rather than traditional dyadic frameworks of interaction such as graphs, binding only pairs of agents. Second, we advocate the joint consideration of structural and semantic features, as collaborations are allegedly constrained by both of them. Considering these provisions, we propose a framework which principally enables us to empirically test a series of hypotheses related to academic team formation patterns. In particular, we exhibit and characterize the influence of an implicit group structure driving recurrent team formation processes. On the whole, innovative production does not appear to be correlated with more original teams, while a polarization appears between groups composed of experts only or non-experts only, altogether corresponding to collectives with a high rate of repeated interactions.
I propose the index (h) over bar ("hbar"), defined as the number of papers of an individual that have citation count larger than or equal to the (h) over barh of all coauthors of each paper, as a useful index to characterize the scientific output of a researcher that takes into account the effect of multiple authorship. The bar is higher for (h) over bar.
Over the past 30 years, the research behavior of Chinese scholars has continually evolved. This paper studied the citing behavior of Chinese scholars by employing three indicators of citation concentration from the perspective of citation breadth analysis. All the citations from 2,338,033 papers from the Chinese Citation Database (1979-2008) covering four disciplines-Chemistry; Clinical Medicine; Library, Information and Archival Science; and Chinese Literature and World Literature-were analyzed. Empirical results show a general weakening tendency towards citation concentration: (1) decreasing percentage of uncited published papers within a given year; (2) a higher percentage of papers required to account for the same proportion of citation than before; and (3) the steady decline in the Herfindahl-Hirschman index (HHI) of citation distribution. All three measures indicate a decline in citing concentration or an increase in citation breadth. This phenomenon may be the result of increased access to materials, perhaps because of the ease with which scholarly materials can be accessed through the Internet.
Supporting and advancing women's science careers continues to be of interest to researchers, scientists, science funders, and universities. Similarly, professional advice and support networks are important to understanding the advancement of scientific careers. This research aims to marry these two lines of research to investigate and compare the ways in which men and women scientists seek advice and support from women in their networks. Using a sample of academic scientists in nonmedical biology, chemistry, computer science, earth and atmospheric sciences, electrical engineering, and physics we assess the extent to which women and men scientists seek advice and support from women in their networks. We find that field of science is the primary predictor for the presence of women in scientists' advice and support networks. We also find that citizenship, rank, age, and friendship are significantly related to the proportion of women in women's networks, but are not consistently significantly related to the proportion of women in men's networks. We conclude with a discussion of the findings and the distinctions between men and women scientists' advice and support networks.
A detailed analysis of the research carried out in Mexico in the physics specialty of particles and fields (MPPF) reveals the way the current production and citation patterns evolved over a period of 60 years. The basis for the analysis were the publications and citations registered in the Stanford Public Information REtrieval System-High Energy Physics (SPIRES) from 1970 to 2007. The historical coverage afforded by the Science Citation Index provided supplementary data from 1948 to 1979. Papers were classified into five research types: theoretical, phenomenological, experimental, cosmological, and other, while citations were identified as coming from: published or unpublished sources. Results show that the development of MPPF emerged from traditional theoretical and phenomenological research and that the most notable changes taking place in production and impact are associated with the community's involvement in more productive and more internationally visible research practices, characteristic of large international collaborations, leaders in experimental physics and in the authorship of review papers.
With the growing recognition of the importance of knowledge creation, knowledge maps are being regarded as a critical tool for successful knowledge management. However, the various methods of developing knowledge maps mostly depend on unsystematic processes and the judgment of domain experts with a wide range of untapped information. Thus, this research aims to propose a new approach to generate knowledge maps by mining document databases that have hardly been examined, thereby enabling an automatic development process and the extraction of significant implications from the maps. To this end, the accepted research proposal database of the Korea Research Foundation (KRF), which includes a huge knowledge repository of research, is investigated for inducing a keyword-based knowledge map. During the developmental process, text mining plays an important role in extracting meaningful information from documents, and network analysis is applied to visualize the relations between research categories and measure the value of network indices. Five types of knowledge maps (core R&D map, R&D trend map, R&D concentration map, R&D relation map, and R&D cluster map) are developed to explore the main research themes, monitor research trends, discover relations between R&D areas, regions, and universities, and derive clusters of research categories. The results can be used to establish a policy to support promising R&D areas and devise a long-term research plan.
The paper presents the dynamics of the strategic management scientific community network during knowledge creation and dissemination through the Strategic Management Journal from 1980 to 2009. The paper describes the evolution of the participant countries' position within the network structure. We present the different stages that the network goes through, the vertices' transformation into nodes and hubs, and the statistical significance level of cooperation between the country in the core position and the countries in the semi-periphery and periphery positions during their evolution and growth.
This paper studies the structure of collaboration in the Journal of Finance for the period 1980-2009 using publication data from the Social Sciences Citation Index (SSCI). There are 3,840 publications within this period, out of which 58% are collaborations. These collaborations form 405 components, with the giant component capturing approximately 54% of total coauthors (it is estimated that the upper limit of distinct JF coauthors is 2,536, obtained from the total number of distinct author keywords found within the study period). In comparison, the second largest component has only 13 members. The giant component has mean degree 3 and average distance 8.2. It exhibits power-law scaling with exponent alpha = 3.5 for vertices with degree a parts per thousand yen5. Based on the giant component, the degree, closeness and betweenness centralization score, as well as the hubs/authorities score is determined. The findings indicate that the most important vertex on the giant component coincides with Sheridan Titman based on his top ten ranking on all four scores.
The search task and the system both affect the demand on cognitive resources during information search. In some situations the demands may become too high for a person. This article has a three-fold goal. First, it presents and critiques methods to measure cognitive load. Second, it explores the distribution of load across search task stages. Finally, it seeks to improve our understanding of factors affecting cognitive load levels in information search. To this end, a controlled Web search experiment with 48 participants was conducted. Interaction logs were used to segment search tasks semiautomatically into task stages. Cognitive load was assessed using a new variant of the dual-task method. Average cognitive load was found to vary by search task stages. It was significantly higher during query formulation and user description of a relevant document as compared to examining search results and viewing individual documents. Semantic information shown next to the search results lists in one of the studied interfaces was found to decrease mental demands during query formulation and examination of the search results list. These findings demonstrate that changes in dynamic cognitive load can be detected within search tasks. Dynamic assessment of cognitive load is of core interest to information science because it enriches our understanding of cognitive demands imposed on people engaged in the search process by a task and the interactive information retrieval system employed.
Although many studies have identified search tactics, few studies have explored tactic transitions. This study investigated the transitions of search tactics during the Web-based search process. Bringing their own 60 search tasks, 31 participants, representing the general public with different demographic characteristics, participated in the study. Data collected from search logs and verbal protocols were analyzed by applying both qualitative and quantitative methods. The findings of this study show that participants exhibited some unique Web search tactics. They overwhelmingly employed accessing and evaluating tactics; they used fewer tactics related to modifying search statements, monitoring the search process, organizing search results, and learning system features. The contributing factors behind applying most and least frequently employed search tactics are in relation to users' efforts, trust in information retrieval (IR) systems, preference, experience, and knowledge as well as limitation of the system design. A matrix of search-tactic transitions was created to show the probabilities of transitions from one tactic to another. By applying fifth-order Markov chain, the results also presented the most common search strategies representing patterns of tactic transition occurring at the beginning, middle, and ending phases within one search session. The results of this study generated detailed and useful guidance for IR system design to support the most frequently applied tactics and transitions, to reduce unnecessary transitions, and support transitions at different phases.
Although considered proxies for people to interact with a system, mental models have produced limited practical implications for system design. This might be due to the lack of exploration of the elements of mental models resulting from the methodological challenge of measuring mental models. This study employed a new method, concept listing, to elicit people's mental models of an information-rich space, MedlinePlus, after they interacted with the system for 5 minutes. Thirty-eight undergraduate students participated in the study. The results showed that, in this short period of time, participants perceived MedlinePlus from many different aspects in relation to four components: the system as a whole, its content, information organization, and interface. Meanwhile, participants expressed evaluations of or emotions about the four components. In terms of the procedural knowledge, an integral part of people's mental models, only one participant identified a strategy more aligned to the capabilities of MedlinePlus to solve a hypothetical task; the rest planned to use general search and browse strategies. The composition of participants' mental models of MedlinePlus was consistent with that of their models of information-rich Web spaces in general.
The work of corporate finance professionals is information-intensive, despite the fact that the practices and motivations of their information preferences have been researched very little. The present study investigates perceived success and how it is related to corporate finance professionals' information source use behavior based on a Web survey of 92 Finnish corporate finance professionals. The principal finding of the statistical analysis of the data is that perceptions of work success and specific types of information sources contributing to the success are related. The correlations are complex, and very different types of information sources contribute to individual types of success, and vice versa. The findings indicate that information sources function as measures of success and serve an instrumental purpose. Besides functional relations, the correlation of the variables suggests more comprehensive success and information source use related dependencies and preferences. The findings imply that by studying existing perceptions of success, it is possible to make inferences about preferred information sources. The study also suggests that both personal and organizational perceptions of success should be taken into account when planning information services and information literacy education for corporate finance professionals to increase their effectiveness and relevance for the professionals.
Web 2.0 and social/collaborative tagging have altered the traditional roles of indexer and user. Traditional indexing tools and systems assume the top-down approach to indexing in which a trained professional is responsible for assigning index terms to information sources with a potential user in mind. However, in today's Web, end users create, organize, index, and search for images and other information sources through social tagging and other collaborative activities. One of the impediments to user-centered indexing had been the cost of soliciting user-generated index terms or tags. Social tagging of images such as those on Flickr, an online photo management and sharing application, presents an opportunity that can be seized by designers of indexing tools and systems to bridge the semantic gap between indexer terms and user vocabularies. Empirical research on the differences and similarities between user-generated tags and index terms based on controlled vocabularies has the potential to inform future design of image indexing tools and systems. Toward this end, a random sample of Flickr images and the tags assigned to them were content analyzed and compared with another sample of index terms from a general image collection using established frameworks for image attributes and contents. The results show that there is a fundamental difference between the types of tags and types of index terms used. In light of this, implications for research into and design of user-centered image indexing tools and systems are discussed.
While advances in highly targeted therapies and increased use of mammogram services have contributed to the overall decline of breast cancer deaths in the United States, these benefits have not been distributed equitably. Less educated, poor, rural, non-Hispanic African American women have poorer access to cancer services and are less likely to have had a mammogram than are urban women. Lack of physician recommendations and perceived barriers in accessing diagnostic services are major factors that hinder the uptake of regular mammograms in rural communities. This article reports results of formative research conducted as part of a larger study focused on the participatory development of an electronic reminder system for breast cancer screening. The article discusses insights gained from focus groups with rural patients and clinicians about their information needs, breast cancer screening behaviors, barriers to care, and mammography referral practices.
Hierarchical text classification (HTC) approaches have recently attracted a lot of interest on the part of researchers in human language technology and machine learning, since they have been shown to bring about equal, if not better, classification accuracy with respect to their "flat" counterparts while allowing exponential time savings at both learning and classification time. A typical component of HTC methods is a "local" policy for selecting negative examples: Given a category c, its negative training examples are by default identified with the training examples that are negative for c and positive for the categories which are siblings of c in the hierarchy. However, this policy has always been taken for granted and never been subjected to careful scrutiny since first proposed 15 years ago. This article proposes a thorough experimental comparison between this policy and three other policies for the selection of negative examples in HTC contexts, one of which (BESTLOCAL(k)) is being proposed for the first time in this article. We compare these policies on the hierarchical versions of three supervised learning algorithms (boosting, support vector machines, and naive Bayes) by performing experiments on two standard TC datasets, REUTERS-21578 and RCV1-v2.
This article describes and evaluates various information retrieval models used to search document collections written in English through submitting queries written in various other languages, either members of the Indo-European family (English, French, German, and Spanish) or radically different language groups such as Chinese. This evaluation method involves searching a rather large number of topics (around 300) and using two commercial machine translation systems to translate across the language barriers. In this study, mean average precision is used to measure variances in retrieval effectiveness when a query language differs from the document language. Although performance differences are rather large for certain languages pairs, this does not mean that bilingual search methods are not commercially viable. Causes of the difficulties incurred when searching or during translation are analyzed and the results of concrete examples are explained.
In this article, we propose to apply the topic model and topic-level eigenfactor (TEF) algorithm to assess the relative importance of academic entities including articles, authors, journals, and conferences. Scientific impact is measured by the biased PageRank score toward topics created by the latent topic model. The TEF metric considers the impact of an academic entity in multiple granular views as well as in a global view. Experiments on a computational linguistics corpus show that the method is a useful and promising measure to assess scientific impact.
With the rapid development of Web 2.0, online reviews have become extremely valuable sources for mining customers' opinions. Fine-grained opinion mining has attracted more and more attention of both applied and theoretical research. In this article, the authors study how to automatically mine product features and opinions from multiple review sources. Specifically, they propose an integration strategy to solve the issue. Within the integration strategy, the authors mine domain knowledge from semistructured reviews and then exploit the domain knowledge to assist product feature extraction and sentiment orientation identification from unstructured reviews. Finally, feature-opinion tuples are generated. Experimental results on real-world datasets show that the proposed approach is effective.
This study investigated a relatively unexamined query type, queries composed of URLs. The extent, variation, and user click-through behavior was examined to determine the intent behind URL queries. The study made use of a search log from which URL queries were identified and selected for both qualitative and quantitative analyses. It was found that URL queries accounted for similar to 17% of the sample. There were statistically significant differences between URL queries and non-URL queries in the following attributes: mean query length; mean number of tokens per query; and mean number of clicks per query. Users issuing such queries clicked on fewer result list items higher up the ranking compared to non-URL queries. Classification indicated that nearly 86% of queries were navigational in intent with informational and transactional queries representing about 7% of URL queries each. This is in contrast to past research that suggested that URL queries were 100% navigational. The conclusions of this study are that URL queries are relatively common and that simply returning the page that matches a user's URL is not an optimal strategy.
The uptake of social network sites (SNSs) has been highly trend-driven, with Friendster, MySpace, and Facebook being successively the most popular. Given that teens are often early adopters of communication technologies, it seems reasonable to assume that the typical user of any particular SNS would change over time, probably becoming older and covering different segments of the population. This article analyzes changes in MySpace self-reported member demographics and behavior from 2007 to 2010 using four large samples of members and focusing on the United States. The results indicate that despite its take-up rate declining, with only about 1 in 10 members being active a year after joining, the dominant (modal) age for active U.S. members remains midadolescence, but has shifted by about 2 years from 15 to 17, and the U.S. dominance of MySpace is shrinking. There also has been a dramatic increase in the median number of Friends for new U.S. members, from 12 to 96 probably due to MySpace's automated Friend Finder. Some factors show little change, however, including the female majority, the 5% minority gay membership, and the approximately 50% private profiles. In addition, there has been an increase in the proportion of Latino/Hispanic U.S. members, suggesting a shifting ethnic profile. Overall, MySpace has surprisingly stable membership demographics and is apparently maintaining its primary youth appeal, perhaps because of its music orientation.
The decision of the U.S. Supreme Court in 1991 in Feist Publications, Inc. v. Rural Tel. Service Co. affirmed originality as a constitutional requirement for copyright. Originality has a specific sense and is constituted by a minimal degree of creativity and independent creation. The not original is the more developed concept within the decision. It includes the absence of a minimal degree of creativity as a major constituent. Different levels of absence of creativity also are distinguished, from the extreme absence of creativity to insufficient creativity. There is a gestalt effect of analogy between the delineation of the not original and the concept of computability. More specific correlations can be found within the extreme absence of creativity. "[S]o mechanical" in the decision can be correlated with an automatic mechanical procedure and clauses with a historical resonance with understandings of computability as what would naturally be regarded as computable. The routine within the extreme absence of creativity can be regarded as the product of a computational process. The concern of this article is with rigorously establishing an understanding of the extreme absence of creativity, primarily through the correlations with aspects of computability. The understanding established is consistent with the other elements of the not original. It also revealed as testable under real-world conditions. The possibilities for understanding insufficient creativity, a minimal degree of creativity, and originality, from the understanding developed of the extreme absence of creativity, are indicated.
The preface to a 16th-century Hebrew book entitled Devek Toy, a supercommentary on the Pentateuch, includes an apology by the author for not citing all his sources. In his defense, he cites a passage in the Jerusalem Talmud that discusses the obliteration phenomenon. Following the trail of Jewish sayings on the importance of citation leads to a discussion of stealing ideas, i.e., plagiarism. Details of the search process, cataloging issues, incomplete indexes, and descriptions of complex locator systems found in Hebrew texts, concordances, and full-text databases are included. This detective work led to the discovery that Devek Toy was itself obliterated by incorporation into a later commentary on the Pentateuch.
Impact factors (and similar measures such as the Scimago Journal Rankings) suffer from two problems: (a) citation behavior varies among fields of science and, therefore, leads to systematic differences, and (b) there are no statistics to inform us whether differences are significant. The recently introduced "source normalized impact per paper" indicator of Scopus tries to remedy the first of these two problems, but a number of normalization decisions are involved, which makes it impossible to test for significance. Using fractional counting of citations based on the assumption that impact is proportionate to the number of references in the citing documents citations can be contextualized at the paper level and aggregated impacts of sets can be tested for their significance. It can be shown that the weighted impact of Annals of Mathematics (0.247) is not so much lower than that of Molecular Cell (0.386) despite a five-fold difference between their impact factors (2.793 and 13.156, respectively).
We continue investigation of the effect of position in announcements of newly received articles, a single day artifact, with citations received over the course of ensuing years. Earlier work focused on the "visibility" effect for positions near the beginnings of announcements, and on the "self-promotion" effect associated with authors intentionally aiming for these positions, with both found correlated to a later enhanced citation rate. Here we consider a "reverse-visibility" effect for positions near the ends of announcements, and on a "procrastination" effect associated with submissions made within the 20 minute period just before the daily deadline. For two large subcommunities of theoretical high-energy physics, we find a clear "reverse-visibility" effect, in which articles near the ends of the lists receive a boost in both short-term readership and long-term citations, almost comparable in size to the "visibility" effect documented earlier. For one of those subcommunities, we find an additional "procrastination" effect, in which last position articles submitted shortly before the deadline have an even higher citation rate than those that land more accidentally in that position. We consider and eliminate geographic effects as responsible for the above, and speculate on other possible causes, including "oblivious" and "nightowl" effects.
In the past several years studies have started to appear comparing the accuracies of various science mapping approaches. These studies primarily compare the cluster solutions resulting from different similarity approaches, and give varying results. In this study we compare the accuracies of cluster solutions of a large corpus of 2,153,769 recent articles from the biomedical literature (2004-2008) using four similarity approaches: co-citation analysis, bibliographic coupling, direct citation, and a bibliographic coupling-based citation-text hybrid approach. Each of the four approaches can be considered a way to represent the research front in biomedicine, and each is able to successfully cluster over 92% of the corpus. Accuracies are compared using two metrics-within-cluster textual coherence as defined by the Jensen-Shannon divergence, and a concentration measure based on the grant-to-article linkages indexed in MEDLINE. Of the three pure citation-based approaches, bibliographic coupling slightly outperforms co-citation analysis using both accuracy measures; direct citation is the least accurate mapping approach by far. The hybrid approach improves upon the bibliographic coupling results in all respects. We consider the results of this study to be robust given the very large size of the corpus, and the specificity of the accuracy measures used.
VOS is a new mapping technique that can serve as an alternative to the well-known technique of multidimensional scaling (MDS). We present an extensive comparison between the use of MDS and the use of VOS for constructing bibliometric maps. In our theoretical analysis, we show the mathematical relation between the two techniques. In our empirical analysis, we use the techniques for constructing maps of authors, journals, and keywords. Two commonly used approaches to bibliometric mapping, both based on MDS, turn out to produce maps that suffer from artifacts. Maps constructed using VOS turn out not to have this problem. We conclude that in general maps constructed using VOS provide a more satisfactory representation of a dataset than maps constructed using well-known MDS approaches.
We suggest partial logarithmic binning as the method of choice for uncovering the nature of many distributions encountered in information science (IS). Logarithmic binning retrieves information and trends "not visible" in noisy power law tails. We also argue that obtaining the exponent from logarithmically binned data using a simple least square method is in some cases warranted in addition to methods such as the maximum likelihood. We also show why often-used cumulative distributions can make it difficult to distinguish noise from genuine features and to obtain an accurate power law exponent of the underlying distribution. The treatment is nontechnical, aimed at IS researchers with little or no background in mathematics.
Folder navigation is the main way that personal computer users retrieve their own files. People dedicate considerable time to creating systematic structures to facilitate such retrieval. Despite the prevalence of both manual organization and navigation, there is very little systematic data about how people actually carry out navigation, or about the relation between organization structure and retrieval parameters. The aims of our research were therefore to study users' folder structure, personal file navigation, and the relations between them. We asked 296 participants to retrieve 1,131 of their active files and analyzed each of the 5,035 navigation steps in these retrievals. Folder structures were found to be shallow (files were retrieved from mean depth of 2.86 folders), with small folders (a mean of 11.82 files per folder) containing many subfolders (M = 10.64). Navigation was largely successful and efficient with participants successfully accessing 94% of their files and taking 14.76 seconds to do this on average. Retrieval time and success depended on folder size and depth. We therefore found the users' decision to avoid both deep structure and large folders to be adaptive. Finally, we used a predictive model to formulate the effect of folder depth and folder size on retrieval time, and suggested an optimization point in this trade-off.
This article describes the research stimulated by a fundamental shift that is occurring in the manufacture and marketing of aero engines for commercial and defense purposes, away from the selling of products to the provision of services. This research was undertaken in an aerospace company, which designs and manufactures aero engines and also offers contracts, under which it remains responsible for the maintenance of engines. These contracts allow the company to collect far more data about the in-service performance of their engines than was previously available. This article aims at identifying what parts of this in-service information are required when components or systems of existing engines need to be redesigned because they have not performed as expected in service. In addition, this article aims at understanding how designers use this in-service information in a redesign task. In an attempt to address these aims, we analyzed five case studies involving redesign of components or systems of an existing engine. The findings show that the in-service information accessed by the designers mainly contains the undesired physical actions (e. g., deterioration mechanisms, deterioration effects, etc.) and the causal chains of these undesired physical actions. We identified a pattern in the designers' actions regarding the use of these causal chains. The designers have generated several solutions that utilize these causal chains seen in the in-service information. The findings provide a sound basis for developing tools and methods to support designers in effectively satisfying their in-service information requirements in a redesign task.
Although technology can often correct spelling errors, the complex tasks of information searching and retrieval in an online public access catalog (OPAC) are made more difficult by these errors in users' input and bibliographic records. This study examines the search behaviors of 38 university students, divided into groups with either easy-to-spell or difficult-to-spell search terms, who were asked to find items in the OPAC with these search terms. Search behaviors and strategy use in the OPAC and on the World Wide Web (WWW) were examined. In general, students used familiar Web resources to check their spelling or discover more about the assigned topic. Students with difficult-to-spell search terms checked spelling more often, changed search strategies to look for the general topic and had fewer successful searches. Students unable to find the correct spelling of a search term were unable to complete their search. Students tended to search the OPAC as they would search a search engine, with few search terms or complex search strategies. The results of this study have implications for spell checking, user-focused OPAC design, and cataloging. Students' search behaviors are discussed by expanding Thatcher's (2006) Information-Seeking Process and Tactics for the WWW model to include OPACs.
To enable and guide effective metadata creation it is essential to understand the structure and patterns of the activities of the community around the photographs, resources used, and scale and quality of the socially created metadata relative to the metadata and knowledge already encoded in existing knowledge organization systems. This article presents an analysis of Flickr member discussions around the photographs of the Library of Congress photostream in Flickr. The article also reports on an analysis of the intrinsic and relational quality of the photostream tags relative to two knowledge organization systems: the Thesaurus for Graphic Materials (TGM) and the Library of Congress Subject Headings (LCSH). Thirty seven percent of the original tag set and 15.3% of the preprocessed set (after the removal of tags with fewer than three characters and URLs) were invalid or misspelled terms. Nouns, named entity terms, and complex terms constituted approximately 77% of the preprocessed set. More than a half of the photostream tags were not found in the TGM and LCSH, and more than a quarter of those terms were regular nouns and noun phrases. This suggests that these terms could be complimentary to more traditional methods of indexing using controlled vocabularies.
User-centered analysis can benefit the development of interactive video digital libraries. Findings from this study support the idea that having additional understanding about the intended users of video digital libraries can help researchers match system designs with the envisioned use of prototype systems. This study examines one user-centered factor specifically, familiarity with visual search topics, to explore if and how this may be associated with other factors within an interactive video retrieval context. Twenty-eight users from the field of science education were recruited to complete six visual search topics using a prototype system to retrieve video clips from a collection of NASA Science Education Programs. Analysis revealed that topic familiarity was associated with other factors that were examined throughout this study, including user-assessed and experimenter-assessed topic completion ratios, opinions of the prototype system, and interaction behaviors. Such results can have a variety of implications for developing video digital libraries, especially those designed to support queries and interactions of knowledgeable users from a defined domain.
In this work, we investigate the problem of using the block structure of Web pages to improve ranking results. Starting with basic intuitions provided by the concepts of term frequency (TF) and inverse document frequency (IDF), we propose nine block-weight functions to distinguish the impact of term occurrences inside page blocks, instead of inside whole pages. These are then used to compute a modified BM25 ranking function. Using four distinct Web collections, we ran extensive experiments to compare our block-weight ranking formulas with two other baselines: (a) a BM25 ranking applied to full pages, and (b) a BM25 ranking that takes into account best blocks. Our methods suggest that our block-weighting ranking method is superior to all baselines across all collections we used and that average gain in precision figures from 5 to 20% are generated.
We studied the effectiveness of a new class of context-dependent term weights for information retrieval. Unlike the traditional term frequency-inverse document frequency (TF-IDF), the new weighting of a term t in a document d depends not only on the occurrence statistics of t alone but also on the terms found within a text window (or "document-context") centered on t. We introduce a Boost and Discount (B&D) procedure which utilizes partial relevance information to compute the context-dependent term weights of query terms according to a logistic regression model. We investigate the effectiveness of the new term weights compared with the context-independent BM25 weights in the setting of relevance feedback. We performed experiments with title queries of the TREC-6, -7, -8, and 2005 collections, comparing the residual Mean Average Precision (MAP) measures obtained using B&D term weights and those obtained by a baseline using BM25 weights. Given either 10 or 20 relevance judgments of the top retrieved documents, using the new term weights yields improvement over the baseline for all collections tested. The MAP obtained with the new weights has relative improvement over the baseline by 3.3 to 15.2%, with statistical significance at the 95% confidence level across all four collections.
The explosion of disaster health information results in information overload among response professionals. The objective of this project was to determine the feasibility of applying semantic natural language processing (NLP) technology to addressing this overload. The project characterizes concepts and relationships commonly used in disaster health-related documents on influenza pandemics, as the basis for adapting an existing semantic summarizer to the domain. Methods include human review and semantic NLP analysis of a set of relevant documents. This is followed by a pilot test in which two information specialists use the adapted application for a realistic information-seeking task. According to the results, the ontology of influenza epidemics management can be described via a manageable number of semantic relationships that involve concepts from a limited number of semantic types. Test users demonstrate several ways to engage with the application to obtain useful information. This suggests that existing semantic NLP algorithms can be adapted to support information summarization and visualization in influenza epidemics and other disaster health areas. However, additional research is needed in the areas of terminology development (as many relevant relationships and terms are not part of existing standardized vocabularies), NLP, and user interface design.
A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behavior to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially oriented, designed to identify opinions about products rather than user behaviors. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimized by machine learning, SentiStrength is able to predict positive emotion with 60.6% accuracy and negative emotion with 72.8% accuracy, both based upon strength scales of 1-5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches.
We investigated agency satisfaction with an electronic record management system (ERMS) that supports the electronic creation, archival, processing, transmittal, and sharing of records (documents) among autonomous government agencies. A factor model, explaining agency satisfaction with ERMS functionalities, offers hypotheses, which we tested empirically with a large-scale survey that involved more than 1,600 government agencies in Taiwan. The data showed a good fit to our model and supported all the hypotheses. Overall, agency satisfaction with ERMS functionalities appears jointly determined by regulatory compliance, job relevance, and satisfaction with support services. Among the determinants we studied, agency satisfaction with support services seems the strongest predictor of agency satisfaction with ERMS functionalities. Regulatory compliance also has important influences on agency satisfaction with ERMS, through its influence on job relevance and satisfaction with support services. Further analyses showed that satisfaction with support services partially mediated the impact of regulatory compliance on satisfaction with ERMS functionalities, and job relevance partially mediated the influence of regulatory compliance on satisfaction with ERMS functionalities. Our findings have important implications for research and practice, which we also discuss.
The variables investment, benefit, and yield were defined to study the influence of journal self-citations on the impact factor. Investment represents the share of journal self-citations that contribute to the impact factor. Benefit is defined as the ratio of journal impact factor including self-citations to journal impact factor without self-citations. Yield is the relationship between benefit and investment. I selected all journals included in 2008 in the Science Citation Index version of Journal Citation Reports. After deleting 482 records for reasons to be explained, I used a final set of 6,138 journals to study the distribution of the variables defined above. The distribution of benefit differed from the distribution of investment and yield. The top 20-ranked journals were not the same for all three variables. The yield of self-citations on the journal impact factor was, in general, very modest.
Naranan's important theorem, published in Nature in 1970, states that if the number of journals grows exponentially and if the number of articles in each journal grows exponentially (at the same rate for each journal), then the system satisfies Lotka's law and a formula for the Lotka's exponent is given in function of the growth rates of the journals and the articles. This brief communication re-proves this result by showing that the system satisfies Zipf's law, which is equivalent with Lotka's law. The proof is short and algebraic and does not use infinitesimal arguments.
Large-scale scientific projects have become a major impetus of scientific advances. But few studies have specifically analyzed how those projects bolster scientific research. We address this question from a scientometrics perspective. By analyzing the bibliographic records of papers relevant to the Sloan Digital Sky Survey (SDSS), we found that the SDSS helped scientists from many countries further develop their own research; investigators initially formed large research groups to tackle key problems, while later papers involved fewer authors; and the number of research topics increased but the diversity of topics remains stable. Furthermore, the entropy analysis method has proven valuable in terms of analyzing patterns of research topics at a macroscopic level.
In order to monitor articles/patents in nanotechnology, there is little agreement on a universal lexical query or even an explicit definition of nanotechnology. Here in the light of a proposed definition, a set of case studies has been conducted to remove keywords which are not exclusive to nanotechnology. This resulted in a collective and abridged lexical query (CALQ) for nanotechnology delineation. Through bibliometric quantification of already-proposed as well as the novel keywords, it was shown that all keywords included in CALQ have considerable exclusive retrieval and precision, while the removed keywords do not satisfy either of these numerical thresholds. This approach may also be applied for the future updating of CALQ.
The aim of this study was to investigate the existence of a "gender gap" in the authorship of the four most important peer-reviewed psychiatric journals in Brazil and to quantify its magnitude. In addition, we examined the patterns of change in this gap during the period extending from 2001 to 2008 and variations according to the total number of authors, the type of article (original vs. non-original studies), and the journals themselves. A total of 1,036 articles were analyzed. We found that the proportion of female overall participation has increased from 2001 to 2008. Nevertheless, the incremental rate was accounted mostly by the growth of the participation in non-original articles. While the average annual increment for original articles was virtually null (.01%), for the non-original articles the corresponding figure was 3.7%. We also found that the chance of a woman being first author was about three times greater in original papers as compared to non-original ones at the beginning of the study period; this differential declined by 11% per year during this period. A different pattern emerged from the analysis of female last authorship. Year of publication and type of study were still associated with the chance of a woman being the last author but without interaction. Further, the journals themselves were found to be related with female last authorship: the chance of a woman being the last author in an article published in the Revista Brasileira de Psiquiatria was significantly smaller than in the other three journals. Our findings indicate clearly that some progress in being achieved in eliminating the gender gap also in field of Psychiatry and highlight the need for further research in this area.
Data on patent families is used in economic and statistical studies for many purposes, including the analysis of patenting strategies of applicants, the monitoring of the globalization of inventions and the comparison of the inventive performance and stock of technological knowledge of different countries. Most of these studies take family data as given, as a sort of black box, without going into the details of their underlying methodologies and patent linkages. However, different definitions of patent families may lead to different results. One of the purposes of this paper is to compare the most commonly used definitions of patent families and identify factors causing differences in family outcomes. Another objective is to shed light into the internal structure of patent families and see how it affects patent family outcomes based on different definitions. An automated characterization of the internal structures of all extended families with earliest priorities in the 1990s, as recorded in PATSTAT, found that family counts are not affected by the choice of patent family definitions in 75% of families. However, different definitions may really matter for the 25% of families with complex structures and lead to different family compositions, which might have an impact, for instance, on econometric studies using family size as a proxy of patent value.
This paper is set out to examine the temporal pattern of innovative activities: what might have affected a firm's patenting from one period to the next. Based upon data on 'information technology' (IT) manufacturing firms in Taiwan covering the years 1990-2001, we develop a survival model to analyze the underlying drivers of patenting duration. Our results indicate that the level of the patent stock at the onset of the patent spell, defined as the number of successive years during which a firm produced at least one patent per year, has a non-linear effect on spell duration. Other factors, such as industrial growth, firm size and firm profitability, have a positive effect on patenting duration, while firm age and spell sequence affect negatively to spell duration. We conclude that state dependence is demonstrated by innovative behavior, yet the advantages gained from such creative accumulation can easily be dissipated, thereby illustrating the transient nature of dynamic capabilities.
We take a new look at the Shanghai Jiao Tong Academic Ranking of World Universities to evaluate the performance of whole university systems. We deal with system aggregates by means of averaging scores taken over a number of institutions from each higher education system according to the Gross Domestic Product of its country. We treat the set of indicators (measures) at the country level as a scale, and investigate its reliability and dimensionality using appropriate statistical tools. After a Principal Component Analysis is performed, a clear picture emerges: at the aggregate level ARWU seems to be a very reliable one-dimensional scale, with a first component that explains more than 72% of the variance of the sample under analysis. The percentages of variance of the indicators explained by the first component do shed light on the fact that ARWU is in fact measuring the research quality (both at the individual and collective levels) of a university system. When the second principal component is taken into account, the two principal components contribute to explain more than 90% of the variance. The rotated solution facilitates the interpretation of the components and provides clear and interesting clustering information about the 32 higher education systems under analysis.
Taking the interactive open access journal Atmospheric Chemistry and Physics as an example, this study examines whether Thomson Reuters, for the Journal Citation Reports, correctly calculates the Journal Impact Factor (JIF) of a journal that publishes several versions of a manuscript within a two-stage publication process. The results of this study show that the JIF of the journal is not overestimated through the two-stage publication process.
Bibliometric mapping of scientific articles based on keywords and technical terms in abstracts is now frequently used to chart scientific fields. In contrast, no significant mapping has been applied to the full texts of non-specialist documents. Editorials in Nature and Science are such non-specialist documents, reflecting the views of the two most read scientific journals on science, technology and policy issues. We use the VOSviewer mapping software to chart the topics of these editorials. A term map and a document map are constructed and clusters are distinguished in both of them. The validity of the document clustering is verified by a manual analysis of a sample of the editorials. This analysis confirms the homogeneity of the clusters obtained by mapping and augments the latter with further detail. As a result, the analysis provides reliable information on the distribution of the editorials over topics, and on differences between the journals. The most striking difference is that Nature devotes more attention to internal science policy issues and Science more to the political influence of scientists.
This paper presents and discusses a new bibliometric indicator of research performance, designed with the fundamental concern of enabling cross-disciplinary comparisons. The indicator, called x-index, compares a researcher's output to a reference set of research output from top researchers, identified in the journals where the researcher has published. It reflects publication quantity and quality, uses a moderately sized data set, and works with a more refined definition of scientific fields. x-index was developed to rank researchers in a scientific excellence award in the Faculty of Engineering of the University of Porto. The data set collected for the 2009 edition of the award is used to study the indicator's features and design choices, and provides the basis for a discussion of its advantages and limitations.
Scientometric indicators or science metrics, conventional and derived ones, are used in ex-post evaluating of a government policy with impact on research system. Publications, citations, h-index, Glanzel model, and patents are applied in both micro and meso levels. This provides useful insight into the impact of the voluntary early retirement policy on research and technological outputs of the faculties of science in Morocco and consequently on the overall Morocco's research system. The use of these metrics showed that the effect of the initiative was quite limited by affecting an average of 8% of the professor staffs of these institutions. Furthermore, each professor benefiting from this initiative had produced an average of 3.7 publications indexed in SCI in all his (her) career. The few number of the publications attributed to these professors had been gradually decreasing even 6 years before the initiative. No specific scientific field had intensively been struck. The findings also support that these professors were in general more 'author' than 'inventor'. Inventor-professor institutions were likely more affected by the initiative. By means of these metrics, even if the initiative had not contributed to rejuvenate the professor-staffs of the faculties of science in Morocco, would nevertheless be a stimulus of their research system with respect to their scientometric indicators.
The paper has the general aim of assessing the worldwide research activity in agricultural and food science and technology as it is reflected by the mainstream journal literature. The specific research questions were as follows: (1) What is the position of the European Research Area (ERA) represented by 33 countries in this study, on the world map of agrifood science publications? (2) Which countries are influential and what is their position? (3) Are there any specific European strengths and weaknesses by subfields of agrifood science? Overall, assessed by the total number of publications, the European Research Area (ERA), represented by 33 countries in this study, is in a dominant position on the world map of agrifood science. However, agrifood publications from the United States are more influential (judged by the average citation rates per paper). Correlation has been found between economic power and agrifood science publications: this is true not only for the total number of papers, but also for influence (measured by, again, the citation rates). Within Europe, the UK, Germany, France, Spain and the Netherlands dominate the agrifood research fields also in terms of citations. The Scandinavian countries, the Benelux states and Switzerland manage to produce influential papers across several fields of agrifood science. The EU's New Member States-a populous area-together have less than 10% share in Europe's agrifood publications and in citations they account for a 3-4% portion only. It seems that deepening of the integration of the national research systems in the European Research Area is desirable to have more impact of European agrifood research viewed from a global perspective.
This paper presents a proposal of CERIF data model extension for evaluation of scientific research results. The data model extension is based on the CERIF semantic layer which enables classification of entities and relations between entities according to some classification scheme. The proposed data model was created using PowerDesigner CASE tool. The model is represented using a physical data model in the conceptual notation that is adopted in literature for representing the CERIF data model. This model is verified using the rule book for evaluation and quantitative expression of scientific research results of researchers employed at University of Novi Sad.
Since bibliometric indicators have obtained a general acceptance in science policy and attained applied relevance in research evaluation, feedback effects on scientists' behaviour resulting from the use of these indicators for science funding decisions have been reported. These adaptation strategies could be called mimicry in science. Scientists apply strategies that should enable them to comply to bibliometric accountability and to secure funds to their own research.
This paper presents a bibliometric analysis of the literature published in the field of mathematics from 1868 to date. The data originate from the Zentralblatt MATH database. The increase rate of publications per year reflects the growth of the mathematics community and both can well be represented by exponential or linear functions, the latter especially after the Second World War. The distribution of publications follows Bradford's law but in contrast to many other disciplines there is no strong domination of a small number of journals. The productivity of authors follows two inverse power laws of the Lotka form with different parameters, one in the range of low productivity and the other in the range of high productivity. The average productivity has changed only slightly since the year 1870. As far as multiple authorship is concerned the distribution of the number of authors per publication can be described quite well by a Gamma Distribution. The average number of authors per publication has been increasing steadily; while it was close to 1 up to the first quarter of the last century it has now reached a value of 2 in the last few years. This means that the percentage of single-authored papers has fallen from over 95% in the years before 1930 to about 30% today.
Text mining was used to extract technical intelligence from the open source global SARS research literature. A SARS-focused query was applied to the Science Citation Index (SCI) (SCI 2008) database for the period 1998-early 2008. The SARS research literature infrastructure (prolific authors, key journals/institutions/countries, most cited authors/journals/documents) was obtained using bibliometrics, and the SARS research literature technical structure (hierarchical taxonomy) was obtained using computational linguistics/document clustering.
In the production of scientific knowledge, as revealed by publication output, South Africa is at the forefront of many other countries in the developing world and in the African continent. This study examines for the first time the publication trends of South African engineering researchers for a period of 30 years since 1975. Drawing data from the ISI Web of Knowledge, this paper specifically looks at the publication patterns of engineering researchers in South Africa.
To assess the publication practices of editors in their own journals, we analysed the number of articles that Croatian editors published in the journals they edit. From 2005 to 2008, 256 decision-making editors of 180 journals published a total of 887 publications in their own journals. Out of these, 332 were relevant for their academic promotion. Only 18 editors published 5 or more articles in their own journals. A single journal had regulations for self-publishing in the instructions for authors. Although the majority of editors did not misuse their own journals for scientific publishing and academic promotion, there is a need for greater transparency of the declaration and management of editorial conflict of interest in academic and scholarly journals.
We describe two general approaches to creating document-level maps of science. To create a local map, one defines and directly maps a sample of data, such as all literature published in a set of information science journals. To create a global map of a research field, one maps" all of science" and then locates a literature sample within that full context. We provide a deductive argument that global mapping should create more accurate partitions of a research field than does local mapping, followed by practical reasons why this may not be so. The field of information science is then mapped at the document level using both local and global methods to provide a case illustration of the differences between the methods. Textual coherence is used to assess the accuracies of both maps. We find that document clusters in the global map have significantly higher coherence than do those in the local map, and that the global map provides unique insights into the field of information science that cannot be discerned from the local map. Specifically, we show that information science and computer science have a large interface and that computer science is the more progressive discipline at that interface. We also show that research communities in temporally linked threads have a much higher coherence than do isolated communities, and that this feature can be used to predict which threads will persist into a subsequent year. Methods that could increase the accuracy of both local and global maps in the future also are discussed.
The mapping of scientific fields, based on principles established in the seventies, has recently shown a remarkable development and applications are now booming with progress in computing efficiency. We examine here the convergence of two thematic mapping approaches, citation-based and word-based, which rely on quite different sociological backgrounds. A corpus in the nanoscience field was broken down into research themes, using the same clustering technique on the 2 networks separately. The tool for comparison is the table of intersections of the M clusters (here M = 50) built on either side. A classical visual exploitation of such contingency tables is based on correspondence analysis. We investigate a rearrangement of the intersection table (block modeling), resulting in pseudo-map. The interest of this representation for confronting the two breakdowns is discussed. The amount of convergence found is, in our view, a strong argument in favor of the reliability of bibliometric mapping. However, the outcomes are not convergent at the degree where they can be substituted for each other. Differences highlight the complementarity between approaches based on different networks. In contrast with the strong informetric posture found in recent literature, where lexical and citation markers are considered as miscible tokens, the framework proposed here does not mix the two elements at an early stage, in compliance with their contrasted logic.
This article studies massive evidence about references made and citations received after a 5-year citation window by 3.7 million articles published in 1998 to 2002 in 22 scientific fields. We find that the distributions of references made and citations received share a number of basic features across sciences. Reference distributions are rather skewed to the right while citation distributions are even more highly skewed: The mean is about 20 percentage points to the right of the median, and articles with a remarkable or an outstanding number of citations represent about 9% of the total. Moreover, the existence of a power law representing the upper tail of citation distributions cannot be rejected in 17 fields whose articles represent 74.7% of the total. Contrary to the evidence in other contexts, the value of the scale parameter is above 3.5 in 13 of the 17 cases. Finally, power laws are typically small, but capture a considerable proportion of the total citations received.
Many countries are moving towards research policies that emphasize excellence; consequently; they develop evaluation systems to identify universities, research groups, and researchers that can be said to be "excellent." Such active research policy strategies, in which evaluations are used to concentrate resources, are based on an unsubstantiated assumption that researchers' track records are indicative of their future research performance. In this study, information on authors' track records (previous publication volume and previous citation rate) is used to predict the impact of their articles. The study concludes that, to a certain degree, the impact of scientific work can be predicted using information on how often an author's previous publications have been cited. The relationship between past performance and the citation rate of articles is strongest at the high end of the citation distribution. The implications of these results are discussed in the context of a cumulative advantage process.
In a comprehensive research project, we investigated the predictive validity of selection decisions and reviewers' ratings at the open access journal Atmospheric Chemistry and Physics (ACP). ACP is a high-impact journal publishing papers on the Earth's atmosphere and the underlying chemical and physical processes. Scientific journals have to deal with the following question concerning the predictive validity: Are in fact the "best" scientific works selected from the manuscripts submitted? In this study we examined whether selecting the "best" manuscripts means selecting papers that after publication show top citation performance as compared to other papers in this research area. First, we appraised the citation impact of later published manuscripts based on the percentile citedness rank classes of the population distribution (scaling in a specific subfield). Second, we analyzed the association between the decisions (n = 677 accepted or rejected, but published elsewhere manuscripts) or ratings (reviewers' ratings for n = 315 manuscripts), respectively, and the citation impact classes of the manuscripts. The results confirm the predictive validity of the ACP peer review system.
A recent study in information science (IS), Lykke and Eslau (2010; hereafter L&E), raises important issues concerning the value of human indexing and basic theories of indexing and information retrieval, as well as the use of quantitative and qualitative approaches in IS and the underlying theories of knowledge informing the field. The present article uses L&E as the point of departure for demonstrating in what way more social and interpretative understandings may provide fruitful improvements for research in indexing, knowledge organization, and information retrieval. The artcle is motivated by the observation that philosophical contributions tend to be ignored in IS if they are not directly formed as criticisms or invitations to dialogs. It is part of the author's ongoing publication of articles about philosophical issues in IS and it is intended to be followed by analyzes of other examples of contributions to core issues in IS. Although it is formulated as a criticism of a specific paper, it should be seen as part of a general discussion of the philosophical foundation of IS and as a support to the emerging social paradigm in this field.
The article explains why the concept of the user in Library and Information Science (LIS) user studies and information seeking behavior is theoretically inadequate and it proposes a reconceptualization of subjects, objects, and their relations according to a model of 'double mediation.' Formal causation (affordances) is suggested as a substitute for mechanistic causation. The notion of 'affective causation' is introduced. The works of several psychoanalysts and continental and Anglo-American philosophers are used as tools to develop the model.
The notion of information quality (IQ) has been investigated extensively in recent years. Much of this research has been aimed at conceptualizing IQ and its underlying dimensions (e.g., accuracy, completeness) and at developing instruments for measuring these quality dimensions. However, less attention has been given to the measurability of IQ. The objective of this study is to explore the extent to which a set of IQ dimensions-accuracy, completeness, objectivity, and representation-lend themselves to reliable measurement. By reliable measurement, we refer to the degree to which independent assessors are able to agree when rating objects on these various dimensions. Our study reveals that multiple assessors tend to agree more on certain dimensions (e.g., accuracy) while finding it more difficult to agree on others (e.g., completeness). We argue that differences in measurability stem from properties inherent to the quality dimension (i.e., the availability of heuristics that make the assessment more tangible) as well as on assessors' reliance on these cues. Implications for theory and practice are discussed.
The aim of this research is to advance both the theoretical conceptualization and the empirical validation of trustworthiness in mHealth (mobile health) information services research. Conceptually, it extends this line of research by reframing trustworthiness as a hierarchical, reflective construct, incorporating ability, benevolence, integrity, and predictability. Empirically, it confirms that partial least squares path modeling can be used to estimate the parameters of a hierarchical, reflective model with moderating and mediating effects in a nomological network. The model shows that trustworthiness is a second-order, reflective construct that has a significant direct and indirect impact on continuance intentions in the context of mHealth information services. It also confirms that consumer trust plays the key, mediating role between trustworthiness and continuance intentions, while trustworthiness does not have any moderating influence in the relationship between consumer trust and continuance intentions. Overall, the authors conclude by discussing conceptual contributions, methodological implications, limitations, and future research directions of the study.
This paper aims to review the fiercely discussed question of whether the ranking of Wikipedia articles in search engines is justified by the quality of the articles. After an overview of current research on information quality in Wikipedia, a summary of the extended discussion on the quality of encyclopedic entries in general is given. On this basis, a heuristic method for evaluating Wikipedia entries is developed and applied to Wikipedia articles that scored highly in a search engine retrieval effectiveness test and compared with the relevance judgment of jurors. In all search engines tested, Wikipedia results are unanimously judged better by the jurors than other results on the corresponding results position. Relevance judgments often roughly correspond with the results from the heuristic evaluation. Cases in which high relevance judgments are not in accordance with the comparatively low score from the heuristic evaluation are interpreted as an indicator of a high degree of trust in Wikipedia. One of the systemic shortcomings of Wikipedia lies in its necessarily incoherent user model. A further tuning of the suggested criteria catalog, for instance, the different weighing of the supplied criteria, could serve as a starting point for a user model differentiated evaluation of Wikipedia articles. Approved methods of quality evaluation of reference works are applied to Wikipedia articles and integrated with the question of search engine evaluation.
Previous studies have found that both (a) the characteristics (e.g., quality and accessibility) (e.g., Fidel & Green, 2004) and (b) the types of sources (e. g., relational and nonrelational sources) (e.g., Zimmer, Henry, & Butler, 2007) influence information source selection. Different from earlier studies that have prioritized one source attribute over the other, this research uses information need as a contingency factor to examine information seekers' simultaneous consideration of different attributes. An empirical test from 149 employees' evaluations of eight information sources revealed that (a) low-and high-information-need individuals favored information source quality over accessibility while medium-information-need individuals favored accessibility over quality; and (b) individuals are more likely to choose relational over nonrelational sources as information need increases.
Despite improvements in their capabilities, search engines still fail to provide users with only relevant results. One reason is that most search engines implement a "one size fits all" approach that ignores personal preferences when retrieving the results of a user's query. Recent studies (Smyth, 2010) have elaborated the importance of personalizing search results and have proposed integrating recommender system methods for enhancing results using contextual and extrinsic information that might indicate the user's actual needs. In this article, we review recommender system methods used for personalizing and improving search results and examine the effect of two such methods that are merged for this purpose. One method is based on collaborative users' knowledge; the second integrates information from the user's social network. We propose new methods for collaborative-and social-based search and demonstrate that each of these methods, when separately applied, produce more accurate search results than does a purely keyword-based search engine (referred to as "standard search engine"), where the social search engine is more accurate than is the collaborative one. However, separately applied, these methods do not produce a sufficient number of results (low coverage). Nevertheless, merging these methods with those implemented by standard search engines overcomes the low-coverage problem and produces personalized results for users that display significantly more accurate results while also providing sufficient coverage than do standard search engines. The improvement, however, is significant only for topics for which the diversity of terms used for queries among users is low.
Following the transition from print journals to electronic (hybrid) journals in the past decade, usage metrics have become an interesting complement to citation metrics. In this article we investigate the similarities of and differences between usage and citation indicators for pharmacy and pharmacology journals and relate the results to a previous study on oncology journals. For the comparison at journal level we use the classical citation indicators as defined in the Journal Citation Reports and compute the corresponding usage indicators. At the article level we not only relate download and citation counts to each other but also try to identify the possible effect of citations upon subsequent downloads. Usage data were provided by ScienceDirect both at the journal level and, for a few selected journals, on a paper-by-paper basis. The corresponding citation data were retrieved from the Web of Science and Journal Citation Reports. Our analyses show that electronic journals have become generally accepted over the last decade. While the supply of ScienceDirect pharma journals rose by 50% between 2001 and 2006, the total number of article downloads (full-text articles [FTAs]) multiplied more than 5-fold in the same period. This also impacted the pattern of scholarly communication (strong increase in the immediacy index) in the past few years. Our results further reveal a close relation between citation and download frequencies. We computed a high correlation at the journal level when using absolute values and a moderate to high correlation when relating usage and citation impact factors. At the article level the rank correlation between downloads and citations was only medium-sized. Differences between downloads and citations exist in terms of obsolescence characteristics. While more than half of the articles are downloaded in the publication year or 1 year later, the median cited half-life was nearly 6 years for our journal sample. Our attempt to reveal a direct influence of citations upon downloads proved not to be feasible.
The Preserving Virtual Worlds project has been investigating the preservation of computer games and interactive fiction. The preservation of games benefits from simultaneous application of the data models from the Functional Requirements for Bibliographic Records report, and the Open Archival Information System reference model. The article described efforts to integrate these two data models within a single Web ontology language for application with multiple XML-based packaging formats.
This work identifies changes in dominant topics in library and information science (LIS) over time, by analyzing the 3,121 doctoral dissertations completed between 1930 and 2009 at North American Library and Information Science programs. The authors utilize latent Dirichlet allocation (LDA) to identify latent topics diachronically and to identify representative dissertations of those topics. The findings indicate that the main topics in LIS have changed substantially from those in the initial period (1930-1969) to the present (2000-2009). However, some themes occurred in multiple periods, representing core areas of the field: library history occurred in the first two periods; citation analysis in the second and third periods; and information-seeking behavior in the fourth and last period. Two topics occurred in three of the five periods: information retrieval and information use. One of the notable changes in the topics was the diminishing use of the word library (and related terms). This has implications for the provision of doctoral education in LIS. This work is compared to other earlier analyses and provides validation for the use of LDA in topic analysis of a discipline.
Among existing theoretical models for the h-index, Hirsch's original approach, the Egghe-Rousseau model, and the Glanzel-Schubert model are the three main representatives. Assuming a power-law relation or Heaps' law between publications and citations a unified theoretical explanation for these three models is provided. It is shown that on the level of universities, the Glanzel-Schubert model fits best.
This paper studies how missing data in the PageRank algorithm influences the result of papers ranking and proposes PrestigeRank algorithm on that basis. We make use of PrestigeRank to give the ranking of all papers in physics in the Chinese Scientific and Technology Papers and Citation Database (CSTPCD) published between 2004 and 2006. We compared PrestigeRank result with PageRank and citation ranking. We found PrestigeRank is significantly correlated with PageRank and citation counts. We also used paper citation networks to rank journals, and compared the result with that of journal citation networks. We proposed PR(sum), PR(ave), and compared both of them with citation counts and impact factor. It indicates PR(sum), PR(ave) can reflect journal's authority favorably. We also discuss the advantages and disadvantages, application scope and application prospects of PrestigeRank in the evaluation of papers and journals. (C) 2010 Elsevier Ltd. All rights reserved.
A publication exerts direct and indirect influence on other articles (by citing articles and by articles that cite citing articles) and is itself influenced directly as well as indirectly (by references and references of references). This citation network leads to generations of citing and cited publications. In this contribution we show that these generations can be defined in different ways. We also propose methods to calculate indicators derived from these citation generations. We claim that, when studying a publication's contribution to the evolution of its field or to science in general, taking only direct citations into account, tells only part of the story. (C) 2010 Elsevier Ltd. All rights reserved.
The crown indicator is a well-known bibliometric indicator of research performance developed by our institute. The indicator aims to normalize citation counts for differences among fields. We critically examine the theoretical basis of the normalization mechanism applied in the crown indicator. We also make a comparison with an alternative normalization mechanism. The alternative mechanism turns out to have more satisfactory properties than the mechanism applied in the crown indicator. In particular, the alternative mechanism has a so-called consistency property. The mechanism applied in the crown indicator lacks this important property. As a consequence of our findings, we are currently moving towards a new crown indicator, which relies on the alternative normalization mechanism. (C) 2010 Elsevier Ltd. All rights reserved.
This paper introduces a novel methodology for comparing the citation distributions of research units of a certain size working in the same homogeneous field. Given a critical citation level (CCL), we suggest using two real valued indicators to describe the shape of any distribution: a high-impact and a low-impact measure defined over the set of articles with citations above or below the CCL. The key to this methodology is the identification of a citation distribution with an income distribution. Once this step is taken, it is easy to realize that the measurement of low-impact coincides with the measurement of economic poverty. In turn, it is equally natural to identify the measurement of high-impact with the measurement of a certain notion of economic affluence. On the other hand, it is seen that the ranking of citation distributions according to a family of low-impact measures is essentially characterized by a number of desirable axioms. Appropriately redefined, these same axioms lead to the selection of an equally convenient class of decomposable high-impact measures. These two families are shown to satisfy other interesting properties that make them potentially useful in empirical applications, including the comparison of research units working in different fields. (C) 2010 Elsevier Ltd. All rights reserved.
Evaluating the scientific output of researchers, research institutions, academic departments and even universities is a challenging issue. To do this, bibliometric indicators are helpful tools, more and more familiar to research and governmental institutions. This paper proposes a structured method to compare academic research groups within the same discipline, by means of some Hirsch (h) based bibliometric indicators. Precisely, five different typologies of indicators are used so as to depict groups' bibliometric positioning within the scientific community. A specific analysis concerning the Italian researchers in the scientific sector of Production Technology and Manufacturing Systems is developed. The analysis is supported by empirical data and can be extended to research groups associated to other scientific sectors. (C) 2010 Elsevier Ltd. All rights reserved.
This paper proposes an axiomatic analysis of Impact Factors when used as tools for ranking journals. This analysis draws on the similarities between the problem of comparing distribution of citations among papers and that of comparing probability distributions on consequences as commonly done in decision theory. Our analysis singles out a number of characteristic properties of the ranking based on Impact Factors. We also suggest alternative ways of using distributions of citations to rank order journals. (C) 2010 Elsevier Ltd. All rights reserved.
A citation-based indicator for interdisciplinarity has been missing hitherto among the set of available journal indicators. In this study, we investigate network indicators (betweenness centrality), unevenness indicators (Shannon entropy, the Gini coefficient), and more recently proposed Rao-Stirling measures for "interdisciplinarity." The latter index combines the statistics of both citation distributions of journals (vector-based) and distances in citation networks among journals (matrix-based). The effects of various normalizations are specified and measured using the matrix of 8207 journals contained in the Journal Citation Reports of the (Social) Science Citation Index 2008. Betweenness centrality in symmetrical (1-mode) cosine-normalized networks provides an indicator outperforming betweenness in the asymmetrical (2-mode) citation network. Among the vector-based indicators, Shannon entropy performs better than the Gini coefficient, but is sensitive to size. Science and Nature, for example, are indicated at the top of the list. The new diversity measure provides reasonable results when (1 - cosine) is assumed as a measure for the distance, but results using Euclidean distances were difficult to interpret. (C) 2010 Elsevier Ltd. All rights reserved.
In this paper we study the effects of field normalization baseline on relative performance of 20 natural science departments in terms of citation impact. Impact is studied under three baselines: journal, ISI/Thomson Reuters subject category, and Essential Science Indicators field. For the measurement of citation impact, the indicators item-oriented mean normalized citation rate and Top-5% are employed. The results, which we analyze with respect to stability, show that the choice of normalization baseline matters. We observe that normalization against publishing journal is particular. The rankings of the departments obtained when journal is used as baseline, irrespective of indicator, differ considerably from the rankings obtained when ISI/Thomson Reuters subject category or Essential Science Indicators field is used. Since no substantial differences are observed when the baselines Essential Science Indicators field and ISI/Thomson Reuters subject category are contrasted, one might suggest that people without access to subject category data can perform reasonable normalized citation impact studies by combining normalization against journal with normalization against Essential Science Indicators field. (C) 2010 Elsevier Ltd. All rights reserved.
We investigate how the benefits of the TeraGrid supercomputing infrastructure are distributed across the scientific community. Do mostly high-impact scientists benefit from the TeraGrid? Are some scientific domains more strongly represented than others in TeraGrid-supported work? To answer these questions, we examine the relation between TeraGrid usage and scientific impact for a set of scientists whose projects relied to varying degrees on the TeraGrid infrastructure. For each scientist we measure TeraGrid usage expressed in terms of allocated Service Units (SU) vs. various indicators of their scientific impact such as the h-index, total citations, and citations per article. Our results show a significant correlation between scientific impact and TeraGrid usage. We furthermore examine the distribution of TeraGrid-related publications across various scientific journals. A superposition of these journals over an existing large-scale map of science shows how TeraGrid-supported work is mostly concentrated in Physics and Chemistry, with a lesser focus on biology. (C) 2010 Elsevier Ltd. All rights reserved.
This paper contains the first empirical applications of a novel methodology for comparing the citation distributions of research units working in the same homogeneous field. The paper considers a situation in which the world citation distribution in 22 scientific fields is partitioned into three geographical areas: the U. S., the European Union (EU), and the rest of the world (RW). Given a critical citation level (CCL), we suggest using two real valued indicators to describe the shape of each area's distribution: a high-and a low-impact measure defined over the set of articles with citations below or above the CCL. It is found that, when the CCL is fixed at the 80th percentile of the world citation distribution, the U. S. performs dramatically better than the EU and the RW according to both indicators in all scientific fields. This superiority generally increases as we move from the incidence to the intensity and the citation inequality aspects of the phenomena in question. Surprisingly, changes observed when the CCL is increased from the 80th to the 95th percentile are of a relatively small order of magnitude. Finally, it is found that international co-authorship increases the high-impact and reduces the low-impact level in the three geographical areas. This is especially the case for the EU and the RW when they cooperate with the U.S. (C) 2010 Elsevier Ltd. All rights reserved.
This paper presents an approach to analyze the thematic evolution of a given research field. This approach combines performance analysis and science mapping for detecting and visualizing conceptual subdomains (particular themes or general thematic areas). It allows us to quantify and visualize the thematic evolution of a given research field. To do this, co-word analysis is used in a longitudinal framework in order to detect the different themes treated by the research field across the given time period. The performance analysis uses different bibliometric measures, including the h-index, with the purpose of measuring the impact of both the detected themes and thematic areas. The presented approach includes a visualization method for showing the thematic evolution of the studied field. Then, as an example, the thematic evolution of the Fuzzy Sets Theory field is analyzed using the two most important journals in the topic: Fuzzy Sets and Systems and IEEE Transactions on Fuzzy Systems. (C) 2010 Elsevier Ltd. All rights reserved.
Peer review serves a gatekeeper role, the final arbiter of what is valued in academia, but is widely criticized in terms of potential biases-particularly in relation to gender. In this substantive-methodological synergy, we demonstrate methodological and multilevel statistical approaches to testing a null hypothesis model in relation to the effect of researcher gender on peer reviews of grant proposals, based on 10,023 reviews by 6233 external assessors of 2331 proposals from social science, humanities, and science disciplines. Utilizing multilevel cross-classified models, we show that support for the null hypothesis model positing researcher gender has no significant effect on proposal outcomes. Furthermore, these non-effects of gender generalize over assessor gender (contrary to a matching hypothesis), discipline, assessors chosen by the researchers themselves compared to those chosen by the funding agency, and country of the assessor. Given the large, diverse sample, the powerful statistical analyses, and support for generalizability, these results - coupled with findings from previous research - offer strong support for the null hypothesis model of no gender differences in peer reviews of grant proposals. (C) 2010 Elsevier Ltd. All rights reserved.
In this paper, we define a First-Citation-Speed-Index (FCSI) for a set of papers, based on their times of publication and of first citation. The index is based on the definition of a h-index for increasing sequences. We show that the index has several good properties in the sense that the shorter the times are between publication and first citation (in a global manner) the higher the FCSI is. We present two case studies: a first-citation speed comparison of three journals in the field of psychology and a first-citation speed comparison of accepted and rejected, but published elsewhere manuscripts by the journal Angewandte Chemie International Edition. Both case studies indicate that our FCSI satisfies the intuitive feeling of what values a FCSI should have in these cases. (C) 2010 Elsevier Ltd. All rights reserved.
Scientific collaboration and endorsement are well-established research topics which utilize three kinds of methods: survey/questionnaire, bibliometrics, and complex network analysis. This paper combines topic modeling and path-finding algorithms to determine whether productive authors tend to collaborate with or cite researchers with the same or different interests, and whether highly cited authors tend to collaborate with or cite each other. Taking information retrieval as a test field, the results show that productive authors tend to directly coauthor with and closely cite colleagues sharing the same research interests; they do not generally collaborate directly with colleagues having different research topics, but instead directly or indirectly cite them; and highly cited authors do not generally coauthor with each other, but closely cite each other. Published by Elsevier Ltd.
The practice of collaboration, and particularly international collaboration, is becoming ever more widespread in scientific research, and is likewise receiving greater interest and stimulus from policy-makers. However, the relation between research performance and degree of internationalization at the level of single researchers still presents unresolved questions. The present work, through a bibliometric analysis of the entire Italian university population working in the hard sciences over the period 2001-2005, seeks to answer some of these questions. The results show that the researchers with top performance with respect to their national colleagues are also those who collaborate more abroad, but that the reverse is not always true. Also, interesting differences emerge at the sectorial level. Finally, the effect of the nation involved in the international partnership plays a role that should not be ignored. (C) 2010 Elsevier Ltd. All rights reserved.
Accurate computation of h indices or other indicators of research impact requires access to databases supplying complete and accurate citation information. The Web of Science (WoS) database is widely used for this purpose and it is generally deemed error-free. This note describes an inaccuracy that seems to affect differentially non-English sources and targets in WoS, namely, "phantom citations" (i.e., papers reported by WoS to cite some article when they actually did not) and their concentration around particular articles that are thus dubbed "strange attractors". The analysis of references in (and citations to) papers in two English sources and two non-English sources reveals that phantom citations and other errors of indexing occur about twice as often with non-English items. These and other errors of commission affect about 1% of the cited references in the WoS database, and they may reveal large-scale problems in the reference matching algorithm in WoS. (C) 2010 Elsevier Ltd. All rights reserved.
The purpose of this study is to test the existence of the exposure effect in journal ranking decisions. The exposure effect emerges when participants of journal ranking surveys assign higher scores to some journals merely because they are more familiar with them rather than on their objective assessment of the overall journal's contribution to the field. Analysis of the journal ranking data from a survey of 233 active researchers in the field of knowledge management and intellectual capital confirmed the existence of the exposure effect. Specifically, it was found that: (1) those who previously published in a particular journal rated it higher than those who did not; (2) those who previously served as a reviewer or editor for a particular journal also rated it higher than those who did not; and (3) a very strong correlation was found between the respondents' perceptions of overall contribution of a journal and the degree of their familiarity with this outlet. This investigation confirmed a major limitation of the stated preference journal ranking approach that should be taken into consideration in future research and results interpretation. (C) 2010 Elsevier Ltd. All rights reserved.
A proposal is made so that the p-index (a composite performance index that can effectively combine size and quality of scientific papers) can be extended for bibliometric research assessment in cases where multiple authorship is taken into account. The fractional and harmonic p-indices are applied to some recent examples to show their usefulness.
It has long been known that scientific output proceeds on an exponential increase, or more properly, a logistic growth curve. The interplay between effort and discovery is clear, and the nature of the functional form has been thought to be due to many changes in the scientific process over time. Here I show a quantitative method for examining the ease of scientific progress, another necessary component in understanding scientific discovery. Using examples from three different scientific disciplines mammalian species, chemical elements, and minor planets I find the ease of discovery to conform to an exponential decay. In addition, I show how the pace of scientific discovery can be best understood as the outcome of both scientific output and ease of discovery. A quantitative study of the ease of scientific discovery in the aggregate, such as done here, has the potential to provide a great deal of insight into both the nature of future discoveries and the technical processes behind discoveries in science.
Multiple-part manuscripts are those submitted to a journal and intended for publication as a series, usually having "Part 1," "Part I," ... "Part N" in the title. Although some journals prohibit such submissions, other journals (including Monthly Weather Review) have no such restrictions. To examine how reviewers and editors view multiple-part manuscripts, 308 multiple-part manuscripts submitted to Monthly Weather Review from May 2001 through February 2010 were examined. For multiple-part manuscripts having reached a final decision, 67% were accepted, which was also the average acceptance rate of all manuscripts (67%). Part I manuscripts submitted alone had a lower acceptance rate (61%) than the average, whereas Part II manuscripts submitted alone had a higher acceptance rate (77%) than the average. Two-part manuscripts submitted together had an acceptance rate (67%) comparable to the average. Typical reviewer comments for Part I manuscripts submitted alone included the manuscript being too long for the available results and the author making claims in Part I that would be supported in the unseen Part II. Typical comments for Part 11 manuscripts submitted alone included the somewhat contradictory statements that material was unnecessarily duplicated in the two manuscripts and more repetition was needed between the two parts. For two-part manuscripts submitted together, reviewers often recommended condensing the two manuscripts and merging them into one. In some cases, editors rejected manuscripts even though no reviewer recommended rejection because the sum of all reviewers' comments would require substantial reorganization of the manuscripts. The results of this study suggest the following recommendations for authors considering writing multiple-part manuscripts: Write manuscripts that are sensibly independent of each other, make minimal reference to unsubmitted manuscripts, and have sufficient and substantiated scientific content within each manuscript.
The research output of India in computer science during 1999-2008 is analyzed in this paper on several parameters including total research output, its growth, rank and global publication share, citation impact, share of international collaborative papers and major collaborative partner countries and patterns of research communication in most productive journals. It also analyses the characteristics of most productive institutions, authors and high-cited papers. The publications output and impact of India is also compared with China, South Korea, Taiwan and Brazil.
As indicator weights obtaining is often difficult in all types of evaluation, this paper describes an approach to improve the indicator weights of scientific and technological competitiveness evaluation of Chinese universities. As a public institution funded by Chinese government, the research center for Chinese science evaluation of Wuhan University has completed five annual evaluations for the scientific and technological competitiveness of Chinese universities since 2005, whose abundant and reliable data motivated us to try to improve the weights obtained by the AHP (analytical hierarchy process). Based on these data, we calculated the objective weights of the indicator using the representative mathematical methods of the least square and the variation coefficient. As the weights of AHP can be influenced by the knowledge, experience and preference of experts and the calculated objective weights neglect the subjective judgement information, we integrated the subjective and objective weights by respectively using the additive and multiplicative model to reflect both the subjective considerations of experts and the objective information, and obtained three kinds of integrative weights. Finally, we selected the integrative weights of multiplicative model as the best weights by comparing and analyzing the evaluation results in 2005 and 2009 of each kind of weights. The results show that the evaluation effect of the weights of multiplicative model is indeed the best for all types of Chinese universities among these kinds of weights, and the experts and university principals enquired also basically reached a consensus on the university rankings of the integrative weights of multiplicative model.
China is becoming a leading nation in terms of its share of the world's publications in the emerging nanotechnology domain. This paper demonstrates that the international rise of China's position in nanotechnology has been underwritten by the emergence of a series of regional hubs of nanotechnology R&D activity within the country. We develop a unique database of Chinese nanotechnology articles covering the period 1990 to mid-2006 to identify the regional distribution of nanotechnology research in China. To build this database, a new approach was developed to clean and standardize the geographical allocation of Chinese publication records. We then analyze the data to understand the regional development of nanotechnology research in China over our study period and to map interregional and international research collaboration linkages. We find that the geographical distribution of China's domestic nanotechnology research is characterized by regional imbalance, with most of the leading regions located in eastern China, including not only Beijing and Shanghai but also a series of other new regional hubs. There is much less development of nanotechnology research in central and western China. Beijing, Shanghai, and Hong Kong are among the leading Chinese regions for international nanotechnology research collaboration. Other Chinese nanotechnology regions are less focused on international collaboration, although they have developed domestic interregional collaborations. Although new regional research hubs have emerged in the nanotechnology domain, the paper notes that their concentration in eastern China reinforces existing imbalances in science and technology capabilities in China, and in turn this may further reinforce the dominant position of eastern China in the commercialization of new technologies such as nanotechnology.
We obtained data of statistical significance to verify the intuitive impression that collaboration leads to higher impact. We selected eight scientific journals to analyze the correlations between the number of citations and the number of coauthors. For different journals, the single-authored articles always contained the lowest citations. The citations to those articles with fewer than five coauthors are lower than the average citations of the journal. We also provided a simple measurement to the value of authorship with regards to the increase number of citations. Compared to the citation distribution, similar but smaller fluctuations appeared in the coauthor distribution. Around 70% of the citations were accumulated in 30% of the papers, while 60% of the coauthors appeared in 40% of the papers. We find that predicting the citation number from the coauthor number can be more reliable than predicting the coauthor number from the citation number. For both citation distribution and coauthor distribution, the standard deviation is larger than the average value. We caution the use of such an unrepresentative average value. The average value can be biased significantly by extreme minority, and might not reflect the majority.
We define converging research as the emergence of an interdisciplinary research area from fields that did not show interdisciplinary connections before. This paper presents a process to search for converging research using journal subject categories as a proxy for fields and citations to measure interdisciplinary connections, as well as an application of this search. The search consists of two phases: a quantitative phase in which pairs of citing and cited fields are located that show a significant change in number of citations, followed by a qualitative phase in which thematic focus is sought in publications associated with located pairs. Applying this search on publications from the Web of Science published between 1995 and 2005, 38 candidate converging pairs were located, 27 of which showed thematic focus, and 20 also showed a similar focus in the other, reciprocal pair.
Although composition of bibliometric indicators appears to be desirable, in many cases it may be misleading. After a brief introduction on the properties of scales of measurement, the attention of this communication is focused on a recent composite indicator, the hg-index, suggested by Alonso et al. (Scientometrics 82(2):391-400, 2010). Specifically, hg-index has three major criticalities: (1) the hg scale is the result of a composition of the h- and g-indices, which are defined both on ordinal scales, (2) the equivalence classes of hg are questionable and the substitution rate between h and g may arbitrarily change depending on the specific h and g values, (3) the apparent increase in granularity of hg, with respect to h and g, is illusory and misleading. Argument is supported by several examples.
There is an evident and rapid trend towards the adoption of evaluation exercises for national research systems for purposes, among others, of improving allocative efficiency in public funding of individual institutions. However the desired macroeconomic aims could be compromised if internal redistribution of government resources within each research institution does not follow a consistent logic: the intended effects of national evaluation systems can result only if a "funds for quality" rule is followed at all levels of decision-making. The objective of this study is to propose a bibliometric methodology for: (i) large-scale comparative evaluation of research performance by individual scientists, research groups and departments within research institution, to inform selective funding allocations; and (ii) assessment of strengths and weaknesses by field of research, to inform strategic planning and control. The proposed methodology has been applied to the hard science disciplines of the Italian university research system for the period 2004-2006.
The trend to use administrative health care databases as research material is increasing but not well explored. Taiwan's National Health Insurance Research Database (NHIRD), one of the largest administrative health care databases around the world, has been used widely in academic studies. This study analyzed 383 NHIRD studies published between 2000 and 2009 to quantify the effects on overall growth, scholar response, and spread of the study fields. The NHIRD studies expanded rapidly in both quantity and quality since the first study was published in 2000. Researchers usually collaborated to share knowledge, which was crucial to process the NHIRD data. However, once the fundamental problem had been overcome, success to get published became more reproducible. NHIRD studies were also published diversely in a growing number of journals. Both general health and clinical science studies benefited from NIIIRD. In conclusion, this new research material widely promotes scientific production in a greater magnitude. The experience of Taiwan's NHIRD should encourage national- or institutional-level data holders to consider re-using their administrative databases for academic purposes.
Inventions combine technological features. When features are barely related, burdensomely broad knowledge is required to identify the situations that they share. When features are overly related, burdensomely broad knowledge is required to identify the situations that distinguish them. Thus, according to my first hypothesis, when features are moderately related, the costs of connecting and costs of synthesizing are cumulatively minimized, and the most useful inventions emerge. I also hypothesize that continued experimentation with a specific set of features is likely to lead to the discovery of decreasingly useful inventions; the earlier-identified connections reflect the more common consumer situations. Covering data from all industries, the empirical analysis provides broad support for the first hypothesis. Regressions to test the second hypothesis are inconclusive when examining industry types individually. Yet, this study represents an exploratory investigation, and future research should test refined hypotheses with more sophisticated data, such as that found in literature-based discovery research.
The aim of this study is to use the Japanese university employee list (published by Kojunsha) to compile a database of teacher transferrals in higher education (HM-DB) at 9 points in time over the 21-year period from 1988 to 2008, and then to use this database to assess and analyze the status of national university teachers immediately before and after assuming office as professors in order to gain some understanding of the transferral mechanisms of teachers at Japan's national universities. From the results of cross-tabulation analysis, it has become clear that a growing proportion of transfers involving the appointment of professors involve movements between very similar universities (transferral blocking phenomenon), and that there is a growing tendency for professorial appointments to involve a migration from universities with a lower share of published research papers to universities with a higher share. Also, by constructing a log-linear model and performing a residual analysis, we have found that although these trends are clearly apparent, they do not yet have a great deal of influence.
As an adaptation to its new environment, universities have engaged in various organisational innovations and taken a more active role in the orientation of the researcher. The emerging institutional management imposes specific constraints and opportunities for researchers. Thus, the impact of institutional membership, notably on the different institutional policies, is increasingly a dominant force in academic working lives. However, some scholars have argued that the context of researchers remains an Ivory Tower situation, where academic working life is defined through the twin discourse of academic freedom and professional autonomy. This article analyses the activities of research faculty members funded by the Natural Sciences and Engineering Research Council of Canada, in comparison to the theories that contribute to the explanation of researchers' behaviour. By using intra-class correlation, which is based on a multi-level analysis of the variance distribution, we find that the grouping effect is still small. In other words, despite the emerging constraints and opportunities determined by their institutional context, researchers still exist in an Ivory Tower, where the explanation of their behaviour is still a matter of individual differences.
Disciplines vary in the types of communicative genres they use to disseminate knowledge and citing patterns used within these genres. However, citation analyses have predominately relied on the references and citations of one type of communicative genre. It is argued that this is particularly problematic for studies of interdisciplinarity, where analyses bias the disciplines that communicate using the genre under investigation. This may lead to inaccurate or incomplete results in terms of fully understanding the interrelationships between disciplines. This study analyzes a set of 15,870 references from 97 US dissertations, in order to demonstrate the difference in discipline and author rankings, based on the genre under investigation. This work encourages future work that takes into account multiple citing and cited works, especially where indicators of interdisciplinarity are used for the allocation of resources or ranking of scholars.
This article analyzes some of the most popular scientific journals in the Manufacturing field from the point of view of four bibliometric indicators: the ISI impact factor (ISI-IF), the Hirsch (h) index-for-journal, the total number of citations and the h-spectrum. h-spectrum is a novel tool based on h, making it possible to (i) identify a reference profile of the typical authors of a journal, (ii) compare different journals and (iii) provide a rough indication of their "bibliometric positioning" in the scientific community. Results of this analysis can be helpful for guiding potential authors and members of the scientific community in the Manufacturing area. Of particular interest is the construction of maps based on h-spectrum and IST-IF to compare journals and monitor their bibliometric positioning over time. A large amount of empirical data are presented and discussed.
This study presents a historical overview of the International Conference on Human Robot Interaction (HRI). It summarizes its growth, internationalization and collaboration. Rankings for countries, organizations and authors are provided. Furthermore, an analysis of the military funding for HRI papers is performed. Approximately 20% of the papers are funded by the US Military. The proportion of papers from the US is around 65% and the dominant role of the US is only challenged by the strong position of Japan, in particular by the contributions by AIR.
The study focuses on publication activity, citation impact and citation links between publications and patents in biotechnology. The European Union (EU), US, Japan and China are the most important global players. However, the landscape is changing since the EU and the US are losing ground because of challenges from a group of emerging economies. National profiles differ between the two groups of main players and upcoming countries; the focus on red biotechnology in the US and Europe is contrasted by propensity for white and green technology in Asia. Furthermore, the subject profile of biotechnology papers citing patents and cited by patents as well as the relationship between patent citations and citation impact in scientific literature is explored. Papers that cite patents tend to reflect propensity towards white biotechnology while patent-cited publications have a higher relative share in red biotechnology. No significant difference concerning the citation impact of publications 'citing patents' and 'not citing patents' can be found. This is contrasted by the observation that patent-cited papers perform distinctly better in terms of standard bibliometric indicators than comparable publications that are not linked to technology in this direction.
Academic research groups are treated as complex systems and their cooperative behaviour is analysed from a mathematical and statistical viewpoint. Contrary to the naive expectation that the quality of a research group is simply given by the mean calibre of its individual scientists, we show that intra-group interactions play a dominant role. Our model manifests phenomena akin to phase transitions which are brought about by these interactions, and which facilitate the quantification of the notion of critical mass for research groups. We present these critical masses for many academic areas. A consequence of our analysis is that overall research performance of a given discipline is improved by supporting medium-sized groups over large ones, while small groups must strive to achieve critical mass.
The HIV/AIDS pandemic is of international interest with the 2008 Nobel Prize in physiology or medicine having being awarded for the discovery of the virus that causes AIDS. South Africa has a particular interest in the field of HIV/AIDS research as it is the country with the largest number of HIV infections in the world and the issue has created a number of political and scientific debates. This investigation identifies the state of HIV/ AIDS related research in South Africa vis-a-vis the rest of the world using evaluative scientometrics in order to inform relevant policy. South Africa is identified as producing an increasing number of HIV/AIDS related publications, making it one of the most prolific fields in the country. The rest of the world appears to have stabilized its research efforts after the development of highly active antiretroviral therapies. The USA is identified as the main producer of HIV/AIDS research while Europe appears to under-emphasise the issue. Comparison of the world's most prolific universities with those in South Africa identifies that the latter has a fragmented system. A number of policy issues are discussed.
The CiteSeer digital library is a useful source of bibliographic information. It allows for retrieving citations, co-authorships, addresses, and affiliations of authors and publications. In spite of this, it has been relatively rarely used for automated citation analyses. This article describes our findings after extensively mining from the CiteSeer data. We explored citations between authors and determined rankings of influential scientists using various evaluation methods including citation and in-degree counts, HITS, PageRank, and its variations based on both the citation and collaboration graphs. We compare the resulting rankings with lists of computer science award winners and find out that award recipients are almost always ranked high. We conclude that CiteSeer is a valuable, yet not fully appreciated, repository of citation data and is appropriate for testing novel bibliometric methods.
The criteria for the evaluation of scientific journals have changed from characteristics of its contents to citations of articles. Among many problems associated with citation-based evaluation methods are that it is applicable only to a limited number of journals, preferential selection of citable documents, differential values to citations, time duration for assessment, etc. The proposed index, Aggregated Citations of Cited Articles (ACCA), is calculated based on citations data, derived from only of cited articles, and therefore can be validated from standard database. While giving more importance to citations, the number of cited articles published in a journal also has some influence in the new index. The calculated values are consistent with time and can be used to back-track the status of a journal in its past and for continued evaluation. The new Index ensures neutrality, qualitative and quantitative hierarchy and consistency in the estimation of journal ranking.
In 1989 the Spanish Government established an individual retrospective research evaluation system (RES) for public researchers. Policy makers have associated the establishment of this evaluation system with the significant increase in the volume of scientific publications attributed to Spain over the last decades. In a similar vein to the analyses of other country cases, some scholars have also claimed that the growth of Spain's international scientific publications is a result of the establishment of the new evaluation system. In this paper, we provide a methodological revision of the validity threats in previous research, including some interrupted time-series analyses and control groups to investigate the effects of this policy instrument on the number of papers produced by Spanish authors. In the years following the establishment of the evaluation system, the results indicate a considerable increase in the number of papers attributed to Spanish authors among those eligible for evaluation (the "treated" group), but also in the control groups. After testing various alternative explanations, we conclude that the growth in Spanish publications cannot be attributed indisputably to the effect of the establishment of the RES, but rather to the increase of expenditure and number of researchers in the Spanish R&D system along with some maturation effects. We take this case as an example of the need to improve and refine methodologies and to be more cautious when attributing effects to research evaluation mechanisms at the national level.
The aim of this paper is to identify the research status quo on pervasive and ubiquitous computing via scientometric analysis. Information visualization and knowledge domain visualization techniques were adopted to determine how the study of pervasive and ubiquitous computing has evolved. A total of 5,914 papers published between 1995 and 2009 were retrieved from the Web of Science with a topic search of pervasive or ubiquitous computing. CiteSpace is a java application for analyzing and visualizing a wide range of networks from bibliographic data. By use of it, we generated the subject category network to identify the leading research fields, the research power network to find out the most productive countries and institutes, the journal co-citation map to identify the distribution of core journals, the author co-citation map to identify key scholars and their co-citation patterns, the document co-citation network to reveal the ground-breaking literature and detect the co-citation clusters on pervasive and ubiquitous computing, and depicted the hybrid network of keywords and noun phrases to explore research foci on pervasive and ubiquitous computing over the entire span 1995-2009.
This study used a bibliometric method to find quantitative evidence of publication and citing patterns within UK academia. The publications of a random sample of UK research-active academics for each of the years 2003 and 2008-were collected and analysed to gather data regarding referencing practices, along with any identifiable trends between the 2 years. References were categorised by type of material to show the proportions of each type used. Comparisons between the 2 years showed that the use of journal articles had increased. There was also an increase in the average number of publications per author. A large number of authors had no publications in the target years.
Policy makers, at various levels of governance, generally encourage the development of research collaboration. However the underlying determinants of collaboration are not completely clear. In particular, the literature lacks studies that, taking the individual researcher as the unit of analysis, attempt to understand if and to what extent the researcher's scientific performance might impact on his/her degree of collaboration with foreign colleagues. The current work examines the international collaborations of Italian university researchers for the period 2001-2005, and puts them in relation to each individual's research performance. The results of the investigation, which assumes co-authorship as proxy of research collaboration, show that both research productivity and average quality of output have positive effects on the degree of international collaboration achieved by a scientist.
Yearly publication counts of research institutions and universities continue to be a widely-used parameter to assess their research productivity, and such evaluations have been successfully used to analyze the influence of research support policies at various levels. This study was designed to analyze the yearly number of articles having an Akdeniz University address and that appeared in the Web of Science databases from 1996 to 2009. Time series analysis of the number of published articles was used to determine the impact of alterations in the number of faculty members and research funding as well as changes in the institutional and country-wide research support policies and encouragement mechanisms. It was observed that alterations in both the number of faculty members who are active in research and the total amount of research funding each year may explain the general pattern published articles. However, there is a period with significant deviations from the trend predicted by these relationships. This period, corresponding to the years 2002-2008, is discussed in terms of the effects of policy changes which may have positive and negative contributions to the predicted pattern. Mathematical analysis of publication time series, together with parameters expected to affect research output, may provide valuable insight into the effectiveness of research support mechanisms.
This paper aims to reveal the relationship and structure of library and information science (LIS) journals in China. 24 core LIS journals in China are selected and the relevant data of journal co-citation are retrieved from Chinese Journal Full-Text Database constructed by China National Knowledge Infrastructure during the period of 1999-2009. By calculating mean co-citation frequencies and correlation coefficients, we find that there is a strong relationship among LIS journals in China. Utilizing the methods of cluster analysis, multidimensional scaling analysis and factor analysis, we analyze the data of journal co-citation. LIS journals in China are divided into four clusters. The relatedness among journals is shown manifestly through their locations in the two-dimensional map. A three-factor solution is obtained with the factor loading of each journal. Finally, we interpret and discuss the results to get some conclusions and also expect to describe the network characters of journal co-citation in future research.
The f-value is a new indicator that measures the importance of a research article by taking into account all citations received, directly and indirectly, up to depth n. The f-value considers all information present in a Citation Graph in order to produce a ranking of the articles. Apart from the mathematical equation that calculates the f-value, we also present the corresponding algorithm with its implementation, plus an experimental comparison of f-value with two known indicators of an article's scientific importance, namely, the number of citations and the Page Rank for citation analysis. Finally, we discuss the similarities and differences among the indicators.
Technology analysis is a process which uses textual analysis to detect trends in technological innovation. Co-word analysis (CWA), a popular method for technology analysis, encompasses (1) defining a set of keyword or key phrase patterns which are represented in technology-dependent terms, (2) generating a network that codifies the relations between occurrences of keywords or key phrases, and (3) identifying specific trends from the network. However, defining the set of keyword or key phrase patterns heavily relies on effort of experts, who may be expensive or unavailable. Furthermore defining keyword or key phrase patterns of new or emerging technology areas may be a difficult task even for experts. To solve the limitation in CWA, this research adopts a property-function based approach. The property is a specific characteristic of a product, and is usually described using adjectives; the function is a useful action of a product, and is usually described using verbs. Properties and functions represent the innovation concepts of a system, so they show innovation directions in a given technology. The proposed methodology automatically extracts properties and functions from patents using natural language processing. Using properties and functions as nodes, and co-occurrences as links, an invention property-function network (IPFN) can be generated. Using social network analysis, the methodology analyzes technological implications of indicators in the IPFN. Therefore, without predefining keyword or key phrase patterns, the methodology assists experts to more concentrate on their knowledge services that identify trends in technological innovation from patents. The methodology is illustrated using a case study of patents related to silicon-based thin film solar cells.
This article studies interdisciplinarity and the intellectual base of 34 literature journals using citation data from Web of Science. Data from two time periods, 1978-1987 and 1998-2007 were compared to reveal changes in the interdisciplinary citing of monographs. The study extends the analysis to non-source publications; using the classification of monographs to show changes in the intellectual base. There is support for increased interdisciplinary citing of sources, especially to the social sciences, and changes in the intellectual base reflect this. The results are explained using theories on the intellectual and social organization of scientific fields and the use of bibliometric methods on the humanities is discussed. The article demonstrates how citation analysis can provide insights into the communication patterns and intellectual structure of scholarly fields in the arts and humanities.
This paper describes the different forms of and tries to give reasons for international scientific collaboration in general. It focuses on eleven countries in the Asia-Pacific region by evaluating their national research output with the help of bibliometric indicators in particular. Over two million journal articles published by these countries between 1998 and 2007 in ISI-listed periodicals are analyzed. Discipline-specific publication and citation profiles reveal national strengths and weaknesses in the different research domains. The exponential increase in publication output by China over the last few years is astonishing, but in terms of visibility, i.e. citation rates, China cannot keep up with leading science nations, remaining below the world average. A discipline-specific analysis shows that Chinese authors took an active part in more than a quarter of all articles and reviews published in the field of materials science in 2007, while their contribution to medical research is very low. Co-publication networks among the eleven countries are generated to observe the development of cooperation bonds in the region. Applying Salton's measure of international collaboration strength, an above-average strengthening of scientific collaboration in the Asia-Pacific region can be observed.
Better research quality not only inspires scholars to continue their research, but also increases the possibility of higher research budgets from sponsors. Given the importance of research quality, this study proposes that utilizing social capital (i.e., research collaboration) might be a promising avenue to achieve better research quality. In addition, as every scholar has his or her own expertise and knowledge, the diversity of collaborating members might be an extra resource for reinforcing research quality. The purpose of this study is to investigate the impact of research collaboration and member diversity on research quality, including the number of citations, the impact factor, and the size of the research award. To explore unknown associations, the author adopts two data sources, that is, the Social Science Citation Index database and academic database of a university, to verify the hypotheses. The results show that a higher intensity at which scholars are embedded in a collaboration network, results in higher research quality. However, member diversity does not seem to be a major concern during the organization of a research group. Research quality is not affected, regardless of whether a scholar collaborates with different or the same co-authors.
This study analyzed the use of acknowledgements in medical articles published in five countries (Venezuela, Spain, France, UK and USA) from 1950 to 2010. For each country, we selected 54 papers (18 research papers, 18 reviews and 18 case reports), evenly distributed over six decades, from two medical journals with the highest impact factors. Only papers written by native speakers in the national language were included. The evolution of the frequency and length of acknowledgments was analyzed. Of 270 articles studied, 127 (47%) had acknowledgments. The presence of acknowledgments was associated with country (p = 0.001), this section being more common and longer in US and UK journals. Acknowledgments were most common in research papers (70 vs. 40% in case reports and 31% in reviews, p < 0.001). Reviews without acknowledgments were significantly more common than those with (69 vs. 31%), but there was no trend in case reports. Altogether, articles with acknowledgments predominated only after 2000. Since the frequency of use of acknowledgments remained stable over time in US and UK journals but increased in non-Anglophone journals, the overall increase is attributed to the change in non-English publications. Authors acknowledged sub-authorship more in English language journals than in those published in the national language in France, Spain and Venezuela. However, the practice of acknowledging is increasing in non-Anglophone journals. We conclude that the concept of intellectual indebtedness does not only differ from one geographical context to another, but also over time and from one academic genre to another.
In the assessment of success of new analgesic drugs over the past 50 years (Kissin, Anesth Analg 110:780-789, 2010) we observed a difference in the publication response to a new drug between biomedical journals in general and top journals: number of published articles on a drug increased (and declined) more rapidly in the top journals. Based on this phenomenon we present a new publication indicator-the Top Journal Selectivity Index (TJSI). It represents the ratio between the number of all types of articles in the top 20 biomedical journals and the number of articles in all (> 5,000) journals covered by Medline, over 5 years after a drug's introduction. Ten analgesics developed during the period 1986-2009 were selected for analysis. Three publication indices were used for assessment: the number of all types of articles presented in Medline, the number of articles covering only randomized controlled trials (RCT), and the Top Journal Selectivity Index. We also assessed the success score in the development of these analgesics based on the following criteria: novelty of molecular target, analgesic efficacy, and response by the pharmaceutical market. The relationships between the publication indices and analgesic's success score were determined with the use of the Pearson correlation coefficient. Positive relationship was found only with the Top Journal Selectivity Index (r = 0.876, p < 0.001). We suggest that this index can predict success in drug development at least in the field of analgesics.
The Impact Factors (IFs) of the Institute for Scientific Information suffer from a number of drawbacks, among them the statistics Why should one use the mean and not the median? and the incomparability among fields of science because of systematic differences in citation behavior among fields. Can these drawbacks be counteracted by fractionally counting citation weights instead of using whole numbers in the numerators? (a) Fractional citation counts are normalized in terms of the citing sources and thus would take into account differences in citation behavior among fields of science. (b) Differences in the resulting distributions can be tested statistically for their significance at different levels of aggregation. (c) Fractional counting can be generalized to any document set including journals or groups of journals, and thus the significance of differences among both small and large sets can be tested. A list of fractionally counted IFs for 2008 is available online at http ://www.leydesdorffnet/weighted_if/weighted_if.xis The between-group variance among the 13 fields of science identified in the U.S. Science and Engineering Indicators is no longer statistically significant after this normalization. Although citation behavior differs largely between disciplines, the reflection of these differences in fractionally counted citation distributions can not be used as a reliable instrument for the classification.
I studied the factors (citations, self-citations, and number of articles) that influenced large changes in only 1 year in the impact factors (IFs) of journals. A set of 360 instances of journals with large increases or decreases in their IFs from a given year to the following was selected from journals in the Journal Citation Reports from 1998 to 2007 (40 journals each year). The main factor influencing large changes was the change in the number of citations. About 54% of the increases and 42% of the decreases in the journal IFs were associated with changes in the journal self-citations.
This article aims to identify whether different weighted Page Rank algorithms can be applied to author citation networks to measure the popularity and prestige of a scholar from a citation perspective. Information retrieval (IR) was selected as a test field and data from 1956-2008 were collected from Web of Science. Weighted Page Rank with citation and publication as weighted vectors were calculated on author citation networks. The results indicate that both popularity rank and prestige rank were highly correlated with the weighted Page Rank. Principal component analysis was conducted to detect relationships among these different measures. For capturing prize winners within the IR field, prestige rank outperformed all the other measures.
Effective patent management is essential for organizations to maintain their competitive advantage. The classification of patents is a critical part of patent management and industrial analysis. This study proposes a hybrid-patent-classification approach that combines a novel patent-network-based classification method with three conventional classification methods to analyze query patents and predict their classes. The novel patent network contains various types of nodes that represent different features extracted from patent documents. The nodes are connected based on the relationship metrics derived from the patent metadata. The proposed classification method predicts a query patent's class by analyzing all reachable nodes in the patent network and calculating their relevance to the query patent. It then classifies the query patent with a modified k-nearest neighbor classifier. To further improve the approach, we combine it with content-based, citation-based, and metadata-based classification methods to develop a hybrid- classification approach. We evaluate the performance of the hybrid approach on a test dataset of patent documents obtained from the U.S. Patent and Trademark Office, and compare its performance with that of the three conventional methods. The results demonstrate that the proposed patent-network-based approach yields more accurate class predictions than the patent network-based approach.
National exercises for the evaluation of research activity by universities are becoming regular practice in ever more countries. These exercises have mainly been conducted through the application of peer-review methods. Bibliometrics has not been able to offer a valid large-scale alternative because of almost overwhelming difficulties in identifying the true author of each publication. We will address this problem by presenting a heuristic approach to author name disambiguation in bibliometric datasets for large-scale research assessments. The application proposed concerns the Italian university system, comprising 80 universities and a research staff of over 60,000 scientists. The key advantage of the proposed approach is the ease of implementation. The algorithms are of practical application and have considerably better scalability and expandability properties than state-of-the-art unsupervised approaches. Moreover, the performance in terms of precision and recall, which can be further improved, seems thoroughly adequate for the typical needs of large-scale bibliometric research assessments.
The production of scientific knowledge has evolved from a process of inquiry largely based on the activities of individual scientists to one grounded in the collaborative efforts of specialized research teams. This shift brings to light a new question: how the composition of scientific teams affects their production of knowledge. This study employs data from 1,415 experiments conducted at the National High Magnetic Field Laboratory (NHMFL) between 2005 and 2008 to identify and select a sample of 89 teams and examine whether team diversity and network characteristics affect productivity. The study examines how the diversity of science teams along several variables affects overall team productivity. Results indicate several diversity measures associated with network position and team productivity. Teams with mixed institutional associations were more central to the overall network compared with teams that primarily comprised NHMFL's own scientists. Team cohesion was positively related to productivity. The study indicates that high productivity in teams is associated with high disciplinary diversity and low seniority diversity of team membership. Finally, an increase in the share of senior members negatively affects productivity, and teams with members in central structural positions perform better than other teams.
This is a study of coverage and overlap in second-generation social sciences and humanities journal lists, with attention paid to curation and the judgment of scholarliness. We identify four factors underpinning coverage shortfalls: journal language, country, publisher size, and age. Analyzing these factors turns our attention to the process of assessing a journal as scholarly, which is a necessary foundation for every list of scholarly journals. Although scholarliness should be a quality inherent in the journal, coverage falls short because groups assessing scholarliness have different perspectives on the social sciences and humanities literature. That the four factors shape perspectives on the literature points to a deeper problem of fragmentation within the scholarly community. We propose reducing this fragmentation as the best method to reduce coverage shortfalls.
The development of visual retrieval methods requires information about user interaction with images, including their description and categorization. This article presents the development of a categorization model for magazine images based on two user studies. In Study 1, we elicited 10 main classes of magazine image categorization criteria through sorting tasks with nonexpert and expert users (N = 30). Multivariate methods, namely, multidimensional scaling and hierarchical clustering, were used to analyze similarity data. Content analysis of category names gave rise to classes that were synthesized into a categorization framework. The framework was evaluated in Study 2 by experts (N = 24) who categorized another set of images consistent with the framework and found it to be useful in the task. Based on the evaluation study the framework was solidified into a model for categorizing magazine imagery. Connections between classes were analyzed both from the original sorting data and from the evaluation study and included into the final model. The model is a practical categorization tool that may be used in workplaces, such as magazine editorial offices. It may also serve to guide the development of computational methods for image understanding, selection of concepts for automatic detection, and approaches to support browsing and exploratory image search.
The use of primary source materials is recognized as key to supporting history and social studies education. The extensive digitization of library, museum, and other cultural heritage collections represents an important teaching resource. Yet, searching and selecting digital primary sources appropriate for classroom use can be difficult and time-consuming. This study investigates the design requirements and the potential usefulness of a domain-specific ontology to facilitate access to, and use of, a collection of digital primary source materials developed by the Library of the University of North Carolina at Chapel Hill. During a three-phase study, an ontology model was designed and evaluated with the involvement of social studies teachers. The findings revealed that the design of the ontology was appropriate to support the information needs of the teachers and was perceived as a potentially useful tool to enhance collection access. The primary contribution of this study is the introduction of an approach to ontology development that is user-centered and designed to facilitate access to digital cultural heritage materials. Such an approach should be considered on a case-by-case basis in relation to the size of the ontology being built, the nature of the knowledge domain, and the type of end users targeted.
Webcasting systems were developed to provide remote access in real-time to live events. Today, these systems have an additional requirement: to accommodate the "second life" of webcasts as archival information objects. Research to date has focused on facilitating the production and storage of webcasts as well as the development of more interactive and collaborative multimedia tools to support the event, but research has not examined how people interact with a webcasting system to access and use the contents of those archived events. Using an experimental design, this study examined how 16 typical users interact with a webcasting system to respond to a set of information tasks: selecting a webcast, searching for specific information, and making a gist of a webcast. Using several data sources that included user actions, user perceptions, and user explanations of their actions and decisions, the study also examined the strategies employed to complete the tasks. The results revealed distinctive system-use patterns for each task and provided insights into the types of tools needed to make webcasting systems better suited for also using the webcasts as information objects.
Large, data-rich organizations have tremendously large collections of digital objects to be "repurposed," to respond quickly and economically to publishing, marketing, and information needs. Some management typically assume that a content management system, or some other technique such as OWL and RDF, will automatically address the workflow and technical issues associated with this reuse. Four case studies show that the sources of some roadblocks to agile repurposing are as much managerial and organizational as they are technical in nature. The review concludes with suggestions on how digital object repurposing can be integrated given these organizations' structures.
The Initiative for the Evaluation of XML retrieval (INEX) provides a TREC-like platform for evaluating content-oriented XML retrieval systems. Since 2007, INEX has been using a set of precision-recall based metrics for its ad hoc tasks. The authors investigate the reliability and robustness of these focused retrieval measures, and of the INEX pooling method. They explore four specific questions: How reliable are the metrics when assessments are incomplete, or when query sets are small? What is the minimum pool/query-set size that can be used to reliably evaluate systems? Can the INEX collections be used to fairly evaluate "new" systems that did not participate in the pooling process? And, for a fixed amount of assessment effort, would this effort be better spent in thoroughly judging a few queries, or in judging many queries relatively superficially? The authors' findings validate properties of precision-recall-based metrics observed in document retrieval settings. Early precision measures are found to be more error-prone and less stable under incomplete judgments and small topic-set sizes. They also find that system rankings remain largely unaffected even when assessment effort is substantially (but systematically) reduced, and confirm that the INEX collections remain usable when evaluating nonparticipating systems. Finally, they observe that for a fixed amount of effort, judging shallow pools for many queries is better than judging deep pools for a smaller set of queries. However, when judging only a random sample of a pool, it is better to completely judge fewer topics than to partially judge many topics. This result confirms the effectiveness of pooling methods.
Educational standards are a central focus of the current educational system in the United States, underpinning educational practice, curriculum design, teacher professional development, and high-stakes testing and assessment. Digital library users have requested that this information be accessible in association with digital learning resources to support teaching and learning as well as accountability requirements. Providing this information is complex because of the variability and number of standards documents in use at the national, state, and local level. This article describes a cataloging tool that aids catalogers in the assignment of standards metadata to digital library resources, using natural language processing techniques. The research explores whether the standards suggestor service would suggest the same standards as a human, whether relevant standards are ranked appropriately in the result set, and whether the relevance of the suggested assignments improve when, in addition to resource content, metadata is included in the query to the cataloging tool. The article also discusses how this service might streamline the cataloging workflow.
The microblogging site Twitter generates a constant stream of communication, some of which concerns events of general interest. An analysis of Twitter may, therefore, give insights into why particular events resonate with the population. This article reports a study of a month of English Twitter posts, assessing whether popular events are typically associated with increases in sentiment strength, as seems intuitively likely. Using the top 30 events, determined by a measure of relative increase in (general) term usage, the results give strong evidence that popular events are normally associated with increases in negative sentiment strength and some evidence that peaks of interest in events have stronger positive sentiment than the time before the peak. It seems that many positive events, such as the Oscars, are capable of generating increased negative sentiment in reaction to them. Nevertheless, the surprisingly small average change in sentiment associated with popular events (typically 1% and only 6% for Tiger Woods' confessions) is consistent with events affording posters opportunities to satisfy pre-existing personal goals more often than eliciting instinctive reactions.
This study examines the identity and development of the management information systems (MIS) field through a scientometric lens applied to three major global, regional and national conferences: International Conference on Information Systems (ICIS), Pacific Asia Conference on Information Systems (PACIS) and Administrative Sciences Association of Canada Annual Conference (ASAC). It adapts the conference stakeholder approach to the construction of the identity of the MIS discipline and analyzes the proceedings of these three conferences. The findings suggest that the MIS field has been evolving in terms of collaborative research and scholarly output and has been gradually moving towards academic maturity. The leading MIS conference contributors tend to establish loyalty to a limited number of academic meetings. At the same time, relatively low levels of repeat publication in the proceedings of ICIS, PACIS and ASAC were observed. It was suggested that Lotka's and Yule-Simon's bibliometric laws may be applied to measure and predict the degree of conference delegate loyalty.
The paper is concerned with analysing what makes a great journal great in the sciences, based on quantifiable Research Assessment Measures (RAM). Alternative RAM are discussed, with an emphasis on the Thomson Reuters ISI Web of Science database (hereafter ISI). Various ISI RAM that are calculated annually or updated daily are defined and analysed, including the classic 2-year impact factor (2YIF), 5-year impact factor (5YIF), Immediacy (or 0-year impact factor (0YIF)), Eigenfactor, Article Influence, C3PO (Citation Performance Per Paper Online), h-index, Zinfluence, PI-BETA (Papers Ignored-By Even The Authors), Impact Factor Inflation (IFI), and three new RAM, namely Historical Self-citation Threshold Approval Rating (H-STAR), 2 Year Self-citation Threshold Approval Rating (2Y-STAR), and Cited Article Influence (CAI). The RAM data are analysed for the 6 most highly cited journals in 20 highly-varied and well-known ISI categories in the sciences, where the journals are chosen on the basis of 2YIF. The application to these 20 ISI categories could be used as a template for other ISI categories in the sciences and social sciences, and as a benchmark for newer journals in a range of ISI disciplines. In addition to evaluating the 6 most highly cited journals in each of 20 ISI categories, the paper also highlights the similarities and differences in alternative RAM, finds that several RAM capture similar performance characteristics for the most highly cited scientific journals, determines that PI-BETA is not highly correlated with the other RAM, and hence conveys additional information regarding research performance. In order to provide a meta analysis summary of the RAM, which are predominantly ratios, harmonic mean rankings are presented of the 13 RAM for the 6 most highly cited journals in each of the 20 ISI categories. It is shown that emphasizing THE impact factor, specifically the 2-year impact factor, of a journal to the exclusion of other informative RAM can lead to a distorted evaluation of journal performance and influence on different disciplines, especially in view of inflated journal self citations.
Ranking of 91 countries based on the Technology Achievement Index 2009 (TAI-09) (2009 refers to the year in which most of data collection was carried out.) is reported. Originally proposed in 2002, the TAI is a composite indicator which aggregates national technological capabilities and performance in terms of creation/diffusion of new technologies, diffusion of old technologies and development of human skills. In addition to the overall ranking of 91 countries, rankings in each sub-dimension of the Index are also reported. Comparative analysis of TAI ranking of 56 countries, common to the present and previous study of 2002 under similar conditions, is quite instructive and indicates shifts in technological scenario of these countries even over a relatively short period of 5-6 years. A simple concept based on Standard Deviation approach, as an indication of the technological spread or otherwise, is proposed for the first time. Application of this concept to 56 common countries is reported.
This paper offers some insights into scientific collaboration (SC) at the regional level by drawing upon two lines of inquiry. The first involves examining the spatial patterns of university SC across the EU-15 (all countries belonging to the European Union between 1995 and 2004). The second consists of extending the current empirical analysis on regional SC collaboration by including the economic distance between regions in the model along with other variables suggested by the extant literature. The methodology relies on co-publications as a proxy for academic collaboration, and in order to test the relevance of economic distance for the intensity of collaboration between regions, we put forward a gravity equation. The descriptive results show that there are significant differences in the production of academic scientific papers between less-favoured regions and core regions. However, the intensity of collaboration is similar in both types of regions. Our econometric findings suggest that differences in scientific resources (as measured by R&D expenditure) between regions are relevant in explaining academic scientific collaborations, while distance in the level of development (as measured by per capita GDP) does not appear to play any significant role. Nevertheless, other variables in the analysis, including geographical distance, specialization and cultural factors, do yield significant estimated coefficients, and this is consistent with the previous literature on regional SC.
We studied the effect on journal impact factors (JIF) of citations from documents labeled as articles and reviews (usually peer reviewed) versus citations coming from other documents. In addition, we studied the effect on JIF of the number of citing records. This number is usually different from the number of citations. We selected a set of 700 journals indexed in the SCI section of JCR that receive a low number of citations. The reason for this choice is that in these instances some citations may have a greater impact on the JIF than in more highly-cited journals. After excluding some journals for different reasons, our sample consisted of 674 journals. We obtained data on citations that contributed to the JIF for the years 1998-2006. In general, we found that most journals obtained citations that contribute to the impact factor from documents labeled as articles and reviews. In addition, in most of journals the ratio between citations that contributed to the impact factor and citing records was greater than 80% in all years. Thus, in general, we did not find evidence that citations that contributed to the impact factor were dependent on non-peer reviewed documents or only a few citing records.
The h-index has received an enormous attention for being an indicator that measures the quality of researchers and organizations. We investigate to what degree authors can inflate their h-index through strategic self-citations with the help of a simulation. We extended Burrell's publication model with a procedure for placing self-citations, following three different strategies: random self-citation, recent self-citations and h-manipulating self-citations. The results show that authors can considerably inflate their h-index through self-citations. We propose the q-index as an indicator for how strategically an author has placed self-citations, and which serves as a tool to detect possible manipulation of the h-index. The results also show that the best strategy for an high h-index is publishing papers that are highly cited by others. The productivity has also a positive effect on the h-index.
Bibliometric indicators are increasingly used to fund and evaluate scientific research. Since the number of authors in a paper and the number of has increased it is difficult to determine the individual contribution of authors. Suggested approaches include the study of author position or the corresponding author. Our findings show that the corresponding author is most likely to appear first and then last in the byline. The results are dependent on number of authors in a paper and national differences exist. This underscores the need to take into account both the number of authors on a paper and their position in the byline to be accurate when measuring author contribution.
The Hirsch index is a number that synthesizes a researcher's output. It is defined as the maximum number h such that the researcher has h papers with at least h citations each. Four characterizations of the Hirsch index are suggested. The most compact one relies on the interpretation of the index as providing the number of valuable papers in an output and postulates three axioms. One, only cited papers can be valuable. Two, the index is strongly monotonic: if output x has more papers than output y and each paper in x has more citations than the most cited paper in y, then x has more valuable papers than y. And three, the minimum amount of citations under which a paper becomes valuable is different for each paper.
This study is an attempt to approach the intellectual structure of the stem cell research field 2004-2009 through a comprehensive author co-citation analysis (ACA), and to contribute to a better understanding of a field that has been brought to the forefront of research, therapy and political and public debates, which, hopefully, will in turn better inform research and policy. Based on a nearly complete and clean dataset of stem cell literature compiled from PubMed and Scopus, and using automatic author disambiguation to further improve results, we perform an exclusive all-author ACA of the 200 top-ranked researchers of the field by fractional citation count. We find that, despite the theoretically highly interdisciplinary nature of the field, stem cell research has been dominated by a few central medical research areas-cancer and regenerative medicine of the brain, the blood, the skin, and the heart-and a core of cell biologists trying to understand the nature and the molecular biology of stem cells along with biotechnology researchers investigating the practical identification, isolation, creation, and culturing of stem cells. It is also remarkably self-contained, drawing only on a few related areas of cell biology. This study also serves as a baseline against which the effectiveness of a range of author-based bibliometric methods and indicators can be tested, especially when based on less comprehensive datasets using less optimal analysis methods.
We study global and local Q-measures, as well as betweenness centrality, as indicators of international collaboration in research. After a brief review of their definitions, we introduce the concepts of external and internal inter-group geodesics. These concepts are applied to a collaboration network of 1129 researchers from different countries, which is based on publications in bibliometrics, informetrics, webometrics, and scientometrics (BIWS in short) from the period 1990-2009. It is thus illustrated how international collaboration (among authors from different countries) in BIWS is carried out. Our results suggest that average scores for local Q-measures are typically higher, indicating a relatively low degree of international collaboration in BIWS. The dominating form of international collaboration is bilateral, whereas multilateral collaboration is relatively rare in the field of BIWS. We also identify and visualize the most important global and local actors. Dividing the entire period in four 5-year periods, it is found that most international collaboration in the field has happened in the last time slice (2005-2009). A comparison of the different time slices reveals the non-linear growth of the indicators studied and the international expansion of the field.
There has been a substantial increase in the percentage for publications with co-authors located in departments from different countries in 12 major journals of psychology. The results are evidence for a remarkable internationalization of psychological research, starting in the mid 1970s and increasing in rate at the beginning of the 1990s. This growth occurs against a constant number of articles with authors from the same country; it is not due to a concomitant increase in the number of co-authors per article. Thus, international collaboration in psychology is obviously on the rise.
Nanobiopharmaceuticals is a hopeful research domain from recent scientific advances with massive marketable potential. Although some researchers have studied international collaboration from some aspects, few articles are as comprehensive as this article to consider international cooperation from so many different aspects. We lay more emphasis on international collaboration in the field of nanobiopharmaceuticals involving China. Incremental citation impact values show that in order to move forward and improve the overall competitiveness in the field, China requires to carry out more international collaboration in the field, especially with USA, Germany, and England. Startlingly, multinational collaboration does not sway Chinese citation impact as much as we anticipate in the field. China has reached the first rank in the world in terms of publication amount per year in the field in 2009. Few papers about international collaboration compare small world phenomenon. We use small world quotient to find that it is important for Chinese international co-authors to strengthen to cultivate a cooperation networks in which a node's partners are also buddies to each other.
It is shown that the observations made in a recent contribution by Savanur and Srikanth (Scientometrics 84:365-371, 2010) are not new. On the contrary much more refined collaboration measures have been proposed already in 1991 by Egghe.
Patents are the manifestation of the industry's research and development (R&D) endeavor; therefore, this paper studies the industry evolution of and key technologies in China from the perspective of patent analysis. Patents in six types of industries, including Chemical (excluding Drugs), Computers and Communications, Drugs and Medical, Electrical and Electronics (E&E), Mechanical, and Others are analyzed in this study. Findings from the analysis show a steady increase of US granted utility patents in China as well as percentage of these patents in the world over the period between 2003 and 2008. All the above industries in China have been growing rapidly during this period, which is very different from the global industry development. Despite the rapid development, the citation rates of these patents have been low, reflecting a need for improvement in the quality of patents and R&D performance for these six industries in China in order to exert more influence in the industry world. The analysis on patents also reveals China's industry distribution to be similar to the global industry distribution, with the exception of E&E industry which weights over one third of the total patents in technologies. The E&E industry is also the field with largest economic growth which rises more rapidly after 2006 with a sudden increase of patents in USPC 361. Detailed tracking of the key technology evolution reveals that 90% of the newly issued patents in USPC 361 after 2006 are owned by Foxconn Technology Co., Ltd, pointing to an unbalanced R&D environment in China's E&E industry sector. By providing the insight into the evolution of China's industrial and technological development through the perspective of patent analysis, this paper hopes to provide an objective statistic reference for future policy directions and academic researches.
In this article I study characteristics of the journal impact factor (JIF) computed using a 5-year citation window as compared with the classical JIF computed using a 2-year citation window. Since 2007 ISI-Thomson Reuters has published the new 5-year impact factor in the JCR database. I studied changes in the distribution of JIFs when the citation window was enlarged. The distributions of journals according their 5-year JIFs were very similar all years studied, and were also similar to the distribution according to the 2-year JIFs. In about 72% of journals, the JIF increased when the longer citation window was used. Plots of 5-year JIFs against rank closely followed a beta function with two exponents. Thus, the 5-year JIF seems to behave very similarly to the 2-year JIF. The results also suggest that gains in JIF with the longer citation window tend to distribute similarly in all years. Changes in these gains also tend to distribute similarly from 1 year to the following year.
Nowadays, scientometrics has become an important field of study to monitor the progresses in scientific performance of a research group, a department, a university etc. A number of scientometrical studies have been done about Iranian scientific outcome in recent years. But there is no comparison between major Iranian medical universities. In this study, by using Scopus as search engine, the scientific outcomes of the Iran University of Medical Sciences, Isfahan University of Medical Sciences, Mashhad University of Medical Sciences, Shahid Beheshti University of Medical Sciences, Shiraz University of Medical Sciences, Tabriz University of Medical Sciences, and Tehran University of Medical Sciences have been compared with each other. These universities were compared by the number of published articles per year, number of citations received per year, number of citations received per year per article, total H-indices, top ten authors, and top ten journals. The results of this study show that the order of the studied universities in research performance is as follow: Tehran > Shiraz = Shahid Beheshti > Isfahan = Iran > Tabriz = Mashhad universities of medical sciences. In addition, the data of Tehran University of Medical Sciences as the top medical university of Iran was compared with some of top medical universities around the world.
Why authors cite particular documents has been the subject of both speculation and empirical investigation for decades. This article provides a short history of attempts to understand citation motivations and reports a replication of earlier surveys measuring reasons for citations. Comparisons are made among various types of scholars. The present study identified six highly cited articles in the topic area of bibliometrics and surveyed all of the locatable authors who cited those works (n = 112). It was thought that bibliometricians, given that this is their area of expertise, might have a heightened level of awareness of their own citation practices, and hence a different pattern of responses. Several reasons indicated by the 56% of the sample who identified themselves as bibliometricians differed in statistically significant ways from nonbibliometricians, and also from earlier samples of scholars in Communication and Psychology. By far the most common reason for citing a document is that it represents a genre. A factor analysis shows that 20 motivations, clustered in seven factors, can represent the most common motivations for citation. The implications of these findings are discussed in the light of recent debates about the role of social factors in citation. Alternative methods for investigating citation behavior are discussed.
This paper deals with the specific features of historical papers relevant for information retrieval and bibliometrics. The analysis is based mainly on the citation indexes accessible under the Web of Science (WoS) but also on field-specific databases: the Chemical Abstracts Service (CAS) literature database and the INSPEC database. First, the journal coverage of the WoS (in particular of the WoS Century of Science archive), the limitations of specific search fields as well as several database errors are discussed. Then, the problem of misspelled citations and their "mutations" is demonstrated by a few typical examples. Complex author names, complicated journal names, and other sources of errors that result from prior citation practice are further issues. Finally, some basic phenomena limiting the meaning of citation counts of historical papers are presented and explained.
This study investigated name changes of women authors to determine how they were represented in indexes and cited references and identify problem areas. A secondary purpose of the study was to investigate whether or not indexing services were using authority control and how this influenced the search results. The works of eight library science authors who had published under multiple names were examined. The researchers compared author names as they appeared on title pages of publications versus in four online databases and in bibliographies by checking 380 publications and 1,159 citations. Author names were correctly provided 81.22% of the time in indexing services and 90.94% in citation lists. The lowest accuracy (54.55%) occurred when limiting to publications found in Library Literature. The highest accuracy (94.18%) occurred with works published before a surname changed. Author names in indexes and citations correctly matched names on journal articles more often than for any other type of publication. Indexes and citation style manuals treated author names in multiple ways, often altering names substantially from how they appear on the title page. Recommendations are made for changes in editorial styles by indexing services and by the authors themselves to help alleviate future confusion in author name searching.
Ranking authors is vital for identifying a researcher's impact and standing within a scientific field. There are many different ranking methods (e.g., citations, publications, h-index, Page Rank, and weighted Page Rank), but most of them are topic-independent. This paper proposes topic-dependent ranks based on the combination of a topic model and a weighted Page Rank algorithm. The author-conference-topic (ACT) model was used to extract topic distribution of individual authors. Two ways for combining the ACT model with the Page Rank algorithm are proposed: simple combination (I_PR) or using a topic distribution as a weighted vector for Page Rank (PR_t). Information retrieval was chosen as the test field and representative authors for different topics at different time phases were identified. Principal component analysis (PCA) was applied to analyze the ranking difference between I_PR and PR_t.
Ranking scientific productivity and prestige are often limited to homogeneous networks. These networks are unable to account for the multiple factors that constitute the scholarly communication and reward system. This study proposes a new informetric indicator, P-Rank, for measuring prestige in heterogeneous scholarly networks containing articles, authors, and journals. P-Rank differentiates the weight of each citation based on its citing papers, citing journals, and citing authors. Articles from 16 representative library and information science journals are selected as the dataset. Principle Component Analysis is conducted to examine the relationship between P-Rank and other bibliometric indicators. We also compare the correlation and rank variances between citation counts and P-Rank scores. This work provides a new approach to examining prestige in scholarly communication networks in a more comprehensive and nuanced way.
This study investigated the effectiveness of query expansion using synonymous and co-occurrence tags in users' video searches as well as the effect of visual storyboard surrogates on users' relevance judgments when browsing videos. To do so, we designed a structured folksonomy-based system in which tag queries can be expanded via synonyms or co-occurrence words, based on the use of Word Net 2.1 synonyms and Flickr's related tags. To evaluate the structured folksonomy-based system, we conducted an experiment, the results of which suggest that the mean recall rate in the structured folksonomy-based system is statistically higher than that in a tag-based system without query expansion; however, the mean precision rate in the structured folksonomy-based system is not statistically higher than that in the tag-based system. Next, we compared the precision rates of the proposed system with storyboards (SB), in which SB and text metadata are shown to users when they browse video search results, with those of the proposed system without SB, in which only text metadata are shown. Our result showed that browsing only text surrogates including tags without multimedia surrogates is not sufficient for users' relevance judgments.
Flickr, the large-scale online photo sharing website, is often viewed as one of the 'classic' examples of Web2.0 applications through which researchers are able to observe the social behavior of online communities. One of the main features of Flickr is groups. These provide a means to organize, share and discuss photos of potential interest to group members. This paper explores the scale of group creation on Flickr and proposes a new set of metrics for characterizing groups on Flickr looking at aspects of membership, communication activity, and communication structure. Data collected from a sample of 1,000 groups was used to confirm the metrics and provide new insights into group formation in Flickr, such as the nature of larger and smaller groups.The contributions of the article are as follows: a set of metrics for characterizing online groups that extend existing schemes; an approach for sampling Flickr to estimate the number of groups; new insights into Flickr groups based on results from analyzing 1,000 randomly selected groups; and reflections on our experiences with using publicly accessible, automatically collected data to characterize the types of groups on Flickr.
Influxes of new documents over time necessitate reorganization of document categories that a user has created previously. As documents are available in increasing quantities and accelerating frequencies, the manual approach to reorganizing document categories becomes prohibitively tedious and ineffective, thus making a system-oriented approach appealing. Previous research (Larsen & Aone, 1999; Pantel & Lin, 2002) largely has followed the category-discovery approach, which groups documents by using a document-clustering technique to partition a document corpus. This approach does not consider existing categories a user created previously, which in effect reflect his or her document-grouping preference. A handful of studies (Wei, Hu, & Dong, 2002; Wei, Hu, & Lee, 2009) have taken a category-evolution approach to develop lexicon-based techniques for preserving user preference in document-category reorganizations, but have serious limitations. Responding to the significance of document-category reorganizations and addressing the fundamental problems of salient, lexicon-based techniques, we develop an ontology-based category evolution (ONCE), a technique that first enriches a concept hierarchy by incorporating important concept descriptors (jointly referred to as an ontology) and then employs the resulting enriched ontology to support category evolutions at a concept level rather than analyzing and comparing feature vectors at the lexicon level. We empirically evaluate our proposed technique and compare it with two benchmark techniques: CE2 (a lexicon-based category-evolution technique) and hierarchical agglomerative clustering (HAC; a conventional hierarchical document-clustering technique). Overall, our results show that the ONCE technique is more effective than are CE2 and HAC, across all the scenarios studied. Furthermore, the completeness of a concept hierarchy has important impacts on the performance of the proposed technique. Our results have some important implications for further research.
Online Social Networks (OSNs) facilitate the creation and maintenance of interpersonal online relationships. Unfortunately, the availability of personal data on social networks may unwittingly expose users to numerous privacy risks. As a result, establishing effective methods to control personal data and maintain privacy within these OSNs have become increasingly important. This research extends the current access control mechanisms employed by OSNs to protect private information shared among users of OSNs. The proposed approach presents a system of collaborative content management that relies on an extended notion of a "content stakeholder." A tool, Collaborative Privacy Management (COPE), is implemented as an application within a popular social-networking site, facebook.com, to ensure the protection of shared images generated by users. We present a user study of our CoPE tool through a survey-based study (n = 80). The results demonstrate that regardless of whether Facebook users are worried about their privacy, they like the idea of collaborative privacy management and believe that a tool such as CoPE would be useful to manage their personal information shared within a social network.
In the information age, a common problem for employees is not lack of resources but rather how to sift through multiple resources, both electronic and interpersonal, to retrieve and locate true expert knowledge. The main objective of this study is hence to explore employees' simultaneous usage of both resources and to identify situations where employees showed a clear preference of interpersonal resources over electronic ones, and where employees found these two resources (a) (ir)replaceable and (b) complementary. Both qualitative interview data and quantitative social-network data were collected from a university-affiliated community educational office. Data analysis showed that (a) social relationships were crucial for seeking and gaining actual access to needed knowledge; (b) employees were task-driven in knowledge seeking and obtained different types of knowledge depending on availability; and (c) the choice between interpersonal and electronic resources was determined by the characteristics of the knowledge sought as well as such contextual factors as time, cost, and location. Additional interviews from other study contexts validated most of our findings, except those that require collection of complete social-network data. The article ends with a discussion on how organizations can better leverage their investment in human and technical resources to facilitate knowledge seeking.
Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster-based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high-quality clusters, while there should be no dominantly large clusters. Also, documents from these high-quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework.
This paper reports results from an exploratory study investigating the factors affecting student learning outcomes of information literacy instruction (ILI) given at business schools. Specifically, the potential influence of student demographics, learning environment factors, and information literacy program components on behavioral, psychological, and benefit outcomes were examined. In total, 79 interviews with library administrators, librarians, teaching faculty, and students were conducted at three business schools with varying ILI emphases and characteristics. During these interviews, participants discussed students' ILI experiences and the outcomes arising from those experiences. Data collection also involved application of a standardized information literacy testing instrument that measures student information literacy competency. Analysis yielded the generation of a new holistic theoretical model based on information literacy and educational assessment theories. The model identifies potential salient factors of the learning environment, information literacy program components, and student demographics that may affect ILI student learning outcomes. Recommendations for practice and implications for future research are also made.
This comparative case study explores the impact of four influential practitioner-generated theories from the 1970s to the present in the separate domains of finance, military strategy, nursing, and theology, and it discusses why bibliometric research tends to overlook such "invisible" theories from practice, despite their increasing importance in many areas.The concept of the "practice field" as a site for not only the reception of theories into practice but also the production of practice theories themselves may prove useful.
The study on information science (IS) by Bates (2007) is an important contribution to the literature on browsing in IS. It is explicitly based on "behavioural science." I use this article as the point of departure for demonstrating how more social and interpretative understandings may provide fruitful improvements for research in information seeking, browsing, and related phenomena. It is part of my ongoing publication of articles about philosophical issues in IS and it is intended to be accompanied by analyses of other examples of contributions to core issues in IS. Although it is mainly formulated as a discussion based on a specific paper, it should be seen as part of a general discussion of the philosophical foundation of IS and as support for the emerging social paradigm in this field. The article argues that human browsing should not be conceptualized primarily in biological terms and should not be understood as random exploratory processes, but rather it should be seen as a kind of orienting strategy governed by people's metatheories or "paradigms." Information professionals should know how different metatheories are distributed in the information ecology and, thus, be able to help people developing fruitful browsing strategies.
The digital divide continues to challenge political and academic circles worldwide. A range of policy solutions is briefly evaluated, from laissez-faire on the right to "arithmetic" egalitarianism on the left. The article recasts the digital divide as a problem for the social distribution of presumptively important information (e.g., electoral data, news, science) within postindustrial society. Endorsing in general terms the left-liberal approach of differential or "geometric" egalitarianism, it seeks to invest this with greater precision, and therefore utility, by means of a possibly original synthesis of the ideas of John Rawls and R. H. Tawney. It is argued that, once certain categories of information are accorded the status of "primary goods," their distribution must then comply with principles of justice as articulated by those major 20th century exponents of ethical social democracy.The resultant Rawls-Tawney theorem, if valid, might augment the portfolio of options for interventionist information policy in the 21st century.
The three important research domains, nanotechnology, biotechnology and pharmaceuticals, integratedly breed a promising multidisciplinary domain in the post-genomic age, which was recently defined by the term "nanobiopharmaceuticals". In this paper, we firstly investigate its general development profiles, and then implement cross-country comparisons in its research performances, with the focus on the world share, relative research effort, impact and quality of five productive countries. Furthermore, from the science mapping perspective, we build the co-word and co-citation networks respectively for detecting its intellectual structure as well as evolution footprints of intellectual turning points. The growth examinations based on the datasets from WoS, MEDLINE and BIOSIS Review confirm the exponential growth of publications and citations in nanobiopharm-research. The cross-country comparisons show that USA is the leading country, and China is an up-and-coming contributor. The visual mapping structures by co-occurrence analyses show that nanobiopharm-research is currently focused on the drug development for improving biodistribution, bioavailability and pharmacokinetics, and the drug delivery for improving delivery of existing drugs. Some pivot publications is identified by CiteSpace, which work as structural holes, research fronts and intellectual bases for the nanobiopharm-research development in the given time window. (C) 2010 Elsevier Ltd. All rights reserved.
The popular h-index used to measure scientific output can be described in terms of a pool of evaluated objects (the papers), a quality function on the evaluated objects (the number of citations received by each paper) and a sentencing line crossing the origin, whose intersection with the graph of the quality function yields the index value (in the h-index this is a line with slope 1). Based on this abstraction, we present a new index, the c-index, in which the evaluated objects are the citations received by an author, a group of authors, a journal, etc., the quality function of a citation is the collaboration distance between the authors of the cited and the citing papers, and the sentencing line can take slopes between 0 and infinity. As a result, the new index counts only those citations which are significant enough, where significance is proportional to collaboration distance. Several advantages of the new c-index with respect to previous proposals are discussed. (C) 2010 Elsevier Ltd. All rights reserved.
As part of its program of 'Excellence in Research for Australia' (ERA), the Australian Research Council ranked journals into four categories (A*, A, B, and C) in preparation for their performance evaluation of Australian universities. The ranking is important because it likely to have a major impact on publication choices and research dissemination in Australia. The ranking is problematic because it is evident that some disciplines have been treated very differently than others. This paper reveals weaknesses in the ERA journal ranking and highlights the poor correlation between ERA rankings and other acknowledged metrics of journal standing. It highlights the need for a reasonable representation of journals ranked as A* in each scientific discipline. (C) 2010 Elsevier Ltd. All rights reserved.
In December 2003, seventeen years after the first UK research assessment exercise, Italy started up its first-ever national research evaluation, with the aim to evaluate, using the peer review method, the excellence of the national research production. The evaluation involved 20 disciplinary areas, 102 research structures, 18,500 research products and 6661 peer reviewers (1465 from abroad); it had a direct cost of 3.55 millions Euros and a time length spanning over 18 months. The introduction of ratings based on ex post quality of output and not on ex ante respect for parameters and compliance is an important leap forward of the national research evaluation system toward meritocracy. From the bibliometric perspective, the national assessment offered the unprecedented opportunity to perform a large-scale comparison of peer review and bibliometric indicators for an important share of the Italian research production. The present investigation takes full advantage of this opportunity to test whether peer review judgements and (article and journal) bibliometric indicators are independent variables and, in the negative case, to measure the sign and strength of the association. Outcomes allow us to advocate the use of bibliometric evaluation, suitably integrated with expert review, for the forthcoming national assessment exercises, with the goal of shifting from the assessment of research excellence to the evaluation of average research performance without significant increase of expenses. (C) 2010 Elsevier Ltd. All rights reserved.
Recitation patterns for individual publications are examined and mathematically modeled by focusing on the origin of citations (citers) using data from Thomson Reuters Web of Science. The authors outline oeuvre citation exhaustivity, or the practice of citing across the body of work of an author, and model the resulting frequency distributions to identify research disciples and admirers. A Geeta distribution provided the best fits for both recitation frequency and oeuvre citation exhaustivity distributions. The findings provide a novel way to identify the influence of an author based on citer analysis. (C) 2011 Elsevier Ltd. All rights reserved.
We propose a geometric interpretation to the ranking of patent assignees by their h-indices as indicating the relative positions of their rank-citation curves. We then propose two shape descriptors characterizing the rank-citation curves over the h-cores and h-tails, respectively. Together with the h-indices, the shape descriptors help verifying the geometric relationship among rank-citation curves and the relative performance among the assignees' h-cores and h-tails. The geometric interpretation and shape descriptors are proven by empirical data to be reliable, accurate, robust, flexible, and insightful, and their application could be extended to research performance evaluation as well. Crown Copyright (C) 2011 Published by Elsevier Ltd. All rights reserved.
Recognizing emotion is extremely important for a text-based communication tool such as a blog. On commercial blogs, the evaluation comments by bloggers of a product can spread at an explosive rate in cyberspace, and negative comments could be very harmful to an enterprise. Lately, researchers have been paying much attention to sentiment classification. The goal is to efficiently identify the emotions of their customers to allow companies to respond in the appropriate manner to what customers have to say. Semantic orientation indexes and machine learning methods are usually employed to achieve this goal. Semantic orientation indexes do not have good performance, but they return results quickly. Machine learning techniques provide better classification accuracy, but require a lot of training time. In order to combine the advantages of these two methods, this study proposed a neural-network based approach. It uses semantic orientation indexes as inputs for the neural networks to determine the sentiments of the bloggers quickly and effectively. Several actual blogs are used to evaluate the effectiveness of our approach. The experimental results indicate that the proposed approach outperforms traditional approaches including other neural networks and several semantic orientation indexes. (C) 2011 Elsevier Ltd. All rights reserved.
This paper reports on an investigation into the scholarly impact of the TRECVid (Text Retrieval and Evaluation Conference, Video Retrieval Evaluation) benchmarking conferences between 2003 and 2009. The contribution of TRECVid to research in video retrieval is assessed by analyzing publication content to show the development of techniques and approaches over time and by analyzing publication impact through publication numbers and citation analysis. Popular conference and journal venues for TRECVid publications are identified in terms of number of citations received. For a selection of participants at different career stages, the relative importance of TRECVid publications in terms of citations vis h vis their other publications is investigated. TRECVid, as an evaluation conference, provides data on which research teams 'scored' highly against the evaluation criteria and the relationship between 'top scoring' teams at TRECVid and the 'top scoring' papers in terms of citations is analyzed. A strong relationship was found between 'success' at TRECVid and 'success' at citations both for high scoring and low scoring teams. The implications of the study in terms of the value of TRECVid as a research activity, and the value of bibliometric analysis as a research evaluation tool, are discussed.
Numerous studies have shown that female scientists tend to publish significantly fewer publications than do their male colleagues. In this study, we have analyzed whether similar differences also can be found in terms of citation rates. Based on a large-scale study of 8,500 Norwegian researchers and more than 37,000 publications covering all areas of knowledge, we conclude that the publications of female researchers are less cited than are those of men, although the differences are not large. The gender differences in citation rates can be attributed to differences in productivity. There is a cumulative advantage effect of increasing publication output on citation rates. Since the women in our study publish significantly fewer publications than do men, they benefit less from this effect. The study also provides results on how publication and citation rates vary according to scientific position, age, and discipline.
Secondary journals such as Evidence-Based Medicine, ACP Journal Club, and Evidence-Based Nursing summarize, from over 150 clinical journals, articles that pass criteria for scientific merit, clinical relevance, and interest to practicing clinicians. We performed a retrospective cohort study to validate the selection process used to produce the secondary journals by calculating the 2007 impact factors for these journals using articles that were abstracted and originally published in 2005-2006. The 'impact factors' for the secondary journals were calculated using 2007 citations to included articles. These were compared to the published impact factors and mean citations of the source journals. 2005/2006 articles in the secondary journals were originally published in 82 journals with ISI impact factors (median 4.1, range 0.85-52.9). The calculated impact factors for the secondary journals were 39.5 for ACP Journal Club, 30.2 for Evidence-Based Medicine, and 9.3 for Evidence-Based Nursing. Limitations include articles coming from only 150 journal titles and the inclusion of these articles may in fact stimulate citations. We conclude that evidence-based secondary journals include articles at the time of publication that go on to garner more citations on average than other articles in the source publications.
An innovative model to measure the influence among scientific journals is developed in this study. This model is based on the path analysis of a journal citation network, and its output is a journal influence matrix that describes the directed influence among all journals. Based on this model, an index of journals' overall influence, the quality-structure index (QSI), is derived. Journal ranking based on QSI has the advantage of accounting for both intrinsic journal quality and the structural position of a journal in a citation network. The QSI also integrates the characteristics of two prevailing streams of journal-assessment measures: those based on biblio-metric statistics to approximate intrinsic journal quality, such as the Journal Impact Factor, and those using a journal's structural position based on the Page Rank-type of algorithm, such as the Eigenfactor score. Empirical results support our finding that the new index is significantly closer to scholars' subjective perception of journal influence than are the two aforementioned measures. In addition, the journal influence matrix offers a new way to measure two-way influences between any two academic journals, hence establishing a theoretical basis for future scientometrics studies to investigate the knowledge flow within and across research disciplines.
How can citation analysis take into account the highly collaborative nature and unique research and publication culture of biomedical research fields? This study explores this question by introducing last-author citation counting and comparing it with traditional first-author counting and theoretically optimal all-author counting in the stem cell research field for the years 2004-2009. For citation ranking, last-author counting, which is directly supported by Scopus but not by ISI databases, appears to approximate all-author counting quite well in a field where heads of research labs are traditionally listed as last authors; however, first author counting does not. For field mapping, we find that author co-citation analyses based on different counting methods all produce similar overall intellectual structures of a research field, but detailed structures and minor specialties revealed differ to various degrees and thus require great caution to interpret. This is true especially when authors are selected into the analysis based on citedness, because author selection is found to have a greater effect on mapping results than does choice of co-citation counting method. Findings are based on a comprehensive, high-quality dataset extracted in several steps from PubMed and Scopus and subjected to automatic reference and author name disambiguation.
This paper proposes a methodology which discriminates the articles by the target authors ("true" articles) from those by other homonymous authors ("false" articles). Author name searches for 2,595 "source" authors in six subject fields retrieved about 629,000 articles. In order to extract true articles from the large amount of the retrieved articles, including many false ones, two filtering stages were applied. At the first stage any retrieved article was eliminated as false if either its affiliation addresses had little similarity to those of its source article or there was no citation relationship between the journal of the retrieved article and that of its source article. At the second stage, a sample of retrieved articles was subjected to manual judgment, and utilizing the judgment results, discrimination functions based on logistic regression were defined. These discrimination functions demonstrated both the recall ratio and the precision of about 95% and the accuracy (correct answer ratio) of 90-95%. Existence of common coauthor(s), address similarity, title words similarity, and interjournal citation relationships between the retrieved and source articles were found to be the effective discrimination predictors. Whether or not the source author was from a specific country was also one of the important predictors. Furthermore, it was shown that a retrieved article is almost certainly true if it was cited by, or cocited with, its source article. The method proposed in this study would be effective when dealing with a large number of articles whose subject fields and affiliation addresses vary widely.
Existing methods for automatically analyzing search logs describe search behavior on the basis of syntactic differences (overlapping terms) between queries. Although these analyses provide valuable insights into the complexity and successfulness of search interactions, they offer a limited interpretation of the observed searching behavior, as they do not consider the semantics of users' queries. In this article we propose a method to exploit semantic information in the form of linked data to enrich search queries so as to determine the semantic types of the queries and the relations between queries that are consecutively entered in a search session. This work provides also an in-depth analysis of the search logs of professional users searching a commercial picture portal. Compared to previous image search log analyses, in particular those of professional users, we consider a much larger dataset. We analyze the logs both in a syntactic way and using the proposed semantic approach and compare the results. Our findings show the benefits of using semantics for search log analysis: the identified types of query modifications cannot be appropriately analyzed by only considering term overlap, since queries related in the most frequent ways do not usually share terms.
Caching of query results is an important mechanism for efficiency and scalability of web search engines. Query results are cached and presented in terms of pages, which typically include 10 results each. In navigational queries, users seek a particular website, which would be typically listed at the top ranks (maybe, first or second) by the search engine, if found. For this type of query, caching and presenting results in the 10-per-page manner may waste cache space and network bandwidth. In this article, we propose nonuniform result page models with varying numbers of results for navigational queries. The experimental results show that our approach reduces the cache miss count by up to 9.17% (because of better utilization of cache space). Furthermore, bandwidth usage, which is measured in terms of number of snippets sent, is also reduced by 71% for navigational queries. This means a considerable reduction in the number of transmitted network packets, i.e., a crucial gain especially for mobile-search scenarios. A user study reveals that users easily adapt to the proposed result page model and that the efficiency gains observed in the experiments can be carried over to real-life situations.
Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition.
Social technologies tend to attract research on social structure or interaction. In this paper we analyze the individual use of a social technology, specifically an enterprise people-tagging application. We focus on active participants of the system and distinguish between users who initiate activity and those who respond to activity. This distinction is situated within the preferential attachment theory in order to examine which type of participant contributes more to the process of tagging. We analyze the usage of the people-tagging application in a snapshot representing 3 years of activity, focusing on self-tagging compared to tagging by and of others. The main findings are: (1) People who tag themselves are the most productive contributors to the system. (2) Preferential attachment saturation is reached at 12-14 tags per user. (3) The nature of participation is more significant than the number of participants for system growth. The paper concludes with theoretical and practical implications.
Pseudo-relevance feedback (PRF) via query expansion (QE) assumes that the top-ranked documents from the first-pass retrieval are relevant. The most informative terms in the pseudo-relevant feedback documents are then used to update the original query representation in order to boost the retrieval performance. Most current PRF approaches estimate the importance of the candidate expansion terms based on their statistics on document level. However, a document for PRF may consist of different topics, which may not be all related to the query even if the document is judged relevant. The main argument of this article is the proposal to conduct PRF on a granularity smaller than on the document level. In this article, we propose a topic-based feedback model with three different strategies for finding a good query-related topic based on the Latent Dirichlet Allocation model. The experimental results on four representative TREC collections show that QE based on the derived topic achieves statistically significant improvements over a strong feedback model in the language modeling framework, which updates the query representation based on the top-ranked documents.
This paper reports on a transaction log analysis of the type and topic of search queries entered into the search engine Google (Australia). Two aspects, in particular, set this apart from previous studies: the sampling and analysis take account of the distribution of search queries, and lifestyle information of the searcher was matched with each search query. A surprising finding was that there was no observed statistically significant difference in search type or topics for different segments of the online population. It was found that queries about popular culture and Ecommerce accounted for almost half of all search engine queries and that half of the queries were entered with a particular Website in mind. The findings of this study also suggest that the Internet search engine is not only an interface to information or a shortcut to Websites, it is equally a site of leisure. This study has implications for the design and evaluation of search engines as well as our understanding of search engine use.
Over the last 7 years, the AIMTech Research Group in the University of Leeds has used cultural-historical activity theory (CHAT) to inform a range of research activities in the fields of information behavior and information systems. In this article, we identify certain openings and theoretical challenges in the field of information behavior, which sparked our initial interest in CHAT: context, technology, and the link between practice and policy. We demonstrate the relevance of CHAT in studying information behavior and addressing the identified openings and argue that by providing a framework and hierarchy of activity-action-operation and semantic tools, CHAT is able to overcome many of the uncertainties concerning information behavior research. In particular, CHAT provides researchers a theoretical lens to account for context and activity mediation and, by doing so, can increase the significance of information behavior research to practice. In undertaking this endeavour, we have relied on literature from the fields of information science and others where CHAT is employed. We provide a detailed description of how CHAT may be applied to information behavior and account for the concepts we see as relevant to its study.
Developed in the early 1980s-well before Internet and web-based technologies had arrived-Taylor's Value-Added Model introduced what is now better known as the human-actors' needs perspective on information systems/information technology (IS/IT) artifacts. Taylor distinguished six top-level criteria that mattered most to human actors when using IS/IT artifacts. We develop this approach further and present the TEDS framework as an analytical instrument for actor- and utilization-specific evaluation of IS/IT artifacts as well as a practical tool for moderating and formulating design specifications. We use the empirical case of a comprehensive comparative professional sports team web site evaluation project to illustrate the power and versatility of the extended analytical framework.
Interdisciplinary research is expected to contribute to industrial and economic development. However, due to expansion of knowledge and the fragmentation of research fields, knowledge dissemination among different research fields is rare and we need a methodology for measuring such dissemination and promoting it. In this paper, we introduce a citation lag analysis of inter- and intra-clusters extracted by citation network analysis as a new indicator to represent the speed of knowledge diffusion in subfields of a research field. A case study was performed within supply chain research to investigate knowledge integration among its subfields. Based on the analysis, we discuss knowledge structure and reciprocal influence of subfields in supply chain research. This study contributes to offering a new approach for analyzing and understanding the development of boundary spanning research.
We present a computer model of opinion changes in a scientific community. The study takes into account two mechanisms of opinion formation for individual scientists: influence of coworkers with whom there is direct interaction and cumulative influence of the subject literature. We analyze the evolution of relative popularity of different competing theories, depending on their accuracy in describing observed phenomena and on current social support of the theory. We include such aspects as finite lifetime of publication impact and tendency to 'defend' one's own opinions, especially if they were already published. A special class of publications, delivering crucial observational or experimental data, which may revolutionize the scientific worldview is considered. The goal of the model is to discover which conditions lead to quick domination of one theory over others, or, conversely, in which situations one may expect several explanations to co-exist.
This study aims to investigate the influence of different patterns of collaboration on the citation impact of Harvard University's publications. Those documents published by researchers affiliated with Harvard University in WoS from 2000-2009, constituted the population of the research which was counted for 124,937 records. Based on the results, only 12% of Harvard publications were single author publications. Different patterns of collaboration were investigated in different subject fields. In all 22 examined fields, the number of co-authored publications is much higher than single author publications. In fact, more than 60% of all publications in each field are multi-author publications. Also, the normalized citation per paper for co-authored publications is higher than that of single author publications in all fields. In addition, the largest number of publications in all 22 fields were also published through inter-institutional collaboration and were as a result of collaboration among domestic researchers and not international ones. In general, the results of the study showed that there was a significant positive correlation between the number of authors and the number of citations in Harvard publications. In addition, publications with more number of institutions have received more number of citations, whereas publications with more number of foreign collaborators were not much highly cited.
Many researchers have analyzed e-government literature as a whole or a specific area to focus on statistical methodologies, lessons learnt, or problem related to the area. However, no investigation from socio-technical perspective on e-government issues, in developing countries (DCs), has been carried out. Utilizing scientometrics approach, we analyzed and synthesized e-government (EG) literature that deals with the issues/topics in developing countries from the lens of socio-technical theory (STT). 145 articles from 7 core e-government journals published during the last decade were selected and reviewed for analyzing e-government literature related to developing countries. The growth pattern of e-government literature showed that e-government studies pertaining developing countries issues/topics have rapidly increased during the last decade; covering a range of topics/issues studied from socio-technical aspects. We found that e-government literature in developing countries has somewhat adopted a balanced approach and is moving away from a merely theoretical or conceptual bases toward an empirical foundation; however, the literature lacked depth and balance in terms of issues/topics discussed and methodologies applied. In the light of the findings, strengths, limitations, and future directions for e-government research in developing countries are discussed.
We present a mathematical derivation of the scale-dependence of the h-index. This formula can be used in two cases: one where the units are scale-dependent and one where the units are not scale-dependent. Examples are given.
The aim of peer review is to separate the wheat from the chaff for publication and research funding. In the excessive competition, this mechanism would only select the wheat of mainstream. Up to now, almost all discussions on the consequence of the short-comings of peer review are limited to qualitatively description. I propose a model of "peer-group-assessed-grant-based-funding-system" combined with tenure system and over-competitive research funding review process. It is the first on the quantitatively investigation which dramatizes the current short-comings of the process. My simulation shows that it takes about two or three generations of researchers for the mainstream of a complicated research topic obtaining monopoly supremacy, with only the aid of the mechanism the model described. Based on the computation results, suggestions are proposed to avoid loss of self-correction capability on popularity determined single research direction which could be wrong on very complicated research topics.
Data from 1,581 faculty members affiliated with 98 doctoral-granting Communication programs in the United States were analyzed to determine normative publication rates and predictors of position centrality in the faculty hiring network. The Communication Institute for Online Scholarship (CIOS) database was used to measure publication frequency in refereed journals. Position centrality was measured using a Communication program's relative position in the hiring network as established by Barnett, Danowski, Feeley, and Stalker (2010). The average publication frequencies by academic rank were as follows: assistant professors averaged 2.29 articles (N = 441, SD = 3.29); associate professors averaged 6.69 articles (N = 497, SD = 5.77); professors averaged 10.92 articles (N = 542, SD = 12.09). Results from multiple regression analyses indicate the number of publications for faculty members and position centrality of where one earned his or her doctoral degree significantly predicted current position centrality. Publication numbers for one's advisor and year of earned doctorate did not emerge as significant predictors of position centrality.
This paper examines co-authorship of research articles in Thomson Reuters citation indexes in order to assess knowledge co-production in selected sub-Saharan African countries. Two indicators, namely publications and citations, were analysed to establish the patterns of knowledge co-production and its scientific impact, respectively. The study found that knowledge production through collaborative research among sub-Saharan African countries is minimal and contributes only a small percentage when compared to collaboration between sub-Saharan African countries and their foreign counterparts. Similarly, the scientific impact of international collaboration was higher than that of continental collaboration. Countries belonging to the same geographic region contributed to each other's knowledge production more frequently than they did to the countries outside their region. It is recommended that, for knowledge co-production in sub-Saharan Africa to improve, various measures such as encouraging student and staff exchange, hosting more regional conferences and encouraging research networks need to be put in place.
A review of 649 PhDs undertaken by Swedish nurses and midwives found no evidence that they stop publishing in English after their PhD. The proportion of 70% for any publication in English was similar to that of MDs. A higher proportion of male than female nurses were high publishers of six or more (52% vs. 23%) and eight or more papers (44% vs. 14%) in a 5 year period. The standard of the PhDs of Swedish nurses was comparable to those of other biomedical PhDs and was consistent in pattern over the past two decades. The gender pattern of external examiners of female nurses evolved in that 1992-94, 75% were men, during 1996-97, 54% were men and from 2000 onwards 46% were men. Nurses were examined by foreign examiners in 20% of examinations. They came primarily from Norway and USA.
This study investigates the incidence of self-citation (authors citing their own work) for scholarly articles in ten journals published by the American Physiological Society. We analysed authorship and referencing practices of all original research articles published in the first ordinary issue of each journal in both 2000 and 2010, comprising 271 and 212 articles, respectively. Self-citation is common in these journals and represents a total of 17.75% of all citations. Only 9 (1.86%) of the articles analysed did not self-cite. Author position significantly influenced the rate of self-citation with last authors being self-cited significantly more than any other author. This was likely a result of the cumulative nature of scientific research within a specific discipline and the necessary desire to promote ones own work for associated academic benefit. The country in which the work was conducted also influenced the rate of self-citation, with last authors based in North America self-citing more than last authors from Asian countries. A comparison of self-citation rates between decades (2000 and 2010) revealed an increase in the number of authors and number of citations per article between 2000 and 2010, however the mean percentage of self-cited articles did not differ between the years. Finally, there were no differences in the percentage of self-citation between the different journals analysed.
Measuring the intellectual diversity encoded in publication records as a proxy to the degree of interdisciplinarity has recently received considerable attention in the science mapping community. The present paper draws upon the use of the Stirling index as a diversity measure applied to a network model (customized science map) of research profiles, proposed by several authors. A modified version of the index is used and compared with the previous versions on a sample data set in order to rank top Hungarian research organizations (HROs) according to their research performance diversity. Results, unexpected in several respects, show that the modified index is a candidate for measuring the degree of polarization of a research profile. The study also points towards a possible typology of publication portfolios that instantiate different types of diversity.
It is proposed that citation contexts, the text surrounding references in scientific papers, be analyzed in terms of an expanded notion of sentiment, defined to include attitudes and dispositions toward the cited work. Maps of science at both the specialty and global levels are used as the basis of this analysis. Citation context samples are taken at these levels and contrasted for the appearance of cue word sets, analyzed with the aid of methods from corpus linguistics. Sentiments are shown to vary within a specialty and can be understood in terms of cognitive and social factors. Within-specialty and between-specialty co-citations are contrasted and in some cases suggest a correlation of sentiment with structural location. For example, the sentiment of "uncertainty" is important in interdisciplinary co-citation links, while "utility" is more prevalent within the specialty. Suggestions are made for linking sentiments to technical terms, and for developing sentiment "baselines" for all of science.
This paper describes and analyses the role played in the development of bibliometric indicators-and the use made of bibliometric indicators for policy purposes-by the European Commission's Directorate-General Research in the period 1990-2005.
Research policies in the more developed nations are ever more oriented towards the introduction of productivity incentives and competition mechanisms intended to increase efficiency in research institutions. Assessments of the effects of these policy interventions on public research activity often neglect the normal, inherent variation in the performance of research institutions over time. In this work, we propose a cross-time bibliometric analysis of research performance by all Italian universities in two consecutive periods (2001-2003 and 2004-2008) not affected by national policy interventions. Findings show that productivity and impact increased at the level of individual scientists. At the level of university, significant variation in the rank was observed.
Two paradigmatic approaches to the normalisation of citation-impact measures are discussed. The results of the mathematical manipulation of standard indicators such as citation means, notably journal Impact Factors, (called a posteriori normalisation) are compared with citation measures obtained from fractional citation counting (called a priori normalisation). The distributions of two subfields of the life sciences and mathematics are chosen for the analysis. It is shown that both methods provide indicators that are useful tools for the comparative assessment of journal citation impact.
Flattery citations of editors, potential referees, and so on have been claimed to be a common strategy among academic authors. From a sociology of science perspective as well as from a citation analytical perspective, it is both an interesting claim and a consequential one. The article presents a citation analysis of the editorial board members entering the American Economic Review from 1984 to 2004 using a citation window of 11 years. To test the flattery citation hypothesis further, we have conducted a study applying the difference-in-differences estimator. We analyze the number of times the editors and editorial board members of the American Economic Review were cited in articles published in the journal itself as well as in a pool of documents comprising articles from the Journal of Political Economy and the Quarterly Journal of Economics. The results of the analyses do not support the existence of a flattery citation effect.
This article employs citation analysis on a micro level- the level of the cited document; in this case, Walter Benjamin's Illuminations (1968/2007). The study shows how this frequently cited publication-more than 4,000 citations in Web of Science-has been received. The growth of citations and interdisciplinary citing is studied, and a novel approach-page citation analysis-is applied to study how different parts of Illuminations have been cited. The article demonstrates how bibliometric methods can be used together with qualitative accounts to map the impact and dissemination of a particular publication. Furthermore, it shows how bibliometric methods can be utilized to study intellectual structures in the humanities, and highlights the influence of the humanities on the social sciences and sciences.
Studying scientific collaboration using coauthorship networks has attracted much attention in recent years. How and in what context two authors collaborate remain among the major questions. Previous studies, however, have focused on either exploring the global topology of coauthorship networks (macro perspective) or ranking the impact of individual authors (micro perspective). Neither of them has provided information on the context of the collaboration between two specific authors, which may potentially imply rich socioeconomic, disciplinary, and institutional information on collaboration. Different from the macro perspective and micro perspective, this article proposes a novel method (meso perspective) to analyze scientific collaboration, in which a contextual subgraph is extracted as the unit of analysis. A contextual subgraph is defined as a small subgraph of a large-scale coauthorship network that captures relationship and context between two coauthors. This method is applied to the field of library and information science. Topological properties of all the subgraphs in four time spans are investigated, including size, average degree, clustering coefficient, and network centralization. Results show that contextual subgprahs capture useful contextual information on two authors' collaboration.
Grasping the fruits of "emerging technologies" is an objective of many government priority programs in a knowledge-based and globalizing economy. We use the publication records (in the Science Citation Index) of two emerging technologies to study the mechanisms of diffusion in the case of two innovation trajectories: small interference RNA(siRNA) and nanocrystalline solar cells (NCSC). Methods for analyzing and visualizing geographical and cognitive diffusion are specified as indicators of different dynamics. Geographical diffusion is illustrated with overlays to Google Maps; cognitive diffusion is mapped using an overlay to a map based on the ISI subject categories. The evolving geographical networks show both preferential attachment and small-world characteristics. The strength of preferential attachment decreases over time while the network evolves into an oligopolistic control structure with small-world characteristics. The transition from disciplinary-oriented ("Mode 1") to transfer-oriented ("Mode 2") research is suggested as the crucial difference in explaining the different rates of diffusion between siRNA and NCSC.
This article proposes a novel application of a statistical language model to opinionated document retrieval targeting weblogs (blogs). In particular, we explore the use of the trigger model-originally developed for incorporating distant word dependencies-in order to model the characteristics of personal opinions that cannot be properly modeled by standard n-grams. Our primary assumption is that there are two constituents to form a subjective opinion. One is the subject of the opinion or the object that the opinion is about, and the other is a subjective expression; the former is regarded as a triggering word and the latter as a triggered word. We automatically identify those subjective trigger patterns to build a language model from a corpus of product customer reviews. Experimental results on the Text Retrieval Conference Blog track test collections show that, when used for reranking initial search results, our proposed model significantly improves opinionated document retrieval. In addition, we report on an experiment on dynamic adaptation of the model to a given query, which is found effective for most of the difficult queries categorized under politics and organizations. We also demonstrate that, without any modification to the proposed model itself, it can be effectively applied to polarized opinion retrieval.
This study explores, in 3 steps, how the 3 main library classification systems, the Library of Congress Classification, the Dewey Decimal Classification, and the Universal Decimal Classification, cover human knowledge. First, we mapped the knowledge covered by the 3 systems. We used the "10 Pillars of Knowledge: Map of Human Knowledge," which comprises 10 pillars, as an evaluative model. We mapped all the subject-based classes and subclasses that are part of the first 2 levels of the 3 hierarchical structures. Then, we zoomed into each of the 10 pillars and analyzed how the three systems cover the 10 knowledge domains. Finally, we focused on the 3 library systems. Based on the way each one of them covers the 10 knowledge domains, it is evident that they failed to adequately and systematically present contemporary human knowledge. They are unsystematic and biased, and, at the top 2 levels of the hierarchical structures, they are incomplete.
Twitter, Facebook, and other related systems that we call social awareness streams are rapidly changing the information and communication dynamics of our society. These systems, where hundreds of millions of users share short messages in real time, expose the aggregate interests and attention of global and local communities. In particular, emerging temporal trends in these systems, especially those related to a single geographic area, are a significant and revealing source of information for, and about, a local community. This study makes two essential contributions for interpreting emerging temporal trends in these information systems. First, based on a large dataset of Twitter messages from one geographic area, we develop a taxonomy of the trends present in the data. Second, we identify important dimensions according to which trends can be categorized, as well as the key distinguishing features of trends that can be derived from their associated messages. We quantitatively examine the computed features for different categories of trends, and establish that significant differences can be detected across categories. Our study advances the understanding of trends on Twitter and other social awareness streams, which will enable powerful applications and activities, including user-driven real-time information services for local communities.
Web data repositories usually contain references to thousands of real-world entities from multiple sources. It is not uncommon that multiple entities share the same label (polysemes) and that distinct label variations are associated with the same entity (synonyms), which frequently leads to ambiguous interpretations. Further, spelling variants, acronyms, abbreviated forms, and misspellings compound to worsen the problem. Solving this problem requires identifying which labels correspond to the same real-world entity, a process known as entity resolution. One approach to solve the entity resolution problem is to associate an authority identifier and a list of variant forms with each entity-a data structure known as an authority file. In this work, we propose a generic framework for implementing a method for generating authority files. Our method uses information from the Web to improve the quality of the authority file and, because of that, is referred to as WER-Web-based Entity Resolution. Our contribution here is threefold: (a) we discuss how to implement the WER framework, which is flexible and easy to adapt to new domains; (b) we run extended experimentation with our WER framework to show that it outperforms selected baselines; and (c) we compare the results of a specialized solution for author name resolution with those produced by the generic WER framework, and show that the WER results remain competitive.
The theory-driven Electronic Health Information for Life-Long Learners via Collaborative Learning (eHILLL-CL) intervention, developed and tested in public libraries, aims to improve older adults' e-health literacy. A total of 172 older adults participated in this study from August 2009 to June 2010. Significant differences were found from pretest to posttest in general computer/Web knowledge and skill gains and in e-health literacy (p < 0.001 in all cases; effect sizes: 0.5-2.1; statistical power: 1.00 even at the 0.01 level) and three attitude measures (p < 0.05) for both computer anxiety and attitudes toward the aging experience in physical change, and p < 0.01 for attitude toward the CL method; effect sizes: 0.2-0.3; statistical power: 0.4-0.8, at the 0.05 level). No significant difference was found in other variables. Participants were highly positive about the intervention and reported positive changes in health-related behavior and decision making. Group composition (based on gender, prior familiarity with peers, or prior computer experience) showed no significant impact on CL outcomes. These findings contribute to the CL and health literacy literatures and infer that CL can be a useful method for improving older adults' e-health literacy when using the specific strategies developed for this study, which suggests that social interdependence theory can be generalized beyond the younger population and formal educational settings.
Information technology (IT) provides resources with which human actors can change the patterns of social action in which they participate. Studies of genre change have been among those that have focused on such change. Those studies, though, have tended not to focus on creative genres. This study is one in a series of studies meant to examine IT use in literature and the arts. The current study mediates between smaller, qualitative studies and future, quantitative studies with larger sample sizes. The results demonstrate that the conceptual framework developed in earlier studies is essentially valid, but needs to be modified for future studies. The pilot survey approach paid off both in modifications to the conceptual framework under development and in refinement of the research questions for future iterations.
Informational cities are prototypical cities of the knowledge society. If they are informational world cities, they are new centers of power. According to Manuel Castells (1989), in those cities space of flows (flows of money, power, and information) tend to override space of places. Information and communication technology infrastructures, cognitive infrastructures (as groundwork of knowledge cities and creative cities), and city-level knowledge management are of great importance. Digital libraries provide access to the global explicit knowledge. The informational city consists of creative clusters and spaces for personal contacts to stimulate sharing of implicit information. In such cities, we can observe job polarization in favor of well-trained employees. The corporate structure of informational cities is made up of financial services, knowledge-intensive high-tech industrial enterprises, companies of the information economy, and further creative and knowledge-intensive service enterprises. Weak location factors are facilities for culture, recreational activities, and consumption. Political willingness to create an informational city and e-governance activities are crucial aspects for the development of such cities. This conceptual article frames indicators which are able to mark the degree of "informativeness" of a city. Finally, based upon findings of network economy, we try to explain why certain cities master the transition to informational cities and others (lagging to relative insignificance) do not. The article connects findings of information science and of urbanistics and urban planning.
Our objective was to determine the prevalence of the term preembryo in the scientific literature using a bibliometric study in the Web of Science database. We retrieved data from the Web of Science from 1986 to 2005, covering a range of 20 years since the term was first published. Searches for the terms embryo, blastocyst, preimplantation embryo, and preembryo were performed. Then, Boolean operators were applied to measure associations between terms. Finally, statistical assessments were made to compare the use of each term in the scientific literature, and in specific areas where preembryo is most used. From a total of 93,019 registers, 90,888 corresponded to embryo; 8,366 to blastocyst; 2,397 to preimplantation embryo; and 172 to preembryo. The use frequency for preembryo was 2:1000. The term preembryo showed a lower cumulative impact factor (343) in comparison with the others (25,448; 5,530; and 546; respectively) in the highest scored journal category. We conclude that the term preembryo is not used in the scientific community, probably because it is confusing or inadequate. The authors suggest that its use in the scientific literature should be avoided in future publications. The bibliometric analysis confirms this statement. While preembryo hardly ever is used, terms such as preimplantation embryo and blastocyst have gained wide acceptance in publications from the same areas of study.
Recently, it was shown that among existing theoretical models for the h-index, the Glanzel-Schubert model provides the best fit for a chosen example involving the research evaluation of universities. In this brief communication, we propose a thermodynamic explanation for the success of the Glanzel-Schubert model of the h-index.
There are few comprehensive studies and categorization schemes to discuss the characteristics for both data mining and customer relationship management (CRM) although they have already become more important recently. Using a bibliometric approach, this paper analyzes data mining and CRM research trends from 1989 to 2009 by locating headings "data mining" and "customer relationship management" or "CRM" in topics in the SSCI database. The bibliometric analytical technique was used to examine these two topics in SSCI journals from 1989 to 2009, we found 1181 articles with data mining and 1145 articles with CRM. This paper implemented and classified data mining and CRM articles using the following eight categories-publication year, citation, country/territory, document type, institute name, language, source title and subject area-for different distribution status in order to explore the differences and how data mining and CRM technologies have developed in this period and to analyze data mining and CRM technology tendencies under the above result. Also, the paper performs the K-S test to check whether the analysis follows Lotka's law. The research findings can be extended to investigate author productivity by analyzing variables such as chronological and academic age, number and frequency of previous publications, access to research grants, job status, etc. In such a way characteristics of high, medium and low publishing activity of authors can be identified. Besides, these findings will also help to judge scientific research trends and understand the scale of development of research in data mining and CRM through comparing the increases of the article author. Based on the above information, governments and enterprises may infer collective tendencies and demands for scientific researcher in data mining and CRM to formulate appropriate training strategies and policies in the future. This analysis provides a roadmap for future research, abstracts technology trends and facilitates knowledge accumulations so that data mining and CRM researchers can save some time since core knowledge will be concentrated in core categories. This implies that the phenomenon "success breeds success" is more common in higher quality publications.
This study proposes a quantitative analysis of researcher mobility (i.e. transfer from one institution to another) and collaborative networks on the basis of author background data extracted from biographical notes in scientific articles to identify connections that are not revealed via simple co-authorship analysis. Using a top-ranked journal in the field of computer vision, we create a layered network that describes various aspects of author backgrounds, demonstrating a geographical distribution of institutions. We classify networks according to various dimensions including authors, institutions and countries. The results of the quantitative analysis indicate that mobility networks extend beyond the typical collaborative networks describing institutional and international relationships. We also discuss sectoral collaboration considering the mobility networks. Our findings indicate a limitation of collaborative analysis based on bibliometric data and the importance of tracing researcher mobility within potential networks to identify the true nature of scientific collaboration.
We present an empirical comparison between two normalization mechanisms for citation-based indicators of research performance. These mechanisms aim to normalize citation counts for the field and the year in which a publication was published. One mechanism is applied in the current so-called crown indicator of our institute. The other mechanism is applied in the new crown indicator that our institute is currently exploring. We find that at high aggregation levels, such as at the level of large research institutions or at the level of countries, the differences between the two mechanisms are very small. At lower aggregation levels, such as at the level of research groups or at the level of journals, the differences between the two mechanisms are somewhat larger. We pay special attention to the way in which recent publications are handled. These publications typically have very low citation counts and should therefore be handled with special care.
Using the entire population of professors at universities in the province of Quebec (Canada), this article analyzes the relationship between sex and research funding, publication rates, and scientific impact. Since age is an important factor in research and the population pyramids of men and women are different, the role of age is also analyzed. The article shows that, after they have passed the age of about 38, women receive, on average, less funding for research than men, are generally less productive in terms of publications, and are at a slight disadvantage in terms of the scientific impact (measured by citations) of their publications. Various explanations for these differences are suggested, such as the more restricted collaboration networks of women, motherhood and the accompanying division of labour, women's rank within the hierarchy of the scientific community and access to resources as well as their choice of research topics and level of specialization.
Bibliometric research assessment has matured into a quantitative phase using more meaningful measures and analogies. In this paper, we propose a thermodynamic analogy and introduce what are called the energy, exergy and entropy terms associated with a bibliometric sequence. This can be displayed as time series (variation over time), or in event terms (variation as papers are published) and also in the form of phase diagrams (energy-exergy-entropy representations). It is exergy which is the most meaningful single number scalar indicator of a scientist's performance while entropy then becomes a measure of the unevenness (disorder) of the publication portfolio.
This study addresses whether interdisciplinarity is a prominent feature of climate research by means of a co-citation analysis of the IPCC Third Assessment Report. The debate on interdisciplinarity and bibliometric measures is reviewed to operationalize the contested notion of interdisciplinarity. The results, based on 6417 references of the 96 most frequently used journals, demonstrate that the IPCC assessment of climate change is best characterized by its multidisciplinarity where the physical, biological, bodily and societal dimensions are clearly separated. Although a few fields and journals integrate a wide variety of disciplines, integration occurs mainly between related disciplines (narrow interdisciplinarity) which indicate an overall disciplinary basis of climate research. It is concluded that interdisciplinarity is not a prominent feature of climate research. The significance of this finding is explored, given that the problem scope of climate change necessitates interdisciplinarity. Ways to promote interdisciplinarity are suggested by way of conclusion.
The Essential Science Indicators (ESI) database is widely used to evaluate institutions and researchers. The objective of this study was to analyze trends and characteristics of papers in the subject category of water resources in the ESI database of the Institute for Scientific Information (ISI). Distributions of document type, language of publication, scientific output, and publication of journals are reported in this article. Five indicators (the number and ranking of total papers, first-author papers, corresponding-author papers, independent papers, and collaborative papers) were applied to evaluate country, institute, and author performances. In addition, the numbers of authors cited, numbers of institutes cited, numbers of countries cited, and numbers of subject areas cited were also used to evaluate ESI papers. Results showed that 265 papers, all written in English, were listed in 27 journals in the field of water resources. A review paper was more likely to be included in the ESI than a research paper. Journal of Hydrology published the most papers. The USA and UK were the two leading nations. ESI papers published in the US were more likely to involve inter-institutional collaboration than papers published in the UK. The University of Arizona was the most productive institute. Some papers that were almost excluded from the ESI database appear to have consistently received annual high frequencies of citation. Perhaps the 10 year criterion for inclusion in the ESI should be reassessed.
There is increasing interest in assessing how sponsored research funding influences the development and trajectory of science and technology. Traditionally, linkages between research funding and subsequent results are hard to track, often requiring access to separate funding or performance reports released by researchers or sponsors. Tracing research sponsorship and output linkages is even more challenging when researchers receive multiple funding awards and collaborate with a variety of differentially-sponsored research colleagues. This article presents a novel bibliometric approach to undertaking funding acknowledgement analysis which links research outputs with their funding sources. Using this approach in the context of nanotechnology research, the article probes the funding patterns of leading countries and agencies including patterns of cross-border research sponsorship. We identify more than 91,500 nanotechnology articles published worldwide during a 12-month period in 2008-2009. About 67% of these publications include funding acknowledgements information. We compare articles reporting funding with those that do not (for reasons that may include reliance on internal core-funding rather than external awards as well as omissions in reporting). While we find some country and field differences, we judge that the level of reporting of funding sources is sufficiently high to provide a basis for analysis. The funding acknowledgement data is used to compare nanotechnology funding policies and programs in selected countries and to examine their impacts on scientific output. We also examine the internationalization of research funding through the interplay of various funding sources at national and organizational levels. We find that while most nanotechnology funding is nationally-oriented, internationalization and knowledge exchange does occur as researchers collaborate across borders. Our method offers a new approach not only in identifying the funding sources of publications but also in feasibly undertaking large-scale analyses across scientific fields, institutions and countries.
The aim of this article is to present new ideas in evaluating Shanghai University's Academic Ranking of World Universities (ARWU). One issue frequently put forth in various publications is that the Shanghai rankings are sensitive to the relative weight they attribute to each variable. As a possible remedy to this issue, the statistical I-distance method is proposed to be used. Based on a sample containing the top 100 ranked universities, the results show a significant correlation with the official ARWU list. However, some inconsistencies concerning European universities have been noticed and elaborated upon.
Scientific literature recommender systems (SLRSs) provide papers to researchers according to their scientific interests. Systems rely on inter-researcher similarity measures that are usually computed according to publication contents (i.e., by extracting paper topics and citations). We highlight two major issues related to this design. The required full-text access and processing are expensive and hardly feasible. Moreover, clues about meetings, encounters, and informal exchanges between researchers (which are related to a social dimension) were not exploited to date. In order to tackle these issues, we propose an original SLRS based on a threefold contribution. First, we argue the case for defining inter-researcher similarity measures building on publicly available metadata. Second, we define topical and social measures that we combine together to issue socio-topical recommendations. Third, we conduct an evaluation with 71 volunteer researchers to check researchers' perception against socio-topical similarities. Experimental results show a significant 11.21% accuracy improvement of socio-topical recommendations compared to baseline topical recommendations.
A new bibliometric index is proposed, trying to preserve the advantages of the h-index and to overcome its disadvantages. Multivariate comparisons among 18 bibliometric indices are performed by using Hasse Diagram Technique (HDT) and Principal Component Analysis (PCA). The comparisons were performed on some artificial data sets, three of them well known in literature. The obtained results seems to highlight some interesting properties of the new index and also reveals some relevant relationships among the considered bibliometric indices.
An increasing number of nations allocate public funds to research institutions on the basis of rankings obtained from national evaluation exercises. Therefore, in non-competitive higher education systems where top scientists are dispersed among all the universities, rather than concentrated among a few, there is a high risk of penalizing those top scientists who work in lower-performance universities. Using a 5 year bibliometric analysis conducted on all Italian universities active in the hard sciences from 2004 to 2008, this work analyzes the distribution of publications and relevant citations by scientists within the universities, measures the research performance of individual scientists, quantifies the intensity of concentration of top scientists at each university, provides performance rankings for the universities, and indicates the effects of selective funding on the top scientists of low-ranked universities.
The aim of this article is to observe differences between research areas when it comes to establish collaboration ties with local, national or international partners. It also intends to determine in what extent the collaboration can influence the patent transfer. A collaboration network between CSIC researchers and their external collaborators was built. Several statistical tests were used to find significant differences between research areas. A multiple regression model was also utilized in order to know what type of collaboration is more successful to transfer a patent. The results show that there are two well defined groups. A "Bio" group with a high international collaboration pattern but less national participation; and a "Physicist" group supported by a high proportion of national partners but with few international connections. The regression analysis found that the national collaboration is the variable that most increase the patent transfer.
In a previous article (Degli Esposti and Geraci. Bulletin of Italian Politics, 2011), we presented an historical survey of the university reform laws that took place in Italy in the last 30 years. On that occasion, we stressed how important is merit evaluation for academics and their institutions, especially in view of the much debated but not yet implemented 'Gelmini' reform with its long awaited new regulation for accessing academic positions (concorsi) and for determining individual weight in financial resource allocation among universities. Here, we present and compare several rankings used to evaluate the prestige and merit of Italian universities. We also consider alternative approaches to academic rankings that highlight peculiar aspects of the universities in Italy which cannot be reasonably accounted for by other international rankings. Finally, we propose a new approach that combines both national and international standing of Italian universities. It is hoped that this study will provide practical guidance to policy makers for establishing the criteria upon which merit should be assessed.
Bibliometric data on psychology publications from 1977 through 2008 are modeled and forecasted for the 10 years following 2008. Data refer to the raw frequencies of the PsycINFO (94% English-language, mainly Anglo-American publications) and the English-language documents of PSYNDEX (publications from the German-speaking countries). The series were modelled by way of exponential smoothing. In contrast to Single Moving Average methods which do not weigh observations, exponential smoothing assigns differential weights to observations. Weights reflect the distance from the most recent data point. Results suggest strongly expanding publication activities which can be represented by exponential functions. In addition, forecasted publication activities, estimated based on psychology publication frequencies in the past, show positive bibliometric trends in the Anglo-American research community. These trends go in parallel the bibliometric trends for the English-language publications of German-speaking authors. However, while positive trends were forecasted for all psychological subdisciplines of the Anglo-American publication database PsycINFO, negative bibliometric trends were estimated for English-language publications from German-speaking authors in 6 out of 20 subdisciplines.
In this paper, we discuss the application of the data mining tools to identify typical features for highly cited papers (HCPs). By integrating papers' external features and quality features, the feature space used to model HCPs was established. Then, a series of predictor teams were extracted from the feature space with rough set reduction framework. Each predictor team was used to construct a base classifier. Then the five base classifiers with the highest classification performance and larger diversity on whole were selected to construct a multi-classifier system (MCS) for HCPs. The combination prediction model obtained better performance than models of a single predictor team. 11 typical prediction features for HCPs were extracted on the basis of the MCS. The findings show that both the papers' inner quality and external features, mainly represented as the reputation of the authors and journals, contribute to generation of HCPs in future.
This paper attempts to build a classification model according to the research products created by those institutes and hence to design specific evaluation processes. Several scientific input/output indicators belonging to 109 research institutes from the Spanish National Research Council (CSIC) were selected. A multidimensional approach was proposed to resume these indicators in various components. A clustering analysis was used to classify the institutes according to their scores with those components (principal component analysis). Moreover, the validity of the a priori classification was tested and the most discriminant variables were detected (linear discriminant analysis). Results show that there are three types of institutes according to their research outputs: Humanistic, Scientific and Technological. It is argue that these differences oblige to design more precise assessment exercises which focus on the particular results of each type of institute. We conclude that this method permits to build more precise research assessment exercises which consider the varied nature of the scientific activity. (C) 2011 Elsevier Ltd. All rights reserved.
This paper reports on the first documented attempt to investigate the presence of the superstar (or Matthew) effect in the knowledge management and intellectual capital (KM/IC) scholarly discipline. The Yule-Simon model and Lotka's square law were applied to the publication data obtained from 2175 articles from 11 KM/IC journals. Based on the findings, it was concluded that the KM/IC discipline represents a very young, attractive academic field that welcomes contributions from a variety of academics and practitioners. In their paper acceptance decisions, KM/IC journal editors are not biased towards a small group of highly productive researchers, which is a positive sign that the field has been progressing in the right direction. The discipline is driven more by academics than by practitioners, and the distribution of articles is more concentrated among a few academic but not practitioner institutions. It was also observed that the Yule-Simon model and Lotka's square law may produce different distributions with respect to institutions. (C) 2011 Elsevier Ltd. All rights reserved.
This paper presents the first meta-analysis of studies that computed correlations between the h index and variants of the h index (such as the g index; in total 37 different variants) that have been proposed and discussed in the literature. A high correlation between the h index and its variants would indicate that the h index variants hardly provide added information to the h index. This meta-analysis included 135 correlation coefficients from 32 studies. The studies were based on a total sample size of N = 9005; on average, each study had a sample size of n = 257. The results of a three-level cross-classified mixed-effects meta-analysis show a high correlation between the h index and its variants: Depending on the model, the mean correlation coefficient varies between 8 and 9. This means that there is redundancy between most of the h index variants and the h index. There is a statistically significant study-to-study variation of the correlation coefficients in the information they yield. The lowest correlation coefficients with the h index are found for the h index variants MII and m index. Hence, these h index variants make a non-redundant contribution to the h index. (C) 2011 Elsevier Ltd. All rights reserved.
In the case of the scientometric evaluation of multi- or interdisciplinary units one risks to compare apples with oranges: each paper has to be assessed in comparison to an appropriate reference set. We suggest that the set of citing papers can be considered as the relevant representation of the field of impact. In order to normalize for differences in citation behavior among fields, citations can be fractionally counted proportionately to the length of the reference lists in the citing papers. This new method enables us to compare among units with different disciplinary affiliations at the paper level and also to assess the statistical significance of differences among sets. Twenty-seven departments of the Tsinghua University in Beijing are thus compared. Among them, the Department of Chinese Language and Linguistics is upgraded from the 19th to the second position in the ranking. The overall impact of 19 of the 27 departments is not significantly different at the 5% level when thus normalized for different citation potentials. (C) 2011 Elsevier Ltd. All rights reserved.
In 2008, the type of document "proceedings paper" (PP) was assigned in the WoS database to journal articles which were initially presented at a conference and later adapted for publication in a journal. Since the use of two different labels ("article" and "proceedings paper") might lead to infer differences in their relevance and/or quality, this paper presents a comparative study of standard journal articles and PP in journals to explore potential differences between them. The study focuses on the Library and Information Science field in the Web of Science database and covers the 1990-2008 period. PP approximately account for 9% of the total number of articles in this field, two-thirds of which are published in monographic issues devoted to conferences, which tend to be concentrated in specific journals. Proceedings papers emerge as an heterogeneous set comprising PP in ordinary issues, similar to standard articles in structure and impact of research; and PP in monographic issues, which seem to be less comprehensive and tend to receive less citations. Faster publication of PP in monographic than in ordinary issues may conceal differences in the review process undergone by either type of paper. The main implications of these results for authors, bibliometricians, journal editors and research evaluators are pointed out. (C) 2011 Elsevier Ltd. All rights reserved.
The structural properties of the network generated by the editorial activities of the members of the boards of "Information Science & Library Science" journals are explored through network analysis techniques. The crossed presence of scholars on editorial boards, the phenomenon called interlocking editorship, is considered a proxy of the similarity of editorial policies. The evidences support the idea that this group of journals is better described as a set of only relatively connected subfields. In particular two main subfields are identified, consisting of research oriented journals devoted respectively to LIS and MIS. The links between these two subsets are weak. Around these two subsets there are a lot of (relatively) isolated professional journals or journals characterized more by their subject-matter content than by their focus on information flows. It is possible to suggest that this configuration of the network may be the consequence of the youthfulness of Information Science & Library Science, which has not permitted yet to reach a general consensus through scholars on research aims, methods and instruments. (C) 2011 Elsevier Ltd. All rights reserved.
In the recent debate on the use of averages of ratios (AoR) and ratios of averages (RoA) for the compilation of field-normalized citation rates, little evidence has been provided on the different results obtained by the two methods at various levels of aggregation. This paper provides such an empirical analysis at the level of individual researchers, departments, institutions and countries. Two datasets are used: 147,547 papers published between 2000 and 2008 and assigned to 14,379 Canadian university professors affiliated to 508 departments, and all papers indexed in the Web of Science for the same period (N = 8,221,926) assigned to all countries and institutions. Although there is a strong relationship between the two measures at each of these levels, a pairwise comparison of AoR and RoA shows that the differences between all the distributions are statistically significant and, thus, that the two methods are not equivalent and do not give the same results. Moreover, the difference between both measures is strongly influenced by the number of papers published as well as by their impact scores: the difference between AoR and RoA is greater for departments, institutions and countries with low RoA scores. Finally, our results show that RoA relative impact indicators do not add up to unity (as they should by definition) at the level of the reference dataset, whereas the AoR does have that property. (C) 2011 Elsevier Ltd. All rights reserved.
This study uses bibliographic coupling to identify missing relevant patent links, in order to construct a comprehensive citation network. Missing citation links can be added by taking the missing relevant patent links into account. The Pareto principle is used to determine the threshold of bibliographic coupling strength, in order to identify the missing relevant patent links. Comparisons between the original patent citation network and the comprehensive patent citation network with the missing relevant patent links are illustrated at both the patent and assignee levels. Light emitting diode (LED) illuminating technology is chosen as the case study. The relationships between the patents and the assignees are obviously enhanced after adding the missing relevant patent links. The results show that the growth rates on both the total number and the average number of links have apparently improved at the patent level. At the assignee level, the number of linked assignees and the average number of links between two assignees are increased. The differences between the two citation networks are further examined by means of the Freeman vertex betweenness centrality and Johnson's hierarchical clustering. The patents with more new links to other patents have distinct results in terms of the Freeman vertex betweenness centrality. The enhancement of links among patents also results in different clustering. Crown Copyright (C) 2011 Published by Elsevier Ltd. All rights reserved.
The outgrow index measures to which extent an article outgrows - in terms of citations - the references on which it is based. In this article, three types of time series of outgrow indices and one outgrow index matrix are introduced. Examples of these time series are given illustrating the newly introduced concepts. These time series expand the toolbox for citation analysis by focusing on a specific subnetwork of the global citation network. It is stated that citation analysis has three application areas: information retrieval, research evaluation and structural citation network studies. This contribution is explicitly placed among structural network studies. (C) 2011 Elsevier Ltd. All rights reserved.
Complex networks may undergo random and/or systematic failures in some of their components, i.e. nodes and edges. These failures may influence various network properties. In this article, for a number of real-world as well as Watts-Strogatz model networks, we investigated the profile of the network small-worldness as random failures, i.e. errors, or systematic failures, i.e. attacks, occurred in the nodes. In errors nodes are randomly removed along with all their tipping edges, while in attacks the nodes with highest degrees are removed from the network. Interestingly, in many cases, the small-worldness of violated networks increased as more nodes underwent an attack. This indicates an important role of the hub nodes in controlling the small-worldness of Watts-Strogatz networks. The profile of changes in the small-worldness as a result of errors/attacks was independent of network size, while it was influenced by average degree and rewiring probability of Watts-Strogatz model. We also found that the pattern of the changes of the small-worldness in real-world networks is completely different than that of the Watts-Strogatz networks. Therefore, although Watts-Strogatz model is often used for constructing networks with small-world property, the resulting networks have different properties compared to real-world ones in terms of robustness in the small-worldness index against errors/attacks. (C) 2011 Elsevier Ltd. All rights reserved.
This work maps and analyses cross-citations in the areas of Biology, Mathematics, Physics and Medicine in the English version of Wikipedia, which are represented as an undirected complex network where the entries correspond to nodes and the citations among the entries are mapped as edges. We found a high value of clustering coefficient for the areas of Biology and Medicine, and a small value for Mathematics and Physics. The topological organization is also different for each network, including a modular structure for Biology and Medicine, a sparse structure for Mathematics and a dense core for Physics. The networks have degree distributions that can be approximated by a power-law with a cut-off. The assortativity of the isolated networks has also been investigated and the results indicate distinct patterns for each subject. We estimated the betweenness centrality of each node considering the full Wikipedia network, which contains the nodes of the four subjects and the edges between them. In addition, the average shortest path length between the subjects revealed a close relationship between the subjects of Biology and Physics, and also between Medicine and Physics. Our results indicate that the analysis of the full Wikipedia network cannot predict the behavior of the isolated categories since their properties can be very different from those observed in the full network. (C) 2011 Elsevier Ltd. All rights reserved.
We define the generalized Wu- and Kosmulski-indices, allowing for general parameters of multiplication or exponentiation. We then present formulae for these generalized indices in a Lotkaian framework. Next we characterise these indices in terms of their dependence on the quotient of the average number of items per source in the m-core divided by the overall average ( m is any generalized Wu- or Kosmulski-index). As a consequence of these results we show that the fraction of used items (used in the definition of m) in the m-core is independent of the parameter and equals one divided by the overall average. (C) 2011 Elsevier Ltd. All rights reserved.
Web 2.0 technologies are finding their way into academics: specialized social bookmarking services allow researchers to store and share scientific literature online. By bookmarking and tagging articles, academic prosumers generate new information about resources, i.e. usage statistics and content description of scientific journals. Given the lack of global download statistics, the authors propose the application of social bookmarking data to journal evaluation. For a set of 45 physics journals all 13,608 bookmarks from CiteULike, Connotea and BibSonomy to documents published between 2004 and 2008 were analyzed. This article explores bookmarking data in STM and examines in how far it can be used to describe the perception of periodicals by the readership. Four basic indicators are defined, which analyze different aspects of usage: Usage Ratio, Usage Diffusion, Article Usage Intensity and Journal Usage Intensity. Tags are analyzed to describe a reader-specific view on journal content. (C) 2011 Elsevier Ltd. All rights reserved.
This paper proposes an empirical analysis of several scientists based on their time regularity, defined as the ability of generating an active and stable research output over time, in terms of both quantity/publications and impact/citations. In particular, we empirically analyse three recent bibliometric tools to perform qualitative/quantitative evaluations under the new perspective of regularity. These tools are respectively (1) the PY/CY diagram, (2) the publication/citation Ferrers diagram and triad indicators, and (3) a year-by-year comparison of the scientists' output (Borda's ranking). Results of the regularity analysis are then compared with those obtained under the classical perspective of overall production. The proposed evaluation tools can be applied to competitive examinations for research position/promotion, as complementary instruments to the commonly adopted bibliometric techniques. (C) 2011 Elsevier Ltd. All rights reserved.
Meaning can be generated when information is related at a systemic level. Such a system can be an observer, but also a discourse, for example, operationalized as a set of documents. The measurement of semantics as similarity in patterns (correlations) and latent variables (factor analysis) has been enhanced by computer techniques and the use of statistics; for example, in "latent semantic analysis". This communication provides an introduction, an example, pointers to relevant software, and summarizes the choices that can be made by the analyst. Visualization ("semantic mapping") is thus made more accessible. (C) 2011 Elsevier Ltd. All rights reserved.
The Hirsch index and the Egghe index are both numbers that synthesize a researcher's output. The h-index associated with researcher r is the maximum number h such that r has h papers with at least h citations each. The g-index is the maximum number g of papers by r such that the average number of citations of the g papers is at least g. Both indices are characterized in terms of four axioms. One identifies outputs deserving index at most one. A second one establishes a strong monotonicity condition. A third one requires the index to satisfy a property of subadditivity. The last one consists of a monotonicity condition, for the h-index, and an aggregate monotonicity condition, for the g-index. (C) 2011 Elsevier Ltd. All rights reserved.
A paper which has received more citations than the number of references in that paper is called a successful paper (SP). The assessment based on the number of SP produces comparable scores for scientists working in different disciplines of science, and in different countries. (C) 2011 Elsevier Ltd. All rights reserved.
Modern information retrieval (IR) has come to terms with numerous new media in efforts to help people find information in increasingly diverse settings. Among these new media are so-called microblogs. A microblog is a stream of text that is written by an author over time. It comprises many very brief updates that are presented to the microblog's readers in reverse-chronological order. Today, the service called Twitter is the most popular microblogging platform. Although microblogging is increasingly popular, methods for organizing and providing access to microblog data are still new. This review offers an introduction to the problems that face researchers and developers of IR systems in microblog settings. After an overview of microblogs and the behavior surrounding them, the review describes established problems in microblog retrieval, such as entity search and sentiment analysis, and modeling abstractions, such as authority and quality. The review also treats user-created metadata that often appear in microblogs. Because the problem of microblog search is so new, the review concludes with a discussion of particularly pressing research issues yet to be studied in the field.
The enormous amount of valuable information that is produced today and needs to be made available over the long-term has led to increased efforts in scalable, automated solutions for long-term digital preservation. The mission of preservation planning is to define the optimal actions to ensure future access to digital content and react to changes that require adjustments in repository operations. Considerable effort has been spent in the past on defining, implementing, and validating a framework and system for preservation planning. This article sheds light on the actual decision criteria and influence factors to be considered when choosing digital preservation actions. It is based on an extensive evaluation of case studies on preservation planning for a range of different types of objects with partners from different institutional backgrounds. We categorize decision criteria from a number of real-world decision-making instances in a taxonomy. We show that a majority of the criteria can be evaluated by applying automated measurements under realistic conditions, and demonstrate that controlled experimentation and automated measurements can be used to substantially improve repeatability of decisions and reduce the effort needed to evaluate preservation components. The presented measurement framework enables scalable preservation and monitoring and supports trust in preservation decisions because extensive evidence is produced in a reproducible, automated way and documented as the basis of decision making in a standardized form.
With increasing digital information availability, semantic web technologies have been employed to construct semantic digital libraries in order to ease information comprehension. The use of semantic web enables users to search or visualize resources in a semantic fashion. Semantic web generation is a key process in semantic digital library construction, which converts metadata of digital resources into semantic web data. Many text mining technologies, such as keyword extraction and clustering, have been proposed to generate semantic web data. However, one important type of metadata in publications, called affiliation, is hard to convert into semantic web data precisely because different authors, who have the same affiliation, often express the affiliation in different ways. To address this issue, this paper proposes a clustering method based on normalized compression distance for the purpose of affiliation disambiguation. The experimental results show that our method is able to identify different affiliations that denote the same institutes. The clustering results outperform the well-known k-means clustering method in terms of average precision, F-measure, entropy, and purity.
In both the popular press and scholarly research, digital information is persistently discussed in terms that imply its immateriality. In this characterization, the digital derives its power from its nature as a mere collection of 0s and 1s wholly independent from the particular media on which it is stored-hard drive, network wires, optical disk, etc.-and the particular signal carrier which encodes bits-variations of magnetic field, voltages, or pulses of light. This purported immateriality endows bits with considerable advantages: they are immune from the economics and logistics of analog media, and from the corruption, degradation, and decay that necessarily result from the handling of material carriers of information, resulting in a worldwide shift "from atom to bits" as captured by Negroponte. This is problematic: however immaterial it might appear, information cannot exist outside of given instantiations in material forms. But what might it mean to talk of bits as material objects? In this paper I argue that bits cannot escape the material constraints of the physical devices that manipulate, store, and exchange them. Such an analysis reveals a surprising picture of computing as a material process through and through.
Given their ease of use and capability for interactivity, new media are seen as having the potential to make visible previously marginalized voices. The online presence of the writing of history is increasing, and this potential would be a welcome development for the field as it would create a much richer set of easily available historical perspectives. However, this article suggests that the achievement of this promise is fraught with difficulty and that a more likely outcome is a mapping of the status quo in historical representation onto the new media. To illustrate this, I present an analysis of the Wikipedia accounts of Singaporean and Philippine history. For Singapore, alternative historical visions are not as developed as those for the Philippines, and this is reflected in the nature of the respective Wikipedia accounts. I suggest that a possible means to achieve something more of the promise of digital media for history is for information professionals to take a keener interest in Wikipedia, with an eye to helping include accounts of documented historical perspectives that are ignored by mainstream historiographical traditions.
Given the prevalence of community-driven knowledge services (CKSs) such as Yahoo! Answers and Naver Knowledge In, it has become important to understand the effect of social networks on user behaviors in CKS environments. CKSs allow various relationships between askers and answerers as well as among answerers. This study classifies social ties in CKSs into three kinds of ties: answering ties, co-answering ties, and getting answers ties. This study examines the influence of the structural and relational attributes of social networks on the quality of answers at CKSs for answering ties, co-answering ties, and getting answers ties. Data collected from the top-100 heavy users of Yahoo! Answers and of Naver Knowledge In are used to test the research model. The analysis results show that the centrality of the answering ties significantly influences the quality of answers while the average strength of the answering ties has an insignificant effect on the quality of answers. Interestingly, both the centrality and average strength of the co-answering ties negatively affect the quality of answers. Moreover, the centrality and average strength of getting answers ties do not significantly influence the quality of answers.
Research indicates that migrants' social media usage in Ireland enables a background awareness of friends and acquaintances that supports bonding capital and transnational communities in ways not previously reported. Interview data from 65 Polish and Filipino non-nationals in Ireland provide evidence that their social media practices enable a shared experience with friends and relations living outside Ireland that is not simply an elaboration of the social relations enabled by earlier Internet applications. Social media usage enables a passive monitoring of others, through the circulation of voice, video, text, and pictures, that maintains a low level mutual awareness and supports a dispersed community of affinity. This ambient, or background, awareness of others enhances and supports dispersed communities by contributing to bonding capital. This may lead to significant changes in the process of migration by slowing down the process of integration and participation in host societies while also encouraging continual movement of migrants from one society to another.
An important question in information-seeking behavior is where people go for information and why information seekers prefer to use one source type rather than another when faced with an information-seeking task or need for information. Prior studies have paid little attention to contingent variables that could change the cost-benefit calculus in source use. They also defined source use in one way or the other, or considered source use as a monolithic construct. Through an empirical survey of 352 working professionals in Singapore, this study carried out a context-based investigation into source use by information seekers. Different measures of source use have been incorporated, and various contextual variables that could affect the use of source types have been identified. The findings suggest that source quality and access difficulty are important antecedents of source use, regardless of the source type. Moreover, seekers place more weight on source quality when the task is important. Other contextual factors, however, are generally less important to source use. Seekers also demonstrate a strong pecking order in the use of source types, with online information and face-to-face being the two most preferred types.
The goal of this article is to explore the manifestation of culture in the design of English-language and Chinese-language corporate websites, using Hofstede's dimensions of culture. Data were gathered from the 2010 Global 500 list published by Fortune magazine. Only multinational corporations that have both English-language and Chinese-language websites were analyzed (N = 223). The results indicate that the Chinese-language and English-language websites differ significantly in 4 out of Hofstede's 5 cultural dimensions: power distance, uncertainty avoidance, individualism/collectivism, and long-term and short-term dimensions. Cultural differences are indeed reflected in the website designs of the Global 500 corporations, though not exactly in the direction predicted by Hofstede's model.
Increasing interdisciplinarity has been a policy objective since the 1990s, promoted by many governments and funding agencies, but the question is: How deeply has this affected the social sciences? Although numerous articles have suggested that research has become more interdisciplinary, yet no study has compared the extent to which the interdisciplinarity of different social science subjects has changed. To address this gap, changes in the level of interdisciplinarity since 1980 are investigated for subjects with many articles in the Social Sciences Citation Index (SSCI), using the percentage of cross-disciplinary citing documents (PCDCD) to evaluate interdisciplinarity. For the 14 SSCI subjects investigated, the median level of interdisciplinarity, as measured using cross-disciplinary citations, declined from 1980 to 1990, but rose sharply between 1990 and 2000, confirming previous research. This increase was not fully matched by an increase in the percentage of articles that were assigned to more than one subject category. Nevertheless, although on average the social sciences have recently become more interdisciplinary, the extent of this change varies substantially from subject to subject. The SSCI subject with the largest increase in interdisciplinarity between 1990 and 2000 was Information Science & Library Science (IS&LS) but there is evidence that the level of interdisciplinarity of IS&LS increased less quickly during the first decade of this century.
The last few years have seen the emergence of several open access (OA) options in scholarly communication, which can be grouped broadly into two areas referred to as gold and green roads. Several recent studies have shown how large the extent of OA is, but there have been few studies showing the impact of OA in the visibility of journals covering all scientific fields and geographical regions. This research presents a series of informative analyses providing a broad overview of the degree of proliferation of OA journals in a data sample of about 17,000 active journals indexed in Scopus. This study shows a new approach to scientific visibility from a systematic combination of four databases: Scopus, the Directory of Open Access Journals, Rights Metadata for Open Archiving (ROMEO)/Securing a Hybrid Environment for Research Preservation and Access (SHERPA), and SciMago Journal Rank] and provides an overall, global view of journals according to their formal OA status. The results primarily relate to the number of journals, not to the number of documents published in these journals, and show that in all the disciplinary groups, the presence of green road journals widely surpasses the percentage of gold road publications. The peripheral and emerging regions have greater proportions of gold road journals. These journals belong for the most part to the last quartile. The benefits of OA on visibility of the journals are to be found on the green route, but paradoxically, this advantage is not lent by the OA, per se, but rather by the quality of the articles/journals themselves regardless of their mode of access.
Fractional counting of citations can improve on ranking of multidisciplinary research units (such as universities) by normalizing the differences among fields of science in terms of differences in citation behavior. Furthermore, normalization in terms of citing papers abolishes the unsolved questions in scientometrics about the delineation of fields of science in terms of journals and normalization when comparing among different (sets of) journals. Using publication and citation data of seven Korean research universities, we demonstrate the advantages and the differences in the rankings, explain the possible statistics, and suggest ways to visualize the differences in (citing) audiences in terms of a network.
Using a database of 1.4 million papers indexed by Web of Science, we examined the global trends in publication inequality and international collaboration in physics. The publication output and citations received by authors hosted in each country were taken into account. Although inequality decreased over time, further progress toward equality has somewhat abated in recent years. The skewedness of the global distribution in publication output was shown to be correlated with article impact, that is, the inequality is more significant in articles of higher impact. It was also observed that, despite the trend toward more equalitarian distribution, scholarly participation in physics is still determined by a select group. Particularly noteworthy has been China's rapid growth in publication outputs and a gradual improvement in its impact. Finally, the data also suggested regional differences in scientific collaboration. A distinctively high concentration of transnational collaboration and publication performance was found among EU countries.
In an increasingly global research landscape, it is important to identify the most prolific researchers in various institutions and their influence on the diffusion of knowledge. Knowledge diffusion within institutions is influenced by not just the status of individual researchers but also the collaborative culture that determines status. There are various methods to measure individual status, but few studies have compared them or explored the possible effects of different cultures on the status measures. In this article, we examine knowledge diffusion within science and technology-oriented research organizations. Using social network analysis metrics to measure individual status in large-scale coauthorship networks, we studied an individual's impact on the recombination of knowledge to produce innovation in nanotechnology. Data from the most productive and high-impact institutions in China (Chinese Academy of Sciences), Russia (Russian Academy of Sciences), and India (Indian Institutes of Technology) were used. We found that boundary-spanning individuals influenced knowledge diffusion in all countries. However, our results also indicate that cultural and institutional differences may influence knowledge diffusion.
In this article, we address the problem of question clustering and study its use for re-ranking question search results. In question clustering we have to organize question search results into certain meaningful and condensed groups. Specifically, we propose to use a data structure consisting of question topic and question focus for modeling questions, and then cluster questions on the basis of the data structure. Experimental results show that our approach to question clustering improves the performance of question search significantly better than the approach not utilizing the topic focus structure.
Computing document similarity directly from a "bag of words" vector space model can be problematic because term independence causes the relationships between synonymous terms and the contextual influences that determine the sense of polysemous terms to be ignored. This study compares two methods that potentially address these problems by deriving the higher order relationships that lie latent within the original first-order space. The first is latent semantic analysis (LSA), a dimension reduction method that is a well-known means of addressing the vocabulary mismatch problem in information retrieval systems. The second is the lesser known yet conceptually simple approach of second-order similarity (SOS) analysis, whereby latent similarity is measured in terms of mutual first-order similarity. Nearest neighbour tests show that SOS analysis derives similarity models that are superior to both first-order and LSA-derived models at both coarse and fine levels of semantic granularity. SOS analysis has been criticized for its computational complexity. A second contribution is the novel application of vector truncation to reduce run-time by a constant factor. Speed-ups of 4 to 10 times are achievable without compromising the structural gains achieved by full-vector SOS analysis.
A simple distribution function f(x, t) = p(x + q)(-beta)e(alpha t) obeys wave and heat equations, that constructs a theoretical approach to the unification of informetric models, with which we can unify all informetric laws. While its space-type distributions deduce naturally Lotka-type laws in size approaches and Zipf-type laws in rank approaches, its time-type distributions introduce the mechanism of Price-type and Brookes-type laws.
This article proposes a theory of information need for information retrieval (IR). Information need traditionally denotes the start state for someone seeking information, which includes information search using an IR system. There are two perspectives on information need. The dominant, computer science perspective is that the user needs to find an answer to a well-defined question which is easy for the user to formulate into a query to the system. Ironically, information science's best known model of information need (Taylor, 1968) deems it to be a "black box"-unknowable and nonspecifiable by the user in a query to the information system. Information science has instead devoted itself to studying eight adjacent or surrogate concepts (information seeking, search and use; problem, problematic situation and task; sense making and evolutionary adaptation/information foraging). Based on an analysis of these eight adjacent/surrogate concepts, we create six testable propositions for a theory of information need. The central assumption of the theory is that while computer science sees IR as an information-or answer-finding system, focused on the user finding an answer, an information science or user-oriented theory of information need envisages a knowledge formulation/acquisition system.
In the past few decades, the task of judging the credibility of information has shifted from trained professionals (e. g., editors) to end users of information (e. g., casual Internet users). Lacking training in this task, it is highly relevant to research the behavior of these end users. In this article, we propose a new model of trust in information, in which trust judgments are dependent on three user characteristics: source experience, domain expertise, and information skills. Applying any of these three characteristics leads to different features of the information being used in trust judgments; namely source, semantic, and surface features (hence, the name 3S-model). An online experiment was performed to validate the 3S-model. In this experiment, Wikipedia articles of varying accuracy (semantic feature) were presented to Internet users. Trust judgments of domain experts on these articles were largely influenced by accuracy whereas trust judgments of novices remained mostly unchanged. Moreover, despite the influence of accuracy, the percentage of trusting participants, both experts and novices, was high in all conditions. Along with the rationales provided for such trust judgments, the outcome of the experiment largely supports the 3S-model, which can serve as a framework for future research on trust in information.
This exploratory study contributes to research on relevance assessment by specifying criteria that are used in the judgment of information quality and credibility in Internet discussion forums. To this end, 4,739 messages posted to 160 Finnish discussion threads were analyzed. Of the messages, 20.5% contained explicit judgments of the quality of information and credibility in other messages. In the judgments, the forum participants employed both positive criteria such as validity of information and negative criteria such as dishonesty in argumentation. In the evaluation of the quality of the message's information content, the most frequently used criteria pertained to the usefulness, correctness, and specificity of information. In the judgment of information credibility, the main criteria included the reputation, expertise, and honesty of the author of the message. Since Internet discussion forums tend to emphasize the role of disputational discourse questioning rather than accepting the views presented by others, mainly negative criteria were used in the judgments. The generality of our claims is limited because we chose forums that focused on sensitive and value-laden topics; future work could explore credibility and quality judgment in other forums and forumlike venues such as question and answer sites as well as exploring how quality and credibility judgments interact with other aspects of forum use.
The goal of this study is to understand how consultants' information seeking from human and digital knowledge sources is influenced by their relationships with both types of knowledge sources and the characteristics of the knowledge domain in which information seeking takes place. Grounded on and extending transactive memory theory, this study takes a multidimensional approach to predict consultants' information seeking based on expertise recognition, source accessibility, peer information-seeking behaviors, knowledge complexity, and codifiability. Using data collected from 110 consultants across 9 project teams from 2 multinational consulting firms, this study found that consultants' information seeking from human knowledge sources was mostly driven by the expertise and accessibility level of their team members, whereas their information seeking from digital knowledge repositories was strongly influenced by how much information the digital knowledge source had and whether colleagues with whom they had strong social communication ties were seeking information from the digital source. Finally, knowledge complexity had a negative influence on consultants' information seeking from digital knowledge repositories, but knowledge codifiability had no significant effects on information seeking from either knowledge source. This study demonstrates the importance and viability of using a multidimensional network approach to advancing transactive memory theory to study consultants' information-seeking practices.
Previous work has established that search engine queries can be classified according to the intent of the searcher (i.e., why is the user searching, what specifically do they intend to do). In this article, we describe an experiment in which four sets of queries, each set representing a different user intent, are repeatedly submitted to three search engines over a period of 60 days. Using a variety of measurements, we describe the overall stability of the search engine results recorded for each group. Our findings suggest that search engine results for informational queries are significantly more stable than the results obtained using transactional, navigational, or commercial queries.
This research investigated preservation practices of personal digital information by public library users. This qualitative study used semistructured interviews and two visual representation techniques, information source horizons and matrices, for data collection. The constant comparison method and descriptive statistics were used to analyze the data. A model emerged which describes the effects of social, cognitive, and affective influences on personal preservation decisions as well as the effects of fading cognitive associations and technological advances, combined with information escalation over time. Because the preservation of personal digital information involves personal, social, and technological interactions, the integration of these factors is necessary for a viable solution to the digital preservation problem.
Evidence-based practice (EBP) is an influential interdisciplinary movement that originated in medicine as evidence-based medicine (EBM) about 1992. EBP is of considerable interest to library and information science (LIS) because it focuses on a thorough documentation of the basis for the decision making that is established in research as well as an optimization of every link in documentation and search processes. EBP is based on the philosophical doctrine of empiricism and, therefore, it is subject to the criticism that has been raised against empiricism. The main criticism of EBP is that practitioners lose their autonomy, that the understanding of theory and of underlying mechanisms is weakened, and that the concept of evidence is too narrow in the empiricist tradition. In this article, it is suggested that we should speak of "research-based practice" rather than EBP, because this term is open to more fruitful epistemologies and provides a broader understanding of evidence. The focus on scientific argumentation in EBP is an important contribution from EBP to LIS, which is long overdue, but parts of the underlying epistemological assumptions should be replaced: EBP is too narrow, too formalist, and too mechanical an approach on which to base scientific and scholarly documentation.
Applications that use games to harness human intelligence to perform various computational tasks are increasing in popularity and may be termed human computation games (HCGs). Most HCGs are collaborative in nature, requiring players to cooperate within a game to score points. Competitive versions, where players work against each other, are a more recent entrant, and have been claimed to address shortcomings of collaborative HCGs such as quality of computation. To date, however, little work has been conducted in understanding how different HCG genres influence computational performance and players' perceptions of such. In this paper we study these issues using image tagging HCGs in which users play games to generate keywords for images. Three versions were created: collaborative HCG, competitive HCG, and a control application for manual tagging. The applications were evaluated to uncover the quality of the image tags generated as well as users' perceptions. Results suggest that there is a tension between entertainment and tag quality. While participants reported liking the collaborative and competitive image tagging HCGs over the control application, those using the latter seemed to generate better quality tags. Implications of the work are discussed.
In a previous question answering study, we identified nine semantic-relationship types, including synonyms, hypernyms, word chains, and holonyms, that exist between terms in Text Retrieval Conference queries and those in their supporting sentences in the Advanced Question Answering for Intelligence (Graff, 2002) corpus. The most frequently occurring relationship type was the hypernym(e. g., Katherine Hepburn is an actress). The aim of the present work, therefore, was to develop a method for determining a person's occupation from syntactic data in a text corpus. First, in the P-System, we compared predicate-argument data involving a proper name for different occupations using Okapi's BM25 weighting algorithm. When classifying actors and using sufficiently frequent names, an accuracy of 0.955 was attained. For evaluation purposes, we also implemented a standard apposition-based classifier (A-System). This performs well, but only if a particular name happens to appear in apposition with the corresponding occupation. Last, we created a hybrid (H-System) which combines the strengths of P with those of A. Using data with a minimum of 100 predicate-argument pairs, H performed best with an overall lenient accuracy of 0.750 while A and P scored 0.615 and 0.656, respectively. We therefore conclude that a hybrid approach combining information from different sources is the best way to predict occupations.
Large online social rating networks (e. g., Epinions, Blippr) have recently come into being containing information related to various types of products. Typically, each product in these networks is associated with a group of members who have provided ratings and comments on it. These people form a product community. A potential member can join a product community by giving a new rating to the product. We refer to this phenomenon of a product community's ability to "attract" new members as product affinity. The knowledge of a ranked list of products based on product affinity is of much importance for implementing policies, marketing research, online advertisement, and other applications. In this article, we identify and analyze an array of features that exert effect on product affinity and propose a novel model, called AffRank, that utilizes these features to predict the future rank of products according to their affinities. Evaluated on two real-world datasets, we demonstrate the effectiveness and superior prediction quality of AffRank compared with baseline methods. Our experiments show that features such as affinity rank history, affinity evolution distance, and average rating are the most important factors affecting future rank of products. At the same time, interestingly, traditional community features (e. g., community size, member connectivity, and social context) have negligible influence on product affinities.
As academic disciplines are segmented and specialized, it becomes more difficult to capture relevant research areas precisely by common retrieval strategies using either keywords or journal categories. This paper proposes a method of measuring the relatedness among sets of academic papers in order to detect unrelated communities which are not related to target topic. A citation network, extracted by given keywords, is divided into communities based on the density of links. We measured and compared four measures of relatedness between two communities in a citation network for three large-scale citation datasets. We used both link and semantic similarities. The topological distance from the center in a citation network is a more efficient measure for removing the unrelated communities than the other three measures: the ratio of the number of intercluster links over the all links, the ratio of the number of common terms over all terms, cosine similarity of tf-idf vectors.
We submit newly developed citation impact indicators based not on arithmetic averages of citations but on percentile ranks. Citation distributions are-as a rule-highly skewed and should not be arithmetically averaged. With percentile ranks, the citation score of each paper is rated in terms of its percentile in the citation distribution. The percentile ranks approach allows for the formulation of a more abstract indicator scheme that can be used to organize and/or schematize different impact indicators according to three degrees of freedom: the selection of the reference sets, the evaluation criteria, and the choice of whether or not to define the publication sets as independent. Bibliometric data of seven principal investigators (PIs) of the Academic Medical Center of the University of Amsterdam are used as an exemplary dataset. We demonstrate that the proposed family indicators [R(6), R(100), R(6, k), R(100, k)] are an improvement on averages-based indicators because one can account for the shape of the distributions of citations over papers.
A bibliometric method for analyzing and visualizing national research profiles is adapted to describe national preferences for publishing particular document types. Similarities in national profiles and national peculiarities are discussed based on the publication output of the 26 most active countries indexed in the Web of Science annual volume 2007.
In order to examine the phenomena of eponymy and Obliteration by Incorporation at both the aggregate and individual subject level, the literature relating to the game-theoretic concept of the Nash Equilibrium was studied over the period 1950-2008. Almost 5,300 bibliographic database records for publications explicitly citing at least one of two papers by John Nash and/or using the phrase "Nash Equilibrium/Nash Equilibria" were retrieved from the Web of Science and various subject-related databases. Breadth of influence is demonstrated by the wide variety of subject areas in which Nash Equilibrium-related publications occur, including in the natural and social sciences, humanities, law, and medicine. Fifty percent of all items have been published since 2002, suggesting that Nash's papers have experienced "delayed recognition." A degree of Obliteration by Incorporation is observed in that implicit citations (use of the phrase without citation) increased over the time period studied, although the proportion of all citations that are implicit has remained relatively stable during the most recent decade with an annual rate of between 60% and 70%; subject areas vary in their level of obliteration.
This paper examines the rapid growth of China in the field of nanotechnology and the rise of collaboration between China and the US in this emerging domain. Chinese scientific papers in nanotechnology are analyzed to indicate overall trends, leading fields and the most prolific institutions. Patterns of China-US nanotechnology paper co-authorship are examined over the period 1990-2009, with an analysis of how these patterns have changed over time. The paper combines bibliometric analysis and science mapping. We find rapid development in the number of China-US co-authored nanotechnology papers as well as structural changes in array of collaborative nanotechnology sub-fields. Implications for both China and the US of this evolving relationship are discussed.
Macro-environmental trends such as technological changes, declining trade and investment barriers, and globalizing forces impacting both markets and production worldwide point to the heightened importance of international business (IB) and the relevance of IB research today. Despite this, a leading scholar has expressed concerns that the IB research agenda could be 'running out of steam' (Buckley, Journal of International Business Studies 33(2):365-373, 2002), prompting on-going introspection within the IB field. We contribute to this debate by investigating the evolution of the IB field through a scientometric examination of articles published in its premier journal, the Journal of International Business Studies (JIBS) from 1970 until 2008. We introduce a new analytical tool, Leximancer, to the fields of international business and scientometry. We show an evolution from an initial and extended emphasis on macro-environmental issues to a more recent focus on micro-economic, firm-level ones with the multinational enterprise (MNE) as an organizational form enduring throughout the entire period. We observe a field that has established a justifiable claim for relevance, participating actively in the interdisciplinary exchange of ideas.
Drawing on a database of the competitive research funds in the Japanese academia, this study examines the distribution of research grants at the university and individual levels. The data indicates high inequality at the university level and slightly lower inequality at the individual level. Over the last three decades, the total grant budget has greatly increased and an increasing number of researchers have received the funds. Simultaneously, large-size grants have become more common and multiple awarding (i.e., one researcher receives more than one grant simultaneously) has become more frequent. These changes taken together, the level of inequality has not been changed substantially. The extent of inequality largely differs between scientific fields; especially high in basic natural sciences and relatively low in social sciences. A close examination of inequality over researchers' career indicates different patterns of transition between fields and cohorts. Finally, both at the university and individual levels, the funding distribution is found more unequal than the distribution of publications as an output indicator.
This study describes the development process of Kor-Factor, which is a novel composite evaluation index that was developed to promote Korean domestic academic journals. As more data accumulate, the Kor-Factor's optimization process was modified in an attempt to address possible drawbacks of the original form; the result is presented in this study. This study compares Kor-Factor with the Impact Factor, which is the most well-known single element evaluation index. We found that Kor-Factor demonstrates a better power of differentiation and a greater capacity to reflect the reputability of key journals. The modified Kor-Factor, which has been developed through an optimization process, reveals a greater power of differentiation than the original Kor-Factor; however, the modified version has less capacity to reflect reputability. The evaluation elements of the modified Kor-Factor are better and are more evenly reflected on the index value than those of the original version. Finally, we propose the establishment of an appropriate data measurement period for the actual application of the index.
The patents of China in biotechnology in the United States Patent and Trademark Office during 1995-2008 have been analyzed in this paper with the help of bibliometrics and social network analysis techniques. The analysis has been carried out from several perspectives including total patent output of industries, universities and public research institutes (PRIs) and their positions in the knowledge network, the main innovators and their interactions, the collaboration among Chinese regions and the collaborations from abroad. The results show that though with some improvements, the patent performance of Chinese organizations and regions in biotechnology still need to be improved. The connections between Chinese innovators are not very cohesive and they depend heavily on foreign knowledge, especial knowledge from U. S. multinational firms and universities. The important innovators of China in this field are mainly PRIs and universities. More and stronger firm innovators, especially large and powerful multinational companies, are strongly needed for the nation's biotechnology industry.
This exploratory study aims at answering the following research question: Are the h-index and some of its derivatives discriminatory when applied to rank social scientists with different epistemological beliefs and methodological preferences? This study reports the results of five Tobit and two negative binomial regression models taking as dependent variable the h-index and six of its derivatives, using a dataset combining bibliometric data collected with the PoP software with cross-sectional data of 321 Quebec social scientists in Anthropology, Sociology, Social Work, Political Science, Economics and Psychology. The results reveal an epistemological/methodological effect making positivists and quantitativists globally more productive than constructivists and qualitativists.
Devising an index to measure the quality of research is a challenging task. In this paper, we propose a set of indices to evaluate the quality of research produced by an author. Our indices utilize a policy that assigns the weights to multiple authors of a paper. We have considered two weight assignment policies: positionally weighted and equally weighted. We propose two classes of weighted indices: weighted h-indices and weighted citation h-cuts. Further, we compare our weighted h-indices with the original h-index for a selected set of authors. As opposed to h-index, our weighted h-indices take into account the weighted contributions of individual authors in multi-authored papers, and may serve as an improvement over h-index. The other class of weighted indices that we call weighted citation h-cuts take into account the number of citations that are in excess of those required to compute the index, and may serve as a supplement to h-index or its variants.
In this paper, co-word analysis is used to analyze the evolvement in stem cell field. Articles in the stem cell journals are downloaded from PubMed for analysis. Terms selection is one of the most important steps in co-word analysis, so the useless and the general subject headings are removed firstly, and then the major subject headings and minor subject headings are weighted respectively. Then, improved information entropy is exploited to select the subject headings with the experts consulting. Hierarchical cluster analysis is used to cluster the subject headings and the strategic diagram is formed to analyze the evolutionary trends in the stem cell field.
Scientific authorship has important implications in science since it reflects the contribution to research of the different individual scientists and it is considered by evaluation committees in research assessment processes. This study analyses the order of authorship in the scientific output of 1,064 permanent scientists at the Spanish CSIC (WoS, 1994-2004). The influence of age, professional rank and bibliometric profile of scientists over the position of their names in the byline of publications is explored in three different research areas: Biology and Biomedicine, Materials Science and Natural Resources. There is a strong trend for signatures of younger researchers and those in the lower professional ranks to appear in the first position (junior signing pattern), while more veteran or highly-ranked ones, who tend to play supervisory functions in research, are proportionally more likely to sign in the last position (senior signing pattern). Professional rank and age have an effect on authorship order in the three fields analysed, but there are inter-field differences. Authorship patterns are especially marked in the most collaboration-intensive field (i.e. Biology and Biomedicine), where professional rank seems to be more significant than age in determining the role of scientists in research as seen through their authorship patterns, while age has a more significant effect in the least collaboration-intensive field (Natural Resources).
The paper introduces a concept for measuring the interpretive fragmentation of scientific fields by the analysis of their citation networks. Transitive closure in two-mode networks is the basis of the proposed measurement. To test the validity of the concept two analyses are presented. One compares the integrity of two social sciences, sociology and economics, and a natural science, biophysics. The results are in line with the widely held opinion, that because of the lack in cumulative and consensual knowledge production mechanisms the social sciences are more disintegrated. Sociology is considerably more fragmented then economics, as the different paradigm structure of these disciplines would predict. As a second test, the fragmentation of scholarly communication inside and between the sub-fields of sociology is measured. The results correctly indicate that meaning making processes are taking place inside invisible colleges.
The article focuses on evolution of scientific publications released in the Baltic States (Lithuania, Latvia and Estonia) and refers to international databases that contain scientific papers produced over the last 20 years of independence. The countries share the same history of restoration of independence after 40 years of occupation. The article shall specifically focus on the period of post EU accession in 2004. It will discuss the contribution of Kaunas University of Technology, Vilnius Gediminas Technical University, Riga Technical University and Tallinn University of Technology to the total number of publications in these countries. The investigation was based on databases of Thomson Reuters Web of Science, Essential Science Indicators and Journal Citation Report. Additionally, it employed the Scimago ranking system based on Scopus database. Data analysis also involved similar indices that provide the number of papers and their citation results as well as the average number of citations per paper.
The power-law distribution and the Garfield's Law of Concentration of journal citation have long been verified by empirical data. As a relatively new type of reference, the URL references are cited more and more frequently in the scientific papers and their distribution is proved to fit for the Garfield's Law of Concentration too. In this article, we collect three URL references datasets extracted from papers written by researchers belonging to three big research groups : Chinese Academy of Sciences, Max Planck Institute, and the whole Chinese scientific researchers. Through the curve-fitting with SPSS and contrast the results with the judgment standard of power-law distribution, we verify that there also exists power-law distribution in the citation frequency of hostnames in these three URL references datasets. And our experimental results show that the range of power exponent in the journal references and the URL references are different. Started from the concrete empirical procedures and the final experimental results, we analyze four factors that may lead to this difference between journal references and URL references: the sample size, the sampling method, the concentration of citation and the type property of citation.
The study of citation distribution provides retrospective and prospective picture of the evolving impact of a corpus of publications on knowledge community. All distribution models agree on the rise of the number of citations in the first years following the publication to reach a peak and then tend to be less cited when time passes. However, questions such as how long it will continue being cited and what is objectively the rate of the decline remain unanswered. Built up of simple polynomial function, the proposed model is proven to be suitable to represent the observed citation distribution over time and to interestingly identify with accuracy when the major loss of citations happens. I calculate from the model the 'residual citations' representing the citations kept after a long time period after publication year. I demonstrate that the residual citations may be greater than or equal to zero, meaning that the 'life-cycle' of the corpus is infinite, contrary to what some researches termed to be around 21 years. This model fits the observed data from SCI according to R-sq which is greater than 98.9%. Rather, it is very simple and easy to implement and can be used by not highly-skilled scientometric users. Finally, the model serves as a citation predictive tool for a corpus by determining the citations that would obtain at any time of its life-cycle.
Patents constitute an up-to-date source of competitive intelligence in technological development; thus, patent analysis has been a vital tool for identifying technological trends. Patent citation analysis is easy to use, but fundamentally has two main limitations: (1) new patents tend to be less cited than old ones and may miss citations to contemporary patents; (2) citation-based analysis cannot be used for patents in databases which do not require citations. Naturally, citation-based analysis tends to underestimate the importance of new patents and may not work in rapidly-evolving industries in which technology life-cycles are shortening and new inventions are increasingly patented worldwide. As a remedy, this paper proposes a patent network based on semantic patent analysis using subject-action-object (SAO) structures. SAO structures represent the explicit relationships among components used in a patent, and are considered to represent key concepts of the patent or the expertise of the inventor. Based on the internal similarities between patents, the patent network provides the up-to-date status of a given technology. Furthermore, this paper suggests new indices to identify the technological importance of patents, the characteristics of patent clusters, and the technological capabilities of competitors. The proposed method is illustrated using patents related to synthesis of carbon nanotubes. We expect that the proposed procedure and analysis will be incorporated into technology planning processes to assist experts such as researchers and R&D policy makers in rapidly-evolving industries.
National research evaluation exercises provide a comparative measure of research performance of the nation's institutions, and as such represent a tool for stimulating research productivity, particularly if the results are used to inform selective funding by government. While a school of thought welcomes frequent changes in evaluation criteria in order to prevent the subjects evaluated from adopting opportunistic behaviors, it is evident that the "rules of the game'' should above all be functional towards policy objectives, and therefore be known with adequate forewarning prior to the evaluation period. Otherwise, the risk is that policy-makers will find themselves faced by a dilemma: should they reward universities that responded best to the criteria in effect at the outset of the observation period or those that result as best according to rules that emerged during or after the observation period? This study verifies if and to what extent some universities are penalized instead of rewarded for good behavior, in pursuit of the objectives of the "known'' rules of the game, by comparing the research performances of Italian universities for the period of the nation's next evaluation exercise (2004-2008): first as measured according to criteria available at the outset of the period and next according to those announced at the end of the period.
This paper provides a quantitative assessment of the scientific and technological productivity of FP6 projects by exploiting a new database on articles and patents resulting from EU funded projects. Starting from the FP6, the design of the European technology policy has undergone significant changes with the introduction of new funding instruments aimed at achieving a "critical mass'' of resources. Our empirical results provide support to the concerns, expressed by several observers, regarding the fact that the new funding instruments may have resulted in artificially "too large'' research consortia. The available empirical evidence shows that scientific productivity increases with the number of participants following a U-inverted shape, thereby indicating the existence of decreasing marginal returns to an increase in the size of research consortia. A second key result of the paper is related to the existence of significant differences of performance among funding instruments. In particular, after accounting for the larger amount of resources allocated to them, Integrated Projects perform less well in terms of scientific output than both STRePs and Networks of Excellence and they do not exhibit a superior performance than STRePs in terms of patent applications.
This paper addresses the profiling of research papers on 'standardization and innovation'-exploring major topics and arguments in this field. Drawing on 528 papers retrieved from the database, Web of Science, we employed trend, factor, and clustering analyses to demonstrate that the standardization and innovation research has continuously grown from publication of 13 papers in 1995 to 68 papers in 2008; the majority of these papers have been published in the six subject group domains of management, economics, environment, chemistry, computer science, and telecommunications. Technology innovation management specialty journals are the most central sources favorable for these themes. We also present an exploratory taxonomy that offers nine topical clusters to demonstrate the contextual structures of standardization and innovation. The implications of our results for ongoing consistent policy and future research into standardization and innovation are discussed.
Evaluating the career of individual scientists according to their scientific output is a common bibliometric problem. Two aspects are classically taken into account: overall productivity and overall diffusion/impact, which can be measured by a plethora of indicators that consider publications and/or citations separately or synthesise these two quantities into a single number (e.g. h-index). A secondary aspect, which is sometimes mentioned in the rules of competitive examinations for research position/promotion, is time regularity of one researcher's scientific output. Despite the fact that it is sometimes invoked, a clear definition of regularity is still lacking. We define it as the ability of generating an active and stable research output over time, in terms of both publications/quantity and citations/diffusion. The goal of this paper is introducing three analysis tools to perform qualitative/quantitative evaluations on the regularity of one scientist's output in a simple and organic way. These tools are respectively (1) the PY/CY diagram, (2) the publication/citation Ferrers diagram and (3) a simplified procedure for comparing the research output of several scientists according to their publication and citation temporal distributions (Borda's ranking). Description of these tools is supported by several examples.
The notion of 'core documents', first introduced in the context of co-citation analysis and later re-introduced for bibliographic coupling, refers to the representation of the core of a publication set according to given criteria. In the present study, the notion of core documents is extended to the combination of citation-based and textual links. It is shown that core documents defined this way can be used to represent and describe document clusters and topics at different levels of aggregation. Methodology is illustrated using the example of two ISI Subject Categories selected from applied and social sciences.
Brazilian science has increased fast during the last decades. An example is the increasing in the country's share in the world's scientific publication within the main international databases. But what is the actual weight of international publications to the whole Brazilian productivity? In order to respond this question, we have elaborated a new indicator, the International Publication Ratio (IPR). The data source was Lattes Database, a database organized by one of the main Brazilian S&T funding agency, which encompasses publication data from 1997 to 2004 of about 51,000 Brazilian researchers. Influences of distinct parameters, such as sectors, fields, career age and gender, are analyzed. We hope the data presented may help S&T managers and other S&T interests to better understand the complexity under the concept scientific productivity, especially in peripheral countries in science, such as Brazil.
Bibliometric measures based on citations are widely used in assessing the scientific publication records of authors, institutions and journals. Yet currently favored measures lack a clear theoretical foundation and are known to have counter-intuitive properties. The paper proposes a new approach that is grounded on a theoretical "influence function,'' representing explicit prior beliefs about how citations reflect influence. Conditions are derived for robust qualitative comparisons of influence-conditions that can be implemented using readily-available data. Two examples are provided, one using the world's top-10 economics department, the other using the top-10 economics journals.
Only a few cases of systematic empirical research have been reported investigating collaborative knowledge production in China and its implications for China's national and regional innovation system. Using Chinese patent data in the US Patent and Trademark Office (USPTO), this paper examines the geographic variations in intraregional, inter-regional and international knowledge exchanges of China from 1985 to 2007. Degree centrality reveals that intraregional and international collaborations are the main channels of knowledge exchange for the provinces and municipalities of China while inter-regional knowledge exchange is relatively weak. Besides, over the two decades, the knowledge exchange network has been expanding (connecting an increasing number of provinces and countries), becoming more decentralized (increasing number of hubs) and more cohesive (more linkages). A blockmodel analysis further reveals that the inter-regional network of China begins to show characteristics of a core-periphery structure. The most active knowledge exchange occurs between members of the core block composed by the most advanced provinces while the members of the peripheral block from less favored regions have few or no local and extra-local knowledge exchange. Building a strong knowledge transfer network would much improve the innovation capacities in less favored regions and help them break out from their "locked-in" development trajectories.
If we have two information production processes with the same h-index, random removal of items causes one system to have a higher h-index than the other system while random removal of sources causes the opposite effect. In a Lotkaian framework we prove formulae for the h-index in case of random removal of items and in case of random removal of sources. In conclusion, we warn for the use of the h-index in case of incomplete data sets.
This paper studies the production of dissertations in eight research fields in the natural sciences, the social sciences and the humanities. In using doctoral dissertations it builds on De Solla Prices seminal study which used PhD dissertations as one of several indicators of scientific growth (Price, Little science, big science, 1963). Data from the ProQuest: Dissertations and Theses database covering the years 1950-2007 are used to depict historical trends, and the Gompertz function was used for analysing the data. A decline in the growth of dissertations can be seen in all fields in the mid-eighties and several fields show only a modest growth during the entire period. The growth profiles of specific disciplines could not be explained by traditional dichotomies such as pure/applied or soft/hard, but rather it seems that the age of the discipline appears to be an important factor. Thus, it is obvious that the growth of dissertations must be explained using several factors emerging both inside and outside academia. Consequently, we propose that the output of dissertations can be used as an indicator of growth, especially in fields like the humanities, where journal or article counts are less applicable.
This paper studies evidence from Thomson Scientific (TS) about the citation process of 3.7 million articles published in the period 1998-2002 in 219 Web of Science (WoS) categories, or sub-fields. Reference and citation distributions have very different characteristics across sub-fields. However, when analyzed with the Characteristic Scores and Scales (CSS) technique, which is replication and scale invariant, the shape of these distributions over three broad categories of articles appears strikingly similar. Reference distributions are mildly skewed, but citation distributions with a 5-year citation window are highly skewed: the mean is 20 points above the median, while 9-10% of all articles in the upper tail account for about 44% of all citations. The aggregation of sub-fields into disciplines and fields according to several aggregation schemes preserve this feature of citation distributions. It should be noted that when we look into subsets of articles within the lower and upper tails of citation distributions the universality partially breaks down. On the other hand, for 140 of the 219 sub-fields the existence of a power law cannot be rejected. However, contrary to what is generally believed, at the sub-field level the scaling parameter is above 3.5 most of the time, and power laws are relatively small: on average, they represent 2% of all articles and account for 13.5% of all citations. The results of the aggregation into disciplines and fields reveal that power law algebra is a subtle phenomenon.
Graphene is a rising star as one of the promising materials with many applications. Its global literature increased fast in recent years. In this work, bibliometric analysis and knowledge visualization technology were applied to evaluate global scientific production and developing trend of graphene research. The data were collected from 1991 to 2010 from the Science Citation Index database, Conference Proceeding Citation Index database and Derwent Innovation Index database integrated by Thomson Reuters. The published papers from different subjects, journals, authors, countries and keywords distributed in several aspects of research topics proved that graphene research increased rapidly over past 20 years and boosted in recent 5 years. The distinctions in knowledge map show that the clusters distributed regularly in keywords of applied patents in recent 5 years due to the potential applications of graphene research gradually found. The analytical results provided several key findings of bibliometrics trend.
First order digits in data sets of natural and social data often follow a distribution called Benford's law. We studied the number of articles published, citations received and impact factors of all journals indexed in the Science Citation Index from 1998 to 2007. We tested their compliance with Benford's law. Citations data followed Benford's law remarkably well in all years studied. However, for the data on the numbers of articles, the differences between the values predicted by Benford's law and the observed values were always statistically significant. This was also the case for most data for impact factors.
This study applies patent analysis to discuss the influences of the three aspects of patent trait-a firm's revealed technology advantage in its most important technological field (RTA(MIT)), relative patent position in its most important technological field (RPPMIT), and patent share in its most important technological field (PSMIT)-upon corporate growth and discusses the moderation effect of relative growth rate of its most important technological field (RGR(MIT)) in the American pharmaceutical industry. The results demonstrate that the three relationships between corporate growth and the three aspects of patent trait are positive, and verify that RGR(MIT) moderates the three relationships. This study suggests that pharmaceutical companies should enhance their R&D capabilities, the degree of leading position, and concentration of R&D investment in their most important technological fields to increase their growth. Finally, this study classifies the pharmaceutical companies into four types, and provides some suggestions to them.
This article describes the results of a network analysis based on the citation among Communication journals and those academic disciplines that are cited by those journals labeled as "Communication" by the Web of Science. The results indicate that the journals indexed solely as Communication rather than those also tagged as another social science are more central in the citation network. Further, a cluster analysis of the cited disciplines revealed three groupings, a micro psychological cluster, a macro socio-political group and a woman's studies clique. A two-mode network analysis found that the most central Communication journals cited multiple clusters, while the peripheral journals cited only one, suggesting that the structure of influence on the field of Communication is more complex than suggested by Park and Leydesdorff (Scientometrics 81(1):157-175, 2009). Also, the results indicate that the macro cluster is about twice as influential as the micro cluster, rather than as Park and Leydesdorff suggest that Psychology is the discipline's primary influence.
Understanding the nature and dynamics of Africa's collaborative research networks is critical for building and integrating the African innovation system. This paper investigates the collaborative structure of the African research systems, with focus on regions and integration. Drawing on a bibliometric analysis of co-authorship of African research publications in 2005-2009, we propose an empirically derived grouping of African research community into three distinct research regions: Southern-Eastern, Western, and Northern. The three regions are established and defined in terms of active co-authorship clusters within Africa, as well as through co-authorship links with non-African countries and regions. We examine co-authorship links both at the national and city levels in order to provide a robust and nuanced empirical basis for the three African research regions. The collaboration patterns uncovered cast light on the emerging innovation systems in Africa by pointing out the differing national, regional, and global roles of countries and cities within collaborative research networks. Lack of research capabilities is the primary factor arresting the development of African innovation systems, but our analysis also suggests that Africa's internal research collaboration suffers from structural weaknesses and uneven integration. We also identify that South Africa, and some emerging new research hubs, hold critical networking function for linking African researchers.
We applied a set of standard bibliometric indicators to monitor the scientific state-of-arte of 500 universities worldwide and constructed a ranking on the basis of these indicators (Leiden Ranking 2010). We find a dramatic and hitherto largely underestimated language effect in the bibliometric, citation-based measurements of research performance when comparing the ranking based on all Web of Science (WoS) covered publications and on only English WoS covered publications, particularly for Germany and France.
Using aggregated journal-journal citation networks, the measurement of the knowledge base in empirical systems is factor-analyzed in two cases of interdisciplinary developments during the period 1995-2005: (i) the development of nanotechnology in the natural sciences and (ii) the development of communication studies as an interdiscipline between social psychology and political science. The results are compared with a case of stable development: the citation networks of core journals in chemistry. These citation networks are intellectually organized by networks of expectations in the knowledge base at the specialty (that is, above-journal) level. The "structuration" of structural components (over time) can be measured as configurational information. The latter is compared with the Shannon-type information generated in the interactions among structural components: the difference between these two measures provides us with a measure for the redundancy generated by the specification of a model in the knowledge base of the system. This knowledge base incurs (against the entropy law) to variable extents on the knowledge infrastructures provided by the observable networks of relations.
The database management technology has played a vital role in the advancements of the information technology field. Database researchers are one of the key players and main sources to the growth of the database systems. They are playing a foundational role in creating the technological infrastructure from which database advancements evolve. We analyze the database research publications of nine top-tier and prestigious database research venues. In particular, we study the publications of four major core database technology conferences (SIGMOD, VLDB, ICDE, EDBT), two main theoretical database conferences (PODS, ICDT) and three database journals (TODS, VLDB Journal, TKDE) over a period of 10 years (2001-2010). Our analysis considers only regular papers as we do not include short papers, demo papers, posters, tutorials or panels into our statistics. In this study, we report the list of the authors with the highest number of publications for each conference/journal separately and in combined. We analyze the preference of the database research community towards publishing their work in prestigious conferences or major database journals. We report about the most successful co-authorship relationships in the database research community in the last decade. Finally, we analyze the growth in the number of research publications and the size of the research community in the last decade.
Bibliographic databases are frequently used and analysed for the purpose of assessing the capacity and performance of individual researchers or entire research systems. Many of the advantages and disadvantages are the subject of continued discussion in the relevant literature, although only rarely with respect to the regional dimension of scientific publication activity. The importance of the regional dimension of science is reflected in many theoretical concepts, ranging from innovation system theories to territorial cluster concepts and learning regions. This article makes use of the extensive information found in bibliographic data and assesses the reliability of this information as a proxy indicator for the spatial dimension of scientific collaboration in emerging economies. This is undertaken using the example of the emerging field of biotechnology in China from 2000 onwards. Two data sets have been prepared: (1) the frequently used ISI Web of Knowledge database (SCI-Expanded) and (2) the domestic Chinese Chongqing VIP database. Both data sources were analysed using a variety of bibliometric and network scientific methods. The structural and topological similarity of networks, built from co-authorship data, is apparent between the two databases. At an abstract level, general network forces are present, resulting in similar network sizes, clustering, or assortativity. However, introducing additional complexity through regional subdivision reveals many differences between the two data sources that must be accounted for in the analytic design of future scientometric research in dynamic spaces.
Quality, Quantity, Performance,aEuro broken vertical bar An unresolved challenge in performance evaluation in a very general context that goes beyond scientometrics, has been to determine a single indicator that can combine quality and quantity of output or outcome. Toward this end, we start from metaphysical considerations and propose introducing a new name called Quasity to describe those quantity terms which incorporate a degree of quality and best measures the output. The product of quality and quasity then becomes an energy term which serves as a performance indicator. Lessons from kinetics, bibliometrics and sportometrics are used to build up this theme.
A bibliometric analysis of the 50 most frequently publishing Spanish universities shows large differences in the publication activity and citation impact among research disciplines within an institution. Gini Index is a useful measure of an institution's disciplinary specialization and can roughly categorize universities in terms of general versus specialized. A study of the Spanish academic system reveals that assessment of a university's research performance must take into account the disciplinary breadth of its publication activity and citation impact. It proposes the use of graphs showing not only a university's article production and citation impact, but also its disciplinary specialization. Such graphs constitute both a warning and a remedy against one-dimensional approaches to the assessment of institutional research performance.
In the current scenario of the global economy and race for the next Asian super power, overall economic strength of the two countries, India and China, is a most debated topic. The future role of intellectual property protection especially in the form of patent system and the growth of industrialization for these two developing economies in ASIA may prove to be crucial over all other assets. In the current development scene of the changing global market supported by intangible asset of inventions protected mainly through the patents is emerging to play an important role. This paper elaborates the statistical research on patents granted/filed in the US Patent and Trade Mark office (US-PTO), PCT of WIPO and in the home countries over last 35 years of aforesaid two Asian countries. It is found that the economic and technological growth of both of the countries may make main difference primarily based on the level of patenting activity by them.
To predict the success of an analgesic drug we have suggested a bibliometric indicator, the Top Journals Selectivity Index (TJSI) (Kissin, Scientometrics, 86:785-795, 2011). It represents the ratio (as %) between the number of all types of articles on a particular drug in the top 20 biomedical journals and the number of articles on that drug in all (> 5,000) journals covered by Medline over the first 5 years after a drug's introduction. For example, the highest TJSI score among analgesics was that of sumatriptan, the most successful drug for the treatment of migraine. The aim of this study was to demonstrate that TJSI may be used not only in the field of analgesics, but also for various other categories of drugs. The study tested two hypotheses. First, the difference between the most successful and less successful drugs in any pharmacological class can be reliably detected by TJSI. Second, drugs with TJSI indicators as high as that of sumatriptan can be found among other pharmacological classes as well. Drugs from various pharmacological classes approved by the Federal Drug Administration (FDA) during the 10-year period, 1980-1989, were used in this study. Two groups of 10 drugs were selected to test the first hypothesis. One group included the most successful (breakthrough) drugs; the other included less successful drugs matched with the breakthrough drugs according to mechanism of action. The difference between the two groups was compared using three publication indices: the TJSI, the number of all types of articles on a drug in journals presented by Medline (AJI), and the number of articles covering only randomized controlled trials (RCT). It was found that TJSI can detect the difference between the two groups of drugs better than the two other indices. TJSI detected the difference between a breakthrough drug and its less successful counterpart at least 69% of the time with 95% confidence. With the other two indices the difference was not distinguishable from random chance. Some of the breakthrough drugs (zidovudine, omeprazole, lovastatin) have TSJIs as high or even higher than that of sumatriptan (19.2 vs. 23.0, 21.4, and 20.6, respectively). In conclusion, TJSI can be useful not only in the field of analgesics, but also with drugs belonging to other pharmacological classes.
The study presents a time-series analysis of field-standardized average impact of Italian research compared to the world average. The approach is purely bibliometric, based on census of the full scientific production from all Italian public research organizations active in 2001-2006 (hard sciences only). The analysis is conducted both at sectorial level (aggregated, by scientific discipline and for single fields within disciplines) and at organizational level (by type of organization and for single organizations). The essence of the methodology should be replicable in all other national contexts. Its offers support to policy-makers and administrators for strategic analysis aimed at identifying strengths and weaknesses of national research systems and institutions.
This study applies bibliometric analysis to investigate the quantity and citation impact of scientific papers in the field of complementary and alternative medicine (CAM). The data are collected from 19 CAM journals in the Science Citation Index Expanded (SCI-E) database during 1980-2009, and 17,002 papers are identified for analysis. The study analyzes the document types, geographical and institutional distribution of the authorship, including international scientific collaboration. This study suggests that the major type of document is original article. The CAM papers are mostly published by North America, East Asia, and European countries, of which publications authored in East Asia are cited most. Country-wise, major contributors of CAM papers are from USA, People's Republic of China, India, England and Germany. India has the highest CPP value, attracting high attentions in CAM community. This article also finds that international co-authorship in the CAM field has increased rapidly during this period. In addition, internationally collaborated publications generate higher citation impact than papers published by authors from single country. Finally, the research identifies productive institutions in CAM, and China Medical University located in Taiwan is the most productive organization.
Commemorating the 100th death anniversary of Francis Galton, this paper is a bibliometric impact analysis of the works of this outstanding scientist and predecessor of scientometrics. Citation analysis was done in Web of Science, Scopus and Google Scholar (Publish or Perish) in order to retrieve the most cited books and journal articles. Additionally references were identified where Galton was rather mentioned than cited in order to analyze the phenomenon of obliteration by incorporation. Finally occurrence counts of Galton's works in obituaries, Festschrift, the website Galton.org, major encyclopaedias and biographical indexes were compared to citation counts. As an outcome Galton's works are increasingly cited or mentioned. Obliteration (use of eponyms) applies to one-third of Galton's works and seems to be typical for fields like mathematics or statistics, whereas citations are more common in psychology. The most cited books and journal articles are also the most mentioned with remarkable correlation. Overall citation analysis and occurrence counting are complementary useful methods for the impact analysis of the works of "giants".
Title of an article can be descriptive, declarative or a question. It plays important role in both marketing and findability of article. We investigate the impact of the type of article titles on the number of citations and downloads articles receive. Number of downloads and citations for all articles published in six of PLoS (Public Library of Science) journals (2,172 articles) were obtained from PLoS and type of each article's title (including descriptive, indicative and question) was determined as well as the number of substantive words in title (title length). Statistical difference and correlation tests were carried out. The findings showed that differences exist between articles with different types of titles in terms of downloads and citations, especially articles with question titles tended to be downloaded more but cited less than the others. Articles with longer titles were downloaded slightly less than the articles with shorter titles. Titles with colon tended to be longer and receive fewer downloads and citations. As expected, number of downloads and citations were positively correlated.
In this short communication we give critical comments on the paper of Perakakis et al. (Scientometrics 85(2):553-559, 2010) on "Natural selection of academic papers". The criticism mainly focusses on their unbalanced criticism of peer review and their negative evaluation of the link of peer review with commercial publishing.
Teenagers are among the most prolific users of social network sites (SNS). Emerging studies find that youth spend a considerable portion of their daily life interacting through social media. Subsequently, questions and controversies emerge about the effects SNS have on adolescent development. This review outlines the theoretical frameworks researchers have used to understand adolescents and SNS. It brings together work from disparate fields that examine the relationship between SNS and social capital, privacy, youth safety, psychological well-being, and educational achievement. These research strands speak to high-profile concerns and controversies that surround youth participation in these online communities, and offer ripe areas for future research.
Limited research has investigated the role of multitasking, cognitive coordination, and cognitive shifts during web search. Understanding these three behaviors is crucial to web search model development. This study aims to explore characteristics of multitasking behavior, types of cognitive shifts, and levels of cognitive coordination as well as the relationship between them during web search. Data collection included pre- and postquestionnaires, think-aloud protocols, web search logs, observations, and interviews with 42 graduate students who conducted 315 web search sessions with 221 information problems. Results show that web search is a dynamic interaction including the ordering of multiple information problems and the generation of evolving information problems, including task switching, multitasking, explicit task and implicit mental coordination, and cognitive shifting. Findings show that explicit task-level coordination is closely linked to multitasking, and implicit cognitive-level coordination is related to the task-coordination process; including information problem development and task switching. Coordination mechanisms directly result in cognitive state shifts including strategy, evaluation, and view states that affect users' holistic shifts in information problem understanding and knowledge contribution. A web search model integrating multitasking, cognitive coordination, and cognitive shifts (MCC model) is presented. Implications and further research also are discussed.
As the web has grown into an integral part of daily life, social annotation has become a popular manner for web users to manage resources. This method of management has many potential applications, but it is limited in applicability by the cold-start problem, especially for new resources on the web. In this article, we study automatic tag prediction for web pages comprehensively and utilize the predicted tags to improve search performance. First, we explore the stabilizing phenomenon of tag usage in a social bookmarking system. Then, we propose a two-stage tag prediction approach, which is efficient and is effective in making use of early annotations from users. In the first stage, content-based ranking, candidate tags are selected and ranked to generate an initial tag list. In the second stage, random-walk re-ranking, we adopt a random-walk model that utilizes tag co-occurrence information to re-rank the initial list. The experimental results show that our algorithm effectively proposes appropriate tags for target web pages. In addition, we present a framework to incorporate tag prediction in a general web search. The experimental results of the web search validate the hypothesis that the proposed framework significantly enhances the typical retrieval model.
The primary webometric method for estimating the online impact of an organization is to count links to its website. Link counts have been available from commercial search engines for over a decade but this was set to end by early 2012 and so a replacement is needed. This article compares link counts to two alternative methods: URL citations and organization title mentions. New variations of these methods are also introduced. The three methods are compared against each other using Yahoo!. Two of the three methods (URL citations and organization title mentions) are also compared against each other using Bing. Evidence from a case study of 131 UK universities and 49 US Library and Information Science (LIS) departments suggests that Bing's Hit Count Estimates (HCEs) for popular title searches are not useful for webometric research but that Yahoo!'s HCEs for all three types of search and Bing's URL citation HCEs seem to be consistent. For exact URL counts the results of all three methods in Yahoo! and both methods in Bing are also consistent. Four types of accuracy factors are also introduced and defined: search engine coverage, search engine retrieval variation, search engine retrieval anomalies, and query polysemy.
The objective of this research is to examine the interaction of institutions, based on their citation and collaboration networks. The domain of library and information science is examined, using data from 1965-2010. A linear model is formulated to explore the factors that are associated with institutional citation behaviors, using the number of citations as the dependent variable, and the number of collaborations, physical distance, and topical distance as independent variables. It is found that institutional citation behaviors are associated with social, topical, and geographical factors. Dynamically, the number of citations is becoming more associated with collaboration intensity and less dependent on the country boundary and/or physical distance. This research is informative for scientometricians and policy makers.
Bias quantification of retrieval functions with the help of document retrievability scores has recently evolved as an important evaluation measure for recall-oriented retrieval applications. While numerous studies have evaluated retrieval bias of retrieval functions, solid validation of its impact on realistic types of queries is still limited. This is due to the lack of well-accepted criteria for query generation for estimating retrievability. Commonly, random queries are used for approximating documents retrievability due to the prohibitively large query space and time involved in processing all queries. Additionally, a cumulative retrievability score of documents over all queries is used for analyzing retrieval functions (retrieval) bias. However, this approach does not consider the difference between different query characteristics (QCs) and their influence on retrieval functions' bias quantification. This article provides an in-depth study of retrievability over different QCs. It analyzes the correlation of lower/higher retrieval bias with different query characteristics. The presence of strong correlation between retrieval bias and query characteristics in experiments indicates the possibility of determining retrieval bias of retrieval functions without processing an exhaustive query set. Experiments are validated on TREC Chemical Retrieval Track consisting of 1.2 million patent documents.
The popularity of online and anonymous options to report crimes, such as tips websites and text messaging, has led to an increasing amount of textual information available to law enforcement personnel. However, locating, filtering, extracting, and combining information to solve crimes is a time-consuming task. In response, we are developing entity and document similarity algorithms to automatically identify overlapping and complementary information. These are essential components for systems that combine and contrast crime information. The entity similarity algorithm integrates a domain-specific hierarchical lexicon with Jaccard coefficients. The document similarity algorithm combines the entity similarity scores using a Dice coefficient. We describe the evaluation of both components. To evaluate the entity similarity algorithm, we compared the new algorithm and four generic algorithms with a gold standard. The strongest correlation with the gold standard, r = 0.710, was found with our entity similarity algorithm. To evaluate the document similarity algorithm, we first developed a test bed containing witness reports for 17 crimes shown in video clips. We evaluated five versions of the algorithm that differ in how much importance is assigned to different entity types. Cosine similarity is then used as a baseline comparison to evaluate the performance of the document similarity algorithms for accuracy in recognizing reports describing the same crime and distinguishing them from reports on different crimes. The best version achieved 92% accuracy.
This article studies one of the main bottlenecks in providing more effective information access: the poverty on the query end. We explore whether users can classify keyword queries into categories from the DMOZ directory on different levels and whether this topical context can help retrieval performance. We have conducted a user study to let participants classify queries into DMOZ categories, either by freely searching the directory or by selection from a list of suggestions. Results of the study show that DMOZ categories are suitable for topic categorization. Both free search and list selection can be used to elicit topical context. Free search leads to more specific categories than the list selections. Participants in our study show moderate agreement on the categories they select, but broad agreement on the higher levels of chosen categories. The free search categories significantly improve retrieval effectiveness. The more general list selection categories and the top-level categories do not lead to significant improvements. Combining topical context with blind relevance feedback leads to better results than applying either of them separately. We conclude that DMOZ is a suitable resource for interacting with users on topical categories applicable to their query, and can lead to better search results.
Owing to the inherent difficulty in obtaining experimental data from wikis, past quantitative wiki research has largely focused on Wikipedia, limiting the ability to generalize such research. To facilitate the analysis of wikis other than Wikipedia, we developed WikiCrawler, a tool that automatically gathers research data from public wikis without supervision. We then built a corpus of 151 wikis, which we have made publicly available. Our analysis indicated that these wikis display signs of collaborative authorship, validating them as objects of study. We then performed an initial analysis of the corpus and discovered some similarities with Wikipedia, such as users contributing at unequal rates. We also analyzed distributions of edits across pages and users, resulting in data which can motivate or verify mathematical models of behavior on wikis. By providing data collection tools and a corpus of already-collected data, we have completed an important first step for investigations that analyze user behavior, establish measurement baselines for wiki evaluation, and generalize Wikipedia research by testing hypotheses across many wikis.
Digital library evaluation is a complex field, as complex as the phenomena it studies. The interest of the digital library society still remains vibrant after all these years of solidification, as these systems have entered real-life applications. However the community has still to reach a consensus on what evaluation is and how it can effectively be planned. In the present article, an ontology of the digital library evaluation domain, named DiLEO, is proposed, aiming to reveal explicitly the main concepts of this domain and their correlations, and it tries to combine creatively and integrate several scientific paradigms, approaches, methods, techniques, and tools. This article demonstrates the added value features of the ontology, which are the support of comparative studies between different evaluation initiatives and the assistance in effective digital library evaluation planning.
This article explores location-based questions, local knowledge, and the implications stemming from these concepts for digital reference staff in consortial question-answering services. Location-based questions are inquiries that concern a georeferencable site. Digital reference personnel staffing the statewide chat reference consortium used in this study respond to location-based questions concerning over 100 participating information agencies. Some literature has suggested that nonlocal digital reference staff have difficulties providing accurate responses to location-based questions concerning locations other than their own. This study utilized content analysis to determine the quantity of location-based questions and the question-negotiation process in responding to location-based questions. Key findings indicate location-based questions comprised 50.2% of the total questions asked to the statewide service, 73.6% of location-based questions were responded to by nonlocal digital reference staff, and 37.5% of location-based questions ended in referral. This article's findings indicate that despite digital reference's capability to provide anyplace, anytime question-answering service, proximity to local knowledge remains relevant.
Consumer categorizations based on innovativeness were originally proposed by E. M. Rogers (2003) and remain of relevance for predicting purchasing behavior in high-tech domains such as consumer electronics. We extend such innovativeness-based categorizations in two directions: We first take into account the existence of technology clusters within product domains and then enrich the definition of consumer innovativeness by considering not only past adoption behavior but also future purchase intentions. We derive a novel consumer categorization based on data from a sample of 2,094 Dutch consumers for the case of consumer electronics. In so doing, we apply endogenous categorization techniques that represent a methodological improvement with respect to previously applied techniques.
We investigate the extent to which open-access (OA) journals and articles in biology, computer science, economics, history, medicine, and psychology are indexed in each of 11 bibliographic databases. We also look for variations in index coverage by journal subject, journal size, publisher type, publisher size, date of first OA issue, region of publication, language of publication, publication fee, and citation impact factor. Two databases, Biological Abstracts and PubMed, provide very good coverage of the OA journal literature, indexing 60 to 63% of all OA articles in their disciplines. Five databases provide moderately good coverage (22-41%), and four provide relatively poor coverage (0-12%). OA articles in biology journals, English-only journals, high-impact journals, and journals that charge publication fees of $1,000 or more are especially likely to be indexed. Conversely, articles from OA publishers in Africa, Asia, or Central/South America are especially unlikely to be indexed. Four of the 11 databases index commercially published articles at a substantially higher rate than articles published by universities, scholarly societies, nonprofit publishers, or governments. Finally, three databases-EBSCO Academic Search Complete, Pro-Quest Research Library, and Wilson OmniFile-provide less comprehensive coverage of OA articles than of articles in comparable subscription journals.
The annual citation counts of 1,172 articles published in 1985 by 13 American Psychological Association journals were analyzed over a 25-year period. Despite a 61% reduction in citation counts from the peak year (1989: Year 4) to the final year (2010: Year 25), many of these articles were still being cited 25 years after they had been published. When the sample was divided into four categories of impact using the total citation counts for each article-low impact (0-24 citations), moderate impact (25-99 citations), high impact (100-249 citations), and very high impact (250-1763 citations)-the yearly citation counts of low to high-impact articles peaked earlier and displayed a steeper decline than the yearly citation counts of very high-impact articles. Using 5 or more citations a years, 10 or more citations a year, and 20 or more citations a year as markers of moderate-impact, high-impact, and very high-impact articles, respectively, and using the most cited articles in a journal during the first 5 years of the follow-up period as indicators of high impact and very high impact showed promise of predicting impact over the entire 25-year period.
Contrary to what one might expect, Nobel laureates and Fields medalists have a rather large fraction (10% or more) of uncited publications. This is the case for (in total) 75 examined researchers from the fields of mathematics (Fields medalists), physics, chemistry, and physiology or medicine (Nobel laureates). We study several indicators for these researchers, including the h-index, total number of publications, average number of citations per publication, the number (and fraction) of uncited publications, and their interrelations. The most remarkable result is a positive correlation between the h-index and the number of uncited articles. We also present a Lotkaian model, which partially explains the empirically found regularities.
In this article, we report on an experiment to assess the possibility of rigorous evaluation of interactive question-answering (QA) systems using the cross-evaluation method. This method takes into account the effects of tasks and context, and of the users of the systems. Statistical techniques are used to remove these effects, isolating the effect of the system itself. The results show that this approach yields meaningful measurements of the impact of systems on user task performance, using a surprisingly small number of subjects and without relying on predetermined judgments of the quality, or of the relevance of materials. We conclude that the method is indeed effective for comparing end-to-end QA systems, and for comparing interactive systems with high efficiency.
Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond the bag-of-words approach to document ranking in a new perspective, by representing each document as a sequence of sentences. We begin with an assumption that relevant documents are distinguishable from nonrelevant ones by sequential patterns of relevance degrees of sentences to a query. We introduce the notion of relevance flow, which refers to a stream of sentence-query relevance within a document. We then present a framework to learn a function for ranking documents effectively based on various features extracted from their relevance flows and leverage the output to enhance existing retrieval models. We validate the effectiveness of our approach by performing a number of retrieval experiments on three standard test collections, each comprising a different type of document: news articles, medical references, and blog posts. Experimental results demonstrate that the proposed approach can improve the retrieval performance at the top ranks significantly as compared with the state-of-the-art retrieval models regardless of document type.
The overarching goal of this study is to place additional emphasis on the ability to observe, extend, and apply findings across different experimental studies and/or digital projects by providing further insight into some of the broader processes within interactive (user-centered) video retrieval. The variable at the center of this particular investigation included a user-centered definition of topic complexity, or "situated topic complexity," where its influence on other factors within the interactive video process, namely users' actions, satisfaction, performance, and judgments on other topical qualities, was analyzed. Findings revealed that as users' impressions of complexity increased during experimental search topics, search times and the number of user actions also increased, but even more compelling was the fact that users started to convincingly shift away from keyword (transcript) searching, as predominantly used for easy to moderate topics, to categorical and linear browse strategies at a clear point within the interactive process. Different users' assessments, regarding their search experiences, and topic performances were also varied and shown to be significantly correlated with the situated complexity of a given topic. Such findings are relevant to the design of digital libraries in that researchers should continue improving ways to deliver video through keyword search functions, but also further recognize the multidimensionality of video by offering other paths or interface options that provide support for highly complex needs and also for exploring the collection's boundaries on easier topics.
Overlapping structures in XML are not symptoms of a misunderstanding of the intrinsic characteristics of a text document nor evidence of extreme scholarly requirements far beyond those needed by the most common XML-based applications. On the contrary, overlaps have started to appear in a large number of incredibly popular applications hidden under the guise of syntactical tricks to the basic hierarchy of the XML data format. Unfortunately, syntactical tricks have the drawback that the affected structures require complicated workarounds to support even the simplest query or usage. In this article, we present Extremely Annotational Resource Description Framework (RDF) Markup (EARMARK), an approach to overlapping markup that simplifies and streamlines the management of multiple hierarchies on the same content, and provides an approach to sophisticated queries and usages over such structures without the need of adhoc applications, simply by using Semantic Web tools and languages. We compare how relevant tasks (e. g., the identification of the contribution of an author in a word processor document) are of some substantial complexity when using the original data format and become more or less trivial when using EARMARK. We finally evaluate positively the memory and disk requirements of EARMARK documents in comparison to Open Office and Microsoft Word XML-based formats.
This article examines the influence of task type on the users' preferred level of document elements (full articles, sections, or subsections) during interaction with an XML-version of Wikipedia. We found that in general articles and subsections seemed to be the most valuable elements for our test subjects. For information-gathering tasks, this tendency was stronger, whereas for fact-finding tasks, the sections seemed to play a more important role. We assume from this that users select different information search strategies for the two task types. When dealing with fact-finding tasks, users seem more likely to use one single element as an answer, while when they do information gathering, they pick information from several elements.
Patent citations added by examiners are often used as indicators of technological impact and knowledge flows, despite various criticisms. In this study we analyze the distribution of examiner patent citations according to patent characteristics in order to show their limitations. According to our findings, the number of applicant citations included is dependent on the science-base of the technology. However, this gets masked by the citations added by patent examiners, who smooth the distribution of citations across technology classes and include the same number of citations regardless of whether applicants cite any references. Some researchers have called for the use of applicant rather than examiner patent citations as indicators of technology impact and knowledge flows. Nevertheless, we show that the former also have important caveats, because applicants may increase the number of citations in international patents and when there are coapplicants. The implication is that analysts should consider a context-driven use of citation-based indicators.
We extracted gender-specific actions from text corpora and Twitter, and compared them with stereotypical expectations of people. We used Open Mind Common Sense (OMCS), a common sense knowledge repository, to focus on actions that are pertinent to common sense and daily life of humans. We use the gender information of Twitter users and web-corpus-based pronoun/name gender heuristics to compute the gender bias of the actions. With high recall, we obtained a Spearman correlation of 0.47 between corpus-based predictions and a human gold standard, and an area under the ROC curve of 0.76 when predicting the polarity of the gold standard. We conclude that it is feasible to use natural text (and a Twitter-derived corpus in particular) in order to augment common sense repositories with the stereotypical gender expectations of actions. We also present a dataset of 441 common sense actions with human judges' ratings on whether the action is typically/slightly masculine/feminine (or neutral), and another larger dataset of 21,442 actions automatically rated by the methods we investigate in this study.
Social tags have become an important tool for improving access to online resources, particularly non-text media. With the dramatic growth of user-generated content, the importance of tags is likely to grow. However, while tagging behavior is well studied, the relationship between tagging behavior and features of the media being tagged is not well understood. In this paper, we examine the relationship between tagging behavior and image type. Through a lab-based study with 51 subjects and an analysis of an online dataset of image tags, we show that there are significant differences in the number, order, and type of tags that users assign based on their past experience with an image, the type of image being tagged, and other image features. We present these results and discuss the significant implications this work has for tag-based search algorithms, tag recommendation systems, and other interface issues.
The standard data that we use when computing bibliometric rankings of scientists are their publication/ citation records, i.e., so many papers with 0 citation, so many with 1 citation, so many with 2 citations, etc. The standard data for bibliometric rankings of departments have the same structure. It is therefore tempting (and many authors gave in to temptation) to use the same method for computing rankings of scientists and rankings of departments. Depending on the method, this can yield quite surprising and unpleasant results. Indeed, with some methods, it may happen that the "best" department contains the "worst" scientists, and only them. This problem will not occur if the rankings satisfy a property called consistency, recently introduced in the literature. In this article, we explore the consequences of consistency and we characterize two families of consistent rankings.
International collaborative papers are increasingly common in journals of many disciplines. These types of papers are often cited more frequently. To identify the coauthorship trends within Library and Information Science (LIS), this study analyzed 7,489 papers published in six leading publications (ARIST, IP&M, JAMIA, JASIST, MISQ, and Scientometrics) over the last three decades. Logistic regression tested the relationships between citations received and seven factors: authorship type, author's subregion, country income level, publication year, number of authors, document type, and journal title. The main authorship type since 1995 was national collaboration. It was also the dominant type for all publications studied except ARIST, and for all regions except Africa. For citation counts, the logistic regression analysis found all seven factors were significant. Papers that included international collaboration, Northern European authors, and authors in high-income nations had higher odds of being cited more. Papers from East Asia, Southeast Asia, and Southern Europe had lower odds than North American papers. As discussed in the bibliometric literature, Merton's Matthew Effect sheds light on the differential citation counts based on the authors' subregion. This researcher proposes geographies of invisible colleagues and a geographic scope effect to further investigate the relationships between author geographic affiliation and citation impact.
Although neutrality has been extensively questioned as a design principle for document collections and their descriptive infrastructures, little research has investigated how this conceptual shift might affect the collection designer's task. This article describes the development and evaluation of a design process to author document collections with an acknowledged rhetorical purpose: collections with a design goal to persuasively communicate a position on their material to an identified audience. Following principles of design research, the process was developed via the creation of two prototype collections. The process was then implemented in a classroom setting. Over the course of a semester, 16 participants used the design process both as individuals and in teams to create rhetorically aware document collections. Although study participants successfully used the process to create collections that persuasively expressed a position on their subject matter, reflections on their design experiences showed that the student designers felt some ambivalence regarding the assumption of authorial power.
A theory-driven, older adult-oriented e-health literacy intervention was developed and tested to generate scientific knowledge about the potential impact of learning methods and information presentation channels. The experimental design was a 2x2x2 mixed factorial design with learning method (collaborative, individualistic) and presentation channel (visual only, visual plus auditory) as the between-subjects variables and time of measurement (pre-, post-) as the within-subjects variable. One hundred twenty-four older adults (age: M=68.15, SD=9.00) participated during September 2010-February 2011. No significant interaction or main effect of learning method and information presentation channel was found, suggesting the advantages of collaborative learning over individualistic learning or the redundancy effect might not be easily generalized to older adults in similar experimental conditions. Time of measurement had significant main effects on e-health literacy efficacy, perceived usefulness of e-health literacy skills, and e-health literacy skills (p<.001 in all three cases; power=1.00 or .98). These findings suggest that the intervention, regardless of its specific combination of learning method and information presentation channel, was effective in improving e-health literacy from pre- to postintervention. The findings contribute to the collaborative learning, multimedia learning, and e-health literacy literatures.
Interdisciplinarity has been studied using cognitive connections among individuals in corresponding domains, but rarely from the perspective of academic genealogy. This article utilizes academic genealogy network data from 3,038 PhD dissertations in Library and Information Science (LIS) over a span of 80 years (1930-2009) to describe interdisciplinary changes in the discipline. Aspects of academic pedigree of advisors and committee members are analyzed, such as country, school, and discipline of highest degree, to reveal the interdisciplinary features of LIS. The results demonstrate a strong history of mentors from fields such as education and psychology, a decreasing trend of mentors with LIS degrees, and an increasing trend in mentors receiving degrees in computer science, business, and communication, among other disciplines. This work proposes and explores the use of academic genealogy as an indicator of interdisciplinarity and calls for additional research on the role of doctoral committee composition in a student's subsequent academic career.
The editor's decision where and how to place items on a screen is crucial for the design of information displays, such as websites. We developed a statistical model that can facilitate automating this process by predicting the perceived importance of screen items from their location and size. The model was developed based on a 2-step experiment in which we asked participants to rate the importance of text articles that differed in size, screen location, and title size. Articles were either presented for 0.5 seconds or for unlimited time. In a stepwise regression analysis, the model's variables accounted for 65% of the variance in the importance ratings. In a validation study, the model predicted 85% of the variance of the mean apparent importance of screen items. The model also predicted individual raters' importance perception ratings. We discuss the implications of such a model in the context of automating layout generation. An automated system for layout generation can optimize data presentation to suit users' individual information and display preferences.
In this article, we present a study about classification methods for large-scale categorization of product offers on e-shopping web sites. We present a study about the performance of previously proposed approaches and deployed a probabilistic approach to model the classification problem. We also studied an alternative way of modeling information about the description of product offers and investigated the usage of price and store of product offers as features adopted in the classification process. Our experiments used two collections of over a million product offers previously categorized by human editors and taxonomies of hundreds of categories from a real e-shopping web site. In these experiments, our method achieved an improvement of up to 9% in the quality of the categorization in comparison with the best baseline we have found.
The presence of social networks in complex systems has made networks and community structure a focal point of study in many domains. Previous studies have focused on the structural emergence and growth of communities and on the topics displayed within the network. However, few scholars have closely examined the relationship between the thematic and structural properties of networks. Therefore, this article proposes the Tagger Tag Resource-Latent Dirichlet Allocation-Community model (TTR-LDA-Community model), which combines the Latent Dirichlet Allocation (LDA) model with the Girvan-Newman community detection algorithm through an inference mechanism. Using social tagging data from Delicious, this article demonstrates the clustering of active taggers into communities, the topic distributions within communities, and the ranking of taggers, tags, and resources within these communities. The data analysis evaluates patterns in community structure and topical affiliations diachronically. The article evaluates the effectiveness of community detection and the inference mechanism embedded in the model and finds that the TTR-LDA-Community model outperforms other traditional models in tag prediction. This has implications for scholars in domains interested in community detection, profiling, and recommender systems.
Despite the vitality and dynamism that the field of entrepreneurship has experienced in the last decade, the issue of whether it comprises an effective network of (in)formal communication linkages among the most influential scholars within the area has yet to be examined in depth. This study follows a formal selection procedure to delimit the 'relational environment' of the field of entrepreneurship and to analyze the existence and characterization of (in)visible college(s) based on a theoretically well-grounded framework, thus offering a comprehensive and up-to-date empirical analysis of entrepreneurship research. Based on more than a 1,000 papers published between 2005 and 2010 in seven core entrepreneurship journals and the corresponding (85,000) citations, we found that entrepreneurship is an (increasingly) autonomous, legitimate and cohesive (in)visible college, fine tuned through the increasing visibility of certain subject specialties (e.g., family business, innovation, technology and policy). Moreover, the rather dense formal links that characterize the entrepreneurship (in)visible college are accompanied by a reasonably solid network of informal relations maintained and sustained by the mobility of 'stars' and highly influential scholars. The limited internationalization of the entrepreneurship community, reflected in the almost total absence of non-English-speaking authors/studies/outlets, stands as a major quest for the field.
Nanosciences and nanotechnologies are considered important for the development of science, technology and innovation, and the study of their characters can be a great help to the decisions of policy makers and of practitioners. This work is centred on the issue of the time relations between science and technology/innovation, and in particular on the speed of transfer of science-generated knowledge towards its exploitation in patenting. A methodology based on patent citations is used in order to measure the time lag between cited journal articles and citing patent, and thus the time proximity between the two steps. Keywords regarding nanotechnology/nanoscience items are searched in order to collect data useful for the analysis. Collateral measures, performed on another class of materials and on the spatial origin of citing/cited documents, help giving evidence of the peculiarity of the behaviour and on its nature. The most representative time lag between production of scientific knowledge and its technological exploitation appears being around 3-4 years.
The aim of this paper is to identify the research paradigms on digital libraries in China while compared with that of international digital libraries research via scientometric analysis. Co-word network constructed by keywords in documents and their co-occurrence relationships is a kind of mapping knowledge domains, which represents the cognitive and intellectual structure of science. A total of 6068 and 1250 papers published between 1994 and 2010 were, respectively retrieved from the China National Knowledge Infrastructure (CNKI) and ScienceDirect databases with a topic search of digital libraries or digital library in abstracts of papers. This paper uses methods of co-word analysis, social network analysis and mapping knowledge domains as theory basis, with assistance of softwares of UCINET and Netdraw, to construct the co-word network of digital libraries/library research in China, present the study status quo and evolution on digital libraries/library in China and analyze the research paradigm structure of digital libraries/library in China.
This study builds the interdisciplinary knowledge network of China, which is used to catch the knowledge exchange structure of disciplines, and investigates the evolution process from 1981 to 2010. A network analysis was performed to examine the special structure and we compare state of the networks in different periods to determine how the network has got such properties. The dataset are get from the reference relationship in literature on important Chinese academic journals from 1980 to 2010. The analytical results reveal the hidden network structure of interdisciplinary knowledge flows in China and demonstrate that the network is highly connected and has a homogeneous link structure and heterogeneous weight distribution. Through comparing of the network in three periods, that is 1981-1990, 1991-2000 and 2001-2010, we find that the special evolution process, which is limited by the number of nodes, play an important influence on interdisciplinary knowledge flows.
This article seeks to examine the relationship between scientific output and knowledge economy index in 10 South East Asian countries (ASEAN). Using bibliometric data of the Institute of Scientific Information, we analyzed the number of scientific articles published in international peer-reviewed journals between 1991 and 2010 for Vietnam, Cambodia, Laos, Thailand, Myanmar, Malaysia, Indonesia, Brunei, the Philippines, and Singapore. During the 20-year period, scientists from the ASEAN countries have published 165,020 original articles in ISI indexed journals, which represents similar to 0.5% of the world scientific output. Singapore led the region with the highest number of publications (accounting for 45% of the countries' total publications), followed by Thailand (21%), Malaysia (16%), Vietnam (6%), Indonesia and the Philippines (5% each). The number of scientific articles from those countries has increased by 13% per year, with the rate of increase being highest in Thailand and Malaysia, and lowest in Indonesia and the Philippines. At the country level, the correlation between knowledge economy index and scientific output was 0.94. Based on the relationship between scientific output and knowledge economy, we identified 4 clusters of countries: Singapore as the first group; Thailand and Malaysia in the second group; Vietnam, Indonesia and the Philippines in the third group; and Cambodia, Laos, Myanmar and Brunei in the fourth group. These data suggested that there was a strong relationship between scientific research and the degree of "knowledgization" of economy.
Research activities and collaborations in nanoscale science and engineering have major implications for advancing technological frontiers in many fields including medicine, electronics, energy, and communication. The National Nanotechnology Initiative (NNI) promotes efforts to cultivate effective research and collaborations among nano scientists and engineers to accelerate the advancement of nanotechnology and its commercialization. As of August 2008, there have been over 800 products considered to benefit from nanotechnology directly or indirectly. However, today's accomplishments in nanotechnology cannot be transformed into commercial products without productive collaborations among experts from disparate research areas such as chemistry, physics, math, biology, engineering, manufacturing, environmental sciences, and social sciences. To study the patterns of collaboration, we build and analyze the collaboration network of scientists and engineers who conduct research in nanotechnology. We study the structure of information flow through citation network of papers authored by nano area scientists. We believe that the study of nano area co-author and paper citation networks improve our understanding of patterns and trends of the current research efforts in this field. We construct these networks based on the publication data collected for years ranging 1993 through 2008 from the scientific literature database "Web of Science". We explore those networks to find out whether they follow power-law degree distributions and/or if they have a signature of hierarchy. We investigate the small-world characteristics and the existence of possible community structures in those networks. We estimate the statistical properties of the networks and interpret their significance with respect to the nano field.
This study is based on the fact that the surnames of many Russian scientists have gender endings, with "a" denoting a female, so that the sex of most of them can be readily determined from the listing of authors in the Web of Science (WoS). A comparison was made between the proportion of females in 1985, 1995, and 2005, with a corresponding analysis of the major fields in which they worked, their propensity to co-author papers internationally (which often necessitates having the opportunity to travel to conferences abroad to meet possible colleagues), and their citation records. We found, as expected, that women had a higher presence in the biological sciences and a very low presence in engineering, mathematics, and physics. Their citation scores, on a fractionated basis, were lower than those for men in almost all fields and years, and were not explained by their writing of fewer reviews and papers in English (both of which lead to higher citations), or their lower amount of international collaboration in 1995 and 2005 after Russia had become a more open society.
Composite indicators play an essential role for benchmarking higher education institutions. One of the main sources of uncertainty building composite indicators and, undoubtedly, the most debated problem in building composite indicators is the weighting schemes (assigning weights to the simple indicators or subindicators) together with the aggregation schemes (final composite indicator formula). Except the ideal situation where weights are provided by the theory, there clearly is a need for improving quality assessment of the final rank linked with a fixed vector of weights. We propose to use simulation techniques to generate random perturbations around any initial vector of weights to obtain robust and reliable ranks allowing to rank universities in a range bracket. The proposed methodology is general enough to be applied no matter the weighting scheme used for the composite indicator. The immediate benefit achieved is a reduction of the uncertainty associated with the assessment of a specific rank which is not representative of the real performance of the university, and an improvement of the quality assessment of composite indicators used to rank. To illustrate the proposed methodology we rank the French and the German universities involved in their respective 2008 Excellence Initiatives.
The obsolescence and "durability" of scientific literature have been important elements of debate during many years, especially regarding the proper calculation of bibliometric indicators. The effects of "delayed recognition" on impact indicators have importance and are of interest not only to bibliometricians but also among research managers and scientists themselves. It has been suggested that the "Mendel syndrome" is a potential drawback when assessing individual researchers through impact measures. If publications from particular researchers need more time than "normal" to be properly acknowledged by their colleagues, the impact of these researchers may be underestimated with common citation windows. In this paper, we answer the question whether the bibliometric indicators for scientists can be significantly affected by the Mendel syndrome. Applying a methodology developed previously for the classification of papers according to their durability (Costas et al., J Am Soc Inf Sci Technol 61(8):1564-1581, 2010a; J Am Soc Inf Sci Technol 61(2):329-339, 2010b), the scientific production of 1,064 researchers working at the Spanish Council for Scientific Research (CSIC) in three different research areas has been analyzed. Cases of potential "Mendel syndrome" are rarely found among researchers and these cases do not significantly outperform the impact of researchers with a standard pattern of reception in their citations. The analysis of durability could be included as a parameter for the consideration of the citation windows used in the bibliometric analysis of individuals.
Research productivity affects the careers of academic psychologists. Unfortunately, there is a surprising lack of consensus on productivity's meaning, measurement, and how to compare the productivity of one academic psychologist to another. In the present study, we review academic productivity research within psychology, and using a sample of 673 psychologists, compute six indexes of productivity. Most productivity metrics (publication count, citation count, or some combination of the two) were substantially interrelated and one (Integrated Research Productivity Index) was independent from years in the field. Female psychologists were equally as productive as male psychologists after accounting for years in the field, and pre-tenure psychologists showed steeper change-over-time productivity slopes than post-tenure psychologists. Based on these findings, we provide recommendations for the use and measurement of academic research productivity.
Here we show a longitudinal analysis of the overall prestige of first quartile journals during the period between 1999 and 2009, on the subject areas of Scopus. This longitudinal study allows us to analyse developmental trends over times in different subject areas with distinct citation and publication patterns. To this aim, we first introduce an axiomatic index of the overall prestige of journals with ranking score above a given threshold. Here we demonstrate that, between 1999 and 2009, there was high and increasing overall prestige of first quartile journals in only four areas of Scopus. Also, there was high and decreasing overall prestige of first quartile journals in five areas. Two subject areas showed high and oscillating overall prestige of first quartile journals. And there was low and increasing overall prestige in four areas, since the 1999.
This article defines different perspectives for citations and introduces four concepts: Self-expected Citations, Received Citations, Expected Citations, and Deserved Citations. When comparing permutations of these four classes of perspectives, there are up to 145 kinds of equality/inequality relations. From these numerous relations, we analyze the difference between the Matthew Effect and the Matthew Phenomenon. We provide a precise definition and point out that many previous empirical research studies on the Matthew Effect based on citations belong primarily to the Matthew Phenomenon, and not the true meaning of the Matthew Effect. Due to the difficulty in determining the Deserved Citations, the Matthew Effect is in itself difficult to measure, although it is commonly believed to influence citation counts. Furthermore, from the theoretical facts, we outline four new effects/phenomena: the Self-confidence Effect/Phenomenon, the Narcissus Effect/Phenomenon, the Other-confidence Effect/Phenomenon, and the Flattery Effect/Phenomenon, and we discuss additional influencing factors.
USPTO patent data covering the years 1994-2008 is used in this study to examine the citation networks of electronic-paper display technology. Our primary aim is to provide a better understanding of the ways in which emerging firms interact with, and learn from, technology diffusers. Two implications can be drawn from our analysis. Firstly, emerging firms within an emerging industry can enhance their technological capabilities through positive external learning activity. Secondly, despite the fact that technology diffusers have clear technological advantages, with the emergence of a new field, their influence within the network could potentially be decayed if they fail to remain proactive in terms of the absorption of available external knowledge.
Surnames have been used as a proxy in studies on health care for various ethnic groups and also applied to ascribe ethnicity in studies on the genetic structure of a population. The aim of this study was to use a surname-based bibliometric indicator to assess the representation of Jewish authors in US biomedical journals. The other aim was to test the hypothesis that the representation of Jewish authors in US biomedical journals corresponds to their representation among US Nobel Prize winners in Medicine, 1960-2009. From among articles published 1960-2009 in all journals covered by Medline (> 5,000), and in the top 10 US biomedical journals we counted articles by authors from the following three groups: Kohenic-Levitic surnames, other common Jewish surnames, and the most frequent non-Jewish surnames in the USA. The frequency of a surname in the US population (1990 US Census) was used to calculate the expected number of scientific publications: the total number of published articles multiplied by a surname's frequency. The actual number of articles with that surname was also determined. The ratio of actual to expected number of articles was used as a measure of representation proportionality. It was found that the ratio of actual to expected number of articles in both Jewish groups is close to 10 among all (> 5,000) journals, and close to 20 in the top 10 journals. The ratio of actual to expected numbers of Jewish Nobel Laureates in the USA is also close to 20. In conclusion, the representation of Jewish authors in top 10 US biomedical journals corresponds to the representation of Jewish Nobel Laureates among US laureates. We hypothesize that disproportional representation of Jewish scientists as authors in top biomedical journals and among Nobel Prize laureates in Medicine is mostly due to their overrepresentation as research participants, not because of the increased chances for reward for a Jewish researcher per se.
It has been shown that papers in stem cell research submitted from institutions in the USA are accepted faster than those submitted from elsewhere and that the cause might at least partly be some bias in the refereeing process. We investigate whether there is a similar difference in time scale for papers in astronomy, astrophysics, and cosmology and look briefly at some of the possible causes. We find a publication time lag of 3.8 days (out of a median time of 105 days) while in the stem cell case it is 24 days out of a median of 83 days. One of many possible causes is a difference in how useful the papers are to the community, and we will assess this in a second paper making use of citation analysis.
I propose a new method (Pareto weights) to objectively attribute citations to co-authors. Previous methods either profess ignorance about the seniority of co-authors (egalitarian weights) or are based in an ad hoc way on the order of authors (rank weights). Pareto weights are based on the respective citation records of the co-authors. Pareto weights are proportional to the probability of observing the number of citations obtained. Assuming a Pareto distribution, such weights can be computed with a simple, closed-form equation but require a few iterations and data on a scholar, her co-authors, and her co-authors' co-authors. The use of Pareto weights is illustrated with a group of prominent economists. In this case, Pareto weights are very different from rank weights. Pareto weights are more similar to egalitarian weights but can deviate up to a quarter in either direction (for reasons that are intuitive).
Two commonly used ideas in the development of citation-based research performance indicators are the idea of normalizing citation counts based on a field classification scheme and the idea of recursive citation weighing (like in PageRank-inspired indicators). We combine these two ideas in a single indicator, referred to as the recursive mean normalized citation score indicator, and we study the validity of this indicator. Our empirical analysis shows that the proposed indicator is highly sensitive to the field classification scheme that is used. The indicator also has a strong tendency to reinforce biases caused by the classification scheme. Based on these observations, we advise against the use of indicators in which the idea of normalization based on a field classification scheme and the idea of recursive citation weighing are combined.
This paper presents an in depth study of an interesting analogy, recently proposed by Prathap (Scientometrics 87(3):515-524, 2011a), between the evolution of thermodynamic and bibliometric systems. The goal is to highlight some weaknesses and clarify some "dark sides" in the conceptual framework of this analogy, discussing the formal validity and practical meaning of the concepts of Energy, Exergy and Entropy in bibliometrics. Specifically, this analogy highlights the following major criticalities: (1) the definitions of E and X are controversial, (2) the equivalence classes of E and X are questionable, (3) the parallel between the evolution of thermodynamic and bibliometric systems is forced, (4) X is a non-monotonic performance indicator, and (5) in bibliometrics the condition of "thermodynamic perfection" is questionable. Argument is supported by many analytical demonstrations and practical examples.
A new family of citation normalization methods appeared recently, in addition to the classical methods of "cited-side" normalization and the iterative measures of intellectual influence in the wake of Pinski and Narin influence weights. These methods have a quite global scope in citation analysis but were first applied to the journal impact, in the experimental Audience Factor (AF) and the Scopus Source-Normalized Impact per Paper (SNIP). Analyzing some properties of the Garfield's Journal Impact Factor, this note highlights the rationale of citing-side (or source-level, fractional citation, ex ante) normalization.
The paper introduces scholarly Information Retrieval (IR) as a further dimension that should be considered in the science modeling debate. The IR use case is seen as a validation model of the adequacy of science models in representing and predicting structure and dynamics in science. Particular conceptualizations of scholarly activity and structures in science are used as value-added search services to improve retrieval quality: a co-word model depicting the cognitive structure of a field (used for query expansion), the Bradford law of information concentration, and a model of co-authorship networks (both used for re-ranking search results). An evaluation of the retrieval quality when science model driven services are used turned out that the models proposed actually provide beneficial effects to retrieval quality. From an IR perspective, the models studied are therefore verified as expressive conceptualizations of central phenomena in science. Thus, it could be shown that the IR perspective can significantly contribute to a better understanding of scholarly structures and activities.
A quantitative modification to keep the number of published papers invariant under multiple authorship is suggested. In those cases, fractional allocations are attributed to each co-author with a summation equal to one. These allocations are tailored on the basis of each author contribution. It is denoted "Tailor Based Allocations (TBA)" for multiple authorship. Several protocols to TBA are suggested. The choice of a specific TBA may vary from one discipline to another. In addition, TBA is applied to the number of citations of a multiple author paper to have also this number conserved. Each author gets only a specific fraction of the total number of citations according to its fractional paper allocation. The equivalent of the h-index obtained by using TBA is denoted the gh-index. It yields values which differ drastically from those given by the h-index. The gh-index departs also from (h) over bar recently proposed by Hirsh to account for multiple authorship. Contrary to the h-index, the gh-index is a function of the total number of citations of each paper. A highly cited paper allows a better allocation for all co-authors while a less cited paper contributes essentially to one or two of the co-authors. The scheme produces a substantial redistribution of the ranking of scientists in terms of quantitative records. A few illustrations are provided.
This paper investigates the role of homophily and focus constraint in shaping collaborative scientific research. First, homophily structures collaboration when scientists adhere to a norm of exclusivity in selecting similar partners at a higher rate than dissimilar ones. Two dimensions on which similarity between scientists can be assessed are their research specialties and status positions. Second, focus constraint shapes collaboration when connections among scientists depend on opportunities for social contact. Constraint comes in two forms, depending on whether it originates in institutional or geographic space. Institutional constraint refers to the tendency of scientists to select collaborators within rather than across institutional boundaries. Geographic constraint is the principle that, when collaborations span different institutions, they are more likely to involve scientists that are geographically co-located than dispersed. To study homophily and focus constraint, the paper will argue in favour of an idea of collaboration that moves beyond formal co-authorship to include also other forms of informal intellectual exchange that do not translate into the publication of joint work. A community-detection algorithm for formalising this perspective will be proposed and applied to the co-authorship network of the scientists that submitted to the 2001 Research Assessment Exercise in Business and Management in the UK. While results only partially support research-based homophily, they indicate that scientists use status positions for discriminating between potential partners by selecting collaborators from institutions with a rating similar to their own. Strong support is provided in favour of institutional and geographic constraints. Scientists tend to forge intra-institutional collaborations; yet, when they seek collaborators outside their own institutions, they tend to select those who are in geographic proximity. The implications of this analysis for tie creation in joint scientific endeavours are discussed.
We develop a model of scientific creativity and test it in the field of rare diseases. Our model is based on the results of an in-depth case study of the Rett Syndrome. Archival analysis, bibliometric techniques and expert surveys are combined with network analysis to identify the most creative scientists. First, we compare alternative measures of generative and combinatorial creativity. Then, we generalize our results in a stochastic model of socio-semantic network evolution. The model predictions are tested with an extended set of rare diseases. We find that new scientific collaborations among experts in a field enhance combinatorial creativity. Instead, high entry rates of novices are negatively related to generative creativity. By expanding the set of useful concepts, creative scientists gain in centrality. At the same time, by increasing their centrality in the scientific community, scientists can replicate and generalize their results, thus contributing to a scientific paradigm.
This study presents a mixed model that combines different indicators to describe and predict key structural and dynamic features of emerging research areas. Three indicators are combined: sudden increases in the frequency of specific words; the number and speed by which new authors are attracted to an emerging research area, and changes in the interdisciplinarity of cited references. The mixed model is applied to four emerging research areas: RNAi, Nano, h-Index, and Impact Factor research using papers published in the Proceedings of the National Academy of Sciences of the United States of America (1982-2009) and in Scientometrics (1978-2009). Results are compared in terms of strengths and temporal dynamics. Results show that the indicators are indicative of emerging areas and they exhibit interesting temporal correlations: new authors enter the area first, then the interdisciplinarity of paper references increases, then word bursts occur. All workflows are reported in a manner that supports replication and extension by others.
Agent-based simulation can model simple micro-level mechanisms capable of generating macro-level patterns, such as frequency distributions and network structures found in bibliometric data. Agent-based simulations of organisational learning have provided analogies for collective problem solving by boundedly rational agents employing heuristics. This paper brings these two areas together in one model of knowledge seeking through scientific publication. It describes a computer simulation in which academic papers are generated with authors, references, contents, and an extrinsic value, and must pass through peer review to become published. We demonstrate that the model can fit bibliometric data for a token journal, Research Policy. Different practices for generating authors and references produce different distributions of papers per author and citations per paper, including the scale-free distributions typical of cumulative advantage processes. We also demonstrate the model's ability to simulate collective learning or problem solving, for which we use Kauffman's NK fitness landscape. The model provides evidence that those practices leading to cumulative advantage in citations, that is, papers with many citations becoming even more cited, do not improve scientists' ability to find good solutions to scientific problems, compared to those practices that ignore past citations. By contrast, what does make a difference is referring only to publications that have successfully passed peer review. Citation practice is one of many issues that a simulation model of science can address when the data-rich literature on scientometrics is connected to the analogy-rich literature on organisations and heuristic search.
This paper shows the main lines of research concerning health and women, as registered in the Medline database, broken down into four 10-year periods: 1965-1974, 1975-1984, 1985-1994, and 1995-2005. The units of analysis used were the Medline "MeSH'' major terms, processed by means of co-term analysis. For graphic representation, the social network approach was used, with pruning performed by Pathfinder Networks (PFNET), so as to concentrate the displays. Factor analysis was used to group the descriptors and identify the main lines of research involving health and women. The results show that research on Health and Women has increased and undergone significant changes over the past 40 years, yet such studies are not given due importance.
This article presents for the first time a portrait of intramural research conducted by the U.S. Department of Agriculture (USDA). We describe the nature, characteristics, and use of USDA research based on scientometric indicators using patent analysis and three bibliometric methods: publication analysis, citation analysis, and science mapping. Our analyses are intended to be purely descriptive in nature. They demonstrate that USDA maintains several core scientific competencies and its research is much broader than and reaches well beyond traditional agricultural sciences for which it is best known. We illustrate the current status, recent trends, and clear benchmarks for planning and assessing future USDA research across an array of scientific disciplines.
The citation distribution of a researcher shows the impact of their production and determines the success of their scientific career. However, its application in scientific evaluation is difficult due to the bi-dimensional character of the distribution. Some bibliometric indexes that try to synthesize in a numerical value the principal characteristics of this distribution have been proposed recently. In contrast with other bibliometric measures, the biases that the distribution tails provoke, are reduced by the h-index. However, some limitations in the discrimination among researchers with different publication habits are presented in this index. This index penalizes selective researchers, distinguished by the large number of citations received, as compared to large producers. In this work, two original sets of indexes, the central area indexes and the central interval indexes, that complement the h-index to include the central shape of the citation distribution, are proposed and compared.
The influence of the National Research Foundation's (NRF) rating system on the productivity of the South African social science researchers is investigated scientometrically for the period from 1981 to 2006. Their output performance is mainly indicated by their research publications. Following international best practice in scientometrics as well as the behavioural reinforcement theory, we employed the "before/after control impact (BACI) method'', as well as the well known econometric breakpoint test as proposed by Chow. We use as control group the publications in the field of clinical medicine. The field is not supported by NRF and hence clinical medicine researchers are not affected by the evaluation and rating system. The findings show a positive impact of the NRF programme on the research outputs of social sciences researchers and the implementation of the programme has increased the relevant population of research articles by an average of 24.5% (during the first 5 years) over the expected number of publication without the programme. The results confirm the scientometric findings of other studies (e. g. that of Nederhof) that ratings promulgate research productivity.
This paper provides an overview of the progression of technology structure based on patent co-citation networks. Methods of patent bibliometrics, social network analysis and information visualization are employed to analyze patents of Fortune 500 companies indexed in Derwent Innovations Index, the largest patent database in the world. Based on the co-citation networks, several main technology groups are identified, including Chemicals, Petroleum Refining, Motor Vehicles, Pharmaceuticals, Electronics, etc. Relationships among the leading companies and technology groups are also revealed.
The problem of comparing academic institutions in terms of their research production is nowadays a priority issue. This paper proposes a relative bidimensional index that takes into account both the net production and the quality of it, as an attempt to provide a comprehensive and objective way to compare the research output of different institutions in a specific field, using journal contributions and citations. The proposed index is then applied, as a case study, to rank the top Spanish universities in the fields of Chemistry and Computer Science in the period ranging from 2000 until 2009. A comparison with the top 50 universities in the ARWU rankings is also made, showing the proposed ranking is better suited to distinguish among non-elite universities.
Patent counts have been extensionally used to measure the innovative capacities of countries. However, since economic values of patents may differ, simple patent counts may give misleading rankings, if the patents of one country are on average more valuable than those of another. In the literature several methods have been proposed, which shall adjust for these differences. However, often these do not possess a solid economic micro-foundation and therefore are often ad-hoc and arbitrary procedures. In this paper, we intend to present an adjustment method that is based on the analysis of renewal decisions. The method builds on the theoretical model used in Schankerman and Pakes (1986) and Besson (2008) but goes beyond both approaches in that it recovers the important long tail of the value distribution. It also transfers Besson's (2008) econometric methodology (applicable to the organisational structures of the US Patent and Trademark Office) also to the European Patent Office which is necessary, since each application here may split up into several national patent documents. The analysis is performed for 22 countries. Exemplarily, we find that in the cohort of 1986 patent applications, Danish patents are about 60% more valuable than the average patent. German patents are a bit below average. Japanese patents are of least value. In the cohort of 1996, Danish patents lose some of their lead but are still more valuable than the average. While German are a bit above average, Japanese patents even fall further behind (possibly due to the economic downturn in since the mid of 1990ies).
The Publication Approved Drug Products with Therapeutic Equivalence Evaluations (commonly known as the Orange Book) identifies drug products approved by the United States Food and Drug Administration (USFDA) for safety and effectiveness, and provides substantial information on new drug applications (NDAs) with patent data. To explore the patterns among drug patents in the Orange Book, this study used patent bibliometric analysis. The productivity and impact are presented at the assignee level and applicant level, respectively, and the applicant's patent portfolio is further discussed. 2,033 drug patents are identified in this current study. Our findings indicate that the applicant's patent portfolio in the Orange Book is helpful in revealing the technological capability and patent strategy of the pharmaceutical incumbents. By linking drug data and patent information, this current study sheds light on patent research in the pharmaceutical industry.
With the modern technology fast developing, most of entities can be observed by different perspectives. These multiple view information allows us to find a better pattern as long as we integrate them in an appropriate way. So clustering by integrating multi-view representations that describe the same class of entities has become a crucial issue for knowledge discovering. We integrate multi-view data by a tensor model and present a hybrid clustering method based on Tucker-2 model, which can be regarded as an extension of spectral clustering. We apply our hybrid clustering method to scientific publication analysis by integrating citation-link and lexical content. Clustering experiments are conducted on a large-scale journal set retrieved from the Web of Science (WoS) database. Several relevant hybrid clustering methods are cross compared with our method. The analysis of clustering results demonstrate the effectiveness of the proposed algorithm. Furthermore, we provide a cognitive analysis of the clustering results as well as the visualization as a mapping of the journal set.
To provide an overview of the characteristics of research in China, a bibliometric evaluation of highly cited papers with high-level representation was conducted during the period from 1999 to 2009 based on the Essential Science Indicators (ESI) database. A comprehensive assessment covered overall performance, journals, subject categories, internationally collaborative countries, national inter-institutionally collaborative institutions, and most-cited papers in 22 scientific fields. China saw a strong growth in scientific publications in the last decade, to some extent due to increasing research and development expenditure. China has been more active in ESI fields of chemistry and physics, but more excellent in materials science, engineering and mathematics. Most publications were concerned with the common Science Citation Index subject categories of multidisciplinary chemistry, multidisciplinary materials and science, and physical chemistry. About one half China's ESC papers were internationally collaborative and the eight major industrialized countries (the USA, Germany, the UK, Japan, France, Canada, Russia, and Italy) played a prominent role in scientific collaboration with China, especially the USA. The Chinese Academy of Sciences took the leading position of institutions with many branches. The "985 Project'' stimulated the most productive institutions for academic research with a huge funding injection and the universities in Hong Kong showed good scientific performance. The citation impact of internationally collaborative papers differed among fields and international collaborations made positive contributions to academic research in China.
This paper suggests a method for Subject-Action-Object (SAO) network analysis of patents for technology trends identification by using the concept of function. The proposed method solves the shortcoming of the keyword-based approach to identification of technology trends, i.e., that it cannot represent how technologies are used or for what purpose. The concept of function provides information on how a technology is used and how it interacts with other technologies; the keyword-based approach does not provide such information. The proposed method uses an SAO model and represents "key concept'' instead of "key word''. We present a procedure that formulates an SAO network by using SAO models extracted from patent documents, and a method that applies actor network theory to analyze technology implications of the SAO network. To demonstrate the effectiveness of the SAO network this paper presents a case study of patents related to Polymer Electrolyte Membrane technology in Proton Exchange Membrane Fuel Cells.
In academia, the term "inbreeding'' refers to a situation wherein PhDs are employed in the very same institution that trained them during their doctoral studies. Academic inbreeding has a negative perception on the account that it damages both scientific effectiveness and productivity. In this article, the effect of inbreeding on scientific effectiveness is investigated through a case study. This problem is addressed by utilizing Hirsch index as a reliable metric of an academic's scientific productivity. Utilizing the dataset, constructed with academic performance indicators of individuals from the Mechanical and Aeronautical Engineering Departments, of the Turkish Technical Universities, we demonstrate that academic inbreeding has a negative impact on apparent scientific effectiveness through a negative binomial model. This model appears to be the most suitable one for the dataset which is a type of count data. We report chi-square statistics and likelihood ratio test for the parameter alpha. According to the chi-square statistics the model is significant as a whole. The incidence rate ratio for the variable "inbreeding'' is estimated to be 0.11 and this ratio tells that, holding all the other factors constant, for the inbred faculty, the h-index is about 89% lower when compared to the non-inbred faculty. Furthermore, there exists negative and statistically significant correlation with an individual's productivity and the percentage of inbred faculty members at the very same department. Excessive practice of inbreeding adversely affects the overall productivity. Decision makers are urged to limit this practice to a minimum in order to foster a vibrant research environment. Furthermore, it is also found that scientific productivity of an individual decreases towards the end of his scientific career.
The study presents the state of bibliographical research in the discipline of Hebrew printing during a 30-year period, ranging from the latter quarter of the twentieth century until the beginning of the third millennium (1976-2006). Through bibliographical parameters it characterizes the publications dealing with Hebrew printing, examines whether the published material exhibits laws and systematic regularities that are consistent with Bibliometrics, and describes directions in which the field has developed.
This work analyses the links between individual research performance and academic rank. A typical bibliometric methodology is used to study the performance of all Italian university researchers active in the hard sciences, for the period 2004-2008. The objective is to characterize the performance of the ranks of full (FPs), associate and assistant professors (APs), along various dimensions, in order to verify the existence of performance differences among the ranks in general and for single disciplines.
The aim of this study was to explore the research trends and the evolution of publications covered on diadromous fish from 1970s to 2010. We conducted a bibliometric analysis on seven patrimonial species: Atlantic salmon (Salmo salar), Brown and Sea trout (Salmon trutta), Allis shad (Alosa alosa), Twaite shad (Alosa fallax), Eel (Anguilla Anguilla), Sea lamprey (Petromyzon marinus) and River lamprey (Lampetra fluviatilis). We used bibliometric techniques on the total number of research (articles, books, and conferences) in all country in function of main fields such as growth/age, reproduction, migration, habitat, aquaculture, diseases, diet, abundance, fisheries, climate change, toxicology, dams/fishways, genetics, taxonomy, modelling, resource management, and stocking. The results revealed a clear difference in the evolution of scientific studies by species and by countries. The analysis comparisons showed the intensity of certain topics by species with the emergence of new ones, the economic impact on sciences and the increased support of conservation plan management for certain species, such as salmon and lamprey in France. This study also emerged that French research is not always consistent with the international trend which suggests the dominance of management systems on scientific studies.
The Hsinchu Science Park in Taiwan has been synonymous with dynamic and flourishing high-tech industries and companies since the 1980s. Using patent citation data, this empirical study shows that Taiwan's Hsinchu Science Park is a healthy and knowledge-based cluster surrounded by the semiconductor sector, in which external knowledge is continuously playing an important role, while internalized capability is building up quickly; new and extended industrial clusters are being established by the growth of new ventures; and the linkages of capital, manpower, and technology flows are conducted respectively by the large business groups, the NTHU and NCTU, and the ITRI in the region. Subsequent sectors, repeating the successful model created by and catalyzed from the semiconductor sector are flourishing; the thin-film transistor-liquid crystal display (TFT-LCD) and integrated circuit (IC) design sectors have been growing rapidly since the beginning of the 2000s, and the solar photovoltaic and LED (Light-Emitting Diode) sectors emerged quickly in mid-2005. The continuous evolving and growing industries along with the significant increase of value added in the Hsinchu Science Park have demonstrated it is acting as a healthy and vivid innovation region. The policy implications derived from this study can thus shed light, for the Southeast Asian, Latin American or other latecomers, on the strategies for formulating regional research and innovation policies in the process of developing a knowledge-based economy.
This paper by using data envelopment analysis (DEA) and statistical inference evaluates the citation performance of 229 economic journals. The paper categorizes the journals into four main categories (A-D) based on their efficiency levels. The results are then compared to the 27 "core economic journals'' as introduced by Diamond (Curr Contents 21(1):4-11, 1989). The results reveal that after more than 20 years Diamonds' list of "core economic journals'' is still valid. Finally, for the first time the paper uses data from four well-known databases (SSCI, Scopus, RePEc, Econlit) and two quality ranking reports (Kiel Institute internals ranking and ABS quality ranking report) in a DEA setting and in order to derive the ranking of 229 economic journals. The ten economic journals with the highest citation performance are Journal of Political Economy, Econometrica, Quarterly Journal of Economics, Journal of Financial Economics, Journal of Economic Literature, American Economic Review, Review of Economic Studies, Journal of Econometrics, Journal of Finance, Brookings Papers on Economic Activity.
The single publication H-index, introduced by A. Schubert in 2009 can be applied on all articles in the Hirsch-core of a researcher. In this way one can define the "indirect H-index'' of a researcher.
Opthof and Leydesdorff (Scientometrics, 2011) reanalyze data reported by Van Raan (Scientometrics 67(3):491-502, 2006) and conclude that there is no significant correlation between on the one hand average citation scores measured using the CPP/FCSm indicator and on the other hand the quality judgment of peers. We point out that Opthof and Leydesdorff draw their conclusions based on a very limited amount of data. We also criticize the statistical methodology used by Opthof and Leydesdorff. Using a larger amount of data and a more appropriate statistical methodology, we do find a significant correlation between the CPP/FCSm indicator and peer judgment.
Inspired by the acquisition-cognition-application model (T. Saracevic & K. B. Kantor, 1997), we developed a tool called the Information Assessment Method to more clearly understand how physicians use clinical information. In primary healthcare, we conducted a naturalistic and longitudinal study of searches for clinical information. Forty-one family physicians received a handheld computer with the Information Assessment Method linked to one commercial electronic knowledge resource. Over an average of 320 days, 83% of 2,131 searches for clinical information were rated using the Information Assessment Method. Searches to address a clinical question, as well as the retrieval of relevant clinical information, were positively associated with the use of that information for a specific patient. Searches done out of curiosity were negatively associated with the use of clinical information. We found significant associations between specific types of cognitive impact and information use for a specific patient. For example, when the physician reported "My practice was changed and improved" as a result of this clinical information, the odds that information was used for a specific patient increased threefold. Our findings provide empirical data to support the applicability of the acquisition-cognition-application model, as operationalized through the Information Assessment Method, in primary healthcare. Capturing the use of research-based information in medicine opens the door to further study of the relationships between clinical information and health outcomes.
It is argued that to evaluate an information source (e.g., a Wikipedia article), it is necessary to relate the content of that source to an interpretation of the state of knowledge at the research front (which is typically developing dynamically). In the research literature, there is a controversy about the effect of screening programs for breast cancer. This controversy is used to compare the value of Wikipedia with Encyclopedia Britannica and two Danish encyclopedias as information sources. It is argued that this method of examining information sources is preferable to other methods which have been suggested in the literature.
Wikipedia advocates a strict "neutral point of view" (NPOV) policy. However, although originally a U.S-based, English-language phenomenon, the online, user-created encyclopedia now has versions in many languages. This study examines the extent to which content and perspectives vary across cultures by comparing articles about famous persons in the Polish and English editions of Wikipedia. The results of quantitative and qualitative content analyses reveal systematic differences related to the different cultures, histories, and values of Poland and the United States; at the same time, a U.S./English-language advantage is evident throughout. In conclusion, the implications of these findings for the quality and objectivity of Wikipedia as a global repository of knowledge are discussed, and recommendations are advanced for Wikipedia end users and content developers.
IsWeb 2.0 more than a buzzword? In recent years, technologists and others have heatedly debated this question, even in Wikipedia, itself an example of Web 2.0. From the perspective of the present study, Web 2.0 may indeed be a buzzword, but more substantially it is also an example of an organizing vision that drives a community's discourse about certain new Information Technology (IT), serving to advance the technology's adoption and diffusion. Every organizing vision has a career that reflects its construction over time, and in the present study we examine Web 2.0's career as captured in its Wikipedia entry over a 5-year period, finding that it falls into three distinct periods termed Germination, Growth, and Maturation. The findings reveal how Wikipedia, as a discourse vehicle, treats new IT and its many buzzwords, and more broadly captures the careers of their organizing visions. Too, they further our understanding of Wikipedia as a new encyclopedic form, providing novel insights into its uses, its community of contributors, and their editing activities, as well as the dynamics of article construction.
This study comprises a suite of analyses of words in article titles in order to reveal the cognitive structure of Library and Information Science (LIS). The use of title words to elucidate the cognitive structure of LIS has been relatively neglected. The present study addresses this gap by performing (a) co-word analysis and hierarchical clustering, (b) multidimensional scaling, and (c) determination of trends in usage of terms. The study is based on 10,344 articles published between 1988 and 2007 in 16 LIS journals. Methodologically, novel aspects of this study are: (a) its large scale, (b) removal of non-specific title words based on the "word concentration" measure (c) identification of the most frequent terms that include both single words and phrases, and (d) presentation of the relative frequencies of terms using "heatmaps". Conceptually, our analysis reveals that LIS consists of three main branches: the traditionally recognized library-related and information-related branches, plus an equally distinct bibliometrics/scientometrics branch. The three branches focus on: libraries, information, and science, respectively. In addition, our study identifies substructures within each branch. We also tentatively identify "information seeking behavior" as a branch that is establishing itself separate from the three main branches. Furthermore, we find that cognitive concepts in LIS evolve continuously, with no stasis since 1992. The most rapid development occurred between 1998 and 2001, influenced by the increased focus on the Internet. The change in the cognitive landscape is found to be driven by the emergence of new information technologies, and the retirement of old ones.
The methods presented in this paper allow for a statistical analysis revealing centers of excellence around the world using programs that are freely available. Based on Web of Science data (a fee-based database), field-specific excellence can be identified in cities where highly cited papers were published more frequently than can be expected. Compared to the mapping approaches published hitherto, our approach is more analytically oriented by allowing the assessment of an observed number of excellent papers for a city against the expected number. Top performers in output are cities in which authors are located who publish a statistically significant higher number of highly cited papers than can be expected for these cities. As sample data for physics, chemistry, and psychology show, these cities do not necessarily have a high output of highly cited papers.
The pi-indicator (or pi(v)-indicator) of a set of journal papers is equal to a hundredth of the total number of citations obtained by the elite set of publications. The number of publications in the elite set P(pi) is calculated as the square root of total papers. For greater sets the following equation is used: P(pi(v))=(10 log P) - 10, where P is the total number of publications. For sets comprising a single or several extreme frequently cited paper, the pi-index may be distorted. Therefore, a new indicator based on the distribution of citations is suggested. Accordingly, the publications are classified into citation categories, of which lower limits are given as 0, and (2(n) + 1), whereas the upper limits as 2(n) (n = 0, 2, 3, etc.). The citations distribution score (CDS) index is defined as the sum of weighted numbers of publications in the individual categories. The CDS-index increases logarithmically with the increasing number of citations. The citation distribution rate indicator is introduced by relating the actual CDS-index to the possible maximum. Several size-dependent and size-independent indicators were calculated. It has been concluded that relevant, already accepted scientometric indicators may validate novel indices through resulting in similar conclusions ("converging validation of indicators").
Bibliometric evaluations of research outputs in the social sciences and humanities are challenging due to limitations associated with Web of Science data; however, background literature has shown that scholars are interested in stimulating improvements. We give special attention to book reviews processed by Web of Science history and literature journals, focusing on two types: Type I (i.e., reference to book only) and Type II (i.e., reference to book and other scholarly sources). Bibliometric data are collected and analyzed for a large set of reviews (1981-2009) to observe general publication patterns and patterns of citedness and co-citedness with books under review. Results show that reviews giving reference only to the book (Type I) are published more frequently while reviews referencing the book and other works (Type II) are more likely to be cited. The referencing culture of the humanities makes it difficult to understand patterns of co-citedness between books and review articles without further in-depth content analyses. Overall, citation counts to book reviews are typically low, but our data showed that they are scholarly and do play a role in the scholarly communication system. In the disciplines of history and literature, where book reviews are prominent, counting the number and type of reviews that a scholar produces throughout his/her career is a positive step forward in research evaluations. We propose a new set of journal quality indicators for the purpose of monitoring their scholarly influence.
Co-authorship in publications within a discipline uncovers interesting properties of the analyzed field. We represent collaboration in academic papers of computer science in terms of differently grained networks, namely affiliation and collaboration networks. We also build those sub-networks that emerge from either conference or journal co-authorship only. We take advantage of the network science paraphernalia to take a picture of computer science collaboration including all papers published in the field since 1936. Furthermore, we observe how collaboration in computer science evolved over time since 1960. We investigate bibliometric properties such as size of the discipline, productivity of scholars, and collaboration level in papers, as well as global network properties such as reachability and average separation distance among scholars, distribution of the number of scholar collaborators, network resilience and dependence on star collaborators, network clustering, and network assortativity by number of collaborators.
Here, we show a longitudinal analysis of the ranking of the subject areas of Elsevier's Scopus. To this aim, we present three summary measures based on the journal ranking scores for academic journals in each subject area. This longitudinal study allows us to analyze developmental trends over times in different subject areas with distinct citation and publication patterns. We evaluate the relative performance of each subject area by using the overall prestige for the most important journals with ranking score above a given threshold (e.g., in the first quartile) as well as the overall prestige gap for the less important journals with ranking score below a given threshold (e.g., below the top 10 journals). Thus, we propose that it should be possible to study different subject areas by means of appropriate summary measures of the journal ranking scores, which provide additional information beyond analyzing the inequality of the whole ranking-score distribution for academic journals in each subject area. It allows us to investigate whether subject areas with high levels of overall prestige for the first quartile journals also tended to achieve low levels of overall prestige gap for the journals below the top 10.
This article describes the outcomes of research that examined the practice of web information architecture (IA) in large organizations. Using a grounded theory approach, seven large organizations were investigated and the data were analyzed for emerging themes and concepts. The research finds that the practice of web IA is characterized by unpredictability, multiple perspectives, and a need for responsiveness, agility, and negotiation. This article claims that web IA occurs in a complex environment and has emergent, self-organizing properties. There is value in examining the practice as a complex adaptive system. Using this metaphor, a pre-determined, structured methodology that delivers a documented, enduring, information design for the web is found inadequate - dominant and traditional thinking and practice in the organization of information are challenged.
It is argued and demonstrated that Birger Hjorland's critiques of Marcia Bates' articles on the nature of information and the nature of browsing misrepresent the content of these articles, and further, frame the argument as a Manichean conflict between Hjorland's enlightened "discursive" and social approach versus Bates' benighted behavioral approach. It is argued that Bates' work not only contains much of value that has been ignored by Hjorland but also contains ideas that mostly complement, rather than conflict with, those of Hjorland.
Sentiment analysis is a challenging new task related to text mining and natural language processing. Although there are, at present, several studies related to this theme, most of these focus mainly on English texts. The resources available for opinion mining (OM) in other languages are still limited. In this article, we present a new Arabic corpus for the OM task that has been made available to the scientific community for research purposes. The corpus contains 500 movie reviews collected from different web pages and blogs in Arabic, 250 of them considered as positive reviews, and the other 250 as negative opinions. Furthermore, different experiments have been carried out on this corpus, using machine learning algorithms such as support vector machines and Naive Bayes. The results obtained are very promising and we are encouraged to continue this line of research.
The progressive increase of information content has recently made it necessary to create a system for automatic classification of documents. In this article, a system is presented for the categorization of multiclass Farsi documents that requires fewer training examples and can help to compensate the shortcoming of the standard training dataset. The new idea proposed in the present article is based on extending the feature vector by adding some words extracted from a thesaurus and then filtering the new feature vector by applying secondary feature selection to discard inappropriate features. In fact, a phase of secondary feature selection is applied to choose more appropriate features among the features added from a thesaurus to enhance the effect of using a thesaurus on the efficiency of the classifier. To evaluate the proposed system, a corpus is gathered from the Farsi Wikipedia website and some articles in the Hamshahri newspaper, the Roshd periodical, and the Soroush magazine. In addition to studying the role of a thesaurus and applying secondary feature selection, the effect of a various number of categories, size of the training dataset, and average number of words in the test data also are examined. As the results indicate, classification efficiency improves by applying this approach, especially when available data is not sufficient for some text categories.
Sentence clustering plays a pivotal role in theme-based summarization, which discovers topic themes defined as the clusters of highly related sentences to avoid redundancy and cover more diverse information. As the length of sentences is short and the content it contains is limited, the bag-of-words cosine similarity traditionally used for document clustering is no longer suitable. Special treatment for measuring sentence similarity is necessary. In this article, we study the sentence-level clustering problem. After exploiting concept-and context-enriched sentence vector representations, we develop two co-clustering frameworks to enhance sentence-level clustering for theme-based summarization-integrated clustering and interactive clustering-both allowing word and document to play an explicit role in sentence clustering as independent text objects rather than using word or concept as features of a sentence in a document set. In each framework, we experiment with two-level co-clustering (i.e., sentence-word co-clustering or sentence-document co-clustering) and three-level co-clustering (i.e., document-sentence-word co-clustering). Compared against concept-and context-oriented sentence-representation reformation, co-clustering shows a clear advantage in both intrinsic clustering quality evaluation and extrinsic summarization evaluation conducted on the Document Understanding Conferences (DUC) datasets.
President Obama's inaugural flagship Open Data program emphasizes the values of transparency, participation, and collaboration in governmental work. The Open Data performance data analysis, published here for the first time, proposes that most federal agencies have adopted a passive-aggressive attitude toward this program by appearing to cooperate with the program while in fact effectively ignoring it. The analysis further suggests that a tiny group of agencies are the only "real players" in the Data.gov web arena. This research highlights the contradiction between Open Data's transparency goal ("All data must be freed") and federal agencies' goal of collaborating with each other through data trade. The research also suggests that agencies comprehended that Open Data is likely to exacerbate three critical, back-office data-integration problems: inclusion, confusion, and diffusion. The article concludes with a proposal to develop an alternative Federal Information Marketplace (FIM) to incentivize agencies to improve data sharing.
This article presents the privacy dictionary, a new linguistic resource for automated content analysis on privacy-related texts. To overcome the definitional challenges inherent in privacy research, the dictionary was informed by an inclusive set of relevant theoretical perspectives. Using methods from corpus linguistics, we constructed and validated eight dictionary categories on empirical material from a wide range of privacy-sensitive contexts. It was shown that the dictionary categories are able to measure unique linguistic patterns within privacy discussions. At a time when privacy considerations are increasing and online resources provide ever-growing quantities of textual data, the privacy dictionary can play a significant role not only for research in the social sciences but also in technology design and policymaking.
Many information portals are adding social features with hopes of enhancing the overall user experience. Invitations to join and welcome pages that highlight these social features are expected to encourage use and participation. While this approach is widespread and seems plausible, the effect of providing and highlighting social features remains to be tested. We studied the effects of emphasizing social features on users' response to invitations, their decisions to join, their willingness to provide profile information, and their engagement with the portal's social features. The results of a quasi-experiment found no significant effect of social emphasis in invitations on receivers' responsiveness. However, users receiving invitations highlighting social benefits were less likely to join the portal and provide profile information. Social emphasis in the initial welcome page for the site also was found to have a significant effect on whether individuals joined the portal, how much profile information they provided and shared, and how much they engaged with social features on the site. Unexpectedly, users who were welcomed in a social manner were less likely to join and provided less profile information; they also were less likely to engage with social features of the portal. This suggests that even in online contexts where social activity is an increasingly common feature, highlighting the presence of social features may not always be the optimal presentation strategy.
This article examines the relationship between acquaintanceship and coauthorship patterns in a multi-disciplinary, multi-institutional, geographically distributed research center. Two social networks are constructed and compared: a network of coauthorship, representing how researchers write articles with one another, and a network of acquaintanceship, representing how those researchers know each other on a personal level, based on their responses to an online survey. Statistical analyses of the topology and community structure of these networks point to the importance of small-scale, local, personal networks predicated upon acquaintanceship for accomplishing collaborative work in scientific communities.
In bibliometrics, the association of "impact" with central-tendency statistics is mistaken. Impacts add up, and citation curves therefore should be integrated instead of averaged. For example, the journals MIS Quarterly and Journal of the American Society for Information Science and Technology differ by a factor of 2 in terms of their respective impact factors (IF), but the journal with the lower IF has the higher impact. Using percentile ranks (e. g., top-1%, top-10%, etc.), an Integrated Impact Indicator (I3) can be based on integration of the citation curves, but after normalization of the citation curves to the same scale. The results across document sets can be compared as percentages of the total impact of a reference set. Total number of citations, however, should not be used instead because the shape of the citation curves is then not appreciated. I3 can be applied to any document set and any citation window. The results of the integration (summation) are fully decomposable in terms of journals or institutional units such as nations, universities, and so on because percentile ranks are determined at the paper level. In this study, we first compare I3 with IFs for the journals in two Institute for Scientific Information subject categories ("Information Science & Library Science" and "Multidisciplinary Sciences"). The library and information science set is additionally decomposed in terms of nations. Policy implications of this possible paradigm shift in citation impact analysis are specified.
Citation indictors are increasingly used in some subject areas to support peer review in the evaluation of researchers and departments. Nevertheless, traditional journal-based citation indexes may be inadequate for the citation impact assessment of book-based disciplines. This article examines whether online citations from Google Books and Google Scholar can provide alternative sources of citation evidence. To investigate this, we compared the citation counts to 1,000 books submitted to the 2008 U. K. Research Assessment Exercise (RAE) from Google Books and Google Scholar with Scopus citations across seven book-based disciplines ( archaeology; law; politics and international studies; philosophy; sociology; history; and communication, cultural, and media studies). Google Books and Google Scholar citations to books were 1.4 and 3.2 times more common than were Scopus citations, and their medians were more than twice and three times as high as were Scopus median citations, respectively. This large number of citations is evidence that in book-oriented disciplines in the social sciences, arts, and humanities, online book citations may be sufficiently numerous to support peer review for research evaluation, at least in the United Kingdom.
This article reviews results from a research project designed to understand the mediating influence of information technology on information behavior. During the analysis of the data, five modes of information behavior were uncovered. These provide us with a reconceptualization of core information-seeking and search activities, as well as a fruitful opening to redevelop, augment, or complement existing models of information behavior. The findings resonate with emerging theories of decision making and judgement and illustrate the need for information behavior researchers to undertake research in differing contexts. The work illuminates an issue of current concern for public policy: police use of information in decision making.
Motivated by the increasing popularity of computer-mediated communication (CMC) media in university students' learning, this study employs a four-stage novel approach for analyzing and developing a structured hierarchy framework for students' usage of CMC media in learning contexts. First, media characteristics and the Uses and Gratifications (U&G) approach were adopted to understand student-specific reasons for using media. Second, a set of relevant data concerning the university students' reasons for using CMC media was collected by the Repertory Grid Interview Technique (RGT) and analyzed qualitatively using content analysis. The Interpretive Structural Modeling (ISM) technique was then used to develop a six-level hierarchical structural model of media use reasons. Finally, the cross-impact matrix multiplication applied to classification (MICMAC) technique was used to analyze the driver and dependence power for each media use reason and identify the hidden and indirect relationships among all reasons. The reasons related to students' use of CMC were classified as independent variables, linkage variables, and dependent variables. The study provides a validated typology of different clusters of interrelated students' reasons for using CMC media in learning contexts. The findings of this study will have significant implications and will be helpful for researchers, university policy-makers, instructors, and organizations in framing CMC technology implementation and use strategies.
Compared with search queries, which are usually composed of a few keywords, natural language questions can demonstrate detailed information needs through searchers' richer expressions. This study aims to provide understandings of ordinary people's image needs in their daily life, by analyzing 474 questions obtained from a social question and answer (social Q&A) site. The study found that image needs reflected through the natural language questions contain several components: context of image needs (motive and intervening variables), image attributes (descriptive metadata, syntactic, and semantic attributes), and associated information (information on known/similar/comparative images and related stories). Characteristics of each component of image needs were analyzed, and accordingly image-indexing guidelines were suggested. Because image needs comprise diverse attributes, a single indexing approach might not support all complex needs for images. Therefore, this study proposes that different indexing approaches should be integrated for enhancing keyword search and browsing effectiveness. Such approaches include descriptive metadata assigned by a creator and/or automatic algorithms, user-assigned tags (or users' reactions), indexing through associated text, and content-based image retrieval.
This methods-oriented paper introduces visual methods and specifically photography to study immediate information space (Lee, 2003); that is, information-rich settings such as offices or homes. It draws upon the authors' firsthand ethnographic field experiences, a review of relevant theoretical and methodological literature, and an analysis of cases within information studies that have made use of visual and photographic techniques. To begin, the traditions of visual research within anthropology and sociology are traced and major epistemological, methodological, and disciplinary debates associated with visual scholarship are presented. Then, investigations of immediate information space that utilize photography are analyzed, including examples from the areas of personal information management, health informatics, information behavior, and computer-supported cooperative work. Moreover, a section entitled "Applying Photographic Techniques ..." supplies guidelines for employing photography in a research design, as well as a question-based research framework and tips for photographing information phenomena.
We have developed a system that segments web pages into blocks and predicts those blocks' importance (block importance prediction or BIP). First, we use VIPS to partition a page into a tree composed of blocks and then extracts features from each block and labels all leaf nodes. This paper makes two main contributions. Firstly, we are pioneering the formulation of BIP as a sequence tagging task. We employ DFS, which outputs a single sequence for the whole tree in which related sub-blocks are adjacent. Our second contribution is using the conditional random fields (CRF) model for labeling these sequences. CRF's transition features model correlations between neighboring labels well, and CRF can simultaneously label all blocks in a sequence to find the global optimal solution for the whole sequence, not only the best solution for each block. In our experiments, our CRF-based system achieves an F1-measure of 97.41%, which significantly outperforms our ME-based baseline (95.64%). Lastly, we tested the CRF-based system using sites which were not covered in the training data. On completely novel sites CRF performed slightly worse than ME. However, when given only two training pages from a given site, CRF improved almost three times as much as ME.
The traditional view of data, information, and knowledge as a hierarchy fosters an understanding of information as an independent entity with objective meaning-that while information is tied to data and knowledge, its existence is not dependent upon them. While traditional conceptions assume a static nature of information, expressed by the equation information = data + meaning, we have argued that this understanding is based on an ontologization of an entwined process of sense making and meaning making. This process starts from the recognition of a pattern that is interpreted in away that influences our behavior. At the same time, the process character of meaning making makes us aware of the fact that this ontologized hierarchy is in fact an interwoven process. We conclude that the phenomenological analysis of this ontologization that makes into being data, information, and knowledge has to go back to this process to reveal the essential underlying dependencies.
Interdisciplinary communication, and thus the rate of progress in scholarly understanding, would be greatly enhanced if scholars had access to a universal classification of documents or ideas not grounded in particular disciplines or cultures. Such a classification is feasible if complex concepts can be understood as some combination of more basic concepts. There appear to be five main types of concept theory in the philosophical literature. Each provides some support for the idea of breaking complex into basic concepts that can be understood across disciplines or cultures, but each has detractors. None of these criticisms represents a substantive obstacle to breaking complex concepts into basic concepts within information science. Can we take the subject entries in existing universal but discipline-based classifications, and break these into a set of more basic concepts that can be applied across disciplinary classes? The author performs this sort of analysis for Dewey classes 300 to 339.9. This analysis will serve to identify the sort of 'basic concepts' that would lie at the heart of a truly universal classification. There are two key types of basic concept: the things we study (individuals, rocks, trees), and the relationships among these (talking, moving, paying).
Field-association (FA) terms give us the knowledge to identify document fields using a limited set of discriminating terms. Although many earlier methods tried to extract automatically relevant FA terms to build a comprehensive dictionary, the problem lies in the lack of an effective method to extract automatically relevant FA terms to build a comprehensive dictionary. Moreover, all previous studies are based on FA terms in English and Japanese, and the extension of FA terms to other languages such as Arabic could benefit future research in the field. We present a new method to build a comprehensive Arabic dictionary using part-of-speech, pattern rules, and corpora in Arabic language. Experimental evaluation is carried out for various fields using 251 MB of domain-specific corpora obtained from Arabic Wikipedia dumps and Alhayah news selected average of 2,825 FA terms (single and compound) per field. From the experimental results, recall and precision are 84% and 79%, respectively. We propose amended text classification methodology based on field association terms. Our approach is compared with Naive Bayes (NB) and kNN classifiers on 5,959 documents from Wikipedia dumps and Alhayah news. The new approach achieved a precision of 80.65% followed by NB (72.79%) and kNN (36.15%).
Continuing education for information professionals serves to inform professional conduct and broaden theoretical applications. Since its establishment as a regional organization in 1961, the Los Angeles Chapter of the American Society for Information Science and Technology (LACASIS&T) has provided a program for professional development relevant across a range of information institutions within a major metropolitan area. This research examines three factors influencing continuing education activities within the Los Angeles region: founding principles, services contributed by members, and program topics drawn from the range of information professions. Methods employed comprise archival research and oral history conducted with the Chapter's founder, with attention given to the industrial context of the region. The study reveals the centrality of members' contributions to professional trends and information science applications, and a continuous history of individual leaders, contributors, and program offerings presented also emerges. The effectiveness of library and information science continuing education, when presented and coordinated by groups such as chapters, is observed in successful consortial activities consisting of both inreach and outreach to prospective program participants.
It was argued recently that the g-index is a measure of a researcher's specific impact (i.e., impact per paper) as much as it is a measure of overall impact. While this is true for the productive "core" of publications, it can be argued that the g-index does not differ from the square root of the total number of citations in a bibliometrically meaningful way when the entire publication list is considered. The R-index also has a tendency to follow total impact, leaving only the A-index as a true measure of specific impact. The main difference between the g-index and the h-index is that the former penalizes consistency of impact whereas the latter rewards such consistency. It is concluded that the h-index is a better bibliometric tool than is the g-index, and that the square root of the total number of citations is a convenient measure of a researcher's overall impact.
The approach based on "thermodynamic" considerations that can quantify research performance using an exergy term defined as X = iC, where i is the impact and C is the number of citations is now extended to cases where fractionalized counting of citations is used instead of integer counting.
The correlation between GDP and research publications is an important issue in scientometrics. This article provides further empirical evidence connecting revealed comparative advantage in national research with effects on economic productivity. Using quantitative time series analysis, this study attempts to determine the nature of causal relationships between research output and economic productivity. One empirical result is that there is mutual causality between research and economic growth in Asia, whereas in Western countries the causality is much less clear. The results may be of use to underdeveloped nations deciding how to direct their academic investment and industry policy.
Agent-based computing is a diverse research domain concerned with the building of intelligent software based on the concept of "agents". In this paper, we use Scientometric analysis to analyze all sub-domains of agent-based computing. Our data consists of 1,064 journal articles indexed in the ISI web of knowledge published during a 20 year period: 1990-2010. These were retrieved using a topic search with various keywords commonly used in sub-domains of agent-based computing. In our proposed approach, we have employed a combination of two applications for analysis, namely Network Workbench and CiteSpace-wherein Network Workbench allowed for the analysis of complex network aspects of the domain, detailed visualization-based analysis of the bibliographic data was performed using CiteSpace. Our results include the identification of the largest cluster based on keywords, the timeline of publication of index terms, the core journals and key subject categories. We also identify the core authors, top countries of origin of the manuscripts along with core research institutes. Finally, our results have interestingly revealed the strong presence of agent-based computing in a number of non-computing related scientific domains including Life Sciences, Ecological Sciences and Social Sciences.
This paper analyses the growth pattern of Nanoscience and Nanotechnology literature in India during 1990-2009 (20 years). The Scopus international multidisciplinary bibliographical database has been used to identify the Indian contributions on the field of nanoscience and nanotechnology. The study measures the performance based on several parameters, country annual growth rate, authorship pattern, collaborative index, collaborative coefficient, modified collaborative coefficient, subject profile, etc. Further the study examines national publication output and impact in terms of average citations per paper, international collaboration output and share, contribution and impact of Indian Institutions and impact of Indian journals.
As they are used to evaluate the importance of research at different levels by funding agencies and promotion committees, bibliometric indices have received a lot of attention from the scientific community over the last few years. Many bibliometric indices have been developed in order to take into account aspects not previously covered. The result is that, nowadays, the scientific community faces the challenge of selecting which of this pool of indices meets the required quality standards. In view of the vast number of bibliometric indices, it is necessary to analyze how they relate to each other (irrelevant, dependent and so on). Our main purpose is to learn a Bayesian network model from data to analyze the relationships among bibliometric indices. The induced Bayesian network is then used to discover probabilistic conditional (in) dependencies among the indices and, also for probabilistic reasoning. We also run a case study of 14 well-known bibliometric indices on computer science and artificial intelligence journals.
This paper aims at analyzing and extracting the research groups from the co-authorship network of oncology in China. By use of centrality, component analysis, K-Core, M-Slice, Hierarchical Clustering analysis, and Multidimensional Scaling analysis, we studied the data from 10 Core Chinese Oncology journals between 2000 and 2009, analyzed the structure character of the Chinese Oncology research institutes. This study advances the methods for selecting the most prolific research groups and individuals in Chinese Oncology research community, and provides basis for more productive cooperation in the future. This study also provides scientific evidences and suggestions for policymakers to establish a more efficient system for managing and financing Chinese Oncology research in the future.
This paper analyzes whether methods from social network analysis can be adopted for the modeling of scientific fields in order to obtain a better understanding of the respective scientific area. The approach proposed is based on articles published within the respective scientific field and certain types of nodes deduced from these papers, such as authors, journals, conferences and organizations. As a proof of concept, the techniques discussed here are applied to the field of 'Mobile Social Networking'. For this purpose, a tool was developed to create a large data collection representing the aforementioned field. The paper analyzes various views on the complete network and discusses these on the basis of the data collected on Mobile Social Networking. The authors demonstrate that the analysis of particular subgraphs derived from the data collection allows the identification of important authors as well as separate sub-disciplines such as classic network analysis and sensor networks and also contributes to the classification of the field of 'Mobile Social Networking' within the greater context of computer science, applied mathematics and social sciences. Based on these results, the authors propose a set of concrete services which could be offered by such a network and which could help the user to deal with the scientific information process. The paper concludes with an outlook upon further possible research topics.
Employing a citation analysis, this study explored and compared the bibliometric characteristics and the subject relationship with other disciplines of and among the three leading information science journals, Journal of the American Society for Information Science and Technology (JASIST), Information Processing and Management and Journal of Documentation. The citation data were drawn from references of each article of the three journals during 1998 and 2008. The Ulrich's Periodical Directory, Library of Congress Subject Heading, retrieved from the WorldCat, and LISA database were used to identify the main class, subclass and subject of cited journals and books. Quantitative results on the number of JASIST, IPM and JOD literature references, average number of references cited per paper, document type of cited literature and the journal self-citation rate are reported. Moreover, the highly cited journals and books, the main classes and subclasses of cited journals and books in papers of the three journals, the highly cited subjects in journals and books of library and information science were identified and analyzed. Comparison on the characteristics of cited journals and books confirmed that all the three journals under study are information science oriented, except JOD which is library science orientation. JASIST and IPM are very much in common and diffuse to other disciplines more deeply than JOD.
The assessment of individual researchers using bibliometric indicators is more complex than that of a region, country or university. For large scientific bodies, averages over a large number of researchers and their outputs is generally believed to give indication of the quality of the research work. For an individual, the detailed peer evaluation of his research outputs is required and, even this, may fail in the short term to make a final, long term assessment of the relevance and originality of the work. Scientometrics assessment at individual level is not an easy task not only due to the smaller number of publications that are being evaluated, but other factors can influence significantly the bibliometric indicators applied. Citation practices vary widely among disciplines and sub disciplines and this may justify the lack of good bibliometric indicators at individual level. The main goal of this study was to develop an indicator that considers in its calculation some of the aspects that we must take into account on the assessment of scientific performance at individual level. The indicator developed, the h(nf) index, considers the different cultures of citation of each field and the number of authors per publication. The results showed that the h(nf) index can be used on the assessment of scientific performance of individual researchers and for following the performance of a researcher.
This paper presents the journal relative impact (JRI), an indicator for scientific evaluation of journals. The JRI considers in its calculation the different culture of citations presented by the Web of Science subject categories. The JRI is calculated considering a variable citation window. This citation window is defined taking into account the time required by each subject category for the maturation of citations. The type of document considered in each subject category depends on its outputs in relation to the citations. The scientific performance of each journal in relation to each subject category that it belongs to is considered allowing the comparison of the scientific performance of journals from different fields. The results obtained show that the JRI can be used for the assessment of the scientific performance of a given journal and that the SJR and SNIP should be used to complement the information provided by the JRI. The JRI presents good features as stability over time and predictability.
The increasing literature addressing international mobility of researchers has repeatedly pointed out the lack of empirical data compiled over the last two decades, jeopardizing progress in the understanding of the characteristics and impacts of such human flows. This paper makes a contribution to the field by exploring the extent to which information obtained from researchers' electronic curriculum vitae (CV) may be used to study temporary geographical mobility. We exploit a new type of data set-a comprehensive database of electronic CVs-developing a broad set of cross-discipline mobility indicators to assess the dimensions and characteristics of international research visits among a population of over 10,000 researchers. The sample population is made up of PhD holders working in the regional research system of Andalusia, Spain. Information regarding their international research visits over the last four decades is downloaded from CVs contained in the electronic scientific information system of the region. We assess mobility rates and the characteristics of the temporary mobile population. The analysis of visiting patterns shows significant differences in mobility profiles in terms of frequency, duration and destination of visits, across disciplines, career stages and time periods. The study also shows how different definitions of international mobility lead to substantial variations in cross-discipline mobility rates.
In the near future, Brazil is expected to face a number of challenges with regards to economic and social development, and scientific production is a critical aspect of this development process. Over the past 30 years, there has been an almost 18-fold increase in the number of brazilian papers published, up from about 2,000 in 1980 to more than 35,000 in 2009. In this study we analyze the evolution of scientific production in terms of input (resources and permanent investigators) and output (scientific papers and doctorate graduates). We evaluate whether structural investments and the number of investigators at universities are both able to explain the increase in the number of papers, by investigating the relationships among growth rates in investments and the quantity of the papers published, as well as the number of doctorate graduates and active permanent investigators. As an indication of the fluctuations in investments pertaining to academic research, we consider the budget history of the largest Brazilian federal agencies charged with providing academic grants. We observe that the burgeoning number of papers has occurred independently of investments and the number of established investigators, thus suggesting an increase in the efficiency of Brazilian scientific output. Moreover, this increase in efficiency has occurred in conjunction with an increased number of Doctoral graduates per year. In this context, we propose that an evaluation of the academic structure is necessary in order to ascertain the risks of this increased "efficiency". Moreover, the recent cut of over US$ 1 billion announced by the Brazilian government may jeopardize the quality of scientific output in the future.
Several studies exist which use scientific literature for comparing scientific activities (e. g., productivity, and collaboration). In this study, using co-authorship data over the last 40 years, we present the evolutionary dynamics of multi level (i.e., individual, institutional and national) collaboration networks for exploring the emergence of collaborations in the research field of "steel structures". The collaboration network of scientists in the field has been analyzed using author affiliations extracted from Scopus between 1970 and 2009. We have studied collaboration distribution networks at the micro-, meso-and macro-levels for the 40 years. We compared and analyzed a number of properties of these networks (i.e., density, centrality measures, the giant component and clustering coefficient) for presenting a longitudinal analysis and statistical validation of the evolutionary dynamics of "steel structures" collaboration networks. At all levels, the scientific collaborations network structures were central considering the closeness centralization while betweenness and degree centralization were much lower. In general networks density, connectedness, centralization and clustering coefficient were highest in marco-level and decreasing as the network size grow to the lowest in micro-level. We also find that the average distance between countries about two and institutes five and for authors eight meaning that only about eight steps are necessary to get from one randomly chosen author to another.
In the last two decades international collaboration in the Eastern European academic communities has strongly intensified. Scientists from developed countries within the European Union play a key role in stimulating the international collaboration of academics in this region. In addition, many of the research projects that engage East-European scholars are only possible in the framework of the large European programmes. The present study focuses on the role of EU and other developed nations as a partner of these countries and the analysis of the performance of collaborative research as reflected by the citation impact of internationally co-authored publications.
The single publication H-index of Schubert is applied to the papers in the Hirsch-core of a researcher, journal or topic. Four practical examples are given and regularities are explained: the regression line of the single publication H-index of the ranked papers in the Hirsch-core is decreasing. We propose two measures of indirect citation impact: the average of the single publication H-indices of the papers in the Hirsch-core and the H-index of these single publication H-indices, defined as the indirect H-index. Formulae for these indirect citation impact measures are given in the Lotkaian context.
In order to re-categorize the SCImago Journal & Country Rank (SJR) journals based on Scopus, as well as improve the SJR subject classification scheme, an iterative process built upon reference analysis of citing journals was designed. The first step entailed construction of a matrix containing citing journals and cited categories obtained through the aggregation of cited journals. Assuming that the most representative categories in each journal would be represented by the highest citation values regarding categories, the matrix vectors were reduced using a threshold to discern and discard the weakest relations. The process was refined on the basis of different parameters of a heuristic nature, including (1) the development of several tests applying different thresholds, (2) the designation of a cutoff, (3) the number of iterations to execute, and (4) a manual review operation of a certain amount of multi-categorized journals. Despite certain shortcomings related with journal classification, the method showed a solid performance in grouping journals at a level higher than categories-that is, aggregating journals into subject areas. It also enabled us to redesign the SJR classification scheme, providing for a more cohesive one that covers a good proportion of re-categorized journals.
In this article, we propose mapping and visualizing the core of scientific domains using social network analysis techniques derived from mathematical graph theory. In particular, the concept of Network of the Core is introduced which can be employed to visualize scientific domains by constructing a network among theoretical constructs, models, and concepts. A Network of the Core can be used to reveal hidden properties and structures of a research domain such as connectedness, centrality, density, structural equivalence, and cohesion, by modeling the casual relationship among theoretical constructs. Network of the Core concept can be used to explore the strengths and limitations of a research domain, and graphically and mathematically derive the number research hypotheses. The Network of the Core approach can be applied to any domain given that the investigator has a deep understanding of the area under consideration, a graphical or conceptual view (in the form of a network of association among the theoretical constructs and concepts) of the scientific domain can be obtained, and an underlying theory is available or can be constructed to support Network of the Core formation. Future research directions and several other issues related to the Network of the Core concept are also discussed.
Since the relationship between patents and Tobin's q is confusing, this paper utilizes panel threshold regression model to re-examine the relationship between patent counts/sales and Tobin's q. This study finds out patent citations/sales has a single threshold effect on the relationship between patent counts/sales and Tobin's q in the US pharmaceutical industry. The single threshold value of patent citations/sales is 328.81, and it divides the value of patent citations/sales into two regimes: the first regime (patent citations/sale a parts per thousand broken vertical bar 328.81) and the second regime (patent citations/sale > 328.81). The results indicate that patent counts/sales positively affect Tobin's q in the two regimes. In addition, this study demonstrates that the extent of the positive effect of patent counts/sales on Tobin's q is different. This study verifies that patent citations/sales moderates the relationship between patent counts/sales and Tobin's q. Once patent citations/sales is below the threshold value, the extent of the positive relationship between patent counts/sales and Tobin's q is the most. Therefore, this study finds out that the first regime is optimal.
Identifying core technologies and emerging technologies is essential for formulating national technology strategies and policies for pursuing technological competitive advantage. This study presents a quantitative method for identifying core technologies and emerging technologies in the Taiwan technological innovation system. The objective was to gain an overview of technological development in the country by analyzing patent citation networks and by identifying five core technologies and emerging technologies in Taiwan based on United States Patent and Trademark Office (USPTO) patents granted to Taiwan during 1997-2008. The findings indicate the most appropriate management of technology and innovation and the best patent strategy and technology policy that the Taiwan government should pursue. Research institutes, industries and academia are also given research directions for choosing the technologies in which they should invest resources in order to strengthen the Taiwan technological innovation system and to increase its competitive advantage in global technology.
Triadic patents minimise home bias effects in studies that focus on patent counts as a measure of innovative activity. Yet, biases in qualitative patent indicators have been largely neglected. This article advocates that forward patent citations, and triadic citations in particular, can illuminate further on home bias, self citations, and the speed of knowledge flows for drug patents published by the USPTO for the period 1980-2008. The evidence shows that triadic citations help to minimize the home bias in citations as well as to make patent quality more transparent. Also, it indicates that self citations and the age distribution of citations are important factors to consider when explaining cross-country differences in pharmaceutical citations.
Schubert (Scientometrics, 78:559-565, 2009) showed that "a Hirsch-type index can be used for assessing single highly cited publications by calculating the h index of the set of papers citing the work in question" (p. 559). To demonstrate that this single publication h index is a useful yardstick to compare the quality of different publications; the index should be strongly related to the assessment by peers. In a comprehensive research project we investigated the peer review process of the Angewandte Chemie International Edition. The data set contains manuscripts reviewed in the year 2000 and accepted by the journal or rejected but published elsewhere. Single publication h index values were calculated for a total of 1,814 manuscripts. The results show a correlation in the expected direction between peer assessments and single publication h index values: After publication, manuscripts with positive ratings by the journal's reviewers show on average higher h index values than manuscripts with negative ratings by reviewers (and later published elsewhere). However, our findings do not support Schubert's (2009) assumption that the additional dimension of indirect citation influence contributes to a more refined picture of the most cited papers.
The demographical data of the National Science Foundation on research doctorate awardees in the United States is studied in this article. While the overall growth rate of research doctorate awardees is approximately the same as the growth rate of the whole population in the U.S. there are considerable changes in the sub-populations of research doctorate awardees. The demographic data is evaluated/discussed in more detail with respect to gender and research fields of the doctorate awardees. In particular the notion of the primacy of technology over science in the postmodern era is examined and found to be justified.
The increased use of e-learning techniques as an accepted form of teaching has resulted in a growing volume of academic research dedicated to their assessment. Despite the importance of the technique, there is little comprehensive knowledge on e-learning, especially in non-educational fields. Author co-citation analysis (ACA) is an analytical method for identifying the intellectual structure of specific knowledge domains through the relationship between two similar authors. ACA has been applied to many fields, such as information retrieval, knowledge management, and strategic management; however, it has not yet been used to analyze e-learning development. This study examines the intellectual structure of e-learning from the perspective of management information systems (MIS). By applying the ACA method, we analyze and categorize international and Taiwanese research topics into clusters. Our results show that Taiwanese authors put more effort into practical studies of business training, while international authors focus on a users' psychological reaction to learning context. Altogether, our research provides a clear intellectual analysis of e-learning practices from 1996 to 2009, enabling us to thoroughly study and understand the influence of these techniques on modern education.
The purpose of this study is to examine efficiency and its determinants in a set of higher education institutions (HEIs) from several European countries by means of non-parametric frontier techniques. Our analysis is based on a sample of 259 public HEIs from 7 European countries across the time period of 2001-2005. We conduct a two-stage DEA analysis (Simar and Wilson in J Economet 136:31-64, 2007), first evaluating DEA scores and then regressing them on potential covariates with the use of a bootstrapped truncated regression. Results indicate a considerable variability of efficiency scores within and between countries. Unit size (economies of scale), number and composition of faculties, sources of funding and gender staff composition are found to be among the crucial determinants of these units' performance. Specifically, we found evidence that a higher share of funds from external sources and a higher number of women among academic staff improve the efficiency of the institution.
A keyword analysis was applied in this work to evaluate research trends of eutrophication papers published between 1991 and 2010 in any journal of all the subject categories of the Science Citation Index compiled by Institute for Scientific Information, Philadelphia, USA. Eutrophication was used as a keyword to search parts of titles, abstracts, or keywords. The published output analysis showed that eutrophication research steadily increased over the past 20 years and the annual publication output in 2008, 2009, 2010 were about four times that of 1991. The whole paper published by China ranked at 3rd, but these papers' IF were lower than the average of the world. "Water Framework Directive" and "Life Cycle Assessment" were two of the most frequently used author keywords in the period between 1999 and 2010 whilst they did not appear before 1998. These new conception indicated eutrophication research trend was changing to policy and management from technological researches.
We have developed a set of routines that allows to draw easily different maps of the research carried out in a scientific institution. Our toolkit uses OpenSource elements to analyze bibliometric data gathered from the Web Of Science. We take the example of our institution, ENS de Lyon, to show how different maps, using co-occurrence (of authors, keywords, institutionsaEuro broken vertical bar) and bibliographic coupling can be built. These maps may become a valuable tool for discussing institutions' policies, as they offer different views on the institution at a global scale.
Citation frequency is often used in hiring and tenure decisions as an indicator of the quality of a researcher's publications. In this paper, we examine the influence of discipline, institution, journal impact factor, length of article, number of authors, seniority of author, and gender on citation rate of top-cited papers for academic faculty in geography and forestry departments. Self-citation practices and patterns of citation frequency across post-publication lifespan were also examined. Citation rates of the most-highly cited paper for all tenured forestry (N = 122) and geography (N = 91) faculty at Auburn University, Michigan State University, Northern Arizona University, Oklahoma State University, Pennsylvania State University, Texas A&M University, University of Florida, University of Massachusetts, University of Washington, and Virginia Tech were compared. Foresters received significantly more citations than geographers (t = 2.46, P = 0.02) and more senior authors received more citations than junior researchers (r (2) = 0.14, P = 0.03). Articles published in journals with higher impact factors also received more citations (r (2) = 0.28, P = 0.00). The median self-citation rate was 10% and there was no temporal pattern to the frequency of citations received by an individual article (x (2) = 176). Our results stress the importance of only comparing citation rates within a given discipline and confirm the importance of author-seniority and journal rankings as factors that influence citation rate of a given article.
Given the high priority accorded to research collaboration on the assumption that it yields higher productivity and impact rates than do non-collaborative results, research collaboration modes are assessed for their benefits and costs before being executed. Researchers are accountable for selecting their collaboration modes, a decision made through strategic decision making influenced by their environments and the trade-offs among alternatives. In this context, by using bibliographic information and related internal data from the Korea Institute of Machinery and Materials (KIMM, a representative Korean government institute of mechanical research), this paper examines the suggested yet unproven determinants of research collaboration modes that the SCI data set cannot reveal through a Multinomial Probit Model. The results indicate that informal communication, cultural proximity, academic excellence, external fund inspiration, and technology development levels play significant roles in the determination of specific collaboration modes, such as sole research, internal collaboration, domestic collaboration, and international collaboration. This paper refines collaboration mode studies by describing the actual collaboration phenomenon as it occurs in research institutes and the motivations prompting research collaboration, allowing research mangers to encourage researchers to collaborate in an appropriate decision-making context.
Based on historical citation data from the ISI Web of Science, this paper introduces a methodology to automatically calculate and classify the real career h-index sequences of scientists. Such a classification is based on the convexity-concavity features of the different temporal segments of h-index sequences. Five categories are identified, namely convex, concave, S-shaped, IS-shaped and linear. As a case study, the h-index sequences of several Nobel Prize winners in Medicine, Chemistry and Economics are investigated. Two proposed factors influencing the growth of the h-index, namely the "freshness" of the h-index core and changes in the rank positions of papers near the h-index point are studied. It is found that the h-index core's "freshness" is particularly relevant to the growth of the h-index. Moreover, although in general more publications lead to an increase of the h-index, the key role is played by those papers near the h-index point. (C) 2011 Elsevier Ltd. All rights reserved.
The evolution of the Web has promoted a growing interest in social network analysis, such as community detection. Among many different community detection approaches, there are two kinds that we want to address: one considers the graph structure of the network (topology-based community detection approach); the other one takes the textual information of the network nodes into consideration (topic-based community detection approach). This paper conducted systematic analysis of applying a topology-based community detection approach and a topic-based community detection approach to the coauthorship networks of the information retrieval area and found that: (1) communities detected by the topology-based community detection approach tend to contain different topics within each community; and (2) communities detected by the topic-based community detection approach tend to contain topologically-diverse sub-communities within each community. The future community detection approaches should not only emphasize the relationship between communities and topics, but also consider the dynamic changes of communities and topics. Published by Elsevier Ltd.
We propose a novel yet practical method capturing an individual's research or innovation performance by the shape centroids of the h-core and h-tail areas of its publications or patents. A large number of individuals' relative performance with respect to their h-cores and h-tails can be simultaneously positioned and conveniently observed in two-dimensional coordinate systems. Two approaches are further proposed to the utilization of the two-dimensional distribution of shape centroids. The first approach specifically determines, within a group of individuals, those outperforming or being outperformed by a target individual. The second approach provides a quick qualitative categorization of the individuals so that the nature of their performance is revealed. Using patent assignees as an illustrative case, the approaches are tested with empirical patent assignee data. Crown Copyright (C) 2011 Published by Elsevier Ltd. All rights reserved.
The basic concepts of progressive nucleation mechanism are described and the final equations of the mechanism are used to analyze the growth of articles in three randomly selected databases from 20 different databases in humanities (philosopher's index, set 1), social sciences (exceptional child education, set 5) and science and technology (food science and technology, set 10), respectively, covering the period 1968-1987, previously analyzed by Egghe and Ravichandra Rao (1992, Scientometrics 25 (1), 5-46), and the growth of journals, articles and authors in malaria research for the period 1955-2005, reported recently by Ravichandra Rao and Srivastava (2010, Journal of Informetrics 4 (1), 249-256) and compared with the predictions of the power-law equation. Analysis of the former data revealed that: (1) the progressive nucleation mechanism describes the data better than the power-law relation, (2) the field of social sciences is saturated much earlier than science and technology but publication activity in humanities is saturated much later, and (3) that social sciences have the maximum growth, followed by lower growth in humanities and the lowest growth in science and technology. The data on journals J(t), papers N(t) and authors W(t) against publication year Y in malaria research can be described equally well by equations of the power-law and progressive nucleation mechanism, and the growth of journals J(t) and articles N(t) is intimately connected with the growth of authors W(t). (C) 2011 Elsevier Ltd. All rights reserved.
As research becomes an ever more globalized activity, there is growing interest in national and international comparisons of standards and quality in different countries and regions. A sign for this trend is the increasing interest in rankings of universities according to their research performance, both inside but also outside the scientific environment. New methods presented in this paper, enable us to map centers of excellence around the world using programs that are freely available. Based on Scopus data, field-specific excellence can be identified and agglomerated in regions and cities where recently highly cited papers were published. Differences in performance rates can be visualized on the map using colours and sizes of the marks. (C) 2011 Elsevier Ltd. All rights reserved.
Spatial scientometrics has attracted a lot of attention in the very recent past. The visualization methods (density maps) presented in this paper allow for an analysis revealing regions of excellence around the world using computer programs that are freely available. Based on Scopus and Web of Science data, field-specific and field-overlapping scientific excellence can be identified in broader regions (worldwide or for a specific continent) where high quality papers (highly cited papers or papers published in Nature or Science) were published. We used a geographic information system to produce our density maps. We also briefly discuss the use of Google Earth. (C) 2011 Elsevier Ltd. All rights reserved.
Dependence of citations L(t) at time t on the publication duration t of 10 arbitrarily selected Polish professors is analyzed using equations based on power law and exponential growth and on progressive nucleation mechanism for overall crystallization in fixed volume. The former two approaches are well known in the bibliometric literature, but the last approach is new for the analysis of growth of citations and other related phenomena. It was found that: (1) power-law relation and exponential growth are relatively inadequate to analyze the data of all authors due to large scatter in the L(t) data of various authors, (2) in view of poor fit at low or high values of t, the exponential growth relation is worse than the power-law relation, and (3) the progressive nucleation mechanism describes the data reasonably well and gives information on the processes of sources of citations and the growth of theses citation sources. (C) 2011 Elsevier Ltd. All rights reserved.
Scientometric models, which consider papers in a undifferentiated way, are blind to important features of the citation network. We propose an approach for the definition of a function P-S, for any set of scientific articles S, which reflects global properties of the citation network associated to S. Such a function, that we propose as a measure of the impact of scientific papers, is constructed as solution of an iterated system of Perron-eigenvalue problems. We discuss differences with previously defined measures, in particular of the PageRank type. (C) 2011 Elsevier Ltd. All rights reserved.
The ongoing globalisation of science has undisputedly a major impact on how and where scientific research is being conducted nowadays. Yet, the big picture remains blurred. It is largely unknown where this process is heading, and at which rate. Which countries are leading or lagging? Many of its key features are difficult if not impossible to capture in measurements and comparative statistics. Our empirical study measures the extent and growth of scientific globalisation in terms of physical distances between co-authoring researchers. Our analysis, drawing on 21 million research publications across all countries and fields of science, reveals that contemporary science has globalised at a fairly steady rate during recent decades. The average collaboration distance per publication has increased from 334 km in 1980 to 1553 km in 2009. Despite significant differences in globalisation rates across countries and fields of science, we observe a pervasive process in motion, moving towards a truly interconnected global science system. (C) 2011 Elsevier Ltd. All rights reserved.
In the present study we attempt to trace the diffusion of h-related literature over a five-year period beginning with the introduction of the h-index. The study is based on a reliable and representative publication set of 755 papers retrieved from the Web of Science database using keywords and citation links. In the course of the study we analyse several aspects of the emergence of this topic, the differentiation of methodological research, its application within and outside the field and the dissemination process of information among different disciplines in the sciences and social sciences. Finally, a cluster analysis of h-related literature is conducted. The hybrid clustering algorithm results in four clusters, which depict two different aspects each of basic and applied research related to the h-index and its derivatives. Crown Copyright (C) 2011 Published by Elsevier Ltd. All rights reserved.
In this study, we develop a theoretical model based on social network theories and analytical methods for exploring collaboration (co-authorship) networks of scholars. We use measures from social network analysis (SNA) (i.e., normalized degree centrality, normalized closeness centrality, normalized betweenness centrality, normalized eigenvector centrality, average ties strength, and efficiency) for examining the effect of social networks on the (citation-based) performance of scholars in a given discipline (i.e., information systems). Results from our statistical analysis using a Poisson regression model suggest that research performance of scholars (g-index) is positively correlated with four SNA measures except for the normalized betweenness centrality and the normalized closeness centrality measures. Furthermore, it reveals that only normalized degree centrality, efficiency, and average ties strength have a positive significant influence on the g-index (as a performance measure). The normalized eigenvector centrality has a negative significant influence on the g-index. Based on these results, we can imply that scholars, who are connected to many distinct scholars, have a better citation-based performance (g-index) than scholars with fewer connections. Additionally, scholars with large average ties strengths (i.e., repeated co-authorships) show a better research performance than those with low tie strengths (e.g., single co-authorships with many different scholars). The results related to efficiency show that scholars, who maintain a strong co-authorship relationship to only one co-author of a group of linked co-authors, perform better than those researchers with many relationships to the same group of linked co-authors. The negative effect of the normalized eigenvector suggests that scholars should work with many students instead of other well-performing scholars. Consequently, we can state that the professional social network of researchers can be used to predict the future performance of researchers. (C) 2011 Elsevier Ltd. All rights reserved.
We investigated the occurrence of non-alphanumeric characters in a randomized subset of over almost 650,000 titles of scientific publications from the Web of Science database. Additionally, for almost 500,000 of these publications we correlated occurrence with impact, using the field-normalised citation metric CPP/FCSm. We compared occurrence and correlation with impact both at in general and for specific disciplines and took into account the variation within sets by (non-parametrically) bootstrapping the calculation of impact values. We also compared use and impact of individual characters in the 30 fields in which non-alphanumeric characters occur most frequently, by using heatmaps that clustered and reordered fields and characters. We conclude that the use of some non-alphanumeric characters, such as the hyphen and colon, is common in most titles and that not including such characters generally correlates negatively with impact. Specific disciplines on the other hand, may show either a negative, absent, or positive correlation. We also found that thematically related science fields use non-alphanumeric characters in comparable numbers, but that impact associated with such characters shows a less strong thematic relation. Overall, it appears that authors cannot influence success of publications by including non-alphanumeric characters in fields where this is not already commonplace. (C) 2011 Published by Elsevier Ltd.
The current work proposes an application of DEA methodology for measurement of technical and allocative efficiency of university research activity. The analysis is based on bibliometric data from the Italian university system for the five-year period 2004-2008. Technical and allocative efficiency is measured with input being considered as a university's research staff, classified according to academic rank, and with output considered as the field-standardized impact of the research product realized by these staff. The analysis is applied to all scientific disciplines of the so-called hard sciences, and conducted at subfield level, thus at a greater level of detail than ever before achieved in national-scale research assessments. (C) 2011 Elsevier Ltd. All rights reserved.
The purpose of this study is to: (1) develop a ranking of peer-reviewed AI journals; (2) compare the consistency of journal rankings developed with two dominant ranking techniques, expert surveys and journal impact measures; and (3) investigate the consistency of journal ranking scores assigned by different categories of expert judges. The ranking was constructed based on the survey of 873 active AI researchers who ranked the overall quality of 182 peer-reviewed AI journals. It is concluded that expert surveys and citation impact journal ranking methods cannot be used as substitutes. Instead, they should be used as complementary approaches. The key problem of the expert survey ranking technique is that in their ranking decisions, respondents are strongly influenced by their current research interests. As a result, their scores merely reflect their present research preferences rather than an objective assessment of each journal's quality. In addition, the application of the expert survey method favors journals that publish more articles per year. (C) 2011 Elsevier Ltd. All rights reserved.
This paper analyses relationships between university research performance and concentration of university research. Using the number of publications and their citation impact extracted from Scopus as proxies of research activity and research performance, respectively, it examines at a national level for 40 major countries the distribution of published research articles among its universities, and at an institutional level for a global set of 1500 universities the distribution of papers among 16 main subject fields. Both at a national and an institutional level it was found that a larger publication output is associated with a higher citation impact. If one conceives the number of publications as a measure of concentration, this outcome indicates that, in university research, concentration and performance are positively related, although the underlying causal relationships are complex. But a regression analysis found no evidence that more concentration of research among a country's universities or among an institution's main fields is associated with better overall performance. The study reveals a tendency that the research in a particular subject field conducted in universities specializing in other fields outperforms the work in that field in institutions specializing in that field. This outcome may reflect that it is multi-disciplinary research that is the most promising and visible at the international research front, and that this type of research tends to develop better in universities specializing in a particular domain and expanding their capabilities in that domain towards other fields. (C) 2011 Elsevier Ltd. All rights reserved.
With the passage of more time from the original date of publication, the measure of the impact of scientific works using subsequent citation counts becomes more accurate. However the measurement of individual and organizational research productivity should ideally refer to a period with closing date just prior to the evaluation exercise. Therefore it is necessary to compromise between accuracy and timeliness. This work attempts to provide an order of magnitude for the error in measurement that occurs with decreasing the time lapse between date of publication and citation count. The analysis is conducted by scientific discipline on the basis of publications indexed in the Thomson Reuters Italian National Citation Report. (C) 2011 Elsevier Ltd. All rights reserved.
We introduce the h-degree of a node as a basic indicator for weighted networks. The h-degree (d(h)) of a node is the number d(h) if this node has at least d(h) links with other nodes and the strength of each of these links is greater than or equal to d(h). Based on the notion of h-degree other notions are developed such as h-centrality and h-centralization, leading to a new set of indicators characterizing nodes in a network. (C) 2011 Elsevier Ltd. All rights reserved.
In this paper CITAN, the CITation ANalysis package for R statistical computing environment, is introduced. The main aim of the software is to support bibliometricians with a tool for preprocessing and cleaning bibliographic data retrieved from SciVerse Scopus and for calculating the most popular indices of scientific impact. To show the practical usability of the package, an exemplary assessment of authors publishing in the fields of scientometrics and webometrics is performed. (C) 2011 Elsevier Ltd. All rights reserved.
Journal impact factors continue to play an important role in research output assessment, in spite of the criticisms and debates around them. The impact factor rankings provided in the Journal Citation Reports (JCR (TM)) database by Thompson Reuters have enjoyed a position of monopoly for many years. But this has recently changed with the availability of the Scopus (TM) database and its associated journal ranking published in the Scimago Journal Rank (SJR) Web page, as the former provides a citation database with similar inclusion criteria to those used in the JCR and the latter and openly accessible impact factor-based ranking. The availability of alternatives to the JCR impact factor listings using a different citation database raises the question of the extent to which the two rankings can be considered equally valid for research evaluation purposes. This paper reports the results of a contrast of both listings in Computer Science-related topics. It attempts to answer the validity question by comparing the impact factors of journals ranked in both listings and their relative position. The results show that impact factors for journals included in both rankings are strongly correlated, with SJR impact factors in general slightly higher, confirming previous studies related to other disciplines. Nonetheless, the consideration of tercile and quartile position of journal yields some divergences for journals appearing in both rankings that need to be accounted for in research evaluation procedures. (C) 2011 Elsevier Ltd. All rights reserved.
This article presents a review and analysis of the research literature in social Q&A (SQA), a term describing systems where people ask, answer, and rate content while interacting around it. The growth of SQA is contextualized within the broader trend of user-generated content from Usenet to Web 2.0, and alternative definitions of SQA are reviewed. SQA sites have been conceptualized in the literature as simultaneous examples of tools, collections, communities, and complex sociotechnical systems. Major threads of SQA research include user-generated and algorithmic question categorization, answer classification and quality assessment, studies of user satisfaction, reward structures, and motivation for participation, and how trust and expertise are both operationalized by and emerge from SQA sites. Directions for future research are discussed, including more refined conceptions of SQA site participants and their roles, unpacking the processes by which social capital is achieved, managed, and wielded in SQA sites, refining question categorization, conducting research within and across a wider range of SQA sites, the application of economic and game-theoretic models, and the problematization of SQA itself.
The arbitrary division into lines and pages of the book in its present format, does not correspond at all, with the presentation of ideas. (Otlet, 1911, p. 291) Most historical explanations of interfaces are technological and start with the computer age. We propose a different approach by focusing on the history of library and information sciences, particularly on the case of Paul Otlet (1868-1944). Otlet's attempts to integrate and distribute knowledge imply the need for interfaces, and his conceptualizations are reminiscent of modern versions of interfaces that are intended to facilitate manual and mechanical data integration and enrichment. Our discussion is based on a selection from the hundreds of images of what we may think of as "interfaces" that Otlet made or commissioned during his life. We examine his designs for interfaces that involve bibliographic cards, that allow data enrichment, his attempts to visualize interfaces between the sciences and between universal and personal classifications, and even his attempts to create interfaces to the world. In particular, we focus on the implications of Otlet's dissection of the organization of the book for the creation of interfaces to a new order of public knowledge. Our view is that the creative ways in which he faces tensions of scalability, representation, and perception of relationships between knowledge objects might be of interest today.
Computer users today rely on a wide variety of software tools to manage an ever-increasing amount of information and resources. We developed the Global Information Gatherer (GIG) system to help students in higher education manage, understand, and keep their academic work. GIG provides a comprehensive, integrative interface through which students can access commonly used programs and simultaneously record notes and organize files. This article presents an overview of the GIG program before describing a large-scale, longitudinal, and unrestricted evaluation of its use. We investigate how such a program is received by nontechnical users, which features prove most helpful to students as they work to complete their everyday tasks, how it compares to other software solutions, and whether it helps with information assimilation and management tasks. Results of our study indicate that participants have a strong preference for software that minimizes program window manipulation, facilitates information consolidation and organization, provides citation support and integrated web browsing, and incorporates a progressive user interface design. When comparing GIG to their normal way of accomplishing tasks, students gave particularly high marks for its ability to save materials from the web, gather sources for academic research, manage windows, and copy/paste from the web. On the third and final survey of our evaluation, we learned that a majority (>70%) of remaining participants believed that GIG was helpful for managing and making sense of the large volume of information to which they are exposed everyday, and over half (55%) said they would continue using the software if it was freely available.
Repurposive appropriation is a creative everyday act in which a user invents a novel use for information technology (IT) and adopts it. This study is the first to address its prevalence and predictability in the consumer IT context. In all, 2,379 respondents filled in an online questionnaire on creative uses of digital cameras, such as using them as scanners, periscopes, and storage media. The data reveal that such creative uses are adopted by about half of the users, on average, across different demographic backgrounds. Discovery of a creative use on one's own is slightly more common than is learning it from others. Most users discover the creative uses either completely on their own or wholly through learning from others. Our regression model explains 34% of the variance in adoption of invented uses, with technology cognizance orientation, gender, exploration orientation, use frequency, and use tenure as the strongest predictors. These findings have implications for both design and marketing.
Tags associated with social images are valuable information source for superior image search and retrieval experiences. Although various heuristics are valuable to boost tag-based search for images, there is a lack of general framework to study the impact of these heuristics. Specifically, the task of ranking images matching a given tag query based on their associated tags in descending order of relevance has not been well studied. In this article, we take the first step to propose a generic, flexible, and extensible framework for this task and exploit it for a systematic and comprehensive empirical evaluation of various methods for ranking images. To this end, we identified five orthogonal dimensions to quantify the matching score between a tagged image and a tag query. These five dimensions are: (i) tag relatedness to measure the degree of effectiveness of a tag describing the tagged image; (ii) tag discrimination to quantify the degree of discrimination of a tag with respect to the entire tagged image collection; (iii) tag length normalization analogous to document length normalization in web search; (iv) tag-query matching model for the matching score computation between an image tag and a query tag; and (v) query model for tag query rewriting. For each dimension, we identify a few implementations and evaluate their impact on NUS-WIDE dataset, the largest human-annotated dataset consisting of more than 269K tagged images from Flickr. We evaluated 81 single-tag queries and 443 multi-tag queries over 288 search methods and systematically compare their performances using standard metrics including Precision at top-K, Mean Average Precision (MAP), Recall, and Normalized Discounted Cumulative Gain (NDCG).
This study examines the aggregate bandwagon effect of popularity cues on the viewership of online user-generated videos. Cognitive and behavioral theories of information processing suggest that Web users, overwhelmed by information and quality uncertainty, will gravitate toward the popular choices made by earlier decision makers, which appear via indicators such as hit counts to forge quality impressions. Building on the theories, we hypothesize that how much viewer exposure videos will attract at any future time depends on their viewership accumulated individually; furthermore, this viewership cascade is moderated by pictorial and verbal preview because such information reduces quality uncertainty for content shoppers. Our longitudinal model tests these hypotheses using an extensive real-life dataset on video clips retrieved from a video-sharing site.
Building upon previous research on the concepts of inquiring minds and elicitation styles (Wu, 2005; Wu & Liu, 2003), this study aims to identify the relationships between the theoretical constructs of elicitation behavior and user satisfaction in terms of the relevance, utility, and satisfaction of search results, search interaction processes, and overall search activities. Descriptive statistical analysis is applied to compare the user satisfaction ratings with respect to the concepts of inquiring minds and elicitation styles. The results suggest that the stereotyped elicitation style received the lowest user satisfaction ratings compared with functionally and situationally oriented styles. It is suggested that the intermediaries take into account the characteristics of search questions and, accordingly, adapt their professional mindsets to search interview situations; that is, using an inquiring mind in the query formulation process as default mode with functional and situational styles of elicitations would be helpful for enhancing the user's satisfaction ratings. Future research is suggested to better understand and to improve professional talk in information services.
We propose a minimum span clustering (MSC) method for clustering and visualizing complex networks using the interrelationship of network components. To demonstrate this method, it is applied to classify the social science network in terms of aggregated journal-journal citation relations of the Institute of Scientific Information (ISI) Journal Citation Reports. This method of network classification is shown to be efficient, with a processing time that is linear to network size. The classification results provide an in-depth view of the network structure at various scales of resolution. For the social science network, there are 4 resolution scales, including 294 batches of journals at the highest scale, 65 categories of journals at the second, 15 research groups at the third scale, and 3 knowledge domains at the lowest resolution. By comparing the relatedness of journals within clusters, we show that our clustering method gives a better classification of social science journals than ISI's heuristic approach and hierarchical clustering. In combination with the minimum spanning tree approach and multi-dimensional scaling, MSC is also used to investigate the general structure of the network and construct a map of the social science network for visualization.
Using the Arts & Humanities Citation Index (A&HCI) 2008, we apply mapping techniques previously developed for mapping journal structures in the Science and Social Sciences Citation Indices. Citation relations among the 110,718 records were aggregated at the level of 1,157 journals specific to the A&HCI, and the journal structures are questioned on whether a cognitive structure can be reconstructed and visualized. Both cosine-normalization (bottom up) and factor analysis (top down) suggest a division into approximately 12 subsets. The relations among these subsets are explored using various visualization techniques. However, we were not able to retrieve this structure using the Institute for Scientific Information Subject Categories, including the 25 categories that are specific to the A&HCI. We discuss options for validation such as against the categories of the Humanities Indicators of the American Academy of Arts and Sciences, the panel structure of the European Reference Index for the Humanities, and compare our results with the curriculum organization of the Humanities Section of the College of Letters and Sciences of the University of California at Los Angeles as an example of institutional organization.
The counting of papers and citations is fundamental to the assessment of research productivity and impact. In an age of increasing scientific collaboration across national borders, the counting of papers produced by collaboration between multiple countries, and citations of such papers, raises concerns in country-level research evaluation. In this study, we compared the number counts and country ranks resulting from five different counting methods. We also observed inflation depending on the method used. Using the 1989 to 2008 physics papers indexed in ISI's Web of Science as our sample, we analyzed the counting results in terms of paper count (research productivity) as well as citation count and citation-paper ratio (CP ratio) based evaluation (research impact). The results show that at the country-level assessment, the selection of counting method had only minor influence on the number counts and country rankings in each assessment. However, the influences of counting methods varied between paper count, citation count, and CP ratio based evaluation. The findings also suggest that the popular counting method (whole counting) that gives each collaborating country one full credit may not be the best counting method. Straight counting that accredits only the first or the corresponding author or fractional counting that accredits each collaborator with partial and weighted credit might be the better choices.
In non-English-speaking countries the measurement of research output in the social sciences and humanities (SSH) using standard bibliographic databases suffers from a major drawback: the underrepresentation of articles published in local, non-English, journals. Using papers indexed (1) in a local database of periodicals (Erudit) and (2) in the Web of Science, assigned to the population of university professors in the province of Quebec, this paper quantifies, for individual researchers and departments, the importance of papers published in local journals. It also analyzes differences across disciplines and between French-speaking and English-speaking universities. The results show that, while the addition of papers published in local journals to bibliometric measures has little effect when all disciplines are considered and for anglophone universities, it increases the output of researchers from francophone universities in the social sciences and humanities by almost a third. It also shows that there is very little relation, at the level of individual researchers or departments, between the output indexed in the Web of Science and the output retrieved from the Erudit database; a clear demonstration that the Web of Science cannot be used as a proxy for the "overall" production of SSH researchers in Quebec. The paper concludes with a discussion on these disciplinary and language differences, as well as on their implications for rankings of universities.
This study examines longitudinal trends in the university-industry-government (UIG) relationship on the web in the Korean context by using triple helix (TH) indicators. The study considers various Internet resources, including websites/documents, blogs, online cafes, Knowledge-In (comparable to Yahoo! Answers), and online news sites, by employing webometric and co-word analysis techniques to ascertain longitudinal trends in the UIG relationship, which have received considerable attention in the last decade. The results indicate that the UIG relationship varied according to the government's policies and that there was some tension in the longitudinal UIG relationship. Further, websites/documents and blogs were the most reliable sources for examining the strength of and variations in the bilateral and trilateral UIG relationships on the web. In addition, web-based T(uig) values showed a stronger trilateral relationship and larger variations in the UIG relationship than Science Citation Index-based T(uig) values. The results suggest that various Internet resources (e. g., advanced search engines, websites/documents, blogs, and online cafes), together with TH indicators, can be used to explore the UIG relationship on the web.
Accurately evaluating a researcher and the quality of his or her work is an important task when decision makers have to decide on such matters as promotions and awards. Publications and citations play a key role in this task, and many previous studies have proposed using measurements based on them for evaluating researchers. Machine learning techniques as a way of enhancing the evaluating process have been relatively unexplored. We propose using a machine learning approach for evaluating researchers. In particular, the proposed method combines the outputs of three learning techniques (logistics regression, decision trees, and artificial neural networks) to obtain a unified prediction with improved accuracy. We conducted several experiments to evaluate the model's ability to: (a) classify researchers in the field of artificial intelligence as Association for the Advancement of Artificial Intelligence (AAAI) fellows and (b) predict the next AAAI fellowship winners. We show that both our classification and prediction methods are more accurate than are previous measurement methods, and reach a precision rate of 96% and a recall of 92%.
In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our thesis. We show that temporally aware measures, specifically the proposed revision term frequency and revision term frequency span, outperform a term-weighting measure based on raw term frequency alone.
Biomedical decision making often requires relevant evidence from the biomedical literature. Retrieval of the evidence calls for a system that receives a natural language query for a biomedical information need and, among the huge amount of texts retrieved for the query, ranks relevant texts higher for further processing. However, state-of-the-art text rankers have weaknesses in dealing with biomedical queries, which often consist of several correlating concepts and prefer those texts that completely talk about the concepts. In this article, we present a technique, Proximity-Based Ranker Enhancer (PRE), to enhance text rankers by term-proximity information. PRE assesses the term frequency (TF) of each term in the text by integrating three types of term proximity to measure the contextual completeness of query terms appearing in nearby areas in the text being ranked. Therefore, PRE may serve as a preprocessor for (or supplement to) those rankers that consider TF in ranking, without the need to change the algorithms and development processes of the rankers. Empirical evaluation shows that PRE significantly improves various kinds of text rankers, and when compared with several state-of-the-art techniques that enhance rankers by term-proximity information, PRE may more stably and significantly enhance the rankers.
In this article, we propose a new concept-based method for document classification. The conceptual knowledge associated with the words is drawn from Wikipedia. The purpose is to utilize the abundant semantic relatedness information available in Wikipedia in an efficient value function-based query learning algorithm. The procedure learns the value function by solving a simple linear programming problem formulated using the training documents. The learning involves a step-wise iterative process that helps in generating a value function with an appropriate set of concepts (dimensions) chosen from a collection of concepts. Once the value function is formulated, it is utilized to make a decision between relevance and irrelevance. The value assigned to a particular document from the value function can be further used to rank the documents according to their relevance. Reuters newswire documents have been used to evaluate the efficacy of the procedure. An extensive comparison with other frameworks has been performed. The results are promising.
In this paper a novel method for detecting plagiarized passages in document collections is presented. In contrast to previous work in this field that uses content terms to represent documents, the proposed method is based on a small list of stopwords (i.e., very frequent words). We show that stopword n-grams reveal important information for plagiarism detection since they are able to capture syntactic similarities between suspicious and original documents and they can be used to detect the exact plagiarized passage boundaries. Experimental results on a publicly available corpus demonstrate that the performance of the proposed approach is competitive when compared with the best reported results. More importantly, it achieves significantly better results when dealing with difficult plagiarism cases where the plagiarized passages are highly modified and most of the words or phrases have been replaced with synonyms.
Boundary objects are artifacts that reside in the interface between communities and are capable of bridging assumed and experienced differences. Bridging is not, however, necessarily a neutral or a consensual activity. With an emphasis on documents, the present article discusses the politics of boundary objects by analyzing the role of archaeological reports at boundaries between communities with conflicting interests. The analysis demonstrates and discusses the political and purposeful nature of boundary objects-how they are devices for creating and maintaining hegemonies within communities and achieving authority over other intersecting groups of people. The study uses the notion of hegemony and the discourse theory of Laclau and Mouffe (2001) to conceptualize the role of boundary objects as articulations of power and to explicate the dynamics of how the power is exercised.
There is a burgeoning interest among academic scientists and policy-makers in the development and employment of TH (Triple Helix) and WSI (Webometrics, Scientometrics, and Informetrics) research methods. However, the international literature has not systematically examined TH and WSI approaches in an Asian context. Furthermore, previous literature published in international journals does not adequately address the social forces shaping TH development in Asia. Therefore, the purpose of this special issue is to bring researchers together to discuss university-industry-government (U-I-G) relations and innovation diffusion in Asia employing WSI alongside other methods.
Noting the government's role in diffusing information across various sectors of society, this study analyzes the Twitter activity of the Ministry for Food, Agriculture, Forestry and Fisheries (MFAFF), one of Korea's government organizations. From a broad perspective, this study provides a better understanding of innovation activity mediated by social media-particularly the government's Twitter activity, a topic that has not been addressed by previous webometric research on Triple Helix relationships-by employing social network analysis and content analysis. The results indicate some limitations of the MFAFF's activity on Twitter as a mutual communication channel, although Twitter has the potential to facilitate risk management. Further, based on the MFAFF's confined use of its Twitter account, the results suggest that its Twitter account can be an effective information distribution channel, indicating Twitter's value as a communication tool for innovation activity through social media. This study provides an empirical analysis of the government's Twitter activity and contributes to the literature by providing an in-depth understanding of the Triple Helix relationship on the Web.
Assuming the OECD member states as 'advanced' nations equipped with basic scientific capacities, the present research addresses the network configuration of these countries in international scientific collaboration and the transformation of this network along with globalization. The result suggests that geographical, linguistic, and economic affinities did not have a meaningful impact on the formation of co-authorship network between 'advanced' nations, different from previous research results which claimed their importance on international cooperation. Globalization facilitated by the development of information and transportation technologies was found to influence the co-authorship link between countries, but not to accelerate centralization of the network in the past 15 years. Though the core-periphery pattern still persists, new rising stars, which are Korea and Turkey, have emerged in the co-authorship network among 'advanced' nations. These two countries, having a rapid increase in the share of degree centrality from 1995 to 2010, had strategic financial support from the government which stimulated the collaboration between universities and industries and emphasized the development of science and engineering fields.
In order to explore new scientific and innovative communities, analyses based on a technological infrastructure and its related tools, for example, 'Web of science' database for Scientometric analysis, are necessary. However, there is little systematic documentation of social media data and webometric analysis in relation to Korean and broader Asian innovation communities. In this short communication, we present (1) webometric techniques to identify communication processes on the Internet, such as social media data collection and analysis using an API-based application; and (2) experimentation with new types of data visualization using NodeXL, such as social and semantic network analysis. Our research data is drawn from the social networking site, Twitter. We also examine the overlap between innovation communities in terms of their shared members, and then, (3) calculate entropy values for trilateral relationships.
Triple helix (TH) collaborations involving university, industry and government provide a networked infrastructure for shaping the dynamic fluxes of knowledge base of innovations locally and these fluxes remain emergent within the domains. This study maps these emergence dynamics of the knowledge base of innovations of Research & Development (R&D) by exploring the longitudinal trend of systemness within the networked research relations in Bangladesh on the TH model. The bibliometric data of publications collected from the Science Citation Index (SCI), the social sciences and the arts and humanities for analysis of science indicators and the patent data collected from the US Patent Office to analyze the patent success ratio as a measure of innovation within TH domains. The findings show that the network dynamics have varied considerably according to the R&D policies of the government. The collaboration patterns of co-authorship relations in the SCI publications prominently increased, with some variation, from 1996 to 2006. Nevertheless, inter-institutional collaboration negatively influenced by the national science and technology (S&T) research policies in the last 5 years due to their evaluation criteria. Finally, the findings reveal that the R&D system of Bangladesh is still undergoing a process of institutionalizing S&T and has failed to boost its research capacity for building the knowledge base of innovations by neglecting the network effects of TH dynamics.
The Triple Helix (TH) model and its indicators are typically used for exploring university-industry-government relations prevalent in knowledge-based economies. However, this exploratory study extends the TH model, together with webometric analysis, to the musical industry to explore the performance of social hubs from the perspective of entropy and the Web. The study investigates and compares two social hubs-Daegu and Edinburgh-from the perspective of musicals by using data obtained through two search engines (Naver.com and Bing.com). The results indicate that although Daegu is somewhat integrated into the local musical industry, it is not yet fully embedded in the international musical industry, even though it is international in scope. In terms of social events (i.e., musicals), unlike Daegu, Edinburgh is fully integrated into both the local and international musical industries and attracts diverse domains over the Internet.
In this paper, the agricultural innovation systems of two Northeast Asian countries-Korea and China-are investigated and compared from the perspective of triple helix innovation. Specifically, the current study examines the nature of agricultural innovation of the two countries and considers agricultural R&D investments and activities as well as the roles of university, industry, and government (UIG), which are the three units comprising the triple helix. As an empirical extension of the qualitative analysis, we collected bibliometric information of agricultural scientific publications from 1990 to 2010 and patent information from 1980 to 2010. By calculating transmission of uncertainty, which indicates collaboration among UIG, this paper tracks the relationship dynamics of the units comprising the triple helix. In addition, we analyze topics in scientific publications and patents in order to observe and compare the subareas that are the focus in the two countries. The findings reveal both commonalities and differences between the two countries, thus providing knowledge of and insights into the agricultural sector.
This study investigates the role of Twitter in political deliberation and participation by analyzing the ways in which South Korean politicians use Twitter. In addition, the study examines the rise of Twitter as user-generated communication system for political participation and deliberation by using the Triple Helix indicators. For this, we considered five prominent politicians, each belonging to one of four political parties, by using data collected in June 2010. The results suggest that non-mainstream, resource-deficient politicians are more likely to take advantage of Twitter's potential as an alternative means of political participation and that a small number of Twitter users lead political discourse in the Twittersphere. We also examined the occurrence and co-occurrence of politicians' names in Twitter posts, and then calculate entropy values for trilateral relationships. The results suggest that the level of political deliberation, expressed in terms of the level of balance in the communication system, is higher when politicians with different political orientations form the trilateral relationships.
The importance of domestic technology transfer from the public sector (universities and public research institutes) to industry is increasing in the era of science-driven innovation. One of the purposes of a triple helix of evolving university-government-industry relations is how to make use of universities and public research institutes for industrial development. This paper first discusses the means of domestic technology transfer and points out that spinning off companies is one ultimate way to transfer technology, after discussing the relation between a triple helix and technology transfer. Then, this paper presents a unique case of a public research institute before the end of World War II in Japan. This research institute established 63 companies, such as Ricoh and Okamoto. At the same time the institute excelled in science as well. The first two Nobel Prize Laureates of Japan were researchers of this research institute. The paper discusses the management of this institute and its group companies and enabling environment surrounding the institute and its group companies at that time. At the end, the paper draws some lessons for public research institutes and their spin-off companies today.
We trace the structural patterns of co-authorship between Korean researchers at three institutional types (university, government, and industry) and their international partners in terms of the mutual information generated in these relations. Data were collected from the Web of Science during the period 1968-2009. The traditional Triple-Helix indicator was modified to measure the evolving network of co-authorship relations. The results show that international co-authorship relations have varied considerably over time and with changes in government policies, but most relations have become stable since the early 2000s. In other words, the national publication system of Korea has gained some synergy from R&D internationalization during the 1990s, but the development seems to stagnate particularly at the national level: whereas both university and industrial collaborations are internationalized, the cross-connection within Korea has steadily eroded.
What factors influence the relationship between the academic research and the knowledge- transfer activities of academics, in particular in 'catch-up' countries like South Korea? To address this research question, after first conducting a critical review of existing theoretical and empirical studies, we put forward a conceptual framework based on the twin concepts of 'synergy' and 'separation' modes, together with a number of accompanying hypotheses. These hypotheses, along with others that emerged from subsequent interviews, are then tested using various statistical models. After taking into account the specific characteristics of scientific communities in rapidly catching-up counties such as Korea, we find that not only are individual characteristics (such as the gender, age, discipline, and patenting activity) of academics significantly related to the generation of a 'synergy mode' (i.e. a positive relationship between academic research and knowledge-transfer activities) among academics, but so too are a number of contextual characteristics (e.g. laboratory size and type of university).
With the rapid development of the Internet, there is a need for evaluating the public visibility of universities on the Internet (i.e., web visibility) in terms of its implications for university management, planning, and governance. The data were collected in December 2010 by using Yahoo, one of the most widely used search engines. Specifically, we gathered "Single Mention" data to measure the number of times that each university was mentioned on websites. In addition, we collected network-based data on Single Mentions. We obtained another data set based on the 2010 world university rankings by Shanghai Jiao Tong University (SJTU). We employed several analytical methods for the analysis, including correlations, nonparametric tests (e.g., the Mann-Whitney test), and multidimensional scaling (MDS). The significant positive correlation between university rankings and web visibility suggests that indicators of web visibility can function as a proxy measure of conventional university rankings. Another distinctive implication can be drawn from the pattern of a disparity in web visibility stemming from the linguistic divide, that is, universities in English-speaking countries dominated the central positions in various network structures of web visibility, whereas those in non-English-speaking countries were located in the periphery of these structures. In this regard, further research linking web visibility to university management, planning, and governance is needed.
The era of open and sustainable innovation has opened and requested new kinds of human resources (HRs) development at Korean universities. Typical academic and vocational education at universities does not effectively work in the age of technological convergence and open innovation. Knowledge and skills for Green growth and rapid technological innovation demand very skilful, broad, and complex competencies of HRs. Competencies for green growth and disruptive innovation are outlined and various methods to increase competencies at Korean universities are suggested in this study. This study explores the kinds of competencies for future society and suggests how university can contribute to cultivate talents for HRs with multi-functional and high competencies. The author takes a sketch of competence and skill structure in Korea, summarized in value chain of competencies among HRs with high competencies, HRs with medium competencies, and HRs with low competencies. Particularly the author addresses innovation oriented fields such as engineering and chemistry/pharmaceuticals, therefore, the picture can be different from typical manufacturing sectors such as automobile and shipbuilding. However, the manufacturing fields are also progressing into innovation centred sectors. And then the author explores the flow of each HRs according to levels and fields and how they affect Korean innovation system.
China's economy and technology have experienced spectacular growth since the Opening-up Policy adopted in 1978. In order to explore the innovation process and development of China, this study examines the inventive activities and the collaboration pattern of university, industry and government (UIG) in China. This study analyzes the Chinese patent data retrieved from the United States Patent and Trademark Office. Three models of UIG relations which represent different triple helix configurations are introduced. According to the property of patent assignee, patent ownership can be divided into three types: individuals, enterprises, and universities and research institutes. Furthermore, enterprises can be classified into state-owned enterprise (SOE), private-owned enterprise (POE) and foreign enterprise (FE). The corresponding relationship of patent ownership with UIG is set up. Through analyzing the issued year, it is found that the inventive activities of China have experienced three developmental phases and have been promoted quickly in recent years. The achievement of innovation activities in China primarily falls on the enterprise, especially FEs and POEs. The innovation strengths of the three development phases have shifted from government to university and research institute and then industry. According to co-patent analysis, it is found that the collaboration between university and industry is the strongest and has been intensified in recent years, but other forms of collaboration among UIG have been weak. In addition, an innovation relation model of China was set up. The evolution process of innovation systems was explored, from etatistic model, followed by improved "laissez-faire" model, and then shifting toward triple helix model.
In this paper the role of Chinese universities in enterprise-university research collaboration is investigated. This study focuses on a special aspect of the collaboration-co-authored articles. The two cases are analyzed: (1) research collaboration between Baosteel Group Corporation and Chinese universities; (2) research collaboration between China Petroleum & Chemical Corporation and Chinese universities. The co-authorship data over the period 1998-2007 were searched from CNKI database, the largest Chinese publication and citation database. The main findings are as follows: the number of articles co-authored by enterprise and university scientists has been increasing rapidly; the share of co-authored articles has been growing; the authors from universities are more possible to be the first authors; as a whole, enterprise-university co-authored articles tend to receive more citations and get downloaded more frequently; a mathematical orientation emerges in the enterprise-university articles. To reveal and describe such a trend the methods of keywords analysis and co-occurrence analysis are applied. The Chinese government's policy instruments and substantial supports for pushing and improving enterprise-university research collaboration are introduced and analyzed.
This paper provides a first-ever look at differences of centrality scores (i.e., networks) over time and across research specializations in Korea. This is a much needed development, given the variance which is effectively ignored when Science Citation Index (SCI) publications are aggregated. Three quantitative tests are provided-OLS, two sample t-tests, and unit-root tests-to establish the patterns of centrality scores across Korea over time. The unit-root test is particularly important, as it helps identify patterns of convergence in each region's centrality scores. For all other geographic regions besides Seoul, Gyeonggi, and Daejeon, there appears to be little promise-at least in the immediate future-of being network hubs. For these top three regions, though, there is a pattern of convergence in three-quarters of all research specializations, which we attribute in part to policies in the mid- and late-1990s.
This article examines the incentive structure underlying information transfers received by the three key players of the Triple Helix paradigm: universities, industry, and government research institutes (GRIs). For Korea and Taiwan, which are the cases under analysis here, such an empirical examination has not yet been conducted on a quantitative level. Using a unique dataset of survey responses from a maximum of 325 researchers based in Korean and Taiwanese universities, industry, and GRIs, this article shows that there are some significant differences between and within countries. Most importantly, policy interventions to promote university-industry-GRI interactions impact the degree to which specific information transfers are considered useful. In Korea, formal transfers are emphasized, while both formal and, in particular, informal transfers are emphasized in Taiwan.
This study analyzed the research productivity of Saudi academics using the triple-helix model. In the analysis, we combined domestic and international collaboration by three sectors-university, industry, and government-according to the model of the triple-helix. This approach produces better results than by simply including international collaboration as fourth sector. According to the analysis, research collaboration in Saudi Arabia which is measured by the triple-helix, was "-" uncertainty (negative T-value) while scientific productivity has been dramatically increasing since the late 2000s. The triple-helix collaboration does not quite differ between domestic collaboration and "domestic and international" collaborations. In our further analysis, we found that technological development was not based on scientific research in Saudi Arabia; rather, the technological development relies on prior technology (patent references). From that point, Saudi Arabia's current long-term strategic plan to develop a scientific base for a knowledge-based industry is well aligned to the current contexts of Saudi Arabia.
Field normalized citation rates are well-established indicators for research performance from the broadest aggregation levels such as countries, down to institutes and research teams. When applied to still more specialized publication sets at the level of individual scientists, also a more accurate delimitation is required of the reference domain that provides the expectations to which a performance is compared. This necessity for sharper accuracy challenges standard methodology based on pre-defined subject categories. This paper proposes a way to define a reference domain that is more strongly delimited than in standard methodology, by building it up out of cells of the partition created by the pre-defined subject categories and their intersections. This partition approach can be applied to different existing field normalization variants. The resulting reference domain lies between those generated by standard field normalization and journal normalization. Examples based on fictive and real publication records illustrate how the potential impact on results can exceed or be smaller than the effect of other currently debated normalization variants, depending on the case studied. The proposed Partition-based Field Normalization is expected to offer advantages in particular at the level of individual scientists and other very specific publication records, such as publication output from interdisciplinary research. (C) 2011 Elsevier Ltd. All rights reserved.
A number of bibliometric studies point out that citation counts are a function of many variables besides scientific quality. In this paper our aim is to investigate these factors that usually impact the number of citation counts, using an extensive data set from the field of chemistry. The data set contains roughly 2000 manuscripts that were submitted to the journal Angewandte Chemie International Edition (AC-IE) as short communications, reviewed by external reviewers, and either published in AC-IE or, if not accepted for publication by AC-IE, published elsewhere. As the reviewers' ratings of the importance of the manuscripts' results are also available to us, we can examine the extent to which certain factors that previous studies demonstrated to be generally correlated with citation counts increase the impact of papers, controlling for the quality of the manuscripts (as measured by reviewers' ratings of the importance of the findings) in the statistical analysis. As the results show, besides being associated with quality, citation counts are correlated with the citation performance of the cited references, the language of the publishing journal, the chemical subfield, and the reputation of the authors. In this study no statistically significant correlation was found between citation counts and number of authors. (C) 2011 Elsevier Ltd. All rights reserved.
Publication patterns of 79 forest scientists awarded major international forestry prizes during 1990-2010 were compared with the journal classification and ranking promoted as part of the 'Excellence in Research for Australia' (ERA) by the Australian Research Council. The data revealed that these scientists exhibited an elite publication performance during the decade before and two decades following their first major award. An analysis of their 1703 articles in 431 journals revealed substantial differences between the journal choices of these elite scientists and the ERA classification and ranking of journals. Implications from these findings are that additional cross-classifications should be added for many journals, and there should be an adjustment to the ranking of several journals relevant to the ERA Field of Research classified as 0705 Forestry Sciences. Crown Copyright (C) 2011 Published by Elsevier Ltd. All rights reserved.
This article reports a comparative study of five measures that quantify the degree of research collaboration, including the collaborative index, the degree of collaboration, the collaborative coefficient, the revised collaborative coefficient, and degree centrality. The empirical results showed that these measures all capture the notion of research collaboration, which is consistent with prior studies. Moreover, the results showed that degree centrality, the revised collaborative coefficient, and the degree of collaboration had the highest coefficient estimates on research productivity, the average JIF, and the average number of citations, respectively. Overall, this article suggests that the degree of collaboration and the revised collaborative coefficient are superior measures that can be applied to bibliometric studies for future researchers. (C) 2011 Elsevier Ltd. All rights reserved.
The h-index can be written in such a way that it formally resembles an I3 score. Yet, this formal correspondence should not hide the fact that these indicators are fundamentally different. (C) 2011 Elsevier Ltd. All rights reserved.
This paper presents an empirical analysis of two different methodologies for calculating national citation indicators: whole counts and fractionalised counts. The aim of our study is to investigate the effect on relative citation indicators when citations to documents are fractionalised among the authoring countries. We have performed two analyses: a time series analysis of one country and a cross-sectional analysis of 23 countries. The results show that all countries' relative citation indicators are lower when fractionalised counting is used. Further, the difference between whole and fractionalised counts is generally greatest for the countries with the highest proportion of internationally co-authored articles. In our view there are strong arguments in favour of using fractionalised counts to calculate relative citation indexes at the national level, rather than using whole counts, which is the most common practice today. (C) 2011 Elsevier Ltd. All rights reserved.
In May 2011 the Bing Search API 2.0 had become the only major international web search engine data source available for automatic offline processing for webometric research. This article describes its key features, contrasting them with previous web search data sources, and discussing implications for webometric research. Overall, it seems that large-scale quantitative web research is possible with the Bing Search API 2.0, including query splitting, but that legal issues require the redesign of webometric software to ensure that all results obtained from Bing are displayed directly to the user. (C) 2011 Elsevier Ltd. All rights reserved.
Recent development saw concerted efforts by emerging countries to transform their industrial-based economy to post-industrial knowledge-based economy. The growth of science and technology is necessary to support this economic transformation strategy. Based on the concept of functionality development of a growth model, this study attempts to analyze the dynamism and sustainability of growth in science and technology of selected Asian emerging economies. Using the number of published papers and patents as proxies, bi-logistic growth functions were fitted to examine the prolongation ability of science and technology, and the time at which each functionality development emerges. The perspective of a paradigm shift from industrial to knowledge-based economic development is taken into consideration in the analysis. The estimated prolongation ability of the newly industrialized economies (NIEs) including South Korea, Taiwan and Singapore suggests significant transformation of their innovation system that led to a higher degree of functionality, while developing economies such as China, Malaysia and Thailand show no significant change over the years. The results suggest that the NIEs have succeeded in developing new growth trajectories that are beneficial for the transformation towards a knowledge-based economy. (C) 2011 Elsevier Ltd. All rights reserved.
The study of the informetric distributions, such as distributions of citations and impact factors is one of the most relevant topics in the current informetric research. Several laws for modeling impact factor based on ranks have been proposed, including Zipf, Lavalette and the two-exponent law proposed by Mansilla et al. (2007). In this paper, the underlying probabilistic quantile function corresponding to the Mansilla's two-exponent law is obtained. This result is particularly relevant, since it allows us to know the underlying population, to learn about all its features and to use statistical inference procedures. Several probabilistic descriptive measures are obtained, including moments, Lorenz and Leimkuhler curves and Gini index. The distribution of the order statistics is derived. Least squares estimates are obtained. The different results are illustrated using the data of the impact factors in eight relevant scientific fields. (C) 2011 Elsevier Ltd. All rights reserved.
A variant of the h-index, named the stochastic h-index, is proposed. This new index is obtained by adding to the h-index the probability, under a specific stochastic model, that the h-index will increase by one or more within a given time interval. The stochastic h-index thus extends the h-index to the real line and has a direct interpretation as the distance to the next higher index value. We show how the stochastic h-index can be evaluated and compare it with other variants of the h-index which purportedly indicate the distance to a higher h-index. (C) 2011 Elsevier Ltd. All rights reserved.
In the present paper we proceed from recent results by Gonzalez-Albo and Bordons (2011) obtained from studying the role and impact of proceedings literature in LIS journals. We extend the research to all fields of the sciences and social sciences and address additional research questions concerning publication frequency and citation impact as well as their differences in individual journals and subject fields. (C) 2011 Elsevier Ltd. All rights reserved.
We propose a method to identify the journals or proceedings that are most highly esteemed by a research group over some time frame. Using open publication databases, we identify the experts in the community, and analyse their publication pattern, and then use this as a guideline for evaluating scientific outputs of other groups of researchers publishing in the same domain. To illustrate the practicality of our method, we analyse the scientific output of Korean researchers in the security subject domain from 2004 to 2009, and comparing this groups' output with that of well-known researchers. Our empirical analysis demonstrates that there is a persistent gap between these two research groups' publications impact over this period, although the absolute number of journal publications greatly increased over recent years. (C) 2011 Elsevier Ltd. All rights reserved.
We show that essentially local dynamics of citation networks bring special information about the relevance/quality of a paper. Up to some rescaling, they exhibit universal behavior in citation dynamics: temporal patterns are remarkably consistent across disciplines, and uncover a prediction method for citations based on the structure of references only, at publication time. Above-average cited papers universally focus extensively on their own recent subfield - as such, citation counts essentially select what may plausibly be considered as the most disciplinary and normal science; whereas papers which have a peculiar dynamics, such as re-birthing scientific works - 'rediscovered classics' or 'early birds' - are comparatively poorly cited, despite their plausible relevance for the underlying communities. The "rebirth index" that we propose to quantify this phenomenon may be used as a complementary quality-defining criterion, in addition to final citation counts. (C) 2011 Elsevier Ltd. All rights reserved.
Citation numbers are extensively used for assessing the quality of scientific research. The use of raw citation counts is generally misleading, especially when applied to cross-disciplinary comparisons, since the average number of citations received is strongly dependent on the scientific discipline of reference of the paper. Measuring and eliminating biases in citation patterns is crucial for a fair use of citation numbers. Several numerical indicators have been introduced with this aim, but so far a specific statistical test for estimating the fairness of these numerical indicators has not been developed. Here we present a statistical method aimed at estimating the effectiveness of numerical indicators in the suppression of citation biases. The method is simple to implement and can be easily generalized to various scenarios. As a practical example we test, in a controlled case, the fairness of fractional citation count, which has been recently proposed as a tool for cross-discipline comparison. We show that this indicator is not able to remove biases in citation patterns and performs much worse than the rescaling of citation counts with average values. (C) 2011 Elsevier Ltd. All rights reserved.
This study utilizes panel regression model to explore the relationships between corporate performance and the patent performance measured from patent H index, current impact index (CII), and essential patent index (EPI) in the pharmaceutical company. The results demonstrate that patent H index and EPI have positive influences upon corporate performance. Furthermore, this study developed a classification for the pharmaceutical companies to divide them into four types, and provided some suggestions to them. (C) 2011 Elsevier Ltd. All rights reserved.
Research topics and research communities are not disconnected from each other: communities and topics are interwoven and co-evolving. Yet, scientometric evaluations of topics and communities have been conducted independently and synchronically, with researchers often relying on homogeneous unit of analysis, such as authors, journals, institutions, or topics. Therefore, new methods are warranted that examine the dynamic relationship between topics and communities. This paper examines how research topics are mixed and matched in evolving research communities by using a hybrid approach which integrates both topic identification and community detection techniques. Using a data set on information retrieval (IR) publications, two layers of enriched information are constructed and contrasted: one is the communities detected through the topology of coauthorship network and the other is the topics of the communities detected through the topic model. We find evidence to support the assumption that IR communities and topics are interwoven and co-evolving, and topics can be used to understand the dynamics of community structures. We recommend the use of the hybrid approach to study the dynamic interactions of topics and communities. (C) 2011 Elsevier Ltd. All rights reserved.
We investigated whether papers on Neglected Tropical Zoonoses are published in journals with lower impact factors than research on diseases with a similar global health burden. We found that, despite being cited equally often, the papers on Neglected Tropical Zoonoses were published in journals with lower impact factors. The scopes of these journals are mainly restricted to Tropical medicine. A clustering analysis revealed that The Lancet, a high impact general medical journal, does pay attention to Neglected Tropical Zoonoses. We discuss our findings in the context of the ongoing discussion about the publishing policies of medical journals. Moreover, our findings stress the importance of recent suggestions that impact factors should not be used for assigning public funding to research (programs) on Neglected Tropical Zoonoses.
This paper attempts to identify the relationship between co-authorship and the currency of the references and author self-citations in the key journals of environmental engineering. The results show that the self-citation rate of co-authored articles is higher than in single-authored articles. A statistically significant correlation is identified between the numbers of co-authors, the rate of author self-citing and the author self-cited; though it was a low correlation. The value of coefficient correlation between the number of co-authors and the author self-citing rate is slightly higher than that between the number of co-authors and the author self-cited rate, which indicates that the number of co-authors hold a stronger correlation with the self-citing rate than the self-cited rate. Meanwhile, self-citing references are found to be more up-to-date than references to others. The range of publication years of self-citing references is smaller than that of references to others, indicating that researchers tend to preferentially cite their own recent works. There is no significant difference in the latest references between self-citing references and the references to others. It might result from electronic journals that provide an easy access to the most current publications.
This paper identifies the main references, authors and journals influencing the sustainable development literature. The task is accomplished by means of a citation analysis based on the records of ISI Web of Science. We found that the core of sustainability thinking is framed by a pattern of landmark studies published around every 5 years. Only 380 publications have been cited at least ten times. References with the highest influence are those with a global dimension and large diffusion, such as Brundtland Commission's "Our common future" (1987) and classics such as Meadows' et al. "Limits to growth" (1972). The list of the most influential references over the period 1960-2005 is dominated by contributions from economics (particularly ecological economics) and environmental science, but includes many other disciplines such as urban planning, political sciences and sociology. References are also made to policy documents such as "Agenda 21", one of the main outcomes of the Rio Summit in 1992. In analyzing citation trends, we found that classics, because of their high rates of citations per year, seem to have a more enduring and stable influence.
Citations to published work are gaining increasing prominence in evaluations of the research performance of scientists. Considering the importance accorded to gender issues in South African science, it is surprising that (to our knowledge) no research has as yet ascertained the extent of sex differences in citations to the published work of scientists in this country. Our literature study shows that studies that have been conducted elsewhere tend to neglect in their analyses important gender-related and other factors, such as the sex composition of multi-authored papers and the extent of foreign co-authorship. Against this background, we illustrate the difficulties inherent in measuring the quality aspect of sex-specific research performance by means of an analysis of a dataset of articles (n = 229) that were published between 1990 and 2002 in the field of invasion ecology and in journals included in the Thomson Reuters Web of Science. Each article has at least one South African author address. The results indicate that foreign co-authorship is a better correlate of high citations than the sex of South African authors, and this is true irrespective of whether the annual citation rate or window period is used, whether or not self-citations are excluded, and whether or not the number of authors is controlled for by calculating fractional counts. The paper highlights these and other considerations that are relevant for future gender-focused bibliometric research, both in South Africa and beyond.
We introduce an indicator to measure the diffusion of scientific research. Consistent with Stirling's 3-factor diversity model, the diffusion score captures not only variety and balance, but also disparity among citing article cohorts. We apply it to benchmark article samples from six 1995 Web of Science subject categories (SCs) to trace trends in knowledge diffusion over time since publication. Findings indicate that, for most SCs, diffusion scores steadily increase with time. Mathematics is an outlier. We employ a typology of citation trends among benchmark SCs and correlate this with diffusion scores. We also find that self-cites do not, in most cases, significantly influence diffusion scores.
Quantifying the relative performance of individual scholars has become an integral part of decision-making in research policy. The objective of the present study was to evaluate if the scholarship rank of Brazilian Council for Scientific and Technological Development (CNPq) researchers in Medicine is consistent with their scientific productivity. The Lattes curricula of 411 researchers (2006-2008) were included in the study. Scholarship category was the variable of interest. Other variables analyzed were: time since receiving the doctorate, teaching activity (undergraduate, master's and doctoral students), number of articles published, and number of papers indexed by the Institute for Scientific Information (ISI) and Scopus databases. Additional performance indicators included were: citations, h-index, and m-index. There was a significant difference among scholarship categories regarding number of papers per year, considering the entire scientific career (P < 0.001) or the last 5 years (P < 0.001). There was no significant difference among scholarship categories regarding the number of citations per article in the ISI (Thomson Reuters) database (P = 0.23). There was a significant difference in h-index among scholarship categories in both databases, i.e. (P < 0.001) and Scopus (P < 0.001). Regarding the m-index, there was a significant difference among categories only in the ISI database (P = 0.012). According to our findings, a better instrument for qualitative and quantitative indicators is needed to identify researchers with outstanding scientific output.
In the competitive business environment, early identification of technological opportunities is crucial for technology strategy formulation and research and development planning. There exist previous studies that identify technological directions or areas from a broad view for technological opportunities, while few studies have researched a way to detect distinctive patents that can act as new technological opportunities at the individual patent level. This paper proposes a method of detecting new technological opportunities by using subject-action-object (SAO)-based semantic patent analysis and outlier detection. SAO structures are syntactically ordered sentences that can be automatically extracted by natural language processing of patent text; they explicitly show the structural relationships among technological components in a patent, and thus encode key findings of inventions and the expertise of inventors. Therefore, the proposed method allows quantification of structural dissimilarities among patents. We use outlier detection to identify unusual or distinctive patents in a given technology area; some of these outlier patents may represent new technological opportunities. The proposed method is illustrated using patents related to organic photovoltaic cells. We expect that this method can be incorporated into the research and development process for early identification of technological opportunities.
Using the participation in peer reviewed publications of all doctoral students in Quebec over the 2000-2007 period, this paper provides the first large scale analysis of their research effort. It shows that PhD students contribute to about a third of the publication output of the province, with doctoral students in the natural and medical sciences being present in a higher proportion of papers published than their colleagues of the social sciences and humanities. Collaboration is an important component of this socialization: disciplines in which student collaboration is higher are also those in which doctoral students are the most involved in peer-reviewed publications. In terms of scientific impact, papers co-signed by doctorate students obtain significantly lower citation rates than other Quebec papers, except in natural sciences and engineering. Finally, this paper shows that involving doctoral students in publications is positively linked with degree completion and ulterior career in research.
To better understand the distribution of words in all kinds of syntactic structures, the paper calculates the word distribution in syntactic structures of both English and Chinese. On the basis of the calculation, the article presents the definition of the words' syntactic distribution complexity. After arranging the Chinese and English words according to their own syntactic distribution complexity, respectively, the Lotka phenomenon can be clearly attested by the results. The discovery made in the paper reveals the law of the words' syntactic distribution in linguistic studies on one hand and the statistically proven fact that Chinese words' syntax is much more complex than that of the English after comparing the Lotka phenomenon of both Chinese and English words' syntactic distribution complexity on the other hand.
Two layers of enriched information are constructed for communities: a paper-to-paper network based on shared author relations and a paper-to-paper network based on shared word relations. k-means and VOSviewer, a modularity-based clustering technique, are used to identify publication clusters in the two networks. Results show that a few research topics such as webometrics, bibliometric laws, and language processing, form their own research community; while other research topics contain different research communities, which may be caused by physical distance.
Companies should investigate possible patent infringement and cope with potential risks because patent litigation may have a tremendous financial impact. An important factor to identify the possibility of patent infringement is the technological similarity among patents, so this paper considered technological similarity as a criterion for judging the possibility of infringement. Technological similarities can be measured by transforming patent documents into abstracted forms which contain specific technological key-findings and structural relationships among technological components in the invention. Although keyword-based technological similarity has been widely adopted for patent analysis related research, it is inadequate for identifying patent infringement because a keyword vector cannot reflect specific technological key-findings and structural relationships among technological components. As a remedy, this paper exploited a subject-action-object (SAO) based semantic technological similarity. An SAO structure explicitly describes the structural relationships among technological components in the patent, and the set of SAO structures is considered to be a detailed picture of the inventor's expertise, which is the specific key-findings in the patent. Therefore, an SAO based semantic technological similarity can identify patent infringement. Semantic similarity between SAO structures is automatically measured using SAO based semantic similarity measurement method using WordNet, and the technological relationships among patents were mapped onto a 2-dimensional space using multidimensional scaling (MDS). Furthermore, a clustering algorithm is used to automatically suggest possible patent infringement cases, allowing large sets of patents to be handled with minimal effort by human experts. The proposed method will be verified by detecting real patent infringement in prostate cancer treatment technology, and we expect this method to relieve human experts' work in identifying patent infringement.
For a long time, rankings overused in evaluating Chinese universities' research performance. The relationship between research production and research quality hasn't been taken seriously in ranking systems. Most university rankings in China put more weight on research production rather than research quality. Recently, the developmental strategy of Chinese universities has shifted from 'quantity' to 'quality'. As a result, a two-dimensional approach was developed in this article to balance 'quantity' and 'quality'. The research production index and the research quality index were produced to locate research universities (RU) from Mainland China, Hong Kong (HK) and Taiwan (TW) in the two-dimensional graph. Fifty-nine RU were classified into three categories according to their locations, which indicated the relevant level of research performance. University of Hong Kong, National Taiwan University, Tsing Hua University and Peking University appeared to be leading universities in research performance. The result showed that the mainland universities were generally of higher research production and lower research quality than HK and TW universities, and proved that the merging tides of Chinese universities enlarged their research production while causing a low level of research quality as well.
Authority generally relates to expertise, recognition of official status of a source, and the reputation of the author and publisher. As the Internet has become a ubiquitous tool in modern science and scholarly research, evaluating the authority of free online scholarly information is becoming crucial. However, few empirical studies have focused on this issue. Using a modified version of Jim Kapoun's "Five criteria for evaluating web pages" as framework, this research selected 32 keywords from eight disciplines, inputted them into three search engines (Google, Yahoo and AltaVista) and used Analytic Hierarchy Process to determine the weights. The first batches of results (web pages) from keyword searching were selected as evaluation samples (in the two search phases, the first 50 and 10 results were chosen, respectively), and a total of 3,134 samples were evaluated for authority based on the evaluation framework. The results show that the average authority value for free online scholarly information is about 3.63 (out of five), which is in the "fair" level (3 a parts per thousand currency sign Z < 4) (Z is the value assigned to each sample). About 41% of all samples collected provide more authoritative scholarly information. Different domain names, resource types, and disciplines of free online scholarly information perform differently when scored in terms of authority. In conclusion, the authority of free online scholarly information has been unsatisfactory, and needs to be improved. Furthermore, the evaluation framework and its application developed herein could be a useful instrument for librarians, researchers, students, and the public to select Internet resources.
Nanotechnology is a promising research domain with potential and enormous economic value. It is widely acknowledged that nanotechnology, as an emerging and rapidly evolving field with the multidisciplinary nature, is perceived as proximate fields of science and technology. This study provides a further description of the relationship between science and technology at macro-level. The core objective in this paper is to qualify and assess the dynamic associations between scientific activity and technological output. We attempt to illustrate how science and technology relate one another in the case of innovation system. In this paper, we take advantage of the simultaneous equations model to analyze the reciprocal dependence between science and technology. Previous studies about the relationship between science and technology infrequently adopt this model. Our result shows that there is no significant connection between R&D expenditures and actual practices of research in terms of publications and patents for the universities in zone 1 and 2. Our results provoke questions about whether policy-makers should appropriately reallocate scientific and technological resources and other R&D expenditures so as to obtain optimal allocation for resource and achieve maximum results with little effort for scientific research and innovation performance.
This article analyzes the relationship between private and social value of patents, comparing discrete and cumulative innovation. Indicators of the social value of patents are known to be less correlated with measures of private value in technological fields where innovation is more cumulative. We test whether this is because the link between private and social value is weaker, or because the indicators are less informative of the underlying concepts of value. Furthermore we analyze whether these differences between technological fields are really due to cumulativeness. We observe cumulative innovation by making use of databases of patents declared essential for technological standards. Using factor analysis and a set of patent quality indicators, we test the relevance of social value for predicting the private value of a patent measured by renewal and litigation. Whereas we establish a robust and significant link for discrete technologies; neither common factors nor any indicator of social value allows predicting the private value of essential, very cumulative patents. Nevertheless, this result cannot be generalized to whole technological classes identified as "complex" by the literature.
In our previous work (Scientometrics 87:293-301, 2011), a numerical model of over-competitive research funding in "peer-group-assessed-grant-based-funding-system" was proposed and the process was firstly investigated quantitatively. The simulation results show that the mainstream of a very complicated research topic could obtain monopoly supremacy with only the aid of the mechanism the model described. Here, the numbers of publications of cosmology back to 1950 are utilized to empirically test this positive feedback mechanism. The development of three main theories of cosmology, Big Bang, Steady State and Plasma Universe, are revisited. The later two, which are non-mainstream opinions, both state in their peer reviewed papers, that their theories fit the phenomena that support the standard theory. The ratios of publications of the orthodox theory, Big Bang, approximately satisfy the numeric calculating results of our model. The reason for the discrepancy between the model and actual situation is discussed. A further question about the controversy is presented.
In computer science, as opposed to many other disciplines, papers published in conference and workshop proceedings count as formal publications when evaluating the scholarship of an academic. We consider the relationship between high quality journals and conferences in the computer vision (CV) subfield of computer science. We determined that 30% of papers in the top-3 CV journals base their work on top-3 conference papers by the same authors (which we call priors (See "Methods" section for the definition of a prior)). Journal papers with priors are significantly more cited than journal papers without priors. Also the priors themselves are cited more than other papers from the conferences. For a period of 3-5 years after the journal paper publication, the priors receive more citations than the follow-up journal paper. After that period, the journal paper starts receiving most of the citations. Furthermore, we found that having the prior conference paper did not make it any easier (faster) to publish in a journal. We also surveyed journal authors and based on their answers and the priors analysis, we discovered that authors seem to be divided into different groups depending on their preferred method of publication.
We combine two seemingly distinct perspectives regarding the modeling of network dynamics. One perspective is found in the work of physicists and mathematicians who formally introduced the small world model and the mechanism of preferential attachment. The other perspective is sociological and focuses on the process of cumulative advantage and considers the agency of individual actors in a network. We test hypotheses, based on work drawn from these perspectives, regarding the structure and dynamics of scientific collaboration networks. The data we use are for four scientific disciplines in the Slovene system of science. The results deal with the overall topology of these networks and specific processes that generate them. The two perspectives can be joined to mutual benefit. Within this combined approach, the presence of small-world structures was confirmed. However preferential attachment is far more complex than advocates of a single autonomous mechanism claim.
Journal impact factor (JIF) has been used for journal evaluation over a long time, but also accompanied by the continuing controversy. In this study, a new indicator, the Journal's Integrated Impact Index (JIII) has been proposed for journal evaluation. In the JIII, one journal's average citations per paper, total citations, and all journals' average level of average citations per paper and total citations have been used to characterize the integrated impact of journals. Some contrastive analyses were carried out between JIII and JIF. The results show some interesting properties of the new indicator, and also reveal some relevant relationships among JIII, JIF, and other bibliometric indicators.
Visualization of subject structure based on co-word analysis is used to explore the concept network and developmental tendency in certain field. There are many visualization methods for co-word analysis. However, integration of results by different methods is rarely reported. This article addresses the knowledge gap in this field of study. We compare three visualization methods: Cluster tree, strategy diagram and social network maps, and integrate different results together to one result through co-word analysis of medical informatics. The three visualization methods have their own character: cluster trees show the subject structure, strategic diagrams reveal the importance of topic themes in the structure, and social network maps interpret the internal relationship among themes. Integration of different visualization results to one more readable map complements each other. And it is helpful for researchers to get the concept network and developmental tendency in a certain field.
The measurement of similarity between objects plays a role in several scientific areas. In this article, we deal with document-document similarity in a scientometric context. We compare experimentally, using a large dataset, first-order with second-order similarities with respect to the overall quality of partitions of the dataset, where the partitions are obtained on the basis of optimizing weighted modularity. The quality of a partition is defined in terms of textual coherence. The results show that the second-order approach consistently outperforms the first-order approach. Each difference between the two approaches in overall partition quality values is significant at the 0.01 level.
Although co-authorship in scientific research has a long history the analysis of co-authorship network to explore scientific collaboration among authors is a relatively new research area. Studies of current literature about co-authorship networks mostly give emphasis to understand patterns of scientific collaborations, to capture collaborative statistics, and to propose valid and reliable measures for identifying prominent author(s). However, there is no such study in the literature which conducts a longitudinal analysis of co-authorship networks. Using a dataset that spans over 20 years, this paper attempts to explore efficiency and trend of co-authorship networks. Two scientists are considered connected if they have co-authored a paper, and these types of connections between two scientists eventually constitute co-authorship networks. Co-authorship networks evolve among researchers over time in specific research domains as well as in interdisciplinary research areas. Scientists from diverse research areas and different geographical locations may participate in one specific co-authorship network whereas an individual scientist may belong to different co-authorship networks. In this paper, we study a longitudinal co-authorship network of a specific scientific research area. By applying approaches to analyze longitudinal network data, in addition to known methods and measures of current co-authorship literature, we explore a co-authorship network of a relatively young and emerging research discipline to understand its trend of evolution pattern and proximity of efficiency.
In this article I introduce a new indicator that measures the presence of a higher education system in the Shanghai Jiao Tong Academic Ranking of World Universities (ARWU). First, the benefits of introducing such a measure and the drawbacks associated with the possible choices of the indicator are discussed. To analyze the drawbacks, the sample of countries with presence in ARWU is split into two groups of small and large world's GDP share. A raw indicator based upon the sum of the scores of all the universities from a country divided by its world's GDP share shows a noticeable bias in favor of small countries, so a one-way between-groups analysis of variance is conducted to help in canceling the bias. That leads to the introduction of a new aggregate indicator that can be computed in a very simple fashion. A discussion of the performance of higher education systems using this new indicator closes the paper.
Sturgeon species are among the commercially most valuable and the most endangered groups of fish. To assess the existing literature published within the field of sturgeon research over the past 15 years (1996-2010) we applied a bibliometric approach, in order to identify patterns and trends of the published research in this field. The analysis was performed based upon articles obtained from the ISI Web of Knowledge online database. The results revealed that although all 27 sturgeon species have been objects of the research, species that are endangered or facing a high probability of extinction have received disproportionately less attention. White sturgeon (Acipenser transmontanus) was the most frequently studied species, but it was recently surpassed by Persian sturgeon (A. persicus). Early life phases have been among the central objects of the research, and genetics, especially the use of microsatellite DNA, is becoming increasingly popular and had the highest impact. Research related to aquaculture was prominent, while the research related to hybrids (as a commodity of aquaculture production) was decreasing in popularity. Papers dealing with conservation issues were most frequently focused on European sturgeon (A. sturio). A steady increase in the number of published articles over time was observed. However, the overall citation rate declined significantly over time. During the period reviewed, the sturgeon research published in peer reviewed journals dominantly originated from the USA and EU. Nevertheless, considering the current trend in output, it is very likely that the Asian countries, mainly Iran and China, will surpass them within the next 5-10 years. International and inter-institutional collaboration both tended to increase the impact of the research. Stimulation and improvement of the international cooperation should be considered as future priorities.
This paper introduces a diffusion network model: an individual-citation-based directed network model with a time dimension, as a potentially useful approach to capture the diffusion of research topics. The approach combines social network analysis, network visualization and citation analysis to discuss some of the issues concerning the spread of scientific ideas. The process of knowledge diffusion is traced from a network point of view. Using research on the h-index as a case study, we built detailed networks of individual publications and demonstrated the feasibility of applying the diffusion network model to the spread of a research. The model shows the specific paths and associations of individual papers, and potentially complementing issues raised by epidemic models, which primarily deal with average properties of entire scientific communities. Also, based on the citation-based network, the technique of main path analysis identified the articles that influenced the research for some time and linked them into a research tradition that is the backbone of the h-index field.
Performance measures of individual scholars tend to ignore the context. I introduce contextualised metrics: cardinal and ordinal pseudo-Shapley values that measure a scholar's contribution to (perhaps power over) her own school and her market value to other schools should she change job. I illustrate the proposed measures with business scholars and business schools in Ireland. Although conceptually superior, the power indicators imply a ranking of scholars within a school that is identical to the corresponding conventional performance measures. The market value indicators imply an identical ranking within schools and a very similar ranking between schools. The ordinal indices further contextualise performance measures and thus deviate further from the corresponding conventional indicators. As the ordinal measures are discontinuous by construction, a natural classification of scholars emerges. Averaged over schools, the market values offer little extra information over the corresponding production and impact measures. The ordinal power measure indicates the robustness or fragility of an institution's place in the rank order. It is only weakly correlated with the concentration of publications and citations.
The development, current status and dynamics of research in biology related domains in Venezuela is examined through the study of demographic, academic distribution, scientific output and productivity, for two sets of investigators that fit a profile outlined for life sciences researchers or scientists. The first group corresponds to biologists extracted from the ranks of the official Program for the Promotion of Researchers (PPI), the other, pulled out from those that publish in biologically oriented journals, indexed by the Institute of Scientific Information (ISI). Both sets of biology scientists, PPI researchers or Web of Science/ISI scientists, show similar characteristics. The number (absolute and relative) of PPI member that are supposedly dedicated to biological research but do not publish in ISI indexed journals was found to be very similar to the number of supposedly non biologist members of the PPI Program that do publish biological articles in ISI indexed journals. There is also an ongoing feminization process, of academic hierarchies. Female biologists predominate in lower academic ranks and in research cadres, as many as 70% in some areas of biology. This contrasts with the pattern of male predominance observed during the second half of twentieth century in the country. Productivity of Venezuelan biologists seems to depend on gender; men are more productive that their female counterparts. From the bibliometric standpoint, it is found that, on average, 30% of all publications produced in the country are related to biology (or life sciences). The Venezuelan biologists network qualifies neither as a 'Small World' nor it follows the 'Scale Free' model. Finally, in a country rich in renewable natural resources, it seems that the Venezuelan community of researchers in biology is in decline, despite the fact that they constitute its most productive group of investigators.
The main purpose of this study was to evaluate the global progress and quantitative assessment of current research trends on family therapy, using a bibliometric approach and exploring related literature in the Social Science Citation Index (SSCI) database from 1992 to 2009. This study used the bibliometric arrropach to learn about the subject categories, core journals, top countries and leading research institutes in publication, most frequently used author keywords, and most frequently used KeyWords Plus. Also, this study used a "word cluster analysis" method to locate research hot topics. A majority of the subject categories were located in clinical psychology and family studies. The core journals for family therapy located in Journal of Marital and Family Therapy, Contemporary Family Therapy, and Journal of Family Therapy. The US ranked as the top country of world articles with the highest h-index, followed distantly by the UK and Germany. The leading research institutes were Purdue University, University of Miami, and Brigham Young University. "Adolescents" and "adolescent" were highly used words in article titles. Next, the top three most frequently used author keywords were "anorexia nervosa", "adolescents", and "psychotherapy". Finally, the top three most frequently used KeyWords Plus were "psychotherapy", "children", and "intervention". Based on "word cluster analysis" to determine the research hotspots, the research hot topics of family therapy fall into three categories: treated subjects, treated matters, and treatment issues. The research trend in family therapy seems to involve the therapist often treating adolescents or children for eating disorders, substance abuse, depression, or schizophrenia. During treatment or therapy, therapists and researchers must pay attention to the issues of gender, training, and therapeutic alliance.
This study applies the entropy-based patent measure to explore the influences of related technological diversification (RTD) and unrelated technological diversification (UTD) upon technological competences and firm performance. The results show that RTD has a monotonically positive effect on technological competences and UTD has an inverse U-shaped effect on technological competences. Besides, the results demonstrate that the extent of the positive influence of RTD upon technological competences is better than that of UTD upon technological competences. If American pharmaceutical companies would like to adopt technological diversification, this study suggests that they should undertake RTD, rather than UTD. In addition, this study finds out that technological competences mediate the relationship between firm performance and both of RTD and UTD. Although RTD and UTD cannot significantly influence firm performance directly, they can positively affect firm performance indirectly through technological competences.
Although the use of bibliometric indicators for evaluations in science is becoming more and more ubiquitous, little is known about how future publication success can be predicted from past publication success. Here, we investigated how the post-2000 publication success of 85 researchers in oncology could be predicted from their previous publication record. Our main findings are: (i) Rates of past achievement were better predictors than measures of cumulative achievement. (ii) A combination of authors' past productivity and the past citation rate of their average paper was most successful in predicting future publication success (R-2 approximate to 0.60). (iii) This combination of traditional bibliographic indicators clearly outperformed predictions based on the rate of the h index (R-2 between 0.37 and 0.52). We discuss implications of our findings for views on creativity and for science evaluation.
Through analysis of problems of keywords and indexes used in co-word analysis, we find that the key to solving these problems is to integrate experts' knowledge into co-word analysis. Therefore, this paper proposes a new co-word analysis: semantic-based co-word analysis which can integrate experts' knowledge into co-word analysis effectively. The performance of this method has been proved to be very good. It can solve problems on keywords and indexes used in co-word analysis effectively and can improve the veracity of co-word analysis. Using this method, the research filed of "human intelligence network" in China has been analyzed. According to the analysis result, we point out that there are four research focuses on it in China now. They are "methods and theories of human intelligence network", "human intelligence network", "competitive intelligence system (CIS for short)", "the construction and visualization of human intelligence network". The findings of this study not only advance the state of co-word analysis research but also shed light on future research directions.
This study analyzes the level of co-authorship of Spanish research in Library and Information Science (LIS) until 2009, the chronological development that has taken place, and the level of local, domestic and international cooperation. This bibliometric study was made using the data retrieved from the Web of Knowledge (WoK) following a dual strategy-on the one hand through the filter of the category Information Science & Library Science, and on the other hand through a subject search. In this way a significant number of works has been retrieved, some of which are in journals indexed in SCI or A&HCI and not in the SSCI. The results show a significant increase in all co-authorship, including publications in English and those involving international collaboration. As with the increase in Spanish participation in social science (WoK), this growth, coupled with the significant increase in Spanish scientific production in the area of LIS, suggests that the discipline in Spain has entered a more mature phase-although so far it has focused particularly on bibliometric studies.
Concerns that the growing competition for funding and citations might distort science are frequently discussed, but have not been verified directly. Of the hypothesized problems, perhaps the most worrying is a worsening of positive-outcome bias. A system that disfavours negative results not only distorts the scientific literature directly, but might also discourage high-risk projects and pressure scientists to fabricate and falsify their data. This study analysed over 4,600 papers published in all disciplines between 1990 and 2007, measuring the frequency of papers that, having declared to have "tested" a hypothesis, reported a positive support for it. The overall frequency of positive supports has grown by over 22% between 1990 and 2007, with significant differences between disciplines and countries. The increase was stronger in the social and some biomedical disciplines. The United States had published, over the years, significantly fewer positive results than Asian countries (and particularly Japan) but more than European countries (and in particular the United Kingdom). Methodological artefacts cannot explain away these patterns, which support the hypotheses that research is becoming less pioneering and/or that the objectivity with which results are produced and published is decreasing.
The first part of the paper deals with the assessment of international databases in relation to the number of historical publications (representation and relevance in comparison with the model database). The second part is focused on providing answer to the question whether historiography is governed by similar bibliometric rules as exact sciences or whether it has its own specific character. Empirical database for this part of the research constituted the database prepared ad hoc: The Citation Index of the History of Polish Media (CIHPM). Among numerous typically historical features the main focus was put on: linguistic localism, specific character of publishing forms, differences in citing of various sources (contributions and syntheses) and specific character of the authorship (the Lorenz Curve and the Lotka's Law). Slightly more attention was devoted to the half-life indicator and its role in a diachronic study of a scientific field; also, a new indicator (HL14), depicting distribution of citations younger then half-life was introduced. Additionally, the comparison and correlation of selected parameters for the body of historical science (citations, HL14, the Hirsch Index, number of publications, volume and other) were also conducted.
Here we study the relationship between journal quartile rankings of ISI impact factor (at the 2010) and journal classification in four impact classes, i.e., highest impact, medium highest impact, medium lowest impact, and lowest impact journals in subject category computer science artificial intelligence. To this aim, we use fuzzy maximum likelihood estimation clustering in order to identify groups of journals sharing similar characteristics in a multivariate indicator space. The seven variables used in this analysis are: (1) Scimago Journal Ranking (SJR); (2) H-Index (H); (3) ISI impact factor (IF); (4) 5-Year Impact Factor (5IF); (5) Immediacy Index (II); (6) Eigenfactor Score (ES); and (7) Article Influence Score (AIS). The fuzzy clustering allows impact classes to overlap, thereby accommodating for uncertainty related to the confusion about the impact class attribution for a journal and vagueness in impact classes definition. This paper demonstrates the complex relationship between quartiles of ISI impact factor and journal impact classes in the multivariate indicator space. And that several indicators should be used for a distinct analysis of structural changes at the score distribution of journals in a subject category. Here we propose it can be performed in a multivariate indicator space using a fuzzy classifier.
Traditional Chinese medicine (TCM), which is divided into three subfields, including Chinese medicine, Chinese herb and acupuncture, attracts increasing attentions due to its challenging and significant medical values. This study employs bibliometric analysis to examine the profile of publication activity in TCM field as well as its subfields. The data are retrieved from the Science Citation Index Expanded database during 1980-2009, and 16,536 papers are identified for analysis. Generally speaking, proportions of papers in subfield of acupuncture decreased dramatically, while the proportions of papers of Chinese medicine and Chinese herb rose increasingly. This study finds that East Asia has the largest number of TCM papers, followed by North America and Europe. Furthermore, while China is ranked first in terms of the amount of TCM publications, USA gains the highest percentage of citations. As for regional specialty, mainly, scholars in East Asia publish intensively in Chinese medicine, while most of the scholars in North America and Europe probe into the study of acupuncture. In the latest two decades, China took the first place over Japan in subfields of both Chinese medicine and Chinese herb, while the US has always kept the largest share in acupuncture with a marked upward trend. Regarding the top-ranked TCM institution, Chinese Academy of Sciences located in China, is ranked first in the subfields of Chinese medicine and Chinese herb as well. As for Kyung Hee University, which is located in South Korea, is ranked first in the number of acupuncture papers and Harvard University is ranked first in number of acupuncture citations.
We argue that the creation of new knowledge is both difficult and rare. More specifically, we posit that the creation of new knowledge is dominated by a few key insights that challenge the way people think about an idea; generating high interest and use. We label this the blockbuster hypothesis. Using two large samples of published management studies over the period 1998-2007 we find support for the blockbuster hypothesis. We also find that numerous studies in the leading management journals are flops, having little impact on the profession as measured using citation data. Additional tests indicate that journal "quality" is related to the ratio of blockbusters to flops a journal publishes and that journal rankings are a poor proxy for study influence. Consistent with the notion that editorial boards are able to identify new knowledge, we find that research notes significantly under-perform articles in both the same journal and articles published in lower ranked journals. Taken together, the results imply that only a few scientific studies, out of the thousands published in a given area, change or influence the boundaries of knowledge, with many appearing to have little impact on the frontiers of knowledge. Overall, this analysis indicates that the development of new knowledge is rare even though it appears to be recognizable to knowledge gatekeepers like journal editors.
Research on aquaculture is expanding along with the exceptional growth of the sector and has an important role in supporting even further the future developments of this relatively young food production industry. In this paper we examined the aquaculture literature using bibliometrics and computational semantics methods (latent semantic analysis, topic model and co-citation analysis) to identify the main themes and trends in research. We analysed bibliographic information and abstracts of 14,308 scientific articles on aquaculture recorded in Scopus. Both the latent semantic analysis and the topic model indicate that the broad themes of research on aquaculture are related to genetics and reproduction, growth and physiology, farming systems and environment, nutrition, water quality, and health. The topic model gives an estimate of the relevance of these research themes by single articles, authors, research institutions, species and time. With the co-citation analysis it was possible to identify more specific research fronts, which are attracting high number of co-citations by the scientific community. The largest research fronts are related to probiotics, benthic sediments, genomics, integrated aquaculture and water treatment. In terms of temporal evolution, some research fronts such as probiotics, genomics, sea-lice, and environmental impacts from cage aquaculture, are still expanding while others, such as mangroves and shrimp farming, benthic sediments, are gradually losing weight. While bibliometric methods do not necessarily provide a measure of output or impact of research activities, they proved useful for mapping a research area, identifying the relevance of themes in the scientific literature and understanding how research fronts evolve and interact. By using different methodological approaches the study is taking advantage of the strengths of each method in mapping the research on aquaculture and showing in the meantime possible limitations and some directions for further improvements.
A survey of scientific periodical publications (or venues-as distinct from articles) from BRIC country practitioners counted more than 15,000 national publications. Data collected from and about Brazil, Russia, India, and China (BRIC countries) show that 495 venues, or about 3%, are listed in the Science Citation Index Expanded (SCIE) in 2010. Contrary to our expectation of under-representation overall and coverage limitation of SCIE, the average percentage of SCIE-listed venues for the BRICs is about the same as that for advanced countries. China has the lowest representation of national venues in SCIE at 2% of all publications; Russia has the highest at about 8%. India has about 6% of venues in SCIE; Brazil has about 4%. In other words, SCIE includes about the same percentage of high quality science from these four countries as for North America and Europe, meaning that these countries are not under-represented in SCIE. Moreover, the number of national venues available as outlets suggests that national scientists in these countries have good access to publications and venues. Some of the BRIC national publications are difficult to "see" at the global level because of language barriers, diverse publication formats, and lack of digitization. Other national differences represent historical traditions surrounding publication.
The understanding of scientific knowledge itself may promote further advances in science and research on the organization of knowledge may be an initiative to this effort. This stream of research, however, has been mainly driven by the analysis of citation networks. This study uses, as an alternative knowledge element, information on the keywords of papers published in business research and examines how they are associated with each other to constitute a body of scientific knowledge. The results show that, unlike most citation networks, keyword networks are not small-word networks but, rather, locally clustered scale-free networks with a hierarchic structure. These structural patterns are robust against the scope of scientific fields involved. In addition, this paper discusses the origins and implications of the identified structural characteristics of keyword networks.
A critical part of a scientific activity is to discern how a new idea is related to what we know and what may become possible. As the number of new scientific publications arrives at a rate that rapidly outpaces our capacity of reading, analyzing, and synthesizing scientific knowledge, we need to augment ourselves with information that can effectively guide us through the rapidly growing intellectual space. In this article, we address a fundamental issue concerning what kinds of information may serve as early signs of potentially valuable ideas. In particular, we are interested in information that is routinely available and derivable upon the publication of a scientific paper without assuming the availability of additional information such as its usage and citations. We propose a theoretical and computational model that predicts the potential of a scientific publication in terms of the degree to which it alters the intellectual structure of the state of the art. The structural variation approach focuses on the novel boundary-spanning connections introduced by a new article to the intellectual space. We validate the role of boundary-spanning in predicting future citations using three metrics of structural variation-namely, modularity change rate, cluster linkage, and Centrality Divergence-along with more commonly studied predictors of citations such as the number of coauthors, the number of cited references, and the number of pages. Main effects of these factors are estimated for five cases using zero-inflated negative binomial regression models of citation counts. Key findings indicate that (a) structural variations measured by cluster linkage are a better predictor of citation counts than are the more commonly studied variables such as the number of references cited, (b) the number of coauthors and the number of references are both good predictors of global citation counts to a lesser extent, and (c) the Centrality Divergence metric is potentially valuable for detecting boundary-spanning activities at interdisciplinary levels. The structural variation approach offers a new way to monitor and discern the potential of newly published papers in context. The boundary-spanning mechanism offers a conceptually simplified and unifying explanation of the roles played by commonly studied extrinsic properties of a publication in the study of citation behavior.
We conducted a fine-grained prosopography of six distinguished information scientists to explore commonalities and differences in their approaches to scholarly production at different stages of their careers. Specifically, we gathered data on authors' genre preferences, rates and modes of scholarly production, and coauthorship patterns. We also explored the role played by gender and place in determining mentoring and collaboration practices across time. Our biobibliometric profiles of the sextet reveal the different shapes a scholar's career can take. We consider the implications of our findings for new entrants into the academic marketplace.
Few studies have been done concerning document components and their effects on information use. This research empirically tested a taxonomy of functional units in a prototype journal system. This taxonomy was developed by identifying functions of the smallest information units within four journal article components (i.e., introduction, methods, results, discussion), and their associations with information tasks of using journal articles. Experimental results show that functional units can be utilized in supporting navigation, close reading, comprehension, and information use of journal articles to various extents. The results provide evidence that an individual functional unit has varying relevance to information use tasks, and has varying relevance to other functional units in the same or another component for a particular task. This research suggests that the information within a journal article can be organized and presented by functions to enhance effectiveness and efficiency in reading process and reading outcome.
Domestic citation to papers from the same country and the greater citation impact of documents involving international collaboration are two phenomena that have been extensively studied and contrasted. Here, however, we show that it is not so much a national bias, but that papers have a greater impact on their immediate environments, an impact that is diluted as that environment grows. For this reason, the greatest biases are observed in countries with a limited production. Papers that involve international collaboration have a greater impact in general, on the one hand, because they have multiple "immediate environments," and on the other because of their greater quality or prestige. In short, one can say that science knows no frontiers. Certainly there is a greater impact on the authors' immediate environment, but this does not necessarily have to coincide with their national environments, which fade in importance as the collaborative environment expands.
Bioinformatics journals publish research findings of intellectual synergies among subfields such as biology, mathematics, and computer science. The objective of this study is to characterize the citation patterns in bioinformatics journals and their correspondent knowledge subfields. Our study analyzed bibliometric data (impact factor, cited-half-life, and references-per-article) of bioinformatics journals and their related subfields collected from the Journal Citation Reports (JCR). The findings showed that bioinformatics journals' citations are field-dependent, with scattered patterns in article life span and citing propensity. Bioinformatics journals originally derived from biology-related subfields have shorter article life spans, more citing on average, and higher impact factors. Those journals, derived from mathematics and statistics, demonstrate converse citation patterns. Journal impact factors were normalized, taking into account the impacts of article life spans and citing propensity. A comparison of these normalized factors to JCR journal impact factors showed rearrangements in the ranking orders of a number of individual journals, but a high overall correlation with JCR impact factors.
Multivariate linear regression models suggest a trade-off in allocations of national research and development (R&D). Government funding and spending in the higher education sector encourage publications as a long-term research benefit. Conversely, other components such as industrial funding and spending in the business sector encourage patenting. Our results help explain why the United States trails the European Union in publications: The focus in the United States is on industrial funding some 70% of its total R&D investment. Likewise, our results also help explain why the European Union trails the United States in patenting, since its focus on government funding is less effective than industrial funding in predicting triadic patenting. Government funding contributes negatively to patenting in a multiple regression, and this relationship is significant in the case of triadic patenting. We provide new forecasts about the relationships of the United States, the European Union, and China for publishing; these results suggest much later dates for changes than previous forecasts because Chinese growth has been slowing down since 2003. Models for individual countries might be more successful than regression models whose parameters are averaged over a set of countries because nations can be expected to differ historically in terms of the institutional arrangements and funding schemes.
Rapid increase in global competition demands increased protection of intellectual property rights and underlines the importance of patents as major intellectual property documents. Prior art patent search is the task of identifying related patents for a given patent file, and is an essential step in judging the validity of a patent application. This article proposes an automated query generation and postprocessing method for prior art patent search. The proposed approach first constructs structured queries by combining terms extracted from different fields of a query patent and then reranks the retrieved patents by utilizing the International Patent Classification (IPC) code similarities between the query patent and the retrieved patents along with the retrieval score. An extensive set of empirical results carried out on a large-scale, real-world dataset shows that utilizing 20 or 30 query terms extracted from all fields of an original query patent according to their log(tf)idf values helps form a representative search query out of the query patent and is found to be more effective than is using any number of query terms from any single field. It is shown that combining terms extracted from different fields of the query patent by giving higher importance to terms extracted from the abstract, claims, and description fields than to terms extracted from the title field is more effective than treating all extracted terms equally while forming the search query. Finally, utilizing the similarities between the IPC codes of the query patent and retrieved patents is shown to be beneficial to improve the effectiveness of the prior art search.
This study enhances main path analysis by proposing several variants to the original approach. Main path analysis is a bibliometric method capable of tracing the most significant paths in a citation network and is commonly used to trace the development trajectory of a research field. We highlight several limitations of the original main path analysis and suggest new, complementary approaches to overcome these limitations. In contrast to the original local main path, the new approaches generate the global main path, the backward local main path, multiple main paths, and key-route main paths. Each of them is obtained via a perspective different from the original approach. By simultaneously conducting the new, complementary approaches, one uncovers the key development of the target discipline from a broader view. To demonstrate the value of these new approaches, we simultaneously apply them to a set of academic articles related to the Hirsch index. The results show that the integrated approach discovers several paths that are not captured by the original approach. Among these new approaches, the key-route approach is especially useful and hints at a divergence-convergence-divergence structure in the development of the Hirsch index.
In Web 2.0 environments, people commonly share their knowledge and personal experiences with others, but little is known about their background characteristics and motivations. Thus, the current study examines some of the characteristics and motivations common among answerers, who produce health-related answers to questions asked by anonymous others in a social Q&A site, Yahoo! Answers. An online survey questionnaire was distributed to top and recent answerers to investigate their demographics, areas of health expertise, and other characteristics related to answering behaviors online. Also, 10 motivation factors are proposed and tested in the survey: enjoyment, efficacy, learning, personal gain, altruism, community interest, social engagement, empathy, reputation, and reciprocity. Findings show that altruism is the most influential motivation, while personal gain is the least. Enjoyment and efficacy are more influential than other social motivations, such as reputation or reciprocity, although there are some variations across different groups of answerers. Motivational factors among top answerers or health experts are further analyzed. The findings of this study have practical implications for promoting health answerers to share knowledge and experiences in social contexts. Furthermore, the study design of the current study can be used to examine motivations of answerers in other topic areas as well as other social contexts.
This paper presents the results of a large-scale, qualitative study conducted in the homes of children aged 7, 9, and 11 investigating Internet searching processes on Google. Seven search roles, representing distinct behavior patterns displayed by children when interacting with the Google search engine, are described, including Developing Searchers, Domain-specific Searchers, Power Searchers, Nonmotivated Searchers, Distracted Searchers, Rule-bound Searchers, and Visual Searchers. Other trends are described and selected to present a view of the whole child searcher. These roles and trends are used to make recommendations to designers, researchers, educators, and parents about the directions to take when considering how to best aid children to become search literate.
This study replicates a previous study based on work in psychology, which demonstrates that students who score as below proficient in information literacy (IL) skills have a miscalibrated self-view of their ability. Simply stated, these students tend to believe that they have above-average IL skills, when, in fact, an objective test of their ability indicates that they are below-proficient in terms of their actual skills. This investigation was part of an Institute of Museum and Library Services-funded project and includes demographic data about participants, their scores on an objective test of their information literacy skills, and self-estimates of their ability. Findings support previous research that indicates many students come to college without proficient IL skills, that students with below-proficient IL skills have inflated views of their ability, and that this miscalibration can also be expressed by students who test as proficient. Implications for research and practice are discussed.
The user-centered approach to information retrieval emphasizes the importance of a user model in determining what information will be most useful to a particular user, given their context. Mediated search provides an opportunity to elaborate on this idea, as an intermediary's elicitations reveal what aspects of the user model they think are worth inquiring about. However, empirical evidence is divided over whether intermediaries actually work to develop a broadly conceived user model. Our research revisits the issue in a web research services setting, whose characteristics are expected to result in more thorough user modeling on the part of intermediaries. Our empirical study confirms that intermediaries engage in rich user modeling. While intermediaries behave differently across settings, our interpretation is that the underlying user model characteristics that intermediaries inquire about in our setting are applicable to other settings as well.
Online communication is an indispensable tool for communication and management. The network structure of communication is considered to affect team and individual performances, but it has not been not empirically tested. In this article, we collected a set of 1-month e-mail logs of a company and conducted an e-mail network analysis. We calculated the network centralities of 72 managerial candidates, and investigated the relationship between positions in the network and leadership performance with partial least squares structural equation modeling. Betweenness and in-degree network centralities of those middle managers are correlated with their leadership performance; on the other hand, for this management group, out-degree has no correlation, and Page Rank is a negative indicator of leadership. Leaders with high performance are trusted in their communities as a hub of the information channel of the communication network.
Since the end of World War II, the English language has become the lingua franca of science publications worldwide. Science publications written in other languages do not gain the same exposure to the international scientific community as does the material in English. In this sense, non-English articles constitute an "invisible science" for the rest of the scientific world. This study compares publications indexed in the academic-oriented Hebrew Index of Periodicals (IHP) database with those in the Science Citation Index Expanded (SCIE) in order to document the amount of scientific material published in Israel, where Hebrew is the native language. Except for abstracts, which are sometimes given in English, as well as Hebrew, and therefore provide some idea of a paper's content, most of this research remains hidden from the international scientific community. The SCIE and IHP databases for our examination cover the three grand disciplines: the exact and life sciences, the social sciences, and the humanities. Additionally, the study probes the coverage of medical publications in the two databases. The difference between old and emerging disciplines in the use of a language other than Hebrew is observed and non-English citation patterns for various disciplines are examined. The results confirm the dominance of English as the lingua franca of science and point to the large number of scientific studies in Hebrew that lack international exposure.
The article attempts to assess the results of the independent development of the CIS countries in the field of science over the period 1990-2009. The analysis of the numerous scientometric indicators reveals the decrease of the number of expert researchers and the significant decrease in the scientific and technical output. The article also provides the information about the dynamics of a set of indicators which allows to draw conclusions about the effectiveness of the research activity in the CIS countries.
This article analyses scientific growth time series using data for Spanish doctoral theses from 1848 to 2009, retrieved from national databases and an in-depth archive search. Data are classified into subseries by historical periods. The analytical techniques employed range from visual analysis of deterministic graphs to curve-fitting with exponential smoothing and AutoRegressive Integrated Moving Average models. Forecasts are made using the best model. The main finding is that Spanish output of doctoral theses appears to fit a quasi-logistic growth model in line with Price's predictions. An additional control variable termed year-on-year General Welfare is shown to modulate scientific growth, especially in the historical period from 1899 to 1939.
Except the alphabetic ordering authorship papers, the citations of multi-authored papers are allocated to the authors based on their contributions to the paper. For papers without clarification of contribution proportion, a function of author number and rank is presented to rightly determine the credit allocated proportion and allocated citations of each author. Our citation allocation scheme is between the equally fractional counting and the one using the inverse of author rank. It has a parameter to adjust the credit distribution among the different authors. The allocated citations can either be used alone to indicate one's performance in a paper, or can be applied in the modification of h-index and g-index to represent the achievement of a scientist on the whole. The modified h-index and g-index of an author makes use of more papers in which he or she played important roles. Our method is suitable for the papers with wide range of author numbers.
A bibliometric analysis was performed in this work to determine research trends of oxidative stress publications published between 1991 and 2010 in journals of all the subject categories of the Science citation index. Publication trends were analyzed by the retrieved results in publication type and language, characteristics of articles outputs, country, subject categories and journals, and the frequency of title-words and keywords used. Over the years, there was a significant growth in article outputs, with more countries participating and collaborating. The seven major industrialized countries (G7) published the majority of the world articles while the USA contributed about one-third of the total. Chinese and Indian outputs grew much faster than those of other countries in the past 5 years. Oxidative stress research in food and environmental related fields gradually became the mainstream of the research. An analysis of the title-words, author keywords and keywords plus showed that antioxidants in human or rat cells were the hot topic in the field. In addition, "reaction oxygen species", "apoptosis", and "nitric-oxide" were major topics of oxidative stress research recently. More articles dealt with diseases that had a strong relationship with oxidative stress, such as inflammation, Alzheimer's disease, diabetes, and atherosclerosis.
Since China adopted Open-Up and Reformed Policy for global collaboration, China's science and technology have experienced an astounding growth. Papers and patents encompass valuable scientific and technological (S&T) information and collaborative efforts. This article studies China's international S&T collaboration from the perspective of paper and patent analysis. The results show that China's total papers and patents have continuously increased from 2004 to 2008, the papers and patents resulting from China's international collaboration also present a steady growth. However, there is a decline in the share of international collaboration papers and patents with a certain range due to the rapid independent R&D. China's international scientific collaboration (ISC) is broadly distributed over many countries, the USA being the most important ISC partners. China's international technological collaboration (ITC) is mainly carried out with USA and Taiwan, and Taiwan has been the most significant ITC partner of when taking countries' patent output into account. Besides, ISC shows a continuous raise of Chinese papers' citation. Even the countries with a small amount of papers and ISC with China, exert a positive influence on the impact of citation of Chinese papers as well. However, ITC does not always play an active role in the improvement of citation impact of Chinese patents.
Many forms of technology cycle models have been developed and utilized to identify new/convergent technologies and forecast social changes, and among these, the technology hype cycle introduced by Gartner has become established as an effective method that is widely utilized in the field. Despite the popularity of this commonly deployed model, however, the currently existing research literature fails to provide sufficient consideration of its theoretical frame or its empirical verification. This paper presents a new method for the empirical measurement of this hype cycle model. In particular, it presents a method for measuring the hype of the users rather than the hype cycle generated by research activities or by the media by means of analyzing the hype cycle using search traffic analysis. The analytical results derived from the case study of hybrid automobiles empirically demonstrated that following the introductory stage and the early growth stage of the life cycle, the positive hype curve and the negative hype curve, the representative figures of the hype cycle, were present in the bell curve for the users' search behavior. Based on this finding, this paper proposes a new method for measuring the users' expectation and suggests a new direction for future research that enables the forecasting of promising technologies and technological opportunities in linkage with the conventional technology life cycle model. In particular, by interpreting the empirical results using the consumer behavior model and the adoption model, this study empirically demonstrates that the characteristics of each user category can be identified through differences in the hype cycle in the process of the diffusion of new technological products discussed in the past.
The editorial handling of articles in scientific journals as a human activity process is considered. Using recently proposed approaches of human dynamics theory we examine the probability distributions of random variables reflecting the temporal characteristics of studied processes. The first part of this article contains our results of analysis of the real data about articles published in scientific journals. The second part is devoted to modeling of time-series connected with editorial work. The purpose of our study is to present new object that can be studied in terms of human dynamics theory and to corroborate the scientometrical application of the results obtained.
Recent studies have suggested that a causal link exists between the reputation of the institution and the subsequent demand indicators. However, it is unclear how these effects vary across institutional characteristics or whether these effects persist when considering other factors that affects demand outcomes. On the other hand, student demand studies have almost always focused on the demand side of the equilibrium but not the supply side, although both demand and supply equations relate quantity to price. Although the supply is clearly a driver of demand, there are other variables that significantly influence the demand rates. Spanish public university system shows particular features not considered in the mentioned studies. This paper has two objectives. The first one is to modelize the demand for Masters Programs in the Spanish public university system. We propose a panel methodology to estimate the behavior of the demand of Masters Programs based on the data provided by the seventeen Spanish Autonomous Communities. Disaggregated analysis are presented for domestic demand and international demand. We conclude that the offer is a powerful attractor of demand for domestic and international students, and therefore actions of supply reduction should be carefully applied and always according to strategic university policy criteria. The second aim of the article is to analyze the Masters Programs in the Spanish public university system and to provide a benchmark of the current situation of supply (number of programs) and demand (enrollment) at regional level (Spanish Autonomous Communities) and in relation to European scenarios.
To assess the probability of success of an analgesic drug we have proposed a bibliometric indicator, the Top Journals Selectivity Index (TJSI) (Kissin 2011). It represents the ratio (as %) between the number of all types of articles on a particular drug in the top 20 biomedical journals and the number of articles on that drug in all (> 5,000) journals covered by Medline over the first 5 years after that drug's introduction. The aim of this study was to demonstrate that TJSI may be used for the assessment of follow-on drugs (those that follow a first-in-class drug). The study tested two hypotheses. First, TJSI can detect the difference (in the same class) between drugs with distinguishing features and drugs without them ("me-too" drugs) better than other publication indices, i.e., the number of all types of articles on a drug in journals presented by Medline (AJI), and the number of articles covering only randomized controlled trials (RCT). Second, there is a relationship between the TJSI of "me-too" drugs and the order (sequential number) in which those drugs reached the market. The study was based on drug classes approved for marketing between the 1960's and the early 2000's. The eight classes that had 4 or more drugs were included for analysis. Five specific indicators were used to determine drug's distinguishing pharmacological properties. It was found that TJSI can detect the difference between follow-on drugs with distinguishing features and those without them better than the other publication indices (AJI or RCT). Our analysis also demonstrated a negative correlation (r = -0.372, p = 0.014) between the TJSI of drugs without distinguishing features ("me-too" drugs) and the order of the drug's market entry. This implies that TJSI could be useful for the assessment of situations with multiple market entrants in the same class when a new addition has a questionable value.
Recently, geographical information systems have been very intensively applied in social life and in public health in particular. A retrospective problem-oriented search on their use in health planning was performed in Web of Science of Web of Knowledge, three versions of MEDLINE, Scopus, EMBASE, and ProQuest Medical in 1990-2010. The annual dynamics of a set of scientometric parameters characterizing several aspects of the abstracted publications, authors' scientific institutions, journals, authors, citations, and languages was comparatively analyzed. It was established that world publication output on such a relatively narrow topic was reflected to a different extent in these data-bases. MEDLINE (PubMed) presented with 484 papers published in 243 journals followed by MEDLINE (WoK) with 360 papers in 215 journals. The abstracted publications were mainly in English, but 14 other languages were present in significant numbers. Publications by authors from 44 countries were abstracted in WoS but from 29 countries in MEDLINE (Ebsco). The most productive authors and institutions as well as the 'core' journals were identified. The International Journal of Health Geography occupied the leading position. The Center for Disease Control and Prevention (USA) was one of the most productive research institutions in WoS and in Scopus. Scientific institutions and journals belonged to problem-oriented and to mono-, two- and three-disciplinary thematic profiles as well. Some essential peculiarities of the dynamics of research institutionalization and internationalization in this interdisciplinary field were illustrated. The constellation of specific semantically-loaded indicators could be applied for the purposes of problem-oriented analyses as it could timely identify the essential patterns of scientific advances in rapidly expanding interdisciplinary topics.
Based on the fact that in terms of research productivity, performance of women is weaker than men's, and because little is known on the factors affecting academic women's productivity in Iran, the present article aims to study factors affecting research productivity of Iranian women in ISI. To do this, at first, women who have already had published documents indexed in ISI were identified through Web of Science. Afterwards, in order to collect their view regarding factors affecting women's research productivity, a researcher-made questionnaire was used. To analyze the collected data, the statistical software SPSS (version 17) was used. Both descriptive (Percentage and Frequency) and inferential (ANOVA) statistics were employed to reach valid findings. The findings indicate that the most motivational factors affecting positively publishing scholarly articles by Iranian women are 'Getting promoted in scientific rank', 'Intrinsic talents', 'Perseverance and adventitious knowledge', 'Feeling of being useful in society', 'Getting promoted in job', 'Being encouraged by friends and family', 'Religious lessons regarding the importance of science', and 'Attempt to show individual capabilities'. Finally, some remarks for the improvement of the current condition are highlighted.
Database management technology has played a vital role in facilitating key advancements of the information technology field. Database researchers-and computer scientists in general-consider prestigious conferences as their favorite and effective tools for presenting their original research study and for getting good publicity. With the main aim of retaining the high quality and the prestige of these conference, program committee members plays the major role of evaluating the submitted articles and deciding which submissions are to be included in the conference programs. In this article, we study the program committees of four top-tier and prestigious database conferences (SIGMOD, VLDB, ICDE, EDBT) over a period of 10 years (2001-2010). We report about the growth in the number of program committee members in comparison to the size of the research community in the last decade. We also analyze the rate of change in the membership of the committees of the different editions of these conferences. Finally, we report about the major contributing scholars in the committees of these conferences as a mean of acknowledging their impact in the community.
We investigated author information in scientific articles by approximately 7,000 researchers for a quantitative analysis of researchers' international mobility. From top journals, we traced the movements of more than 2,200 researchers in the research domains of robotics, computer vision and electron devices. We categorized countries' characteristics for the balance between the inflow and the outflow of researchers moving internationally. Flow patterns of international mobility confirm that the United States, China and India exhibit the greatest global flows of researchers, with Singapore and Hong Kong attracting remarkable numbers of researchers from other countries. International mobility focusing on institutions reveals that universities in Singapore receive as many foreign researchers as do research universities in the United States. Furthermore, firms and international collaborative research institutes act as alternative receivers to the universities in the electron devices research domain.
The aim of this study is to map the intellectual structure of digital library (DL) field in China during the period of 2002-2011. Co-word analysis was employed to reveal the patterns of DL field in China through measuring the association strength of keywords in relevant journals. Data was collected from Chinese Journal Full-Text Database during the period of 2002-2011. And then, the co-occurrence matrix of keywords was analyzed by the methods of multivariate statistical analysis and social network analysis. The results mainly include five parts: seven clusters of keywords, a two-dimensional map, the density and centrality of clusters, a strategic diagram, and a relation network. The results show that there are some hot research topics and marginal topics in DL field in China, but the research topics are relatively decentralized compared with the international studies.
Diversification of R&D projects not only can reduce overall risk, but also can create value-enhancement effect. A useful guideline for optimal diversification of R&D projects is important to R&D organizations. This paper extends financial portfolio analyses for R&D management particularly incorporating the technology risk. This study uses a survival model to describe the technology risk since termination of an R&D project can be caused by any technology risk factors. A formula of optimal R&D resource allocation that can dynamically achieve the greatest diversification effect is offered. Furthermore we provide an alternative method for estimating correlations between R&D portfolios, which has a critical influence on diversification effect. The method can be useful in risk assessment when measure the exposure of R&D portfolio to particular sources of uncertainty. The evaluation framework for R&D portfolios optimization also can be applied in project-selection decisions.
Investigating Iran's scientific proficiency reflected in its scholarly outputs indexed in SCI during the 21st century and 1980s, the present study tries to propose the use of three features of science production including Specialty Diversity, Specialty Stability, and the growth of publications in the specialties, as the primary criteria in evaluating the contribution sustainability of a science system at macro level. They can be seen as the prerequisites every science system should realize to ensure a sustainable movement towards scientific development. The results reveal that Iran's contributions had been not only limited in number in 1980s, but also exposed to serious subject fluctuations, so that a scarce number of the fields were found to be stable regarding Iranian contributions. Moreover, none of them had experienced a significant, exponential positive growth during the decade. The situation is incomparable to the 21st century where Iran's contributions were as diversified as almost all of the SCI subject categories. It also reached long- or short-term stability in a majority of the categories. None of the previously stabilized specialties collapsed in the second 6-year sub-period. On the other hand, previously fluctuating fields mostly stabilized later. Moreover, a majority of the fields experienced significant exponential growths. Overall, according to the results, a developing science system might be characterized by its Specialty Diversity and Stability, as well as an annual growth in its publications in the specialties. Though meeting the criteria does not necessarily guarantee the achievement of quality standards, it may enhance the visibility of the contributions and thereby their recognition.
A thermodynamic analogy allows bibliometric research assessment of information production processes to be based on a scalar indicator which is an energy-like term called exergy. Derived from standard indicators like impact, citations and number of papers, the exergy indicator X is a multiplicative product of quality and quantity of a scientist's or group's performance using available bibliometric information. Thus, given the bibliometric sequences of leading research agencies and institutions, research performance can be displayed as trajectories on a two-dimensional map as time progresses. In this paper, we track the performance of several of the leading players contributing to academic scientific research in India.
A change has been taking place in the world of nanotechnologies since 2009, marking the beginning of a new era of end consumer goods related to these new technologies. In this article, our aim is to know the dominant tendencies observed in scientific output on carbon nanotubes at centres and poles from different countries and considered to be at the forefront of nanotechnologies research. We have selected a sample comprised of eight universities and locally coherent concentrations from different geographic areas: Europe, America and Asia. Based on this sample, we have used the Scopus database to analyse scientific output on carbon nanotubes in order to determine if there are significant differences in behaviour. We observe that dynamics of scientific output on nanotubes are similar in the universities and clusters analysed over time although a drop in publications was noted in 2009 in part of the organizations included in the sample. We have seen a large amount of publications on graphene in the last several years, due to the fact that researchers working in the field of carbon nanotubes gradually move towards the study of graphene, explained by the high expectations concerning the use of this element. The results lead us to conclude that advances in knowledge on carbon nanotubes and graphene will make it possible to meet the growing needs of a new and powerful market for products that are progressively including these new elements.
Quantitative assessment of information production processes requires the definition of a robust citation performance indicator. This is particularly so where there is a need to introduce a normalization mechanism for correcting for quality across field and disciplines. In this paper, we offer insights from the "thermodynamic" approach in terms of quality, quantity and quasity and energy, exergy and entropy to show how the recently introduced expected value measure can be rationalized and improved. The normalized energy indicator E is proposed as a suitable single number scalar indicator of a scientist's or group's performance (i.e. as a multiplicative product of quality and quantity), when complete bibliometric information is available.
An objective assessment using bibliometric indicators of research productivity in education and psychology in the Philippines was conducted. Results were then benchmarked against its Southeast Asian neighbors' research productivity in the same fields. Results showed that the Philippines ranked low in research productivity compared to Singapore, Thailand, and Malaysia, particularly starting in the 1990s. Only a few researchers, mainly coming from a small number of higher education institutions, were publishing papers on a regular basis in a small range of journals. Those journals had either no or low impact factors and most papers had low citation counts. It also collaborated less with domestic and international institutions. This low research productivity was explained in terms of economic indicators, the local orientation of many social science research studies, funding, individual characteristics of researchers, and the epistemic culture of knowledge production in the country. However, the reforms initiated by the government, particularly in the higher education sector, would hopefully lead to a better research landscape and, consequently, improved research productivity in the near future.
One of the major drawbacks of the classical Lotka function is that arguments only start from the value 1. However, in many applications one may want to start from the value 0, e.g. when including zero received citations. In this article we consider the shifted Lotka function, which includes the case of zero items. Basic results for the total number of sources, the total number of items and the average number of items per source are given in this framework. Next we give the rank-frequency function (Zipf-type function) corresponding to the shifted Lotka function and prove their exact relation. The article ends with a practical example which can be fitted by a shifted Lotka function.
The partnership ability index (phi) combines the number of co-authors and the times each of them acted as co-authors with a given author exactly the same way as Hirsch's h-index combines the number of publications and their citation rate. The index phi was tested on the sample of the Hevesy medal awardees. It was found that phi is consistent with Glanzel's model of h-index, and that higher phi values-at least until a certain limit-may be accompanied with higher citation visibility (h-index). Some further possibilities of application both within and outside the area of scientometrics are suggested.
Higher education systems in competitive environments generally present top universities, that are able to attract top scientists, top students and public and private financing, with notable socio-economic benefits in their region. The same does not hold true for noncompetitive systems. In this study we will measure the dispersion of research performance within and between universities in the Italian university system, typically non-competitive. We will also investigate the level of correlation that occurs between performance in research and its dispersion in universities. The findings may represent a first benchmark for similar studies in other nations. Furthermore, they lead to policy indications, questioning the effectiveness of selective funding of universities based on national research assessment exercises. The field of observation is composed of all Italian universities active in the hard sciences. Research performance will be evaluated using a bibliometric approach, through publications indexed in the Web of Science between 2004 and 2008. (C) 2011 Elsevier Ltd. All rights reserved.
The journal impact factor (JIF) proposed by Garfield in the year 1955 is one of the most prominent and common measures of the prestige, position, and importance of a scientific journal. The JIF may profit from its comprehensibility, robustness, methodological reproducibility, simplicity, and rapid availability, but it is at the expense of serious technical and methodological flaws. The paper discusses two core problems with the JIF: first, citations of documents are generally not normally distributed, and, furthermore, the distribution is affected by outliers, which has serious consequences for the use of the mean value in the JIF calculation. Second, the JIF is affected by bias factors that have nothing to do with the prestige or quality of a journal (e.g., document type). For solving these two problems, we suggest using McCall's area transformation and the Rubin Causal Model. Citation data for documents of all journals in the ISI Subject Category "Psychology, Mathematical" (Journal Citation Report) are used to illustrate the proposal. (C) 2012 Elsevier Ltd. All rights reserved.
We introduce layered systems such as the citations-citing authors-citing institutes-citing countries one. Diffusion of scientific ideas flows through such layered systems. Our contribution contains three main topics: a fractional counting system for the number of different units in a layer; the fractional number of items of the same type, i.e. in the same layer, over which ideas have been diffused; and the evenness of diffusion over different layers. In this way we construct a coherent system to measure the extent to which scientific ideas are diffused. (C) 2012 Elsevier Ltd. All rights reserved.
In this work we investigate the sensitivity of individual researchers' productivity rankings to the time of citation observation. The analysis is based on observation of research products for the 2001-2003 triennium for all research staff of Italian universities in the hard sciences, with the year of citation observation varying from 2004 to 2008. The 2008 rankings list is assumed the most accurate, as citations have had the longest time to accumulate and thus represent the best possible proxy of impact. By comparing the rankings lists from each year against the 2008 benchmark we provide policy-makers and research organization managers a measure of trade-off between timeliness of evaluation execution and accuracy of performance rankings. The results show that with variation in the evaluation citation window there are variable rates of inaccuracy across the disciplines of researchers. The inaccuracy results negligible for Physics, Biology and Medicine. (C) 2011 Elsevier Ltd. All rights reserved.
This study proposes a way of mapping open innovation research structure by quantitatively analyzing open innovation research papers retrieved from Web of Science database. A total of 130 papers are retrieved in this study and 62 papers which contain keywords are chosen for research structure visualization. Open innovation research networks are quantitatively investigated by combining network theory and keyword co-occurrence. Contour maps of open innovation are also created on the basis of networks for visualization. The networks and contour maps can be expressed differently by choosing different information as the main actors, such as the paper author, the institute, the country or the author-keywords, to reflect open innovation research structures in micro, meso, and macro-levels, respectively. The quantitative ways of exploring open innovation research structure are investigated to unveil important or emerging open innovation components as well as to demonstrate visualization of the structure of global open innovation research. The quantitative method provided in this project shows a possible way of visualizing and evaluating research community structure and thus a computerized calculation is possible for potential quantitative applications on open innovation research management, e.g. R&D resource allocation, research performance evaluation, and science map. (C) 2011 Elsevier Ltd. All rights reserved.
Scientific collaboration is often perceived as a joint global process that involves researchers worldwide, regardless of their place of work and residence. Globalization of science, in this respect, implies that collaboration among scientists takes place along the lines of common topics and irrespective of the spatial distances between the collaborators. The networks of collaborators, termed 'epistemic communities', should thus have a space-independent structure. This paper shows that such a notion of globalized scientific collaboration is not supported by empirical data. It introduces a novel approach of analyzing distance-dependent probabilities of collaboration. The results of the analysis of six distinct scientific fields reveal that intra-country collaboration is about 10-50 times more likely to occur than international collaboration. Moreover, strong dependencies exist between collaboration activity (measured in co-authorships) and spatial distance when confined to national borders. However, the fact that distance becomes irrelevant once collaboration is taken to the international scale suggests a globalized science system that is strongly influenced by the gravity of local science clusters. The similarity of the probability functions of the six science fields analyzed suggests a universal mode of spatial governance that is independent from the mode of knowledge creation in science. (C) 2011 Elsevier Ltd. All rights reserved.
This study examines global collaborative creativity through patentometrics and social network analysis. Because patents are a direct output of innovative activities, cross-border patents are used to analyze the trend of global collaborative creativity. The results show linear growth of cross-border patents, while numerous inventors have grown exponentially for collaborative creativity. The number of inventors for global collaborative creativity trends have increased more rapidly than the number of patents. The network for global collaborative creativity is denser and shows a growing trend over a five-year interval. Both observed and cosine-normalized numbers of k-cores in global collaborative creativity show a growing trend, while the cosine-normalized k-cores increase slowly compared to the observed one. Similarly, the social network analysis confirms a growing network of global collaborative creativity, which is dense despite its small degree of growth. This study also found that high values of "betweenness" tend to spread from core countries to periphery countries. Collaborative creativity has globalized but remains concentrated in core countries such as the U.S., the UK, France, Germany, and Canada. (C) 2011 Elsevier Ltd. All rights reserved.
The detection of communities in large social networks is receiving increasing attention in a variety of research areas. Most existing community detection approaches focus on the topology of social connections (e.g., coauthor, citation, and social conversation) without considering their topic and dynamic features. In this paper, we propose two models to detect communities by considering both topic and dynamic features. First, the Community Topic Model (CTM) can identify communities sharing similar topics. Second, the Dynamic CTM (DCTM) can capture the dynamic features of communities and topics based on the Bernoulli distribution that leverages the temporal continuity between consecutive timestamps. Both models were tested on two datasets: ArnetMiner and Twitter. Experiments show that communities with similar topics can be detected and the co-evolution of communities and topics can be observed by these two models, which allow us to better understand the dynamic features of social networks and make improved personalized recommendations. (C) 2011 Elsevier Ltd. All rights reserved.
The effect of two different calculation methods for obtaining relative impact indicators is modelled. Science policy considerations make it clear that evaluating the sets of publications, the "ratio of the sums" method should be preferred over the "mean of the ratios" method. Accordingly, determining the relative total impact against the mean relative impact of the publications of teams or institutes may be preferred. The special problem caused by relating the number of citations of an individual article to the Garfield (Impact) Factor (or mean citedness) of the publishing journal (or a set of journals selected as standard) lower than zero is demonstrated by examples. The possible effects of the different share of publications in different fields on the value of the "new crown" index are also modelled. The assessment methods using several appropriately weighted indicators which result in a composite index are recommended. The acronym "BMV" is suggested to term the relative impact indicators (e.g. RCR, CPP/JCS(m), CPP/FCSm and RW) in scientometrics. (C) 2011 Elsevier Ltd. All rights reserved.
This paper investigates the impact of referee reliability on the quality and efficiency of peer review. We modeled peer review as a process based on knowledge asymmetries and subject to evaluation bias. We tested various levels of referee reliability and different mechanisms of reviewing effort distribution among agents. We also tested different scientific community structures (cohesive vs. parochial) and competitive science environments (high vs. low competition). We found that referee behavior drastically affects peer review and an equal distribution of the reviewing effort is beneficial only if the scientific community is homogeneous and referee reliability is the rule. We also found that the Matthew effect in the allocation of resources and credit is inherent to a 'winner takes all' well functioning science system, more than a consequence of evaluation bias. (C) 2011 Elsevier Ltd. All rights reserved.
The level of consensus in science has traditionally been measured by a number of different methods. The variety is important as each method measures different aspects of science and consensus. Citation analytical studies have previously measured the level of consensus using the scientific journal as their unit of analysis. To produce a more fine grained citation analysis one needs to study consensus formation on an even more detailed level - i.e. the scientific document or article. To do so, we have developed a new technique that measures consensus by aggregated bibliographic couplings (ABC) between documents. The advantages of the ABC-technique are demonstrated in a study of two selected disciplines in which the levels of consensus are measured using the proposed technique. (C) 2011 Elsevier Ltd. All rights reserved.
In the Essential Science Indicators (Thomson Reuters), a research front exists to the h index (entitled "GOOGLE SCHOLAR H-INDEX; SCIENCE CITATION INDEX; GENERALIZED HIRSCH H-INDEX; H INDEX; GOOGLE SCHOLAR CITATIONS") consisting of a group of highly cited papers. We used HistCite to analyze the structure and relationships of the 45 papers forming the h index research front. Since we were interested in the topics of research on the h index at the front, we classified each paper according to its main topic. Six topics (inductively generated) were sufficient to classify the 45 papers: (1) citation database, (2) empirical validation study, (3) new application, (4) theoretical analysis, (5) new index development, and (6) literature review. (C) 2011 Elsevier Ltd. All rights reserved.
Suppose there is a scientist that writes a paper for a peer-reviewed journal. How likely is it that a natural disaster will terminate, change, suspend or discontinue some aspect of this editorial process? To answer this question, the aim of present study was to determine the effects of a natural disaster on progress in materials science research. The Tsunami event in Japan and materials science are well suited to serve as a case study for both the development and application of a system to evaluate the Academic Research Output immediately after a natural disaster. In particular, the analysis focused on the short-term impacts of Japan's triple disaster - earthquake, Tsunami, and nuclear accident (11 March, 2011) - on the Academic Research Output in materials science subject from three different areas: Sendai (Miyagi Prefecture), Tsukuba (Ibaraki Prefecture) and Kyoto (Kyoto Prefecture). The last one has been used as an internal reference standard (normal/non-disaster situation) for the comparison. A geographical cluster-based study was conducted between 09 February and 10 April 2011. Consistent with the hypothesis that a disaster might slow down knowledge production, the conclusion showed that the Japan's triple disaster strongly influenced the Academic Research Output of papers in the selected field of science. Using statistical data, these findings show that the number of submitted papers and the cumulative number of authors contributing to the field of materials science decreased immediately after the March 11th events in the areas affected by disaster. (C) 2012 Elsevier Ltd. All rights reserved.
One of the critical issues in bibliometric research assessments is the time required to achieve maturity in citations. Citation counts can be considered a reliable proxy of the real impact of a work only if they are observed after sufficient time has passed from publication date. In the present work the authors investigate the effect of varying the time of citation observation on accuracy of productivity rankings for research institutions. Research productivity measures are calculated for all Italian universities active in the hard sciences in the 2001-2003 period, by individual field and discipline, with the time of the citation observation varying from 2004 to 2008. The objective is to support policy-makers in choosing a citation window that optimizes the tradeoff between accuracy of rankings and timeliness of the exercise. (C) 2011 Elsevier Ltd. All rights reserved.
The recently awakened discussion on the usability of averages of ratios (AoR) compared to ratios of averages (RoA) has led to the mathematical results in this paper. Based on the empirical results in Lariviere and Gingras (2011) we prove, under reasonable conditions, the following relations between AoR and RoA for a set of points: (i) The regression line of RoA in function of AoR is the first bisectrix. (ii) (AoR - RoA)/AoR in function of the number N of papers is a cloud of points comprised between a multiple of 1 root N and - 1 root N. (iii) (AoR - RoA)/AoR versus RoA has a decreasing regression line. (C) 2012 Elsevier Ltd. All rights reserved.
Recent advances in methods and techniques enable us to develop interactive overlays to a global map of science based on aggregated citation relations among the 9162 journals contained in the Science Citation Index and Social Science Citation Index 2009. We first discuss the pros and cons of the various options: cited versus citing, multidimensional scaling versus spring-embedded algorithms, VOSViewer versus Gephi, and the various clustering algorithms and similarity criteria. Our approach focuses on the positions of journals in the multidimensional space spanned by the aggregated journal-journal citations. Using VOSViewer for the resulting mapping, a number of choices can be left to the user; we provide default options reflecting our preferences. Some examples are also provided; for example, the potential of using this technique to assess the interdisciplinarity of organizations and/or document sets. (C) 2011 Elsevier Ltd. All rights reserved.
Bornmann and Leydesdorff (2011) proposed methods based on Web of Science data to identify field-specific excellence in cities where highly cited papers were published more frequently than can be expected. Top performers in output are cities in which authors are located who publish a number of highly cited papers that is statistically significantly higher than can be expected for these cities. Using papers published between 1989 and 2009 in information science improvements to the methods of Bornmann and Leydesdorff (2011) are presented and an alternative mapping approach based on the Integrated Impact Indicator (I3) is introduced here. The I3 indicator was developed by Leydesdorff and Bornmann (2011b). (C) 2011 Elsevier Ltd. All rights reserved.
Climate change has become a major area of concern over the past few years and consequently many governments, international bodies, businesses, and institutions are taking measures to reduce their carbon footprint. However, to date very little research has taken place on information and sustainable development in general, and on the environmental impact of information services in particular. Based on the data collected from various research papers and reports, this review article shows that information systems and services for the higher education and research sector currently generate massive greenhouse gas (GHG) emissions, and it is argued that there is an urgent need for developing a green information service, or green IS in short, that should be based on minimum GHG emissions throughout its lifecycle, from content creation to distribution, access, use, and disposal. Based on an analysis of the current research on green information technology (IT), it is proposed that a green IS should be based on the model of cloud computing. Finally, a research agenda is proposed that will pave the way for building and managing green ISs to support education and research/scholarly activities.
Although electronic medical records (EMR) adoption by health care organizations has been widely studied, little is known about the determinants of EMR individual use by physicians after institutional adoption has taken place. In this study, the determinants of inpatient physicians' continuous use of EMR were studied. Four dimensions of EMR use were analyzed: use intensity, use extent, use frequency, and use scope. A web-based survey was administered to physicians at a large university hospital; respondents filled out a survey with questions relating to their EMR use, attitude, beliefs, work style, and dispositional resistance to change. Structural equation modeling was carried out to analyze the relationship between these factors. Physicians were found to differ substantially in the scope, extent, and intensity of their EMR use. Their attitude toward EMR use was associated with all use dimensions. Dispositional resistance to change was negatively related to perceived ease of use and with perceived usefulness both directly and through the mediation of compatibility with preferred work style. Time loss was negatively related to both perceived usefulness and attitude toward EMR use. Implications for research and practice are discussed.
We analyzed how effort in searching is associated with search output and task outcome. In a field study, we examined how students' search effort for an assigned learning task was associated with precision and relative recall, and how this was associated to the quality of learning outcome. The study subjects were 41 medical students writing essays for a class in medicine. Searching in Medline was part of their assignment. The data comprised students' search logs in Medline, their assessment of the usefulness of references retrieved, a questionnaire concerning the search process, and evaluation scores of the essays given by the teachers. Pearson correlation was calculated for answering the research questions. Finally, a path model for predicting task outcome was built. We found that effort in the search process degraded precision but improved task outcome. There were two major mechanisms reducing precision while enhancing task outcome. Effort in expanding Medical Subject Heading (MeSH) terms within search sessions and effort in assessing and exploring documents in the result list between the sessions degraded precision, but led to better task outcome. Thus, human effort compensated bad retrieval results on the way to good task outcome. Findings suggest that traditional effectiveness measures in information retrieval should be complemented with evaluation measures for search process and outcome.
This study presents and tests a research model of the outcomes of information literacy instruction (ILI) given to undergraduate business students. This model is based on expectation disconfirmation theory and insights garnered from a recent qualitative investigation of student learning outcomes from ILI given at three business schools. The model was tested through a web survey administered to 372 students. The model represents psychological, behavioral, and benefit outcomes as second-order molecular constructs. Results from a partial least squares (PLS) analysis reveal that expectation disconfirmation influences perceived quality and student satisfaction. These in turn affect student psychological outcomes. Further, psychological outcomes influence student behaviors, which in turn affect benefit outcomes. Based on the study's findings, several recommendations are made.
The "Folktales and Facets" project proposes ways to enhance access to folktales-in written and audiovisual formats-through the systematic and rigorous development of user-focused and task-focused models of information representation. Methods used include cognitive task analysis and facet analysis to better understand the information-seeking and information-use practices of people working with folktales and the intellectual dimensions of the domain. Interviews were conducted with 9 informants, representing scholars, storytellers, and teachers who rely on folktales in their professional lives to determine common tasks across user groups. Four tasks were identified: collect, create, instruct, and study. Facet analysis was conducted on the transcripts of these interviews, and a representative set of literature that included subject indexing material and a random stratified set of document surrogates drawn from a collection of folktales, including bibliographic records, introductions, reviews, tables of contents, and bibliographies. Eight facets were identified as most salient for this group of users: agent, association, context, documentation, location, subject, time, and viewpoint. Implications include the need for systems designers to devise methods for harvesting and integrating extant contextual material into search and discovery systems, and to take into account user-desired features in the development of enhanced services for digital repositories.
In the summer of 2008 all University of North Carolina libraries switched from a traditional library catalog interface supporting text-based searching (TextOnly) to a text and facet-based interface (TextFacet) to improve users' search experiences. This study seeks to understand the differences between these two interfaces and how they affect the search experience of the novice user. In this study, 40 participants were asked to search for resources using both interfaces. Their search times and accuracy were measured across three types of search tasks (known, partially known, and exploratory). After completing the searches, they were asked a series of questions about their experiences. The data were analyzed in order to identify strengths and weaknesses in both search interfaces. Thirty-six out of 40 participants preferred the TextFacet interface to the TextOnly interface. Using three dependent variables-time, accuracy, and rating-the two interfaces were compared and interactions were tested with the three task types. Search times for the TextFacet were shorter and participants preferred the TextFacet search interface over the TextOnly search interface. Performances across the three task types were different in terms of search time. The partially known and exploratory task types showed similar distributions for rating and accuracy. These distributions were distinctly different from the known task type. The results of this study may assist libraries in developing improved library catalog search interfaces that utilize facets as well as text searching.
This article addresses the question "what is information?" by comparing the meaning of the term "information" and epistemological assumptions of three theories in library and information science: the "ShannonWeaver model,"Brookes' interpretation of Popper's World 3, and the Data-Information-Knowledge-Wisdom model. It shows that the term "information" in these theories refers to empirical entities or events and is conceptualized as having causal powers upon human minds. It is argued that the epistemological assumptions have led to the negligence of the cultural and social aspects of the constitution of information (i.e., how something is considered to be and not to be information) and the unquestioned nature of science in research methodologies.
Completeness of metadata is one of the most essential characteristics of their quality. An incomplete metadata record is a record of degraded quality. Existing approaches to measure metadata completeness limit their scope in counting the existence of values in fields, regardless of the metadata hierarchy as defined in international standards. Such a traditional approach overlooks several issues that need to be taken into account. This paper presents a fine-grained metrics system for measuring metadata completeness, based on field completeness. A metadata field is considered to be a container of multiple pieces of information. In this regard, the proposed system is capable of following the hierarchy of metadata as it is set by the metadata schema and admeasuring the effect of multiple values of multivalued fields. An application of the proposed metrics system, after being configured according to specific user requirements, to measure completeness of a real-world set of metadata is demonstrated. The results prove its ability to assess the sufficiency of metadata to describe a resource and provide targeted measures of completeness throughout the metadata hierarchy.
Biodiversity information organization is looking beyond the traditional document-level metadata approach and has started to look into factual content in textual documents to support more intelligent and semantic-based access. This article reports the development and evaluation of CharaParser, a software application for semantic annotation of morphological descriptions. CharaParser annotates semistructured morphological descriptions in such a detailed manner that all stated morphological characters of an organ are marked up in Extensible Markup Language1 format. Using an unsupervised machine learning algorithm and a general purpose syntactic parser as its key annotation tools, CharaParser requires minimal additional knowledge engineering work and seems to perform well across different description collections and/or taxon groups. The system has been formally evaluated on over 1,000 sentences randomly selected from Volume 19 of Flora of North American and Part H of Treatise on Invertebrate Paleontology. CharaParser reaches and exceeds 90% in sentence-wise recall and precision, exceeding other similar systems reported in the literature. It also significantly outperforms a heuristic rule-based system we developed earlier. Early evidence that enriching the lexicon of a syntactic parser with domain terms alone may be sufficient to adapt the parser for the biodiversity domain is also observed and may have significant implications.
Privacy concerns can greatly hinder consumers' intentions to interact with a website. The success of a website therefore depends on its ability to improve consumers' perceptions of privacy assurance. Seals and assurance statements are mechanisms often used to increase this assurance; however, the findings of the extant literature regarding the effectiveness of these tools are mixed. We propose a model based on the elaboration likelihood model (ELM) that explains conditions under which privacy assurance is more or less effective, clarifying the contradictory findings in previous literature. We test our model in a free-simulation online experiment, and the results of the analysis indicate that the inclusion of assurance statements and the combination, understanding, and assurance of seals influence privacy assurance. Privacy assurance is most effective when seals and statements are accompanied by the peripheral cues of website quality and brand image and when counter-argumentation-through transaction risk-is minimized. Importantly, we show ELM to be an appropriate theoretical lens to explain the equivocal results in the literature. Finally, we suggest theoretical and practical implications.
Multisource web news portals provide various advantages such as richness in news content and an opportunity to follow developments from different perspectives. However, in such environments, news variety and quantity can have an overwhelming effect. New-event detection and topic-tracking studies address this problem. They examine news streams and organize stories according to their events; however, several tracking stories of an event/topic may contain no new information (i.e., no novelty). We study the novelty detection (ND) problem on the tracking news of a particular topic. For this purpose, we build a Turkish ND test collection called BilNov-2005 and propose the usage of three ND methods: a cosine-similarity (CS)-based method, a language-model (LM)-based method, and a cover-coefficient (CC)-based method. For the LM-based ND method, we show that a simpler smoothing approach, Dirichlet smoothing, can have similar performance to a more complex smoothing approach, Shrinkage smoothing. We introduce a baseline that shows the performance of a system with random novelty decisions. In addition, a category-based threshold learning method is used for the first time in ND literature. The experimental results show that the LM-based ND method significantly outperforms the CS- and CC-based methods, and category-based threshold learning achieves promising results when compared to general threshold learning.
Knowledge engineering and information mapping are two recent scientific disciplines in constant development where mathematics, linguistics, computer science, and information visualization converge. Their main focus is to discover and display new knowledge in large document databases. They have broad and innovative fields of application for strategic scouting in science and technology, knowledge management, business intelligence, and scientific and technological evaluation. This article presents a new method for mapping the strategic research network and illustrates its application to the strategic analysis of the knowledge domain "Spanish Research in Protected Areas for the Period 1981-2005." This strategic knowledge is displayed through a set of two-dimensional cartographic maps and three-dimensional images of two networks: the international network WoS_KWAJ (1981-2005) and the national network IEDCYT_KWAJ (1981-2005). These maps can be very useful in decision-making processes for science and technology policy.
Webometric network analyses have been used to map the connectivity of groups of websites to identify clusters, important sites or overall structure. Such analyses have mainly been based upon hyperlink counts, the number of hyperlinks between a pair of websites, although some have used title mentions or URL citations instead. The ability to automatically gather hyperlink counts from Yahoo! ceased in April 2011 and the ability to manually gather such counts was due to cease by early 2012, creating a need for alternatives. This article assesses URL citations and title mentions as possible replacements for hyperlinks in both binary and weighted direct link and co-inlink network diagrams. It also assesses three different types of data for the network connections: hit count estimates, counts of matching URLs, and filtered counts of matching URLs. Results from analyses of U. S. library and information science departments and U. K. universities give evidence that metrics based upon URLs or titles can be appropriate replacements for metrics based upon hyperlinks for both binary and weighted networks, although filtered counts of matching URLs are necessary to give the best results for co-title mention and co-URL citation network diagrams.
We characterize the research performance of a large number of institutions in a two-dimensional coordinate system based on the shapes of their h-cores so that their relative performance can be conveniently observed and compared. The 2D distribution of these institutions is then utilized (1) to categorize the institutions into a number of qualitative groups revealing the nature of their performance, and (2) to determine the position of a specific institution among the set of institutions. The method is compared with some major h-type indices and tested with empirical data using clinical medicine as an illustrative case. The method is extensible to the research performance evaluation at other aggregation levels such as researchers, journals, departments, and nations.
The controversial use of bibliometrics in scientific decision making has necessitated the need for researchers to remain informed and engaged about bibliometrics. Glanzel and Schoepflin (1994) first raised the issue of bibliometric standards in bibliometric research and this concern has been echoed by several additional bibliometric researchers over time (Braun, 2010; Glanzel, 1996; Abbott, Cyranoski, Jones, Maher, Schiermeier, & Van Noorden, 2010; Lane, 2010; Nature, 2010; van Noorden, 2010; Wallin, 2005). We compare the characteristics of articles published within and outside the Library and Information Science (LIS) field, including the relative impact and the affiliation of the contributing authors. We find that although the visibility of bibliometric articles within LIS is higher, it is not significant. However, a statistically significant growth in the number of articles written by authors without a bibliometric affiliation was found. This article provides an independent empirical investigation of publication trends potentially underlying Gl nzel and Schoepflin's (1994) concerns regarding the misuse of bibliometric results, and the inaccurate dissemination of concepts, results, and methods outside of the bibliometric field.
We analyze the large-scale structure of the journal citation network built from information contained in the Thomson-Reuters Journal Citation Reports. To this end, we explore network properties such as density, percolation robustness, average and largest node distances, reciprocity, incoming and outgoing degree distributions, and assortative mixing by node degrees. We discover that the journal citation network is a dense, robust, small, and reciprocal world. Furthermore, in- and out-degree node distributions display long tails, with few vital journals and many trivial ones, and they are strongly positively correlated.
Literature-based discovery (LBD) refers to a particular type of text mining that seeks to identify nontrivial assertions that are implicit, and not explicitly stated, and that are detected by juxtaposing (generally a large body of) documents. In this review, I will provide a brief overview of LBD, both past and present, and will propose some new directions for the next decade. The prevalent ABC model is not "wrong"; however, it is only one of several different types of models that can contribute to the development of the next generation of LBD tools. Perhaps the most urgent need is to develop a series of objective literature-based interestingness measures, which can customize the output of LBD systems for different types of scientific investigations.
An efficient and robust medical-image indexing procedure should be user-oriented. It is essential to index the images at the right level of description and ensure that the indexed levels match the user's interest level. This study examines 240 medical-image descriptions produced by three different groups of medical-image users (novices, intermediates, and experts) in the area of radiography. This article reports several important findings: First, the effect of domain knowledge has a significant relationship with the use of semantic image attributes in image-users' descriptions. We found that experts employ more high-level image attributes which require high-reasoning or diagnostic knowledge to search for a medical image (Abstract Objects and Scenes) than do novices; novices are more likely to describe some basic objects which do not require much radiological knowledge to search for an image they need (Generic Objects) than are experts. Second, all image users in this study prefer to use image attributes of the semantic levels to represent the image that they desired to find, especially using those specific-level and scene-related attributes. Third, image attributes generated by medical-image users can be mapped to all levels of the pyramid model that was developed to structure visual information. Therefore, the pyramid model could be considered a robust instrument for indexing medical imagery.
This study proposes an approach for visualizing knowledge structures that creates a "research-focused parallelship network," "keyword co-occurrence network," and a knowledge map to visualize Sci-Tech policy research structure. A total of 1,125 Sci-Tech policy-related papers (873 journal papers [78%], 205 conference papers [18%], and 47 review papers [4%]) have been retrieved from the Web of Science database for quantitative analysis and mapping. Different network and contour maps based on these 1,125 papers can be constructed by choosing different information as the main actor, such as the paper title, the institute, the country, or the author keywords, to reflect Sci-Tech policy research structures in micro-, meso-, and macro-levels, respectively. The quantitative way of exploring Sci-Tech policy research papers is investigated to unveil important or emerging Sci-Tech policy implications as well as to demonstrate the dynamics and visualization of the evolution of Sci-Tech policy research.
Social media is frequently used as a platform for the exchange of information and opinions as well as propaganda dissemination. But online content can be misused for the distribution of illicit information, such as violent postings in web forums. Illicit content is highly distributed in social media, while non-illicit content is unspecific and topically diverse. It is costly and time consuming to label a large amount of illicit content (positive examples) and non-illicit content (negative examples) to train classification systems. Nevertheless, it is relatively easy to obtain large volumes of unlabeled content in social media. In this article, an artificial immune system-based technique is presented to address the difficulties in the illicit content identification in social media. Inspired by the positive selection principle in the immune system, we designed a novel labeling heuristic based on partially supervised learning to extract high-quality positive and negative examples from unlabeled datasets. The empirical evaluation results from two large hate group web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance.
As online communities grow and the volume of user-generated content increases, the need for community management also rises. Community management has three main purposes: to create a positive experience for existing participants, to promote appropriate, socionormative behaviors, and to encourage potential participants to make contributions. Research indicates that the quality of content a potential participant sees on a site is highly influential; off-topic, negative comments with malicious intent are a particularly strong boundary to participation or set the tone for encouraging similar contributions. Aproblem for community managers, therefore, is the detection and elimination of such undesirable content. As a community grows, this undertaking becomes more daunting. Can an automated system aid community managers in this task? In this paper, we address this question through a machine learning approach to automatic detection of inappropriate negative user contributions. Our training corpus is a set of comments from a news commenting site that we tasked Amazon Mechanical Turk workers with labeling. Each comment is labeled for the presence of profanity, insults, and the object of the insults. Support vector machines trained on these data are combined with relevance and valence analysis systems in a multistep approach to the detection of inappropriate negative user contributions. The system shows great potential for semiautomated community management.
In plagiarism detection (PD) systems, two important problems should be considered: the problem of retrieving candidate documents that are globally similar to a document q under investigation, and the problem of side-by-side comparison of q and its candidates to pinpoint plagiarized fragments in detail. In this article, the authors investigate the usage of structural information of scientific publications in both problems, and the consideration of citation evidence in the second problem. Three statistical measures namely Inverse Generic Class Frequency, Spread, and Depth are introduced to assign a degree of importance (i.e., weight) to structural components in scientific articles. A term-weighting scheme is adjusted to incorporate component-weight factors, which is used to improve the retrieval of potential sources of plagiarism. A plagiarism screening process is applied based on a measure of resemblance, in which component-weight factors are exploited to ignore less or nonsignificant plagiarism cases. Using the notion of citation evidence, parts with proper citation evidence are excluded, and remaining cases are suspected and used to calculate the similarity index. The authors compare their approach to two flat-based baselines, TF-IDF weighting with a Cosine coefficient, and shingling with a Jaccard coefficient. In both baselines, they use different comparison units with overlapping measures for plagiarism screening. They conducted extensive experiments using a dataset of 15,412 documents divided into 8,657 source publications and 6,755 suspicious queries, which included 18,147 plagiarism cases inserted automatically. Component-weight factors are assessed using precision, recall, and F-measure averaged over a 10-fold cross-validation and compared using the ANOVA statistical test. Results from structural-based candidate retrieval and plagiarism detection are evaluated statistically against the flat baselines using paired-t tests on 10-fold cross-validation runs, which demonstrate the efficacy achieved by the proposed framework. An empirical study on the system's response shows that structural information, unlike existing plagiarism detectors, helps to flag significant plagiarism cases, improve the similarity index, and provide human-like plagiarism screening results.
Although Web 2.0 contains many tools with different functionalities, they all share a common social nature. One tool in particular, social bookmarking systems (SBSs), allows users to store and share links to different types of resources, i.e., websites, videos, images. To identify and classify these resources so that they can be retrieved and shared, fragments of text are used. These fragments of text, usually words, are called tags. A tag that is found on the inside of a resource text is referred to as an obvious or explicit tag. There are also nonobvious or implicit tags, which don't appear in the resource text. The purpose of this article is to describe the present situation of the SBSs tool and then to also determine the principal features of and how to use explicit tags. It will be taken into special consideration which HTML tags with explicit tags are used more frequently.
International collaboration is being heralded as the hallmark of contemporary scientific production. Yet little quantitative evidence has portrayed the landscape and trends of such collaboration. To this end, 14,000,000 documents indexed in Thomson Reuters's Web of Science (WoS) were studied to provide a state-of-the-art description of scientific collaborations across the world. The results indicate that the number of authors in the largest research teams have not significantly grown during the past decade; however, the number of smaller research teams has seen significant increases in growth. In terms of composition, the largest teams have become more diverse than the latter teams and tend more toward interinstitutional and international collaboration. Investigating the size of teams showed large variation between fields. Mapping scientific cooperation at the country level reveals that Western countries situated at the core of the map are extensively cooperating with each other. High-impact institutions are significantly more collaborative than others. This work should inform policy makers, administrators, and those interested in the progression of scientific collaboration.
In an effort to understand how academic scientists seek information relevant to their research in today's environment of ubiquitous electronic access, a correlation framework is built and regression analysis is applied to the survey results from 2,063 academic researchers in natural science, engineering, and medical science at five research universities in the United States. Previous work has reported descriptive statistics about these scientists' information-seeking behavior. This study extends that work to examine relationships between scientists' information-seeking behaviors and their personal and environmental factors. Several regression models, including the Poisson model, the logit model, and the ordered logit model, are built to interpret the correlation among scientists' behaviors. In addition, exploratory factor analysis is used for data reduction. Overall, many factors were found to affect the specific information-seeking behaviors of scientists, including demographic, psychological, role-related, and environmental factors. Of the factors having an effect, academic position was the most important determinant of information behavior.
The goal in blog search is to rank blogs according to their recurrent relevance to the topic of the query. State-of-the-art approaches view it as an expert search or resource selection problem. We investigate the effect of content-based similarity between posts on the performance of the retrieval system. We test two different approaches for smoothing (regularizing) relevance scores of posts based on their dependencies. In the first approach, we smooth term distributions describing posts by performing a random walk over a document-term graph in which similar posts are highly connected. In the second, we directly smooth scores for posts using a regularization framework that aims to minimize the discrepancy between scores for similar documents. We then extend these approaches to consider the time interval between the posts in smoothing the scores. The idea is that if two posts are temporally close, then they are good sources for smoothing each other's relevance scores. We compare these methods with the state-of-the-art approaches in blog search that employ Language Modeling-based resource selection algorithms and fusion-based methods for aggregating post relevance scores. We show performance gains over the baseline techniques which do not take advantage of the relation between posts for smoothing relevance estimates.
This article presents a search-intent-based method to generate pornographic blacklists for collaborative cyberporn filtering. A novel porn-detection framework that can find newly appearing pornographic web pages by mining search query logs is proposed. First, suspected queries are identified along with their clicked URLs by an automatically constructed lexicon. Then, a candidate URL is determined if the number of clicks satisfies majority voting rules. Finally, a candidate whose URL contains at least one categorical keyword will be included in a blacklist. Several experiments are conducted on an MSN search porn dataset to demonstrate the effectiveness of our method. The resulting blacklist generated by our search-intent-based method achieves high precision (0.701) while maintaining a favorably low false-positive rate (0.086). The experiments of a real-life filtering simulation reveal that our proposed method with its accumulative update strategy can achieve 44.15% of a macro-averaging blocking rate, when the update frequency is set to 1 day. In addition, the overblocking rates are less than 9% with time change due to the strong advantages of our search-intent-based method. This user-behavior-oriented method can be easily applied to search engines for incorporating only implicit collective intelligence from query logs without other efforts. In practice, it is complementary to intelligent content analysis for keeping up with the changing trails of objectionable websites from users' perspectives.
Understanding search behavior is important and leads to more effective interfaces that support searchers throughout the search process. In this article, through an observational user study, we investigate the search behavior of 15 visually impaired and 15 sighted searchers while they complete complex search tasks online. We study complex search tasks because they are challenging, cognitively intensive and affect performance of searchers. We compare the behavior of the two groups of searchers at four stages of the information-seeking process namely, Query Formulation, Search Results Exploration, Query Reformulation, and Search Results Management. For each stage, we identify research questions to investigate the impact of speech-based screen readers on the information-seeking behavior of visually impaired users. Significant differences were observed during query formulation and in the use of query-level support features such as query suggestions and spelling suggestions. In addition, screen-reader users submitted a lower number of queries and displayed comparatively limited exploratory behavior during search results exploration. We investigate how a lack of visual cues affected visually impaired searchers' approach towards query reformulation and observed different strategies to manage and use information encountered during the search process. We discuss the implications that our findings have for the design of search interfaces and propose a set of design guidelines to consider when designing interfaces that are usable and accessible with screen readers. This work also enhances our understanding of search behavior when using an auditory interface and could be useful when designing audio-based information retrieval systems.
Web accessibility (WA) is an innovation in Web design; it can be considered as part of the corporate social responsibility (CSR) strategy of the firms. As adoption of innovations and CSR commitment are linked with firm size and national culture/legislation, we hypothesize that size and national culture/legislation, may have an effect on WA level. The authors studied an international sample made up of companies included in EUROSTOXX600 (The STOXX Europe 600 Index). The main results suggest that both size and culture have a significant effect on WA. Large firms as well as Anglo-Saxon companies are more prone to have higher WA levels. A deeper analysis, which was done through the estimation of quantile regression equations, showed that the influence of size is significant for companies trying to excel or for those trying to avoid the worst WA. However, the effect of size is significant only in the lowest part of the conditional distribution.
The h-index is a popular bibliometric indicator for assessing individual scientists. We criticize the h-index from a theoretical point of view. We argue that for the purpose of measuring the overall scientific impact of a scientist (or some other unit of analysis), the h-index behaves in a counterintuitive way. In certain cases, the mechanism used by the h-index to aggregate publication and citation statistics into a single number leads to inconsistencies in the way in which scientists are ranked. Our conclusion is that the h-index cannot be considered an appropriate indicator of a scientist's overall scientific impact. Based on recent theoretical insights, we discuss what kind of indicators can be used as an alternative to the h-index. We pay special attention to the highly cited publications indicator. This indicator has a lot in common with the h-index, but unlike the h-index it does not produce inconsistent rankings.
We introduce the notions of congruous indicator of relative performance and congruous indicator of absolute performance. These notions are very similar to the notions of independence and consistency, yet slightly different. It is shown that percentile rank scores, as recently introduced by Leydesdorff, Bornmann, Mutz, and Opthof (2011), are strictly congruous indicators of relative performance, and similarly, that the Integrated Impact Indicator (I3), introduced by Leydesdorff and Bornmann (2011), is a strictly congruous indicator of absolute performance. Our analysis highlights the challenge of finding adequate axioms for ranking and for research evaluation.
During the 20th century there was a strong desire to develop an information science from librarianship, bibliography, and documentation and in 1968 the American Documentation Institute changed its name to the American Society for Information Science. By the beginning of the 21st century, however, departments of (library and) information science had turned instead towards the social sciences. These programs address a variety of important topics, but they have been less successful in providing a coherent explanation of the nature and scope of the field. Progress can be made towards a coherent, unified view of the roles of archives, libraries, museums, online information services, and related organizations if they are treated as information-providing services. However, such an approach seems significantly incomplete on ordinary understandings of the providing of information. Instead of asking what information science is or what we might wish it to become, we ask instead what kind of field it can be given our assumptions about it. We approach the question by examining some keywords: science, information, knowledge, and interdisciplinary. We conclude that if information science is concerned with what people know, then it is a form of cultural engagement, and at most, a science of the artificial.
We provide evidence and discuss findings regarding the intellectual distribution and faculty composition of academic units involved in the iSchool community. To better understand the intellectual heritage and major influences shaping the development of the individual and collective identities in iSchools, we develop a classification of the intellectual domains of iSchool faculty education. We use this to develop a descriptive analysis of the community's intellectual composition. The discussion focuses on characterizing intellectual diversity in the iSchools. We conclude with a discussion of the potential implications of these trends relative to the future development of the iSchool community.
This study uses three bibliometric methods: direct citation, bibliographic coupling, and co-authorship analysis, to investigate interdisciplinary changes in library and information science (LIS) from 1978 to 2007. The results reveal that LIS researchers most frequently cite publications in their own discipline. In addition, half of all co-authors of LIS articles are affiliated with LIS-related institutes. The results confirm that the degree of interdisciplinarity within LIS has increased, particularly co-authorship. However, the study found sources of direct citations in LIS articles are widely distributed across 30 disciplines, but co-authors of LIS articles are distributed across only 25 disciplines. The degree of interdisciplinarity was found ranging from 0.61 to 0.82 with citation to references in all articles being the highest and that of co-authorship being the lowest. Percentages of contribution attributable to LIS show a decreasing tendency based on the results of direct citation and co-authorship analysis, but an increasing tendency based on those of bibliographic coupling analysis. Such differences indicate each of the three bibliometric methods has its strength and provides insights respectively for viewing various aspects of interdisciplinarity, suggesting the use of no single bibliometric method can reveal all aspects of interdisciplinarity due to its multifaceted nature.
The purpose of this study is to understand how microblogging communications change and contribute to collective sense-making over time during a crisis. Using B. Dervin's (1983) theory of sense-making applied to crises and communications during crises, we examined 7,184 microblogging communications sent in response to three violent crises that occurred on U.S. college campuses. The analysis of patterns of microblogging communications found that information-sharing behaviors dominated the early response phase of violent crises, and opinion sharing increased over time, peaking in the recovery phase of the crises. The analysis of individual microblogging communications identified various themes in the conversation threads that not only helped individual contributors make sense of the situation but also helped others who followed the conversation. The results of this study show that microblogging can play a vital role in collective sense-making during crises.
In a contemporary user environment, there are often multiple information systems available for a certain type of task. Based on the premises of Activity Theory, this study examines how user characteristics, system experiences, and task situations influence an individual's preferences among different systems in terms of user readiness to interact with each. It hypothesizes that system experiences directly shape specific user readiness at the within-subject level, user characteristics and task situations make differences in general user readiness at the between-subject level, and task situations also affect specific user readiness through the mediation of system experiences. An empirical study was conducted, and the results supported the hypothesized relationships. The findings provide insights on how to enhance technology adoption by tailoring system development and management to various task contexts and different user groups.
This article reports the findings of a bibliometric study of the measurable effects of experience and prestige on researchers' citing behavior. All single authors from two econometrics journals over a 10-year time period form the basis of the analysis of how experience and prestige affect the number of references in their publications. Preliminary results from linear regression models suggest that two author types can be characterized using this analysis. Review experience seems to be the decisive factor in the data. The article discusses the implications of the findings and offers suggestions for future research within this new and promising area.
Radicchi, Fortunato, and Castellano (2008) claim that, apart from a scaling factor, all fields of science are characterized by the same citation distribution. We present a large-scale validation study of this universality-of-citation-distributions claim. Our analysis shows that claiming citation distributions to be universal for all fields of science is not warranted. Although many fields indeed seem to have fairly similar citation distributions, there are exceptions as well. We also briefly discuss the consequences of our findings for the measurement of scientific impact using citation-based bibliometric indicators.
In this article, we build models to predict the existence of citations among papers by formulating link prediction for 5 large-scale datasets of citation networks. The supervised machine-learning model is applied with 11 features. As a result, our learner performs very well, with the F1 values of between 0.74 and 0.82. Three features in particular, link-based Jaccard coefficient, difference in betweenness centrality, and cosine similarity of term frequency-inverse document frequency vectors, largely affect the predictions of citations. The results also indicate that different models are required for different types of research areas-research fields with a single issue or research fields with multiple issues. In the case of research fields with multiple issues, there are barriers among research fields because our results indicate that papers tend to be cited in each research field locally. Therefore, one must consider the typology of targeted research areas when building models for link prediction in citation networks.
We analyze barriers to task-based information access in molecular medicine, focusing on research tasks, which provide task performance sessions of varying complexity. Molecular medicine is a relevant domain because it offers thousands of digital resources as the information environment. Data were collected through shadowing of real work tasks. Thirty work task sessions were analyzed and barriers in these identified. The barriers were classified by their character (conceptual, syntactic, and technological) and by their context of appearance (work task, system integration, or system). Also, work task sessions were grouped into three complexity classes and the frequency of barriers of varying types across task complexity levels were analyzed. Our findings indicate that although most of the barriers are on system level, there is a quantum of barriers in integration and work task contexts. These barriers might be overcome through attention to the integrated use of multiple systems at least for the most frequent uses. This can be done by means of standardization and harmonization of the data and by taking the requirements of the work tasks into account in system design and development, because information access is seldom an end itself, but rather serves to reach the goals of work tasks.
Open access (OA) journals distribute their content at no charge and use other means of funding the publication process. Publication fees or article-processing charges (APC)s have become the predominant means for funding professional OA publishing. We surveyed 1,038 authors who recently published articles in 74 OA journals that charge APCs stratified into seven discipline categories. Authors were asked about the source of funding for the APC, factors influencing their choice of a journal and past history publishing in OA and subscription journals. Additional information about the journal and the authors' country were obtained from the journal website. A total of 429 (41%) authors from 69 journals completed the survey. There were large differences in the source of funding among disciplines. Journals with impact factors charged higher APCs as did journals from disciplines where grant funding is plentiful. Fit, quality, and speed of publication were the most important factors in the authors' choice of a journal. OA was less important but a significant factor for many authors in their choice of a journal to publish. These findings are consistent with other research on OA publishing and suggest that OA publishing funded through APCs is likely to continue to grow.
A text table is a simple table, with no or minimal chartlike elements, that is incorporated directly within a sentence. It can be very efficient in conveying quantitative (and sometimes qualitative) information that can be difficult to read within one or two sentences, but which is too simple to present within a regular table. Although this format has been used in the scientific literature, and indeed recommended in some sources, its effectiveness has not been studied in formal surveys. This article presents the results of one such survey in which three examples were considered. Scientists representing mathematics, statistics, and similar disciplines and scientists representing biology, agriculture, and similar disciplines were asked to participate in the survey; 189 representing the former and 201 representing the latter agreed. The results clearly showed for both groups, when the data presented were suitable for such a layout, that the text tables were much preferred to the original sentences. The main conclusion from this work, therefore, is that scientific authors should use text tables whenever appropriate.
This study examined how searchers interact with a web-based, faceted library catalog when conducting exploratory searches. It applied multiple methods, including eye tracking and stimulated recall interviews, to investigate important aspects of faceted search interface use, specifically: (a) searcher gaze behavior-what components of the interface searchers look at; (b) how gaze behavior differs when training is and is not provided; (c) how gaze behavior changes as searchers become familiar with the interface; and (d) how gaze behavior differs depending on the stage of the search process. The results confirm previous findings that facets account for approximately 10-30% of interface use. They show that providing a 60-second video demonstration increased searcher use of facets. However, searcher use of the facets did not evolve during the study session, which suggests that searchers may not, on their own, rapidly apply the faceted interfaces. The findings also suggest that searcher use of interface elements varied by the stage of their search during the session, with higher use of facets during decision-making stages. These findings will be of interest to librarians and interface designers who wish to maximize the value of faceted searching for patrons, as well as to researchers who study search behavior.
This article investigates the dynamic features of social tagging vocabularies in Delicious, Flickr, and YouTube from 2003 to 2008. Three algorithms are designed to study the macro-and micro-tag growth as well as the dynamics of taggers' activities, respectively. Moreover, we propose a Tagger Tag Resource Latent Dirichlet Allocation (TTR-LDA) model to explore the evolution of topics emerging from those social vocabularies. Our results show that (a) at the macro level, tag growth in all the three tagging systems obeys power law distribution with exponents lower than 1; at the micro level, the tag growth of popular resources in all three tagging systems follows a similar power law distribution; (b) the exponents of tag growth vary in different evolving stages of resources; (c) the growth of number of taggers associated with different popular resources presents a feature of convergence over time; (d) the active level of taggers has a positive correlation with the macro-tag growth of different tagging systems; and (e) some topics evolve into several subtopics over time while others experience relatively stable stages in which their contents do not change much, and certain groups of taggers continue their interests in them.
Sentiment analysis is concerned with the automatic extraction of sentiment-related information from text. Although most sentiment analysis addresses commercial tasks, such as extracting opinions from product reviews, there is increasing interest in the affective dimension of the social web, and Twitter in particular. Most sentiment analysis algorithms are not ideally suited to this task because they exploit indirect indicators of sentiment that can reflect genre or topic instead. Hence, such algorithms used to process social web texts can identify spurious sentiment patterns caused by topics rather than affective phenomena. This article assesses an improved version of the algorithm SentiStrength for sentiment strength detection across the social web that primarily uses direct indications of sentiment. The results from six diverse social web data sets (MySpace, Twitter, YouTube, Digg, Runners World, BBC Forums) indicate that SentiStrength 2 is successful in the sense of performing better than a baseline approach for all data sets in both supervised and unsupervised cases. SentiStrength is not always better than machine-learning approaches that exploit indirect indicators of sentiment, however, and is particularly weaker for positive sentiment in news-related discussions. Overall, the results suggest that, even unsupervised, SentiStrength is robust enough to be applied to a wide variety of different social web contexts.
Evaluation of machine translation output is an important task. Various human evaluation techniques as well as automatic metrics have been proposed and investigated in the last decade. However, very few evaluation methods take the linguistic aspect into account. In this article, we use an objective evaluation method for machine translation output that classifies all translation errors into one of the five following linguistic levels: orthographic, morphological, lexical, semantic, and syntactic. Linguistic guidelines for the target language are required, and human evaluators use them in to classify the output errors. The experiments are performed on English-to-Catalan and Spanish-to-Catalan translation outputs generated by four different systems: 2 rule-based and 2 statistical. All translations are evaluated using the 3 following methods: a standard human perceptual evaluation method, several widely used automatic metrics, and the human linguistic evaluation. Pearson and Spearman correlation coefficients between the linguistic, perceptual, and automatic results are then calculated, showing that the semantic level correlates significantly with both perceptual evaluation and automatic metrics.
Information studies, from origins in the field of documentation, has long been concerned with the question, What is a document? The purpose of this study is to examine Christian icons-typically tempera paintings on wooden panels-as information objects, as documents: documents that obtain meaning through tradition and standardization, documents around which a sophisticated scaffolding of classification and categorization has developed, documents that highlight their own materiality. Theological arguments that associate the icon with the Incarnation are juxtaposed with theories on the materiality of the document and "information as thing." Icons are examined as visual and multimedia documents: all icons are graphic; many also incorporate textual information. Icons emerge as a complex information resource: a resource-with origins in the earliest years of Christianity-that developed over centuries with accompanying systems of standardization and classification, a resource at the center of theological and political differences that shook empires, a primarily visual resource within a theological framework that affords the visual equal status with the textual, a resource with enduring relevance to hundreds of millions of Christians, a resource that continues to evolve as ancient and modern icons take on new material forms made possible through digital technologies. And crist was all, by reason as I preve, Firste a prophete by holy informacion, And by his doctryne, most worthy of byleve. -John Lydgate. Life of Our Lady. IV. II. 309-311 We confess and proclaim our salvation in word and images. -Kontakion of the Sunday of Orthodoxy
The rapid accumulation of genome annotations, as well as their widespread reuse in clinical and scientific practice, poses new challenges to management of the quality of scientific data. This study contributes towards better understanding of scientists' perceptions of and priorities for data quality and data quality assurance skills needed in genome annotation. This study was guided by a previously developed general framework for assessment of data quality and by a taxonomy of data-quality (DQ) skills, and intended to define context-sensitive models of criteria for data quality and skills for genome annotation. Analysis of the results revealed that genomics scientists recognize specific sets of criteria for quality in the genome-annotation context. Seventeen data quality dimensions were reduced to 5-factor constructs, and 17 relevant skills were grouped into 4-factor constructs. The constructs defined by this study advance the understanding of data quality relationships and are an important contribution to data and information quality research. In addition, the resulting models can serve as valuable resources to genome data curators and administrators for developing data-curation policies and designing DQ-assurance strategies, processes, procedures, and infrastructure. The study's findings may also inform educators in developing data quality assurance curricula and training courses.
An online diary study was performed to investigate deception across different media. One hundred and four individuals participated in the study, with 76 completing the diaries. Individuals were most likely to lie on the telephone. Planned lies, which participants also rated the most serious, were more likely told via SMS (short message service) text messaging. Most lies were told to people participants felt closest to. The feature-based model provides a better account of the deceptions reported by participants than do media richness theory or social distance theory. However, the authors propose a reworked feature-based model to explain deception across different media. They suggest that instant messaging should be treated as a near synchronous mode of communication. We suggest that the model needs to distinguish between spontaneous and planned lies.
This document provides an analysis of scientometric research in South Africa and it discusses sources of growth in the country's research literature in general. South Africa is identified to have limited expertise in the field revealed mainly during the last decade. However, the country is ranked 21st in the world among the countries publishing in the journal Scientometrics and it is the only African country with such a standing in the field. Identification of the forces affecting positively the growth in the number of research publications in the country indicates that the primary incentive fuelling the recent growth is the new funding formula in the country which subsidizes the universities by more than R100 000 for each publication that their staff produces. The increase in the number of journals indexed in the ISI Thomson Reuters database and the incorporation of social sciences at the NRF have also affected the growth of research publications, but to a lesser extent.
This work examines a scientometric model that tracks the emergence of an identified technology from initial discovery (via original scientific and conference literature), through critical discoveries (via original scientific, conference literature and patents), transitioning through Technology Readiness Levels (TRLs) and ultimately on to commercial application. During the period of innovation and technology transfer, the impact of scholarly works, patents and on-line web news sources are identified. As trends develop, currency of citations, collaboration indicators, and on-line news patterns are identified. The combinations of four distinct and separate searchable on-line networked sources (i.e., scholarly publications and citation, patents, news archives, and on-line mapping networks) are assembled to become one collective network (a dataset for analysis of relations). This established network becomes the basis from which to quickly analyze the temporal flow of activity (searchable events) for the example subject domain we investigated.
Google Scholar, the academic bibliographic database provided free-of-charge by the search engine giant Google, has been suggested as an alternative or complementary resource to the commercial citation databases like Web of Knowledge (ISI/Thomson) or Scopus (Elsevier). In order to check the usefulness of this database for bibliometric analysis, and especially research evaluation, a novel approach is introduced. Instead of names of authors or institutions, a webometric analysis of academic web domains is performed. The bibliographic records for 225 top level web domains (TLD), 19,240 university and 6,380 research centres institutional web domains have been collected from the Google Scholar database. About 63.8% of the records are hosted in generic domains like .com or .org, confirming that most of the Scholar data come from large commercial or non-profit sources. Considering only institutions with at least one record, one-third of the other items (10.6% from the global) are hosted by the 10,442 universities, while 3,901 research centres amount for an additional 7.9% from the total. The individual analysis show that universities from China, Brazil, Spain, Taiwan or Indonesia are far better ranked than expected. In some cases, large international or national databases, or repositories are responsible for the high numbers found. However, in many others, the local contents, including papers in low impact journals, popular scientific literature, and unpublished reports or teaching supporting materials are clearly overrepresented. Google Scholar lacks the quality control needed for its use as a bibliometric tool; the larger coverage it provides consists in some cases of items not comparable with those provided by other similar databases.
In this paper we have looked at a new measure of connectedness between research areas, namely, the migration of authors between subfields as seen from their contributions to different areas. Migration may be considered as an embodied knowledge flow that bridges some part of the cognitive gap between fields. Our hypothesis is that the rate of author migration will reflect cognitive similarity or affinity between disciplines. This is graphically shown to be reasonable, but only above certain levels of migration for our data from mathematical reviews spanning 17 years (1959-1975). The inter-related structure of Mathematics is then mapped using migration data in the appropriate range. We find the resulting map to be a good reflection of the disciplinary variation in the field of Mathematics.
The aim of this paper is to study the knowledge production of Colombian universities in terms of their accumulation of intellectual capital (IC). We observe Colombian universities' publications between 1958 and 2008, categorizing each university according to growth trends in its scientific publications: early exponential growth, late exponential growth, and linear and irregular growth. This work describes the relationships between these growth trends and IC accumulation. It presents an historical description of some institutional changes in Colombian universities that improved the research activity. In addition, we present an empirical study of IC accumulation in universities from the three growth trend categories between 2003 and 2009. We suggest that the adapting capacity, the accumulation time, and the strategies of IC accumulation related to feedback structures are key factors in explaining the differences in knowledge production between growth categories of Colombian universities. The results show critical differences-on orders of magnitude-in IC accumulation across the three categories. Therefore, it would be possible to define a roadmap to improve the knowledge production in Colombian universities.
Indicators based on non-patent references (NPRs) are increasingly being used for measuring and assessing science-technology interactions. But NPRs in patent documents contain noise, as not all of them can be considered 'scientific'. In this article, we introduce the results of a machine-learning algorithm that allows identifying scientific references in an automated manner. Using the obtained results, we analyze indicators based on NPRs, with a focus on the difference between NPR- and scientific non-patent references-based indicators. Differences between both indicators are significant and dependent on the considered patent system, the applicant country and the technological domain. These results signal the relevancy of delineating scientific references when using NPRs to assess the occurrence and impact of science-technology interactions.
The notion of 'core documents', first introduced in the context of co-citation analysis and later re-introduced for bibliographic coupling and extended to hybrid approaches, refers to the representation of the core of a document set according to given criteria. In the present study, core documents are used for the identification of new emerging topics. The proposed method proceeds from independent clustering of disciplines in different time windows. Cross-citations between core documents and clusters in different periods are used to detect new, exceptionally growing clusters or clusters with changing topics. Three paradigmatic types of new, emerging topics are distinguished. Methodology is illustrated using the example of four ISI subject categories selected from the life sciences, applied sciences and the social sciences.
This bibliometric study on the collaboration of Austria and six target countries (Slovenia, Hungary, Czech Republic, Denmark, Switzerland and Israel) reveals the importance of differentiation between the bilateral and multilateral contingents in the assessment of international scientific collaboration. For this purpose a "degree of bilaterality" (DB) and a "citation degree of bilaterality" (CDB) are introduced. In our findings the DB and the CDB have values lower than 1/3 and 1/5, respectively. Therefore, the total collaboration is mostly shaped in its volume and impact by the multilateral contingent. Regarding the impact estimation of the collaboration publication output, a multi-faceted approach was used. It is recommended to separately analyze the following three aspects: the un-cited range, the average range and the excellence range. Considering different country specific parameters the total number of publications and citations were resized for each type of collaboration and the results discussed. Only a very weak correlation between 'times cited' and the number of affiliations or authors was observed at publication level. Neither the number of authors or affiliations determines impact increase. Rather internationalisation and cooperation seem to be the crucial factors.
Key to accurate bibliometric analyses is the ability to correctly link individuals to their corpus of work, with an optimal balance between precision and recall. We have developed an algorithm that does this disambiguation task with a very high recall and precision. The method addresses the issues of discarded records due to null data fields and their resultant effect on recall, precision and F-measure results. We have implemented a dynamic approach to similarity calculations based on all available data fields. We have also included differences in author contribution and age difference between publications, both of which have meaningful effects on overall similarity measurements, resulting in significantly higher recall and precision of returned records. The results are presented from a test dataset of heterogeneous catalysis publications. Results demonstrate significantly high average F-measure scores and substantial improvements on previous and stand-alone techniques.
New indicators, including the outgrow index, characterizing an article in its ego citation network are introduced. We take full advantage of the existing duality (cites-is cited by) in a citation network. Although algebraic aspects are emphasized, a first step towards their interpretation is attempted. Examples of their calculation and of future applications are provided.
This paper investigates whether CiteULike and Mendeley are useful for measuring scholarly influence, using a sample of 1,613 papers published in Nature and Science in 2007. Traditional citation counts from the Web of Science (WoS) were used as benchmarks to compare with the number of users who bookmarked the articles in one of the two free online reference manager sites. Statistically significant correlations were found between the user counts and the corresponding WoS citation counts, suggesting that this type of influence is related in some way to traditional citation-based scholarly impact but the number of users of these systems seems to be still too small for them to challenge traditional citation indexes.
Previous studies have shown that hybrid clustering methods based on textual and citation information outperforms clustering methods that use only one of these components. However, former methods focus on the vector space model. In this paper we apply a hybrid clustering method which is based on the graph model to map the Web of Science database in the mirror of the journals covered by the database. Compared with former hybrid clustering strategies, our method is very fast and even achieves better clustering accuracy. In addition, it detects the number of clusters automatically and provides a top-down hierarchical analysis, which fits in with the practical application. We quantitatively and qualitatively asses the added value of such an integrated analysis and we investigate whether the clustering outcome provides an appropriate representation of the field structure by comparing with a text-only or citation-only clustering and with another hybrid method based on linear combination of distance matrices. Our dataset consists of about 8,000 journals published in the period 2002-2006. The cognitive analysis, including the ranked journals, term annotation and the visualization of cluster structure demonstrates the efficiency of our strategy.
Traditional co-citation analysis has not taken the proximity of co-cited references into account. As long as two references are cited by the same article, they are retreated equally regardless the distance between where citations appear in the article. Little is known about what additional insights into citation and co-citation behaviours one might gain from studying distributions of co-citation in terms of such proximity. How are citations distributed in an article? What insights does the proximity of co-citation provide? In this article, the proximity of a pair of co-cited reference is defined as the nearest instance of the co-citation relation in text. We investigate the proximity of co-citation in full text of scientific publications at four levels, namely, the sentence level, the paragraph level, the section level, and the article level. We conducted four studies of co-citation patterns in the full text of articles published in 22 open access journals from BioMed Central. First, we compared the distributions of co-citation instances at four proximity levels in journal articles to the traditional article-level co-citation counts. Second, we studied the distributions of co-citations of various proximities across organizational sections in articles. Third, the distribution of co-citation proximity in different co-citation frequency groups is investigated. Fourth, we identified the occurrences of co-citations at different proximity levels with reference to the corresponding traditional co-citation network. The results show that (1) the majority of co-citations are loosely coupled at the article level, (2) a higher proportion of sentence-level co-citations is found in high co-citation frequencies than in low co-citation frequencies, (3) tightly coupled sentence-level co-citations not only preserve the essential structure of the corresponding traditional co-citation network but also form a much smaller subset of the entire co-citation instances typically considered by traditional co-citation analysis. Implications for improving our understanding of underlying factors concerning co-citations and developing more efficient co-citation analysis methods are discussed.
The Novosibirsk region is one of the most industrialized in Siberia. In 1957 the Siberian Branch of the Academy of Sciences of the USSR (now Siberian Branch of the RAS (SBRAS)) was set up to stimulate a rapid development of the Siberian and Far East research forces. The goal of this mainly bibliometric, empirical study is to obtain insight into R&D performance in the Novosibirsk region, domestic and international collaborations and the impact of new government science policies focused on boosting the research and innovation activities of regional universities. Key drivers of research performance are institutions of the SBRAS. Second place in terms of research output belongs to Novosibirsk State University. Its research focuses on hard sciences. 75% of its papers were published in collaboration with SBRAS institutions. Research output is growing. Novosibirsk area's share of RFBR grants was stable around 8%. Publications from RFBR grantees in 34 subject categories had a level-aggregated indicator value of one or higher. In these hard-science areas Russian research develops in accordance with global trends. We observed a concentration of domestic collaboration in the Novosibirsk area as well as a strong international collaboration with advanced economies, in particular in the Asia-Pacific region.
Individualistic nature of research in the humanities is a common fact, as well as the notion that boundaries in humanities are poorly defined. Using citation analysis we have to take into account differences in citation practices not only between humanities and sciences but also within narrower fields of humanities. In the current study we observe differences between publication behaviour of historians and archaeologists, examine some aspects of citation practices in those fields, and show their effect on visibility.
This article, elaborating on mutuality of knowledge and social structure theory borrowed from sociology of knowledge literature, where knowledge is perceived as an essentially social and societal category, develops a coherent research framework which relates cognitive structure and the collaboration patterns into an integrated socio-knowledge analysis of a given scientific community. The framework extends co-word analysis combining it with social network analysis. The framework is enhanced by introducing a novel model. The new model maps actors from co-authorship networks into a strategic diagram of scientists. The mapping is based on cohesiveness and pervasiveness of issues each author has published in the field. The exemplary longitudinal case from Turkey covers scientific publication activities in Turkish management academia spanning the years from 1922 until 2008. It is seen that, while within local community diffusion of management knowledge is lead by academicians with certain socio-cognitive properties, academicians publishing at international arena do not show any significantly differing socio-cognitive properties, instead, they are merely embedded in strongly connected groups. Leading academicians within local community, however, exhibit a common socio-cognitive structure relative to the rest of the community. They have more social ties and more diversified disseminated knowledge compared to the rest. Knowledge they disseminate is distinct compared to their peers in the network, they hold certain part of their knowledge exclusively, thus knowledge-wise they don't resemble the rest, but they keep a level of common knowledge with the rest of the community.
In this work the well known scientometric concepts of bibliographically coupled publications and co-cited references were applied to produce interactive maps of research fronts and knowledge bases of research fields. This article proposes a method and some standardization for the detection and visualization of research fronts and knowledge bases with two and three dimensional graphics inspired by geographical maps. Agglomerations of bibliographically coupled publications with a common knowledge base are identified and graphically represented by a density function of publications per area unit. The research fronts become visible if publications with similar vectors of common citations are associated and visualized as an ensemble in a three dimensional graphical representation as a mountain scenery measured with the help of a spatial density. Knowledge bases were calculated in the same way. Maps similar to the geographic representation of oceans and islands are used to visualize the two-dimensional spatial density function of references weighted by individual links. The proposed methodology is demonstrated by publications in the field of battery research.
A well-designed and comprehensive citation index for the social sciences and humanities has many potential uses, but has yet to be realised. Significant parts of the scholarly production in these areas are not published in international journals, but in national scholarly journals, in book chapters or in monographs. The potential for covering these literatures more comprehensively can now be investigated empirically using a complete publication output data set from the higher education sector of an entire country (Norway). We find that while the international journals in the social sciences and humanities are rather small and more dispersed in specialties, representing a large but not unlimited number of outlets, the domestic journal publishing, as well as book publishing on both the international and domestic levels, show a concentration of many publications in few publication channels. These findings are promising for a more comprehensive coverage of the social sciences and humanities.
Citation relationships are commonly described with citation network or citation graph, but in this article, the author introduced the notion of citation genetic genealogy and apply it in citation analysis. A citing document usually only uses pieces of its cited document, so the author of this article defined these valuable pieces of a scientific document, which carry the information that have been used by its citing documents as its document genes. Besides, with the definition of symbolic information of a scientific document, the conclusion that a citing document inherited the document genes from its references can be drawn. Based on these understandings, citation genetic genealogy is constructed to describe citation relationships. With citation genetic genealogy, it is easy to map the citation relationships, like bibliographic coupling and co-citation, with familiar family relationships and illustrate the inheritance relationships in scientific literatures. Also, citation genetic genealogy may provide an interface between the citation analysis of a document set and the content analysis for each individual document inside this document set.
This study reports research on analyzing the impact of government funding on research output. 500,807 SCI papers published in 2009 in 10 countries are collected and analyzed. The results show that, in China, 70.34% of SCI papers are supported by some research funding, among which 89.57% are supported by National Natural Science Foundation of China (NSFC). Average grants per funding-supported paper in China is 2.95, when in the USA the number is 2.93 and in Japan it is 2.40. The results of funding agency analysis show that, China, Germany and Spain are single funding agency dominated countries, while USA, Japan, Canada and Australia are double funding agencies dominated countries, and the source of funding in UK, France and Italy is diversified.
While there is a consensus that there is a core-periphery structure in the global scientific enterprise, there have not been many methodologies developed for identifying this structure. This paper develops a methodology by looking at the differences in the power law structure of article outputs and degree centrality distributions of countries. This methodology is applied to five different scientific fields: astronomy and astrophysics, energy and fuels, nanotechnology and nanosciences, nutrition, and oceanography. This methodology uncovers a two-tiered power law structure that exists in all examined fields. The core-periphery structure that is unique to each field is characterized by the core's size, minimum degree, and exponent of its power law distribution. Stark differences are identified between technology and non-technology intensive scientific fields.
In an earlier exercise some demographic methods were reformulated for application in a scientometric context. Age-pyramids based on annual publication output and citation impact was supplemented by the change of the mean age of the publications in the h-core at any time. Although the method was introduced to shed some demographic-scientometric light on the career of individual researchers, the second component, i.e., the age dynamics of the h-core can however be applied to higher levels of aggregation as well. However, the found paradigmatic shapes and patterns do not only characterise individual careers and positions, but are also typical of life cycles and subject-specific peculiarities. In the present study, the proposed approach is used to visualise the careers of scientists active in different fields of the sciences and social sciences and notably the second component, the h-core dynamics, is extended to the analysis of scientific journals from the same fields. In addition to the dynamics of productivity and citation impact, the evolution of co-authorship patterns of the same scientists is studied to capture another facet of individual academic careers.
The Science of Science and Innovation Policy (SciSIP) program at the National Science Foundation (NSF) supports research designed to advance the scientific basis of science and innovation policy. The program was established at NSF in 2005 in response to a call from Dr. John Marburger III, then science advisor to the U.S. President, for a "science" of science policy. As of January 2011, it has co-funded 162 awards that aim to develop, improve, and expand data, analytical tools, and models that can be directly applied in the science policy decision making process. The long-term goals of the SciSIP program are to provide a scientifically rigorous and quantitative basis for science policy and to establish an international community of practice. The program has an active listserv that, as of January 2011, has almost 700 members from academia, government, and industry. This study analyzed all SciSIP awards (through January 2011) to identify existing collaboration networks and co-funding relations between SciSIP and other areas of science. In addition, listserv data was downloaded and analyzed to derive complementary discourse information. Key results include evidence of rich diversity in communication and funding networks and effective strategies for interlinking researcher and science policy makers, prompting discussion, and resource sharing.
Multidocument summarization (MDS) aims for each given query to extract compressed and relevant information with respect to the different query-related themes present in a set of documents. Many approaches operate in two steps. Themes are first identified from the set, and then a summary is formed by extracting salient sentences within the different documents of each of the identified themes. Among these approaches, latent semantic analysis (LSA) based approaches rely on spectral decomposition techniques to identify the themes. In this article, we propose a major extension of these techniques that relies on the quantum information access (QIA) framework. The latter is a framework developed for modeling information access based on the probabilistic formalism of quantum physics. The QIA framework not only points out the limitations of the current LSA-based approaches, but motivates a new principled criterium to tackle multidocument summarization that addresses these limitations. As a byproduct, it also provides a way to enhance the LSA-based approaches. Extensive experiments on the DUC 2005, 2006 and 2007 datasets show that the proposed approach consistently improves over both the LSA-based approaches and the systems that competed in the yearly DUC competitions. This demonstrates the potential impact of quantum-inspired approaches to information access in general, and of the QIA framework in particular.
We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.
This paper presents a two-phased research project aiming to improve email triage for public administration managers. The first phase developed a typology of email classification patterns through a qualitative study involving 34 participants. Inspired by the fields of pragmatics and speech act theory, this typology comprising four top level categories and 13 subcategories represents the typical email triage behaviors of managers in an organizational context. The second study phase was conducted on a corpus of 1,703 messages using email samples of two managers. Using the k-NN (k-nearest neighbor) algorithm, statistical treatments automatically classified the email according to lexical and nonlexical features representative of managers' triage patterns. The automatic classification of email according to the lexicon of the messages was found to be substantially more efficient when k?=?2 and n?=?2,000. For four categories, the average recall rate was 94.32%, the average precision rate was 94.50%, and the accuracy rate was 94.54%. For 13 categories, the average recall rate was 91.09%, the average precision rate was 84.18%, and the accuracy rate was 88.70%. It appears that a message's nonlexical features are also deeply influenced by email pragmatics. Features related to the recipient and the sender were the most relevant for characterizing email.
This study aimed to integrate the linguistic theory of syntagmatic relations and the concept of topic and comment into an empirical analysis of user tagging. User tags on documents in a social bookmarking site reflect a user's views of an information object, which can augment the content description and provide more effective representation of information. The study presents a study of tag analysis to uncover semantic relations among tag terms implicit in user tagging. The objective was to identify the syntagmatic semantic cores of topic and comment in user tags evidenced by the meaning attached to the information object by users. The study focused on syntagmatic relations, which were based on the way in which terms were used within the information content among users. Analysis of descriptive tag terms found three primary categories of concepts: content-topic, content-comment, and context of use. The relations among terms within a group and between the content-topic and content-comment groups were determined by inferring user meaning from the user notes and from the context of the source text. Intergroup relations showed syntagmatic associations between the topic and comment, whereas intragroup relations were more general but were limited in the document context. The findings are discussed with regard to the semantics of concepts and relations in user tagging. An implication of syntagmatic relations to information search suggests that concepts can be combined by a specific association in the context of the actual use of terms.
In response to Hj circle divide rland's recent call for a reconceptualization of the foundations of relevance, we suggest that the sociocognitive aspects of intermediation by information agencies, such as archives and libraries, are a necessary and unexplored part of the infrastructure of the subject knowledge domains central to his recommended view of relevance informed by a social paradigm (2010, p. 217). From a comparative analysis of documents from 39 graduate-level introductory courses in archives, reference, and strategic/competitive intelligence taught in 13 American Library Association-accredited library and information science (LIS) programs, we identify four defining sociocognitive dimensions of relevance work in information agencies within Hj circle divide rland's proposed framework for relevance: tasks, time, systems, and assessors. This study is intended to supply sociocognitive content from within the relevance work domain to support further domain analytic research, and to emphasize the importance of intermediary relevance work for all subject knowledge domains.
We address how individuals (workers) knowledge needs influence the design of knowledge management systems (KMS), enabling knowledge creation and utilization. It is evident that KMS technologies and activities are indiscriminately deployed in most organizations with little regard to the actual context of their adoption. Moreover, it is apparent that the extant literature pertaining to knowledge management projects is frequently deficient in identifying the variety of factors indicative for successful KMS. This presents an obvious business practice and research gap that requires a critical analysis of the necessary intervention that will actually improve how workers can leverage and form organization-wide knowledge. This research involved an extensive review of the literature, a grounded theory methodological approach and rigorous data collection and synthesis through an empirical case analysis (Parsons Brinckerhoff and Samsung). The contribution of this study is the formulation of a model for designing KMS based upon the design science paradigm, which aspires to create artifacts that are interdependent of people and organizations. The essential proposition is that KMS design and implementation must be contextualized in relation to knowledge needs and that these will differ for various organizational settings. The findings present valuable insights and further understanding of the way in which KMS design efforts should be focused.
The study focuses on the web presence of the main Spanish media and seeks to determine whether hyperlink analysis of media and political parties can provide insight into their political orientation. The research included all major national media and political parties in Spain. Inlink and co-link data about these organizations were collected and analyzed using multidimensional scaling (MDS) and other statistical methods. In the MDS map, media are clustered based on their political orientation. There are significantly more co-links between media and parties with the same political orientation than there are between those with different political orientations. Findings from the study suggest the potential of using link analysis to gain new insights into the interactions among media and political parties.
Characteristics of the Journal of the American Society for Information Science and Technology and 76 other journals listed in the Information Systems category of the Journal Citation ReportsScience edition 2009 were analyzed. Besides reporting usual bibliographic indicators, we investigated the human cornerstone of any peer-reviewed journal: its editorial board. Demographic data about the 2,846 gatekeepers serving in information systems (IS) editorial boards were collected. We discuss various scientometric indicators supported by descriptive statistics. Our findings reflect the great variety of IS journals in terms of research output, author communities, editorial boards, and gatekeeper demographics (e.g., diversity in gender and location), seniority, authority, and degree of involvement in editorial boards. We believe that these results may help the general public and scholars (e.g., readers, authors, journal gatekeepers, policy makers) to revise and increase their knowledge of scholarly communication in the IS field. The EB_IS_2009 dataset supporting this scientometric study is released as online supplementary material to this article to foster further research on editorial boards.
This paper presents a condensed history of Library and Information Science (LIS) over the course of more than a century using a variety of bibliometric measures. It examines in detail the variable rate of knowledge production in the field, shifts in subject coverage, the dominance of particular publication genres at different times, prevailing modes of production, interactions with other disciplines, and, more generally, observes how the field has evolved. It shows that, despite a striking growth in the number of journals, papers, and contributing authors, a decrease was observed in the field's market-share of all social science and humanities research. Collaborative authorship is now the norm, a pattern seen across the social sciences. The idea of boundary crossing was also examined: in 2010, nearly 60% of authors who published in LIS also published in another discipline. This high degree of permeability in LIS was also demonstrated through reference and citation practices: LIS scholars now cite and receive citations from other fields more than from LIS itself. Two major structural shifts are revealed in the data: in 1960, LIS changed from a professional field focused on librarianship to an academic field focused on information and use; and in 1990, LIS began to receive a growing number of citations from outside the field, notably from Computer Science and Management, and saw a dramatic increase in the number of authors contributing to the literature of the field.
A journal may be considered as having dimension-specific prestige when its score, based on a given journal ranking model, exceeds a threshold value. But a journal has multidimensional prestige only if it is a prestigious journal with respect to a number of dimensionse.g., Institute for Scientific Information Impact Factor, immediacy index, eigenfactor score, and article influence score. The multidimensional prestige of influential journals takes into account the fact that several prestige indicators should be used for a distinct analysis of the impact of scholarly journals in a subject category. After having identified the multidimensionally influential journals, their prestige scores can be aggregated to produce a summary measure of multidimensional prestige for a subject category, which satisfies numerous properties. Using this measure of multidimensional prestige to rank subject categories, we have found the top scientific subject categories of Web of Knowledge as of 2010.
We present a new, two-stage, self-supervised algorithm for author disambiguation in large bibliographic databases. In the first bootstrap stage, a collection of high-precision features is used to bootstrap a training set with positive and negative examples of coreferring authors. A supervised feature-based classifier is then trained on the bootstrap clusters and used to cluster the authors in a larger unlabeled dataset. Our self-supervised approach shares the advantages of unsupervised approaches (no need for expensive hand labels) as well as supervised approaches (a rich set of features that can be discriminatively trained). The algorithm disambiguates 54,000,000 author instances in Thomson Reuters' Web of Knowledge with B3 F1 of.807. We analyze parameters and features, particularly those from citation networks, which have not been deeply investigated in author disambiguation. The most important citation feature is self-citation, which can be approximated without expensive extraction of the full network. For the supervised stage, the minor improvement due to other citation features (increasing F1 from.748 to.767) suggests they may not be worth the trouble of extracting from databases that don't already have them. A lean feature set without expensive abstract and title features performs 130 times faster with about equal F1.
Based on earlier results about the shifted Lotka function, we prove an implicit functional relation between the Hirsch index (h-index) and the total number of sources (T). It is shown that the corresponding function, h(T), is concavely increasing. Next, we construct an implicit relation between the h-index and the impact factor IF (an average number of items per source). The corresponding function h(IF) is increasing and we show that if the parameter C in the numerator of the shifted Lotka function is high, then the relation between the h-index and the impact factor is almost linear.
This study examined the relationships among perceived editorial responsiveness, perceived journal quality, and review time of submissions for authors in mainland China. Online review data generated by authors who have experienced the submission process in 10 Chinese academic journals were collected. The results of Spearman correlation analysis show that Chinese authors' perceived responsiveness of an editorial office is positively correlated with perceived quality of the journal, and the total review time does not affect perceptions of the quality of a journal and its editorial responsiveness.
Given that in terms of technology novel inventions are crucial factors for companies; this article contributes to the identification of inventions of high novelty in patent data. As companies are confronted with an information overflow, and having patents reviewed by experts is a time-consuming task, we introduce a new approach to the identification of inventions of high novelty: a specific form of semantic patent analysis. Subsequent to the introduction of the concept of novelty in patents, the classical method of semantic patent analysis will be adapted to support novelty measurement. By means of a case study from the automotive industry, we corroborate that semantic patent analysis is able to outperform available methods for the identification of inventions of high novelty. Accordingly, semantic patent information possesses the potential to enhance technology monitoring while reducing both costs and uncertainty in the identification of inventions of high novelty.
Access to public knowledge is a prerequisite for the good functioning of developed economies. Universities strive and are also requested to contribute to this knowledge both locally and internationally. Traditional studies on the geography of knowledge flows have identified a localisation effect; however, these studies do not use the country as the unit of observation and hence do not explore national patterns. In this paper, we hypothesise that the localisation of university knowledge flows is directly related to share of firm expenditure on research and development. To test this hypothesis, we use references to universities in patent documents as indicators based on a data set of around 20,000 university references, for 37 countries in the period 1990-2007, resulting in panels of around 300-500 observations. We build indicators for the university knowledge flows both inside and outside the applicant country, which we explain as a function of some proxies for national size and research structure based on econometric estimations. We draw some conclusions as to the importance of national business scientific strength for fostering increased domestic university knowledge flows.
This article explores a method of evaluating the comprehensive competitiveness of American universities in Bridge Engineering, which is beneficial for students' picking up an ideal university for further study in America. Making use of ESI database, SCI database and EI database as well as the ranking of American universities from U.S. News and World Report, the author evaluates the comprehensive competitiveness of American universities in Bridge Engineering, and then develops the ranking of comprehensive competitiveness of American universities in Bridge Engineering specialty. From the ranking, the author reaches the conclusion that American universities such as University of Illinois-Urbana-Champaign and Georgia Institute of Technology and so on, have comparatively higher international influence and competitiveness in the field of Bridge Engineering.
To demonstrate the importance of Arctic studies in the humanities and social sciences, we collected data from the SSCI and A&HCI covering a period of over 100 years and focused on the number of papers published each year, the major journals, types of documents, major languages represented, authors and their countries publishing the most articles, author's affiliations, collaboration and the major research subjects covered. The results indicate that worldwide scholars had never been absent in this field for more than one century. Countries near the Arctic, particularly in North America and the Nordic, show the most interest and have the most research results. Universities and colleges are the most important research institutions in this field. North America is the area that has conducted the largest amount of research, while some Western European countries such as Germany and France, performed with great enthusiasm research in relation with North Pole expeditions. Arctic research in the humanities and social sciences has gradually expanded from the historical, archaeological, and anthropological fields to the realm of political, social, educational sciences including international relations, music, art, etc.
This paper suggests an empirical framework to classify research collaboration activities with developed indicators that carry on a previous theoretical framework (Wagner [Science and Technology Policy for Development, Dialogues at the Interface, 2006]; Wagner et al. [Linking effectively: Learning lessons from successful collaboration in science and technology. DB-345-OSTP, 2002]) by employing the Gaussian mixture model, an advanced probabilistic clustering analysis. By further exploring the method upon a profound evidence-based reflection of actual phenomena, this paper also proposes an exploratory analysis to manage and evaluate research projects upon their differentiated classification in a preceding perspective of research collaboration and R&D management. In addition, the results show that international collaboration tends to be associated with more evenly committed collaboration, and that collaboration featuring a higher degree of funding or dispersed commitments generally results in larger outcomes than research clustered on the opposite side of the framework.
In this paper, we use bibliometric methods and social network analysis to analyze the pattern of China-US scientific collaboration on individual level in nanotechnology. Results show that Chinese-American scientists have been playing an important role in China-US scientific collaboration. We find that China-US collaboration in nanotechnology mainly occurs between Chinese and Chinese-American scientists. In the co-authorship network, Chinese-American scientists tend to have higher betweenness centrality. Moreover, the series of polices implemented by the Chinese government to recruit oversea experts seems to contribute a lot to China-US scientific collaboration.
Excellence for Research in Australia (ERA) is an attempt by the Australian Research Council to rate Australian universities on a 5-point scale within 180 Fields of Research using metrics and peer evaluation by an evaluation committee. Some of the bibliometric data contributing to this ranking suffer statistical issues associated with skewed distributions. Other data are standardised year-by-year, placing undue emphasis on the most recent publications which may not yet have reliable citation patterns. The bibliometric data offered to the evaluation committees is extensive, but lacks effective syntheses such as the h-index and its variants. The indirect H-2 index is objective, can be computed automatically and efficiently, is resistant to manipulation, and a good indicator of impact to assist the ERA evaluation committees and to similar evaluations internationally.
This study attempts to expand the work on patenting activities of China. The characteristics of foreign multinationals and indigenous entities' patenting activities in the US patent system are examined in our analysis. This study also attempts to model the diffusion trajectories of patenting activities that result from the functioning of two competing innovation system models adopted by China-FDI and indigenous-to compare the extent of divergence of technological innovations. The findings are useful for highlighting the path of technological innovations and understanding the dynamic potentials through analysis of the growth process. While the results suggest a dominance of foreign firms in patenting activities since the early 2000s, there is a sign of transition from industrial-based to knowledge-driven activities and the formation of evolving propagating behaviour in the production of indigenous technology.
This study employs the method of direct citation to analyze and compare the interdisciplinary characteristics of the two disciplines of library science and information science during the period of 1978-2007. Based on the research generated by five library science journals and five information science journals, library science researchers tend to cite publications from library and information science (LIS), education, business/management, sociology, and psychology, while researchers of information science tend to cite more publications from LIS, general science, computer science, technology, and medicine. This means that the disciplines with larger contributions to library science are almost entirely different from those contributing to information science. In addition, researchers of library science frequently cite publications from LIS; the rate is as high as 65.61%, which is much higher than the rate for information science, 49.50%. However, a decreasing trend in the percentage of LIS in library science indicates that library science researchers tend to cite more publications from non-LIS disciplines. A rising trend in the proportion of references to education sources is reported for library science articles, while a rising trend in the proportion of references to computer science sources has been found for information science articles. In addition, this study applies an interdisciplinary indicator, Brillouin's Index, to measurement of the degree of interdisciplinarity. The results confirm that the trend toward interdisciplinarity in both information science and library science has risen over the years, although the degree of interdisciplinarity in information science is higher than that in library science.
For certain tasks in patent management it makes sense to apply a quantitative measure of textual similarity between patents and/or parts thereof: be it the analysis of freedom to operate, the analysis of technology convergence, or the mapping of patents for strategic purposes. In this paper we intend to outline the process of measuring textual patent similarity on the basis of elements referred to as 'combined concepts'. We are going to use this process in various operations leading to design decisions, and shall also provide guidance regarding these decisions. By way of two applications from patent management, namely the prioritization of patents and the analysis of convergence between two technological fields, we mean to demonstrate the crucial importance of design decisions in terms of patent analysis results.
The number of citations received by authors in scientific journals has become a major parameter to assess individual researchers and the journals themselves through the impact factor. A fair assessment therefore requires that the criteria for selecting references in a given manuscript should be unbiased with regard to the authors or journals cited. In this paper, we assess approaches for citations considering two recommendations for authors to follow while preparing a manuscript: (i) consider similarity of contents with the topics investigated, lest related work should be reproduced or ignored; (ii) perform a systematic search over the network of citations including seminal or very related papers. We use formalisms of complex networks for two datasets of papers from the arXiv and the Web of Science repositories to show that neither of these two criteria is fulfilled in practice. By representing the texts as complex networks we estimated a similarity index between pieces of texts and found that the list of references did not contain the most similar papers in the dataset. This was quantified by calculating a consistency index, whose maximum value is one if the references in a given paper are the most similar in the dataset. For the areas of "complex networks" and "graphenes", the consistency index was only 0.11-0.23 and 0.10-0.25, respectively. To simulate a systematic search in the citation network, we employed a traditional random walk search (i.e. diffusion) and a random walk whose probabilities of transition are proportional to the number of the ingoing edges of the neighbours. The frequency of visits to the nodes (papers) in the network had a very small correlation with either the actual list of references in the papers or with the number of downloads from the arXiv repository. Therefore, apparently the authors and users of the repository did not follow the criterion related to a systematic search over the network of citations. Based on these results, we propose an approach that we believe is fairer for evaluating and complementing citations of a given author, effectively leading to a virtual scientometry.
The Hirsch h-index is widely used to measure a researcher's major publications. It has the advantage of being easy to compute. However, it increases steeply with time and therefore does not allow a comparison of young and mature researchers. We find that if the h-index is divided by the number of decades since publication of the researcher's first paper, the result is statistically constant with age. Then the resulting index can be compared for young and old researchers. Its accuracy is the same as that of the h-index and is as easy to compute as the h-index.
Prior to the beginning of a scientific career, every new scientist is obliged to confront the critical issue of defining the subject area where his/her future research will be conducted. Regardless of the capabilities of a new scholar, an erroneous selection may condemn a dignified effort and result in wasted energy, time and resources. In this article we attempt to identify the research fields which are attractive to these individuals. To the best of our knowledge, this is a new topic that has never been discussed or addressed in the literature. Here we formally set the problem and we propose a solution combining the characteristics of the attractive research areas and the new scholars. Our approach is compared against a statistical model which reveals popular research areas. The comparison of this method to our proposed model leads to the conclusion that not all trendy research areas are suitable for new scientists. A secondary outcome reveals the existence of scientific fields which although they are not so emerging, they are promising for scientists who are starting their career.
This study analysed the technical and publication activities of the Institute of Electrical and Electronics Engineering (IEEE), the most influential academic publisher in engineering. We first constructed an original comprehensive database of periodicals (journal and magazine) and conference proceedings published by the IEEE between 1980 and 2008, which comprised approximately 0.36 million periodical articles and 1.14 million conference articles. We analysed the transitions in technical innovations from two perspectives: trends within (1) individual countries and (2) specialized fields represented in IEEE societies. The number of published periodical articles increased fourfold between 1980 and 2008, while that of published conference articles increased nearly 20-fold in the same period. In particular, the number of conference articles published by China increased dramatically from 2002, exceeding even the number published by the US in 2008. The IEEE has increasingly shifted away from its US-centred origins to literally becoming the 'electrical and electronics association of the world'. The proportion of articles published by authors in North America, Europe and East Asia has increasingly balanced, thus leading to the formation of a tri-polar structure of IEEE technological activities. This comprehensive analysis of IEEE publications over a period of almost 30 years revealed that with the emergence of more active international competition, 'glocalisation' is occurring among publications and research activities of the IEEE. Consequently, quantitative analysis revealed structural changes in global competition and technological transition characterized by five phases.
Variation of citation counts by subdisciplines within a particular discipline is known but rarely systematically studied. This paper compares citation counts for award-winning mathematicians is different subdisciplines of mathematics. Mathematicians were selected for study in groups of rough equivalence with respect to peer evaluation, where this evaluation is given by the awarding of major prizes and grants: Guggenheim fellowships, Sloan fellowships, and National Science Foundation CAREER grants. We find a pattern in which mathematicians working in some subdisciplines have fewer citations than others who won the same award, and this pattern is consistent for all awards. So even after adjustment at the discipline level for different overall citation rates for disciplines, citation counts for different subdisciplines do not match peer evaluation. Demographic and hiring data for mathematics provides a context for a discussion of reasons and interpretations.
This study examines the impact of collaborating patterns on the R&D performance of public research institutions (PRIs) in Korea's science and engineering fields. For the construction of R&D collaborating networks based on the co-authorship data of 127 institutions in Scopus, this paper proposes four types of collaborations by categorizing network analyses into two dimensions: structural positions (density, efficiency, and betweeness centrality) and the relational characteristics of individual nodes (eigenvector and closeness centralities). To explore the research performance by collaboration type, we employ a data envelopment analysis window analysis of a panel of 23 PRIs over a 10-year period. Comparing the R&D productivities of each group, we find that the PRIs of higher productivity adhere to a cohesive networking strategy, retaining intensive relations with their existing partners. The empirical results suggest that excessively cohesive alliances might end up in 'lock-in' relations, hindering the exploitation of new opportunities for innovation. These findings are implicit in relation to the Korean Government's R&D policies on collaborating strategies to produce sustained research results with the advent of the convergence research era.
Ecologists writing research articles frequently cite their own papers. Self-citations are frequent in science, but the reasons behind abnormally high rates of self-citations are questionable. My goals were to assess the prevalence of author self-citations and to identify the combination of attributes that best predict high levels of self-citations in ecology articles. I searched 643 articles from 9 different ecology journals of various impact factors for synchronous (i.e., within reference lists) and diachronous (i.e., following publication) self-citations, using the Web of Science online database. I assessed the effect of the number of authors, pages, and references/citations, the proportion of diachronous/synchronous self-citations, and the impact factor, on the proportion of synchronous and diachronous self-citations separately. I compared various candidate models made of these covariates using Akaike's Information Criterion. On average, ecologists made 6.0 synchronous self-citations (12.8% of references), and 2.5 diachronous self-citations (25.5% of citations received 2.8 to 4.5 years after publication) per article. The best predictor of the proportion of synchronous self-citations was the number of authors. My study is the first to report recidivism in the inclusion of self-citations by researchers, i.e., the proportion of diachronous self-citations was best explained by the proportion of synchronous self-citations. The proportion of self-citations also increased with the number of pages and the impact factor of ecology journals, and decreased with the number of references/citations. Although a lot of variance remained unexplained, my study successfully showed regularities in the propensity of ecologists to include self-citations in their research articles.
The present study proposes a bibliometric methodology for measuring the grade of correspondence between regional industry's demand for research collaboration and supply from public laboratories. The methodology also permits measurement of the intensity and direction of the regional flows of knowledge in public-private collaborations. The aim is to provide a diagnostic instrument for regional and national policy makers, which could add to existing ones to plan interventions for re-balancing sectorial public supply of knowledge with industrial absorptive capacity, and maximizing appropriability of knowledge spillovers. The methodology is applied to university-industry collaborations in the hard sciences in all Italian administrative regions.
Over the last decades there has been a growing interest on developing research and formulating public policy by using the Innovation Systems approach. However, as evidenced on the academic literature there is a lack of systematic, chronological and synthesizing studies indicating how this field has evolved over time. This paper has as main objective to consolidate the state of the art of academic research on IS, based on a bibliometrics study on literature published over the past 35 years. The results are discussed under the following perspectives: general results, chronological distribution, author relevance, articles and cited references of relevance, journals relevance and institutions and countries relevance. The paper ends with a discussion of the main implications and limitations of the study.
Scalar measures of research performance (Energy, Exergy, and Entropy or EEE) are based on what can be called the bibliometrics-thermodynamics consilience. Here, their application to the percentile ranking normalization scheme is demonstrated.
To investigate the prevalence and characteristics of the practice of explicitly giving authors equal credit in publications of major anesthesiology journals. Four major anesthesiology journals (Anesthesia and Analgesia (AA), Anesthesiology, British Journal of Anaesthesia (BJA) and Pain) were searched manually to identify original research articles published between January 1st, 2001 and December 31st, 2010 with respect to equally credited authors (ECAs). It was found that all journals explicitly gave authors equal credit, and articles with ECAs accounted for a greater proportion of the total number of articles published in each journal in 2010 versus that in 2000 (AA: 3.3% vs. 0%; Anesthesiology: 7.1% vs. < 1%; BJA: 5.7% vs. 0%; Pain: 11.0% vs. < 1%). The number of ECAs articles tended to increase significantly yearly in all journals (P < 0.0001 for each journal). The first two authors in the byline received equal credit in most cases. Furthermore, the ECAs articles involved institutions from different countries and regions and were sponsored by various funds. However, no specific guidance concerning this practice was provided in the instructions to authors in the four journals. It is increasingly common to give authors equal credit in original research articles in major anesthesiology journals. Detailed guidelines regarding this practice are warranted in future.
Generally speaking, citation relationship among authors can be divided into 3 types: co-citation, coupling and cross-citation. Since author co-citation analysis was first introduced in 1982, it has been widely applied to study discipline structure, research state and research trends. Afterwards, conception of author bibliographic-coupling analysis was put forward and related empirical studies provided a method for mapping active authors in a research field for a more realistic picture of the current state of its research activities. Additionally, if one of author A's papers has a citation from one of author B's, there is cross-citation relationship between A and B. However, studies based on author cross-citation relationship mainly describe citation behaviors themselves using citation identity and citation image; they rarely involve any implicit knowledge communication, author research correlation or discovering academic communities. Author cross-citation analysis infers to both citing and cited phenomenon, which can be roughly correspond to citation identity and citation image. The study will further explore the author cross-citation relationship with core authors in scientometrics field as study object in order to provide reference for development of scientometrics field and in-depth application of citation analysis.
Two relevant recent developments in the area of science and technology (S&T) and related policy-making motivate this article: first, bibliometric data on a specific research area's performance becomes an increasingly relevant source for S&T policy-making and evaluation. This trend is embedded in wider discussions on evidence-based policy-making. Secondly, the scientific output of Southeast Asian countries is rising, as is the number of international research collaborations with the second area of our interest: Europe. Against this background, we employ basic bibliometric methodology in order to draw a picture of Southeast Asian research strengths as well the amount and focus of S&T cooperation between the countries in Southeast Asia and the European Union. The results can prove useful for an interested public as well as for the scientific community and science, technology and innovation policy-making.
It is shown that the age-independent index based on h-type index per decade, called hereafter an alpha index instead of the a index, suggested by Kosmulski (Journal of Informetrics 3, 341-347, 2009) and Abt (Scientometrics 2012) is related to the square-root of the ratio of citation acceleration a to the Hirsch constant A.
The purpose of this article is to come up with a valid categorization and to examine the performance and properties of a wide range of h-type indices presented recently in the relevant literature. By exploratory factor analysis (EFA) we study the relationship between the h-index, its variants, and some standard bibliometric indicators of 26 physicists compiled from the Science Citation Index in the Web of Science. (C) 2012 Elsevier Ltd. All rights reserved.
The author order of multi-authored papers can reveal subtle patterns of scientific collaboration and provide insights on the nature of credit assignment among coauthors. This article proposes a sequence-based perspective on scientific collaboration. Using frequently occurring sequences as the unit of analysis, this study explores (1) what types of sequence patterns are most common in the scientific collaboration at the level of authors, institutions, U. S. states, and nations in Library and Information Science (LIS); and (2) the productivity (measured by number of papers) and influence (measured by citation counts) of different types of sequence patterns. Results show that (1) the productivity and influence approximately follow the power law for frequent sequences in the four levels of analysis; (2) the productivity and influence present a significant positive correlation among frequent sequences, and the strength of the correlation increases with the level of integration; (3) for author-level, institution-level, and state-level frequent sequences, short geographical distances between the authors usually co-present with high productivities, while long distances tend to co-occur with large citation counts; (4) for author-level frequent sequences, the pattern of "the more productive and prestigious authors ranking ahead" is the one with the highest productivity and the highest influence; however, in the rest of the levels of analysis, the pattern with the highest productivity and the highest influence is the one with "the less productive and prestigious institutions/states/nations ranking ahead." (C) 2012 Elsevier Ltd. All rights reserved.
In the past, recursive algorithms, such as PageRank originally conceived for the Web, have been successfully used to rank nodes in the citation networks of papers, authors, or journals. They have proved to determine prestige and not popularity, unlike citation counts. However, bibliographic networks, in contrast to the Web, have some specific features that enable the assigning of different weights to citations, thus adding more information to the process of finding prominence. For example, a citation between two authors may be weighed according to whether and when those two authors collaborated with each other, which is information that can be found in the co-authorship network. In this study, we define a couple of PageRank modifications that weigh citations between authors differently based on the information from the co-authorship graph. In addition, we put emphasis on the time of publications and citations. We test our algorithms on the Web of Science data of computer science journal articles and determine the most prominent computer scientists in the 10-year period of 1996-2005. Besides a correlation analysis, we also compare our rankings to the lists of ACM A. M. Turing Award and ACM SIGMOD E. F. Codd Innovations Award winners and find the new time-aware methods to outperform standard PageRank and its time-unaware weighted variants. (C) 2012 Elsevier Ltd. All rights reserved.
In economics the Research Papers in Economics (RePEc) network has become an essential source for the gathering and the spread of both existing and new economic research. Furthermore, it is currently the largest bibliometric database in economic sciences containing 33 different indicators for more than 30,000 economists. Based on this bibliographic information RePEc calculates well-known rankings for authors and academic institutions. We provide some cautionary remarks concerning the interpretation of some provided bibliometric measures in RePEc. Moreover, we show how individual and aggregated rankings can be biased due to the employed ranking methodology. In order to select key indicators describing and assessing research performance of scientist, we propose to apply principal component analysis in this data-rich environment. This approach allows us to assign weights to each indicator prior to aggregation. We illustrate the approach by providing a new overall ranking of economists based on RePEc data. (C) 2012 Elsevier Ltd. All rights reserved.
We analyze whether preferential attachment in scientific coauthorship networks is different for authors with different forms of centrality. Using a complete database for the scientific specialty of research about "steel structures," we show that betweenness centrality of an existing node is a significantly better predictor of preferential attachment by new entrants than degree or closeness centrality. During the growth of a network, preferential attachment shifts from (local) degree centrality to betweenness centrality as a global measure. An interpretation is that supervisors of PhD projects and postdocs broker between new entrants and the already existing network, and thus become focal to preferential attachment. Because of this mediation, scholarly networks can be expected to develop differently from networks which are predicated on preferential attachment to nodes with high degree centrality. (C) 2012 Elsevier Ltd. All rights reserved.
We point out some theoretical problems in the construction of the activity index and related indicators. Concretely, if the activity index is larger than one then it is, at least theoretically, possible to decrease its value by increasing the activity in that field. Although for some practical applications these problems do not seem to have serious consequences our investigation adds to the list of problematic indicators. As the problems we point out are due to the mathematical structure of this indicator our analysis also applies to all indicators formed in the same way, such as the revealed comparative advantage index or Balassa index. (C) 2012 Elsevier Ltd. All rights reserved.
End of 2011, the Journal of Informetrics (Elsevier) existed five years. We overview its scope, published articles (topics, co-authorship, authors' countries), editorial decisions, editorial and production times, impact factor and article downloads aspects. Finally we present a local citation environment map of JOI. (C) 2012 Elsevier Ltd. All rights reserved.
Various factors are believed to govern the selection of references in citation networks, but a precise, quantitative determination of their importance has remained elusive. In this paper, we show that three factors can account for the referencing pattern of citation networks for two topics, namely "graphenes" and "complex networks", thus allowing one to reproduce the topological features of the networks built with papers being the nodes and the edges established by citations. The most relevant factor was content similarity, while the other two - in-degree (i.e. citation counts) and age of publication - had varying importance depending on the topic studied. This dependence indicates that additional factors could play a role. Indeed, by intuition one should expect the reputation (or visibility) of authors and/or institutions to affect the referencing pattern, and this is only indirectly considered via the in-degree that should correlate with such reputation. Because information on reputation is not readily available, we simulated its effect on artificial citation networks considering two communities with distinct fitness (visibility) parameters. One community was assumed to have twice the fitness value of the other, which amounts to a double probability for a paper being cited. While the h-index for authors in the community with larger fitness evolved with time with slightly higher values than for the control network (no fitness considered), a drastic effect was noted for the community with smaller fitness. (C) 2012 Elsevier Ltd. All rights reserved.
We examine the extent to which the presence and number of web links between higher education institutions can be predicted from a set of structural factors like country, subject mix, physical distance, academic reputation, and size. We combine two datasets on a large sample of European higher education institutions (HEIs) containing information on inter-university web links, and organizational characteristics, respectively. Descriptive and inferential analyses provide strong support for our hypotheses: we identify factors predicting the connectivity between HEIs, and the number of web links existing between them. We conclude that, while the presence of a web link cannot be directly related to its underlying motivation and the type of relationship between HEIs, patterns of network ties between HEIs present interesting statistical properties which reveal new insights on the function and structure of the inter organizational networks in which HEIs are embedded. (C) 2012 Elsevier Ltd. All rights reserved.
A set of authors whose scientific output can be unequivocally ranged from the highest to the lowest is used to assess the methods of assessment of scientific output. The rank-rank correlation coefficient between the known order in the calibration set and the order produced by certain method of assessment is a quantitative measure of the quality of that method. A common-sense-based reference may play a positive role in the communication between the enthusiasts and antagonists of bibliometric indices. (C) 2012 Elsevier Ltd. All rights reserved.
The existing literature in relation to electronic personal health records (PHRs) has typically focused on the discussion of several key issuesnamely, their design, functional evaluation, privacy, security and architecture. The benefits of PHRs and barriers preventing their adoption are also widely discussed. These issues are affected by technology infrastructure, and current and planned technology infrastructure deployment will be key determinants in the selection and design of PHR architectures. Assumptions about the community-wide deployment of required technologies such as hardware and internet accessibility are implicit in the architectural selection of PHRs and these dependencies have not been fully appreciated or addressed in the existing literature. This review article introduces and describes two infrastructural driversubiquitous technology baseline for PHRs and connectivity coverageand examines their inter-relationships with the selected PHR architectures. Eleven functional capabilities are also described, providing a basis for the analysis of the relationships between the two infrastructural drivers and architectural selection.
This article explores the opportunities and challenges of creating and sustaining large-scale content curation communities through an in-depth case study of the Encyclopedia of Life (EOL). Content curation communities are large-scale crowdsourcing endeavors that aim to curate existing content into a single repository, making these communities different from content creation communities such as Wikipedia. In this article, we define content curation communities and provide examples of this increasingly important genre. We then follow by presenting EOL, a compelling example of a content curation community, and describe a case study of EOL based on analysis of interviews, online discussions, and survey data. Our findings are characterized into two broad categories: information integration and social integration. Information integration challenges at EOL include the need to (a) accommodate and validate multiple sources and (b) integrate traditional peer reviewed sources with user-generated, nonpeer-reviewed content. Social integration challenges at EOL include the need to (a) establish the credibility of open-access resources within the scientific community and (b) facilitate collaboration between experts and novices. After identifying the challenges, we discuss the potential strategies EOL and other content curation communities can use to address them, and provide technical, content, and social design recommendations for overcoming them.
The issue of how teens choose social networks and information communication technologies (ICT's) for personal communication is complex. This study focused on describing how U.S. teens from a highly technological suburban high school select ICT's for personal communication purposes. Two research questions guided the study: (a) What factors influence high school seniors selection of online social networks and other ICT's for everyday communication? (b) How can social network theory (SNT) help to explain how teens select online social networks and other ICT's for everyday communication purposes? Using focus groups, a purposive sample of 45 teens were asked to discuss (a) their preferred methods for communicating with friends and family and why, (b) the reasons why they chose to engage (or not to engage) in online social networking, (c) how they selected ICT's for social networking and other communication purposes, and (d) how they decided whom to accept as online friends. Findings indicated that many factors influenced participants ICT selection practices including six major categories of selection factors: relationship factors, information/communication factors, social factors, systems factors, self-protection factors, and recipient factors. SNT was also helpful in explaining how friendship was a major determining factor in their communication media and platform choices.
The concepts of place and the social have been put forward as significant intertwined explanatory contexts for information behavior. Much of the research that approaches information behavior from this perspective, however, has focused on static contexts or virtual contexts and has not addressed the influence of technology in physical spaces. In this article, we explore the influence of mobile technologies in two settings. The first is a site where a social space was augmented by the introduction of technology with the potential to provide information with the belief that information behavior, use, and services would evolve. The second is where intermediaries and information systems were designed and introduced into existing social spaces (an individual's home, community center, or street) with the explicit intention of providing information. The intention of both implementations was to reduce social exclusion. This exploratory research used activity theory as a theoretical lens to explore end-user reaction. The findings of the research illustrate how information and service needs are now addressed through a combination of mobile information and communication technologies and human intermediary knowledge.
In this study, we consider the structure and linking strategy of Hebrew websites of several nonprofit organizations. Because nonprofit organizations differ from commercial, educational, or governmental sectors, it is important to understand the ways they utilize the web. To the best of our knowledge, the linking structure of nonprofit organizations has not been previously studied. We surveyed websites of 54 nonprofit organizations in Israel; most of these sites have at least 100 volunteers. We compared their orientation and contents and we built their linking map. We divided the organizations into four main groups: economic aid and citizen rights organizations, health aid organizations, organizations supporting families and individuals with special needs, and organizations for women and children. We found that the number of links inside the special needs group is much higher than in the other groups. We tried to explain this behavior by considering the data obtained from the site-linking graph. The value of our results is in defining and testing a method to investigate a group of nonprofit organizations, using a case study of Israeli organizations.
This article discusses people's understanding of reality by representations from the Internet. The Hegelian inquiry system is used here to explain the nature of informing on the Internet as activities of information masters to influence information slaves' opinions and as activities of information slaves to become well informed. The key assumption of Hegelianism regarding information is that information has no value independent from the interests and worldviews (theses) it supports. As part of the dialectic process of generating syntheses, we propose a role for information science of offering methods to critically evaluate the master's information, and by this we develop an opinion (thesis) independent from the master's power. For this we offer multiple methods for information criticism, named triangulation, which may help users to evaluate a master's evidence. This article presents also a prototype of a Hegelian information triangulator tool for information slaves (i.e., nonexperts). The article concludes with suggestions for further research on informative triangulation.
Today, people perform many types of tasks on the web, including those that require multiple web sessions. In this article, we build on research about web tasks and present an in-depth evaluation of the types of tasks people perform on the web over multiple web sessions. Multisession web tasks are goal-based tasks that often contain subtasks requiring more than one web session to complete. We will detail the results of two longitudinal studies that we conducted to explore this topic. The first study was a weeklong web-diary study where participants self-reported information on their own multisession tasks. The second study was a monthlong field study where participants used a customized version of Firefox, which logged their interactions for both their own multisession tasks and their other web activity. The results from both studies found that people perform eight different types of multisession tasks, that these tasks often consist of several subtasks, that these lasted different lengths of time, and that users have unique strategies to help continue the tasks which involved a variety of web and browser tools such as search engines and bookmarks and external applications such as Notepad or Word. Using the results from these studies, we have suggested three guidelines for developers to consider when designing browser-tool features to help people perform these types of tasks: (a) to maintain a list of current multisession tasks, (b) to support multitasking, and (c) to manage task-related information between sessions.
Scholars and practitioners alike increasingly emphasize the importance of the virtual world as a new medium of communication. Key to the success of this digital medium is its ability to support information exchange when compared with face-to-face communication. Its potential is highlighted by the literature illustrating the inadequacy of traditional computer-mediated communication (CMC) tools, such as e-mail and video conferencing, to support communication among geographically dispersed coworkers. Many of the traditional CMC tools lack the needed support for effective information exchange to varying degrees. The emergence of a sophisticated virtual world, such as Second Life, has met this dearth. We draw on the theories of task closure and media richness to propose a parsimonious model of information exchange behavior in a virtual world context. Observations from a series of group-based project discussion sessions in face-to-face and virtual world settings, respectively, suggest that the information exchange between coworkers in both settings could be similar. Specifically, virtual coworkers might be able to achieve task closure (i.e., the complete transmission of intended work-related information) in the same way as their counterparts in the face-to-face context.
In this article, we report on a discrete choice experiment to determine the willingness-to-wait (WTW) in the context of journal submissions. Respondents to our survey are mostly active in the information sciences, including librarians. Besides WTW, other attributes included in the study are the quality of the editorial board, the quality of referee reports, the probability of being accepted, the ISI impact factor, and the standing of the journal among peers. Interaction effects originating from scientists' personal characteristics (age, region of origin, motivations to publish) with the WTW are highlighted. A difference was made between submitting a high quality article and a standard article. Among the interesting results obtained from our analysis we mention that for a high-quality article, researchers are willing to wait some 18 months longer for a journal with an ISI impact factor above 2 than for a journal without an impact factor, keeping all other factors constant. For a standard article, the WTW decreases to some 8 months. Gender had no effect on our conclusions.
The study presented here shows a method based on network theory to identify the most important journals related to a given journal, the seed journal. In just one simple network map, we get the relevant citation environment of a specific seed journal. It is of interest to librarians, publishers, scientists and science policy makers. These journal citation network maps are useful for these various stakeholders in and around the science system, as they provide information on the level of journal connections, unlike the more traditional structures such as the Journal Subject Categories, the classification system applied in the products of Thomson Reuters (Journal Citation Reports, Web of Science, etc.). These network maps show the closest relations journals can have, based on citation relations, suggesting influence relations between journals in such a way that traditional field boundaries are transcended.
Past work suggests that anchor text is a good source of evidence that can be used to improve web searching. Two approaches for making use of this evidence include fusing search results from an anchor text representation and the original text representation based on a document's relevance score or rank position, and combining term frequency from both representations during the retrieval process. Although these approaches have each been tested and compared against baselines, different evaluations have used different baselines; no consistent work enables rigorous cross-comparison between these methods. The purpose of this work is threefold. First, we survey existing fusion methods of using anchor text in search. Second, we compare these methods with common testbeds and web search tasks, with the aim of identifying the most effective fusion method. Third, we try to correlate search performance with the characteristics of a test collection. Our experimental results show that the best performing method in each category can significantly improve search results over a common baseline. However, there is no single technique that consistently outperforms competing approaches across different collections and search tasks.
This study empirically evaluates the effectiveness of different feature types for the classification of the first language of an author. In particular, it examines the utility of psycholinguistic features, extracted by the Linguistic Inquiry and Word Count (LIWC) tool, that have not previously been applied to the task of author profiling. As LIWC is a tool that has been developed in the psycholinguistic field rather than the computational linguistics field, it was hypothesized that it would be effective, both as a single type feature set because of its psycholinguistic basis, and in combination with other feature sets, because it should be sufficiently different to add insight rather than redundancy. It was found that LIWC features were competitive with previously used feature types in identifying the first language of an author, and that combined feature sets including LIWC features consistently showed better accuracy rates and average F measures than were achieved by the same feature sets without the LIWC features. As a secondary issue, this study also examined how effectively first language classification scaled up to a larger number of possible languages. It was found that the classification scheme scaled up effectively to the entire 16 language collection from the International Corpus of Learner English, when compared with results achieved on just 5 languages in previous research.
This study evaluates the utility of a publication power approach (PPA) for assessing the quality of journals in the field of artificial intelligence. PPA is compared with the Thomson-Reuters Institute for Scientific Information (TR) 5-year and 2-year impact factors and with expert opinion. The ranking produced by the method under study is only partially correlated with citation-based measures (TR), but exhibits close agreement with expert survey rankings. A simple average of TR and power rankings results in a new ranking that is highly correlated with the expert survey rankings. This evidence suggests that power ranking can contribute to evaluating artificial intelligence journals.
How does publication pressure in modern-day universities affect the intrinsic and extrinsic rewards in science? By using a worldwide survey among demographers in developed and developing countries, the authors show that the large majority perceive the publication pressure as high, but more so in Anglo-Saxon countries and to a lesser extent in Western Europe. However, scholars see both the pros (upward mobility) and cons (excessive publication and uncitedness, neglect of policy issues, etc.) of the so-called publish-or-perish culture. By measuring behavior in terms of reading and publishing, and perceived extrinsic rewards and stated intrinsic rewards of practicing science, it turns out that publication pressure negatively affects the orientation of demographers towards policy and knowledge sharing. There are no signs that the pressure affects reading and publishing outside the core discipline.
The capability to build nuclear weapons is a key national security factor that has a profound influence on the balance of international relations. In addition to longstanding players, regional powers and peripheral countries have sought for ways of acquiring and/or developing them. The authors postulate that to express the capabilities, relative positions, and interrelations of the countries involved in the production of nuclear weaponization knowledge, dynamic network analysis provides valuable insight. In this article, the authors use a computational framework that combines techniques from dynamic network analysis and text mining to mine and analyze large-scale networks that are extracted from open theoretical and experimental nuclear research publications of the last two decades. More specifically, they build interlinked, dynamic networks that model relationships of nuclear researchers based on the open literature and supplement this information with text mining to classify the nuclear weaponization capabilities of each publication-of each author, organization, city, and country. Using such a comprehensive computational framework, they are able to (a) elicit the hot topics in nuclear weaponization research, (b) assess the nuclear expertise level of each country, (c) differentiate between established and emergent players, and (d) identify the key entities at various levels such as organization, city, and country.
This study explores the similarity among six types of scholarly networks aggregated at the institution level, including bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks. Cosine distance is chosen to measure the similarities among the six networks. The authors found that topical networks and coauthorship networks have the lowest similarity; cocitation networks and citation networks have high similarity; bibliographic coupling networks and cocitation networks have high similarity; and coword networks and topical networks have high similarity. In addition, through multidimensional scaling, two dimensions can be identified among the six networks: Dimension 1 can be interpreted as citation-based versus noncitation-based, and Dimension 2 can be interpreted as social versus cognitive. The authors recommend the use of hybrid or heterogeneous networks to study research interaction and scholarly communications.
Using a large database (similar to 215,000 records) of relevant articles, we empirically study the complex systems field and its claims to find universal principles applying to systems in general. The study of references shared by the articles allows us to obtain a global point of view on the structure of this highly interdisciplinary field. We show that its overall coherence does not arise from a universal theory, but instead from computational techniques and fruitful adaptations of the idea of self-organization to specific systems. We also find that communication between different disciplines goes through specific "trading zones," i.e., subcommunities that create an interface around specific tools (a DNA microchip) or concepts (a network).
A recent study of patient decision making regarding acceptance of an implantable cardiac defibrillator (ICD) provides a substantial but nonrandom sample (N = 191) of telephone interviews with persons who have made an affirmative decision regarding an ICD. Using a coding scheme developed through qualitative analysis of transcribed interviews, these data can be subjected to exploratory statistical analysis. The reasons given by respondents for getting the ICD differed by both region and gender, and show some correlations with whether the device has or has not delivered any stimulation (shocks) since implantation. Cluster analysis reveals association among certain important themes in the discussion of the decision process, particularly linking rather opposite concepts into clusters related to specific dimensions. The results suggest the importance, to patients, of maintaining the integrity of the self by asserting control and independence. The majority of the respondents (61%) have not received the primary intended benefit of the device (stimulation). Thus, the findings suggest that psychological benefits alone of having the device (such as anxiety reduction) serve to justify acceptance of a computerized device. Implications for other lines of computerized health support and for further study of these issues are discussed.
This article introduces the problem of collocative integrity present in long-lived classification schemes that undergo several changes. A case study of the subject "eugenics" in the Dewey Decimal Classification is presented to illustrate this phenomenon. Eugenics is strange because of the kinds of changes it undergoes. The article closes with a discussion of subject ontogeny as the name for this phenomenon and describes implications for information searching and browsing.
Net neutrality is the focus of an important policy debate that is tied to technological innovation, economic development, and information access. We examine the role of human values in shaping the Net neutrality debate through a content analysis of testimonies from U.S. Senate and FCC hearings on Net neutrality. The analysis is based on a coding scheme that we developed based on a pilot study in which we used the Schwartz Value Inventory. We find that the policy debate surrounding Net neutrality revolves primarily around differences in the frequency of expression of the values of innovation and wealth, such that the proponents of Net neutrality more frequently invoke innovation, while the opponents of Net neutrality more frequently invoke wealth in their prepared testimonies. The paper provides a novel approach for examining the Net neutrality debate and sheds light on the connection between information policy and research on human values.
This study combines Web usage mining, Web link analysis, and bibliometric methods for analyzing research activities in research organizations. It uses visits to the Expert Protein Analysis System (ExPASy) server-a virtual research infrastructure for bioinformatics-as a proxy for measuring bioinformatic research activity. The study finds that in the United Kingdom (UK), Germany, and Spain the number of visits to the ExPASy Web server made by research organizations is significantly positively correlated with research output in the field of biochemistry, molecular biology, and genetics. Only in the UK do we find a significant positive correlation between ExPASy visits per publication and the normalized impact of an organization's publications. The type of indicator developed in this study can be used to measure research activity in fields in which e-research has become important. In addition, it can be used for the evaluation of e-research infrastructures.
State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.
Detecting negative and speculative information is essential in most biomedical text-mining tasks where these language forms are used to express impressions, hypotheses, or explanations of experimental results. Our research is focused on developing a system based on machine-learning techniques that identifies negation and speculation signals and their scope in clinical texts. The proposed system works in two consecutive phases: first, a classifier decides whether each token in a sentence is a negation/speculation signal or not. Then another classifier determines, at sentence level, the tokens which are affected by the signals previously identified. The system was trained and evaluated on the clinical texts of the BioScope corpus, a freely available resource consisting of medical and biological texts: full-length articles, scientific abstracts, and clinical reports. The results obtained by our system were compared with those of two different systems, one based on regular expressions and the other based on machine learning. Our system's results outperformed the results obtained by these two systems. In the signal detection task, the F-score value was 97.3% in negation and 94.9% in speculation. In the scope-finding task, a token was correctly classified if it had been properly identified as being inside or outside the scope of all the negation signals present in the sentence. Our proposal showed an F score of 93.2% in negation and 80.9% in speculation. Additionally, the percentage of correct scopes (those with all their tokens correctly classified) was evaluated obtaining F scores of 90.9% in negation and 71.9% in speculation.
This article analyzes the effects of publication language on the international scientific visibility of Russia using the Web of Science (WoS). Like other developing and transition countries, it is subject to a growing pressure to "internationalize" its scientific activities, which primarily means a shift to English as a language of scientific communication. But to what extent does the transition to English improve the impact of research? The case of Russia is of interest in this respect as the existence of many combinations of national journals and languages of publications (namely, Russian and English, including translated journals) provide a kind of natural I experiment to test the effects of language and publisher's country on the international visibility of research through citations as well as on the referencing practices of authors. Our analysis points to the conclusion that the production of original English-language papers in foreign journals is a more efficient strategy of internationalization than the mere translation of domestic journals. If the objective of a country is to maximize the international visibility of its scientific work, then the efforts should go into the promotion of publication in reputed English-language journals to profit from the added effect provided by the Matthew effect of these venues.
A new visual representation of the response time, i.e., the time elapsed between the publication year and the date of the first citation of a paper, is provided. This presentation can be used to detect and describe different paradigmatic types of reception speed for scientific journals.
Traffic from search engines is important for most online businesses, with the majority of visitors to many websites being referred by search engines. Therefore, an understanding of this search engine traffic is critical to the success of these websites. Understanding search engine traffic means understanding the underlying intent of the query terms and the corresponding user behaviors of searchers submitting keywords. In this research, using 712,643 query keywords from a popular Spanish music website relying on contextual advertising as its business model, we use a k-means clustering algorithm to categorize the referral keywords with similar characteristics of onsite customer behavior, including attributes such as clickthrough rate and revenue. We identified 6 clusters of consumer keywords. Clusters range from a large number of users who are low impact to a small number of high impact users. We demonstrate how online businesses can leverage this segmentation clustering approach to provide a more tailored consumer experience. Implications are that businesses can effectively segment customers to develop better business models to increase advertising conversion rates.
A technique is developed using patent information available online (at the U.S. Patent and Trademark Office) for the generation of Google Maps. The overlays indicate both the quantity and the quality of patents at the city level. This information is relevant for research questions in technology analysis, innovation studies, and evolutionary economics, as well as economic geography. The resulting maps can also be relevant for technological innovation policies and research and development management, because the U.S. market can be considered the leading market for patenting and patent competition. In addition to the maps, the routines provide quantitative data about the patents for statistical analysis. The cities on the map are colored according to the results of significance tests. The overlays are explored for the Netherlands as a "national system of innovations" and further elaborated in two cases of emerging technologies: ribonucleic acid interference (RNAi) and nanotechnology.
The application of micro-level citation indicators is not without controversy. The procedure requires the availability of both adequate data sets and trusted metrics. Few indicators have been developed to deal specifically with individual assessment. The h-type indices are the most popular category; however, the dependence of h-type metrics on publication age and field makes their application often unjustified. This article studies the effects that publication age and field normalization have on h-type citation values of German Leibniz Prize winners. This data set is exclusive in that it is highly scrutinized for homonyms. Results are compared with other field-normalized citation rates, contributing to the debate on using demarcation versus average citation approaches to evaluate top researchers.
In a recent article, L. Egghe, R. Guns, and R. Rousseau (2011) noted that in a study of some eminent scientists, many of them had a fair proportion of papers which were uncited and found this to be surprising. Here, we use the stochastic publication/citation model of Q.L. Burrell (2007) to show that the result might in fact be expected. This brief communication is in the spirit of Q.L. Burrell (2002, 2005), showing that results that might at first sight seem to be surprising can in fact often be explainable in a stochastic framework.
Distributed information retrieval (DIR), where a single broker coordinates retrieval from many independent search services, has been extensively studied but typically without any particular application and sometimes even without any explicit motivation. There have been a handful of arguments given for DIR-coverage, effectiveness, and ease of use, for example-but these are not borne out by experience. Are there uses for DIR? There are, but generally for organizational not technical reasons, and they have not been well studied.
This study determines how library and information science (LIS) research in Taiwan has changed between 2001 and 2010. The major research questions address the research status of LIS in Taiwan, how the Taiwanese government supports the field, and the collaborative authorship of LIS journal articles in Taiwan. Bibliometric and content analysis methods were conducted to analyze 2,494 journal articles, 983 theses, and 191 research projects between 2001 and 2010. The results show LIS and Technology to be the most popular topics in journal articles. The most well-received thesis topics are LIS and Technology and User Services, accounting for more than 50 % of graduate theses. The same is true for research projects, with the subjects of LIS and Technology, LIS Theory and Foundation, and User Services having a ratio of more than 70 %. In government-sponsored research projects, the average amount of funding obtained had no significant differences or tendencies for various subjects over time. In authorship of journal articles, individual researchers conducted 66.11 % of articles in key LIS scholarly journals in Taiwan between 2001 and 2010.
With the growth of competition between nations in our knowledge-based world economy, excellence programs are becoming a national agenda item in developing as well as developed Asian countries. The main purpose of this paper is to compare the goals, funding policies and selection criteria of excellence programs in China, Japan, Korea and Taiwan and to analyze the academic achievement of their top ranked universities in three areas: research output, internationalization, and excellence, by using data from the Shanghai Jiao Tong, QS, and HEEACT rankings. The effectiveness of Taiwan's "Development Plan for World Class Universities and Research Centers of Excellence" was assessed as a case study in the paper via a survey targeting on 138 top administrators from 11 Taiwan's universities and 30 reviewers. The study found that more funding nations had, the more outputs and outcomes they would gain, for example China. The Taiwan case demonstrates that world-class universities and research centers are needed in Asian nations despite the concerns for inequality which they raise.
This article identifies patterns and structures in the social tagging of scholarly articles in CiteULike. Using a dataset of 4,215 tags attributed to 1,600 scholarly articles from 15 library and information science journals, a network was built to understand users' information organization behavior. Social network analysis and the frequent-pattern tree method were used to discover the implicit patterns and structures embedded in social tags as well as in their use, based on 26 proposed tag categories. The pattern and structure of this network of social tags is characterized by power-law distribution, centrality, co-used tag categories, role sharing among tag categories, and similar roles of tag categories in associating distinct tag categories. Furthermore, researchers generated 21 path-based decision-making sub-trees providing valuable insights into user tagging behavior for information organization professionals. The limitations of this study and future research directions are discussed.
Knowledge flow between public and private sectors is widely recognized as a way to stimulate innovation and regional development, particularly in science parks. This work employs a bibliometric approach, based on patent citation, non-patent citation, and public-private co-authorship of scientific publications to measure the use of public research in Hsinchu Science Park (HSP) in Taiwan. The result shows that the number of jointly published papers has increased constantly, implying the collaboration between HSP and universities has become more common. However, from the aspect of co-patenting, patent citation, and non-patent reference, technological innovation stemming from public research needs to be enhanced.
Research performance is difficult to evaluate because most of the criteria are incommensurable, and assessing its improvement over time is even more difficult. This paper assesses the performance improvement in management research in Taiwan between 2006 and 2010 using the Malmquist productivity index (MPI). The criteria for measuring research performance are journal publications, where the journals are classified as SI-, TI-, other international-, and other local-types. While the number of papers has increased for three types and decreased in one, the MPI indicates that the aggregate performance has improved significantly. The areas of management covered in this study are management information systems, production and operations management, and marketing. For all these areas the performance has improved, although the improvement in marketing is insignificant. The assessment sheds some light on the area and category of journals that contribute to the improvement of research performance, and which are useful for setting goals to reach higher levels.
Recently there are many organizations conducting projects on ranking world universities from different perspectives. These ranking activities have made impacts and caused controversy. This study does not favor using bibliometric indicators to evaluate universities' performances, but not against the idea either. We regard these ranking activities as important phenomena and aim to investigate correlation of different ranking systems taking bibliometric approach. Four research questions are discussed: (1) the inter-correlation among different ranking systems; (2) the intra-correlation within ranking systems; (3) the correlation of indicators across ranking systems; and (4) the impact of different citation indexes on rankings. The preliminary results show that 55 % of top 200 universities are covered in all ranking systems. The rankings of ARWU and PRSPWU show stronger correlation. With inclusion of another ranking, WRWU (2009-2010), these rankings tend to converge. In addition, intra-correlation is significant and this means that it is possible to find out some ranking indicators with high degree of discriminativeness or representativeness. Finally, it is found that there is no significant impact of using different citation indexes on the ranking results for top 200 universities.
Literature review is an important but time-consuming task that involves many disparate steps. A simple query to a library database may return voluminous literature that often bewilders novices.We believe the bibliographic techniques developed by the information scientists provide useful process and methods that facilitate literature analysis and review. We thereby developed a citation-based literature analyzing and structuring system, which may facilitate novices to perform tasks that are usually carried out by trained professionals. A field study was carried out to gauge the utility as well as users' perception using a questionnaire adopted from relevant empirical studies. Graduate students participated in the field study are able to publish papers in their first semester by utilizing this system. The utility and usefulness of the intellectual structuring system are demonstrated by the objective evidence of the high acceptance rate of papers utilizing the system as well as the subjective positive response from the users. A system utilization model utilizing the structure equation modeling technique found the task characteristics construct affects the information quality construct, which in turn affects the perceive usefulness of the system.
This study uses the entropy-based patent measure to discuss the effects of related technological diversification (RTD) and unrelated technological diversification (UTD) on innovation performance and corporate growth. The results indicate that RTD has a monotonically positive influence on both of innovation performance and corporate growth and UTD has an inverse U-shaped influence on both of them. Furthermore, the results show that the extent of the positive effect of RTD on innovation performance and corporate growth is better than that of UTD on both of them. If Taiwan's semiconductor companies would like to undertake technological diversification, this study suggests that they should adopt RTD, rather than UTD. Besides, this study points out that innovation performance mediates the relationship between corporate growth and both of RTD and UTD. It demonstrates that RTD and UTD can directly affect corporate growth or indirectly influence it via innovation performance.
The rapid economic growth in East Asia might have an impact on the development of research output. Because previous bibliometric analysis about anesthesiology in this region had been limited to research within anesthesiology journals or anesthesia-related research, the total publications from anesthesia departments might not be well displayed. In this study, the databases of Web of Science and PubMed were used to assess the academic productivity and distribution of research diversity of anesthesia departments from four major countries in East Asia and compared those with the USA. From 2001 to 2010 the volume of scientific research from anesthesia departments in East Asia has stably increased. Although Japan was the most productive contributor in East Asia, its share declined annually. China increased most rapidly and exceeded Japan in 2010 in terms of annual number of papers. Research attributed to anesthesia departments in East Asia was diverse and present in a wide range of non-anesthesia field journals. Notably the annual number of randomized controlled trials in East Asia also had a strong growth.
This paper employs bibliometric methods to observe collaboration patterns of scientific publications in biotechnology, information and computer technology, future energy, and nanotechnology among different institutions in Taiwan. The results show primary domestic and international collaborative patterns, the effect of collaborative papers on the world-wide average, collaborative networks, and the distribution of institutions on global map. The findings suggest that domestic collaboration in each area is higher in proportion than international collaboration. Biotechnology leads in both domestic and international collaborative percentage. Among cooperative benchmarking countries, the US and China are the main partners. Collaboration among research institutes and universities is the most frequent collaborative pattern in each area except biotechnology, which tends to occur between hospitals and universities. On average, international collaborative papers tend to have greater effect, except in nanotechnology. Academia Sinica collaborated frequently with foreign institutes in each research field. A further analysis on how each collaborative group forms is recommended, especially collaboration among the Triple-Helix relationships.
We examine how strategic partnership affects external learning of technology descendants from emerging markets under the context of Taiwan's flat panel display industry. The study takes patent citation as a trail of knowledge flow, and incorporates 1,726 pairs relations of the cited and citing firms. Our empirical evidence shows positive pattern of external learning through strategic technology partnership. After controlling the quality factor of the knowledge, technology descendants do learn more from their alliance partners than other non-allied firms; particularly, trading type of partnerships characterized by the asymmetric relations appears to bring more impact. Furthermore, a focused approach in extrapolating knowledge from strategic partners seems to be the dominant practice.
This study aims to propose an early precaution method which allows predicting probability of patent infringement as well as evaluating patent value. To obtain the purposes, a large-scale analysis on both litigated patents and non-litigated patents issued between 1976 and 2010 by USPTO are conducted. The holistic scale analysis on the two types of patents (3,878,852 non-litigated patents and 31,992 litigated patents in total) issued by USPTO from 1976 to 2010 has not been conducted in literatures and need to be investigated to allow patent researchers to understand the overall picture of the USPTO patents. Also, by comparing characteristics of all litigated patents to that of non-litigated patents, a precaution method for patent litigation can be obtained. Both litigated patents and non-litigated patents are analyzed to understand the differences between the two types of patents in terms of different variables. It is found that there are statistically significant differences for the two types of patents in the following 11 variables: (1) No. of Assignee, (2) No. of Assignee Country, (3) No. of Inventor, (4) Inventor Country, (5) No. of Patent Reference, (6) No. of Patent Citation Received, (7) No. of IPC, (8) No. of UPC, (9) No. of Claim, (10) No. of Non-Patent Reference, and (11) No. of Foreign Reference. Finally, logistic regression is used for predicting the probability of occurrence of a patent litigation by fitting the 11 characteristics of 3,910,844 USPTO patents to a logistic function curve.
The present study analyzes bibliometric characteristics of Taiwan's highly cited papers published from 2000 to 2009. During this period, Taiwan ranked within the top 30 countries by number of highly cited papers, defined in Thomson Reuters' Essential Science Indicators (ESI) as those that rank in the top 1 % by citations for their category and year of publication. Taiwan made notable progress in world-class research in the two consecutive 5-year periods 2000-2004 and 2005-2009. For the group of highly cited papers from Taiwan, USA, China, Germany, and Japan were the top collaborating countries over the decade. In recent years, Taiwan has increasingly collaborated with European countries whose output of highly cited papers is relatively high and increasing, rather than with its neighboring countries in Asia. Overall, Taiwan produced highly cited papers in all the 22 ESI subject categories during the 10-year period. Taiwan's output of highly cited papers was greatest in the categories of Engineering, Clinical Medicine, and Physics, while those in Agricultural Sciences and Mathematics exceeded the expected output level in relative terms. More detailed analyses would be useful for a holistic understanding of Taiwan's research landscape and their progress in world-class research, combining both bibliometric and non-bibliometric data, such as researcher mobility, research grants, and output from internationally-collaborated research programs.
A review of Garfield's journal impact factor and its specific implementation as the Thomson Reuters impact factor reveals several weaknesses in this commonly-used indicator of journal standing. Key limitations include the mismatch between citing and cited documents, the deceptive display of three decimals that belies the real precision, and the absence of confidence intervals. These are minor issues that are easily amended and should be corrected, but more substantive improvements are needed. There are indications that the scientific community seeks and needs better certification of journal procedures to improve the quality of published science. Comprehensive certification of editorial and review procedures could help ensure adequate procedures to detect duplicate and fraudulent submissions.
The Hirsch citation index h is nowadays the most frequently used numerical indicator for the performance of scientists as reflected in their output and in the reaction of the scientific community reflected in citations of individual contributions. A few of the possible improvements of h are briefly reviewed. Garfield's journal impact factor (IF) characterizes the reaction of the scientific community to publications in journals, reflected in citations of all papers published in any given journal during the preceding 2 years, and normalized against all citable articles during the same period. Again, a few of the possible improvements or supplements of IF are briefly reviewed, including the journal-h index proposed by Braun, Glanzel, and Schubert. Ascribing higher weighting factors to citations of individual papers proportionally to IF is considered to be a misuse of useful numerical indices based on citations. At most, one could turn this argument on its head and one can find reasons to ascribe an inverse proportionality relative to IF for individual citations: if a paper is considered worthy to be cited even if it was published in a low-IF journal, that citation ought to be worth more than if the citation would have been from a higher-impact journal. A weight factor reflecting the prestige of the citing author(s) may also be considered.
The impact factor is one of the most used scientometric indicators. Its proper and improper uses have been discussed extensively before. It has been criticized extensively, yet it is still here. In this paper I propose the journal report card, which is a set of measures, each with an easily comprehensible meaning that provides a fuller picture of the journals' standing. The set of measures in the report card include the impact factor, the h-index, number of citations at different points on the ranked list of citations, extent of uncitedness and coverage of the h-core. The report card is computed for two sets of journals, the top-20 journals in JCR 2010 and the top-20 journals in JCR 2010 for the category Information and Library Science.
This paper is a response to that of Vanclay, who proposes, that since the impact factor (IF) is so seriously flawed, Thomson Reuters should either correct the measure or-preferably-no longer publish it and restrict itself to journal certification. It is argued here that Vanclay's analysis is itself seriously flawed, because he appears totally ignorant of the thought structure of Eugene Garfield, IF's creator. As a result, Vanclay appears unaware of the importance of total cites and the close connection of IF with review journals, where the paradigms of science are defined. This paper's author agrees that IF is a defective measure, analyzing its defects from the perspective of the frequency theory of probability, on which modern inferential statistics is based. However, he asserts that abandoning it would be counterproductive because of its demonstrated ability-even with its defects-to identify small important journals like review journals, giving it an important role in science evaluation and library collection management.
In the discussion paper on this issue, Vanclay (2011) describes and uncovers several weaknesses of the JIF based on a thorough literature review and detailed empirical analyses. In this short comment we would like to add the results of two studies to the discussion around the JIF. In these studies we investigated the effect of several versions of one and the same manuscript published by a journal on its JIF.
The impact factor is a highly polemic metric. It was designed to help scientists in searching for bibliographic references for their own works, enabling communication among researchers and helping librarians in deciding which journal they should purchase. Nevertheless, it has soon become the most important measure of scientific performance applied to journals, articles, scientists, universities, etc. Since then, some researchers argue that it is a useless and flawed measure, while others defend its utility. The current study is the first survey on the opinion on the topic of a broad sample of scientists from all over the world. The questionnaire was answered by 1,704 researchers from 86 different countries, all the continents and all the UNESCO major fields of knowledge. The results show that the opinion is slightly above the median which could be understood as "neither positive nor negative". Surprisingly, there is a negative correlation between the number of articles published by the respondents and their opinion on the impact factor.
In the interesting and provocative paper on Journal Impact Factors by Vanclay (in press) there are some interesting points worth further reflection. In this short commentary I will focus in those that I consider most relevant because they suggest some ideas that could be addressed by researchers interested in this topic.
The representativeness of the ISI-Thomson Impact Factor rankings and the existing relationship between countries' national languages and the diffusion of scientific publications is analyzed. We discuss literature on the Impact Factor related to language use, publication strategies for authors and editors from non-English-speaking countries, the effects of the inclusion of a new journal in the ISI-Thomson databases and the scientific policies articulated in some non-English-speaking countries. The adoption of the Impact Factor as the valuation criterion for scientific activities has favoured the consolidation of English language journals in the diffusion of scientific knowledge. The vernacular languages only conserve part of their importance in certain disciplines, such as Clinical Medicine or Social Sciences and Humanities. The Impact Factor, invented over 50 years ago now, could be a limitation for non-English authors and scientific journals, and does not consider some widely used practices among the scientific community concerning the development of Internet as a means for the diffusion of knowledge.
Author self-citations are another factor that affects the impact factor of a journal. Typically these self-citations are just counted as such. But to be more meaningful I suggest that when examining the contribution of authors' self-citations to impact factors one should first count the number of citations in the text rather than in the reference list, and then discriminate between different kinds of author self-citations-from those that are informative to those that are self-enhancing-if these data are to be more credible.
With reference to Vanclay (Scientometrics in press, 2012) the paper argues for a pragmatic approach to the Thomson-Reuter's journal impact factor. The paper proposes and discusses to replace the current synchronous Thomson-Reuter journal impact factor by an up-to-date diachronic version (DJIF), consisting of a three-year citation window over a one year publication window. The DJIF online data collection and calculation is exemplified and compared to the present synchronous journal impact factor. The paper discusses briefly the dimensions of currency, robustness, understandability and comparability to other impact factors used in research evaluation.
This paper reflects on the most current and some of the recent contributions of JK Vanclay, focusing on his methods, findings, and criticism about the journal citations reports and the web of science databases, the journal impact factor and the h-index. It is argued and demonstrated that some of the recent papers of the author about scientometric issues, measures and sources show so much demagoguery, ignorance and arrogance, have so much prejudice and bias, so profound errors in using the databases, calculating metrics, and interpreting search results that the papers are very unlikely to be meant as a genuine contribution from an academic who is a graduate of-among others-Oxford University, professor and dean in a respected university, a well-published and well-cited author and a recipient of the Queen's Award (all the above in forest science). The papers are much more likely to serve as props for a staged, mock-up scenario based on slipshod research in an experiment, to illustrate the deficiencies in the processes and in the assessment of scholarly publishing productivity and impact in order to present the idealized solution of Vanclay: using the h-index, portrayed as the Prince, mounted on the shoulder of the White Horse, Google Scholar.
Journal impact factors (IFs) can be considered historically as the first attempt to normalize citation distributions by using averages over 2 years. However, it has been recognized that citation distributions vary among fields of science and that one needs to normalize for this. Furthermore, the mean-or any central-tendency statistics-is not a good representation of the citation distribution because these distributions are skewed. Important steps have been taken to solve these two problems during the last few years. First, one can normalize at the article level using the citing audience as the reference set. Second, one can use non-parametric statistics for testing the significance of differences among ratings. A proportion of most-highly cited papers (the top-10% or top-quartile) on the basis of fractional counting of the citations may provide an alternative to the current IF. This indicator is intuitively simple, allows for statistical testing, and accords with the state of the art.
In a reply to Jerome K. Vanclay's manuscript "Impact Factor: outdated artefact or stepping-stone to journal certification?" we discuss the value of journal metrics for the assessment of scientific-scholarly journals from a general bibliometric perspective, and from the point of view of creators of new journal metrics, journal editors and publishers. We conclude that citation-based indicators of journal performance are appropriate tools in journal assessment provided that they are accurate, and used with care and competence.
The journal impact factor (JIF) proposed by Garfield in the year 1955 is one of the most commonly used and prominent citation-based indicators of the performance and significance of a scientific journal. The JIF is simple, reasonable, clearly defined, and comparable over time and, what is more, can be easily calculated from data provided by Thomson Reuters, but at the expense of serious technical and methodological flaws. The paper discusses one of the core problems: The JIF is affected by bias factors (e.g., document type) that have nothing to do with the prestige or quality of a journal. For solving this problem, we suggest using the generalized propensity score methodology based on the Rubin Causal Model. Citation data for papers of all journals in the ISI subject category "Microscopy" (Journal Citation Report) are used to illustrate the proposal.
In the almost 40 years since we wrote Evaluative bibliometrics enormous advances have been made in data availability and analytic technique. The journal impact factor of the 1960s has clearly not kept up with the state of the art. However, for both old and new indicators, basic validity and relevance issues remain, such as by what standard can we validate our results, and what external use can appropriately be made of them? As funding support becomes more difficult, we should not lose sight of the necessity to again demonstrate the importance of our research, and must keep in mind that it is the relevance of our results that count, not the elegance of our mathematics.
We discuss research evaluation, the nature of impact, and the use of the Thomson Reuters journal impact factor and other indicators in scientometrics in the light of recent commentary.
Journals have been ranked on the basis of impact factors for a long time. This is a quality indicator, and often favours review journals with few articles. Integrated impact indicators try to factor in size (quantity) as well, and are correlated with total number of citations. The total number of papers in a portfolio can be considered a zeroth order performance indicator and the total number of citations a first order performance indicator. Indicators like the h-Index and the g-Index are actually performance indicators in that they integrate both quality and quantity assessment into a single number. The p-Index is another variant of this class of performance indicators and is based on the cubic root of a second order performance indicator called the exergy indicator. The Eigenfactor score and article influence are respectively first order quantity and quality indicators. In this paper, we confirm the above relationships.
The ThomsonReuters impact factor is a viable, widely used and informative measure of journal visibility and frequency of use. It is accurate, transparent and easy to use. It is a live and evolving system, that can broaden its scope and implement new features and methods. Some of Vanclay's suggestions, like wider use of order statistics, or our suggestion of rank normalization might be implemented by JCR in the future.
Vanclay's proposal (Vanclay (2012). Impact factor: outdated artefact or stepping-stone to journal certification? Scientometrics doi: 10.1007/s11192-011-0561-0) is discussed. We agree that a major overhaul is necessary: journal evaluation must be performed using instruments and not artefacts.
Few contemporary inventions have influenced academic publishing as much as journal impact factors. On the other hand, debates and discussion on the potential limitations of, and appropriate uses for, journal performance indicators are almost as long as the history of the measures themselves. Given that scientometrics is often undertaken using bibliometric techniques, the history of the former is inextricably linked to the latter. As with any controversy it is difficult to separate an invention from its history, and for these reasons, the current article provides an overview of some key historical events of relevance to the impact factor. When he first proposed the concept over half a century ago, Garfield did not realise that impact factors would one day become the subject of such widespread controversy. As the current Special Issue of Scientometrics suggests, this debate continues today.
In theory, the web has the potential to provide information about the wider impact of academic research, beyond traditional scholarly impact. This is because the web can reflect non-scholarly uses of research, such as in online government documents, press coverage or public discussions. Nevertheless, there are practical problems with creating metrics for journals based on web data: principally that most such metrics should be easy for journal editors or publishers to manipulate. Nevertheless, two alternatives seem to have both promise and value: citations derived from digitised books and download counts for journals within specific delivery platforms.
In this study the issue of the validity of the argument against the applied length of citation windows in Journal Impact Factors calculations is critically re-analyzed. While previous studies argued against the relatively short citation window of 1-2 years, this study shows that the relative short term citation impact measured in the window underlying the Journal Impact Factor is a good predictor of the citation impact of the journals in the next years to come. Possible exceptions to this observation relate to journals with relatively low numbers of publications, and the citation impact related to publications in the year of publication. The study focuses on five Journal Subject Categories from the science and social sciences, on normal articles published in these journals, in the 2 years 2000 and 2004.
In this paper we present a compilation of journal impact properties in relation to other bibliometric indicators as found in our earlier studies together with new results. We argue that journal impact, even calculated in a sufficiently advanced way, becomes important in evaluation practices based on bibliometric analysis only at an aggregate level. In the relation between average journal impact and actual citation impact of groups, the influence of research performance is substantial. Top-performance as well as lower performance groups publish in more or less the same range of journal impact values, but top-performance groups are, on average, more successful in the entire range of journal impact. We find that for the high field citation-density groups a larger size implies a lower average journal impact. For groups in the low field citation-density regions however a larger size implies a considerably higher average journal impact. Finally, we found that top-performance groups have relatively less self-citations than the lower performance groups and this fraction is decreasing with journal impact.
The paper summarizes some basic features of the Garfield impact factor (GF). Accordingly, GF should be regarded as a scientometric indicator representing the relative contribution of journals to the total impact of information in a field. For calculating GF, both from theoretical and practical reasons the "ratio of the sums" method is recommended over the "mean of the ratios" method. Scientific advances are made by the most influential, presumably most frequently cited articles. The distribution of citations among the publications is skewed in journals. Consequently, the GF index will be influenced primarily by the highly cited papers. It follows, GF represents the most valuable part of the information in journals quantitatively, and even therefore it may be regarded as a reliable impact indicator.
J.K. Vanclay's article is a bold attempt to review recent works on the journal impact factor (JIF) and to call for alternative certifications of journals. The too broad scope did not allow the author to fulfill all his purposes. Attempting after many others to organize the various forms of criticism, with targets often broader than the JIF, we shall try to comment on a few points. This will hopefully enable us to infer in which cases the JIF is an angel, a devil, or a scapegoat. We shall also expand on a crucial question that Vanclay could not really develop in the reduced article format: the field-normalization. After a short recall on classical cited-side or ex post normalization and of the powerful influence measures, we will devote some attention to the novel way of citing-side or ex ante normalization, not only for its own interest, but because it directly proceeds from the disassembling of the JIF clockwork.
Article processing charges (APCs) are a central mechanism for funding open access (OA) scholarly publishing. We studied the APCs charged and article volumes of journals that were listed in the Directory of Open Access Journals as charging APCs. These included 1,370 journals that published 100,697 articles in 2010. The average APC was $906?U.S. dollars (USD) calculated over journals and $904 USD calculated over articles. The price range varied between $8 and $3,900 USD, with the lowest prices charged by journals published in developing countries and the highest by journals with high-impact factors from major international publishers. Journals in biomedicine represent 59% of the sample and 58% of the total article volume. They also had the highest APCs of any discipline. Professionally published journals, both for profit and nonprofit, had substantially higher APCs than journals published by societies, universities, or scholars/researchers. These price estimates are lower than some previous studies of OA publishing and much lower than is generally charged by subscription publishers making individual articles OA in what are termed hybrid journals.
Since 2004, mainstream scholarly publishers have been offering authors publishing in their subscription journals the option to free their individual articles from access barriers against a payment (hybrid OA). This has been marketed as a possible gradual transition path between subscription and open access to the scholarly journal literature, and the publishers have pledged to decrease their subscription prices in proportion to the uptake of the hybrid option. The number of hybrid journals has doubled in the past couple of years and is now over 4,300; the number of such articles was around 12,000 in 2011. On average only 12% of eligible authors utilize the OA option, due mainly to the generally high price level of typically 3,000 USD. There are, however, a few publishers and individual journals with a much higher uptake. This article takes a closer look at the development of hybrid OA and discusses, from an author-centric viewpoint, the possible reasons for the lack of success of this business model.
The proliferation of discipline-specific metadata schemes contributes to artificial barriers that can impede interdisciplinary and transdisciplinary research. The authors considered this problem by examining the domains, objectives, and architectures of nine metadata schemes used to document scientific data in the physical, life, and social sciences. They used a mixed-methods content analysis and Greenberg's () metadata objectives, principles, domains, and architectural layout (MODAL) framework, and derived 22 metadata-related goals from textual content describing each metadata scheme. Relationships are identified between the domains (e.g., scientific discipline and type of data) and the categories of scheme objectives. For each strong correlation (>0.6), a Fisher's exact test for nonparametric data was used to determine significance (p?<?.05). Significant relationships were found between the domains and objectives of the schemes. Schemes describing observational data are more likely to have scheme harmonization (compatibility and interoperability with related schemes) as an objective; schemes with the objective abstraction (a conceptual model exists separate from the technical implementation) also have the objective sufficiency (the scheme defines a minimal amount of information to meet the needs of the community); and schemes with the objective data publication do not have the objective element refinement. The analysis indicates that many metadata-driven goals expressed by communities are independent of scientific discipline or the type of data, although they are constrained by historical community practices and workflows as well as the technological environment at the time of scheme creation. The analysis reveals 11 fundamental metadata goals for metadata documenting scientific data in support of sharing research data across disciplines and domains. The authors report these results and highlight the need for more metadata-related research, particularly in the context of recent funding agency policy changes.
A number of digital projects have been implemented for archival and special collections. The amount of funds and effort devoted to such projects is enormous, and now they are providing greater opportunities and convenience for researchers to view and make use of important, rare, and/or brittle historical materials. However, little attention has been paid in the information science field as to how much impact these projects to digitalize archival collections have had on actual historical research publications. Existing studies are largely devoted to system designs and the user/usability interface, as well as users' search behaviors. Little has been done to determine the direct relationship between digital resources and historical research. This study surveyed research articles in the field of history to observe how frequently and widely digital collections were used, what kinds of digital collections were used more extensively and for what purposes, and what the current status of digital archival collections among other resources is in historical research. Citations and figures in articles of the American Historical Review for the period 20012010 were analyzed with a specific focus on digital archives collection. The usage patterns by material types and formats of references and the impacts of digital archival collections among other sources are identified from two perspectives of impact: intensity and extensity. Observation of the direct relationships with digital collections and historical studies suggest some practical guidelines for future digital projects with concrete data.
This study analyzes the search behavior of Dutch-speaking nursing students with a nonnative knowledge of English who searched for information in MEDLINE/PubMed about a specific theme in nursing. We examine whether and to what extent their search efficiency is affected by their language skills. Our task-oriented approach focuses on three stages of the information retrieval process: need articulation, query formulation, and relevance judgment. The test participants completed a pretest questionnaire, which gave us information about their overall experience with the search system and their self-reported computer and language skills. The students were briefly introduced to the use of PubMed and MeSH (medical subject headings) before they conducted their keyword-driven subject search. We assessed the search results in terms of recall and precision, and also analyzed the search process. After the search task, a satisfaction survey and a language test were completed. We conclude that language skills have an impact on the search results. We hypothesize that language support might improve the efficiency of searches conducted by Dutch-speaking users of PubMed.
Focusing on information behavior in a context where medical evidence is explicitly evolving (management of the menopause transition), this investigation explored how women interact with and make sense of uncertain health information mediated by formal and informal sources. Based on interviews with 28 information seekers and 12 health professionals (HPs), findings demonstrate that participants accessed and valued a wide range of information sources, moved fluidly between formal and informal sources, and trust was strengthened through interaction and referral between sources. Participants were motivated to seek information to prepare for formal encounters with HPs, evaluate and/or supplement information already gathered, establish that they were normal, understand and address the physical embodiment of their experiences, and prepare for future information needs. Findings revealed four strategies used to construct sense from health information mediated by the many information sources encountered and accessed on an everyday basis: women assumed analytic and experiential postures; they valued social contexts for learning and knowledge construction; information consistency was used as a heuristic representing accuracy and credibility; and an important feature of sense making was source complementarity. Implications for health information literacy and patient education are discussed.
Tailoring, the development of health messages based on assessment of key psychosocial variables that influence a prescribed behavior, has been gaining ground as an effective health education approach. The efficacy of this approach is based on the assumption that increasing personal relevance motivates greater elaboration, which is an important precondition for persuasion. Little research has been conducted to tease out the direct effects of tailoring on message processing. This study examines the effects of a tailored health education site on participants' evaluations of and elaboration on health messages. A total of 151 teens were randomly assigned to explore a tailored Web site or a nontailored Web site on adolescent sexual health and decision making. Results of the experiment indicated a statistically significant main effect for condition (tailoring) after controlling for situational motivation and need for cognition, F(1, 148)?=?4.467, MSS?=?2.177, p?<?.05, partial ?2?=?.030. Further evaluations and implications for future research are discussed.
In this article, we propose new word sense disambiguation strategies for resolving the senses of polysemous query terms issued to Web search engines, and we explore the application of those strategies when used in a query expansion framework. The novelty of our approach lies in the exploitation of the Web page PageRank values as indicators of the significance the different senses of a term carry when employed in search queries. We also aim at scalable query sense resolution techniques that can be applied without loss of efficiency to large data sets such as those on the Web. Our experimental findings validate that the proposed techniques perform more accurately than do the traditional disambiguation strategies and improve the quality of the search results, when involved in query expansion.
Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and semantic levels, and learns from human judgments how to combine them by using machine-learning techniques. Experiments show that the new measure produces values for documents that are more consistent with people's judgments than people are with each other. We also use it to classify and cluster large document sets covering different genres and topics, and find that it improves both classification and clustering performance.
This article presents a new open-source software tool, SciMAT, which performs science mapping analysis within a longitudinal framework. It provides different modules that help the analyst to carry out all the steps of the science mapping workflow. In addition, SciMAT presents three key features that are remarkable in respect to other science mapping software tools: (a) a powerful preprocessing module to clean the raw bibliographical data, (b) the use of bibliometric measures to study the impact of each studied element, and (c) a wizard to configure the analysis.
The worldwide span of the microblogging service Twitter provides an opportunity to make international comparisons of trending topics of interest, such as news stories. Previous international comparisons of news interests have tended to use surveys and may bypass topics not well covered in the mainstream media. This study uses 9 months of English-language Tweets from the United Kingdom, United States, India, South Africa, New Zealand, and Australia. Based upon the top 50 trending keywords in each country from the 0.5 billion Tweets collected, festivals or religious events are the most common, followed by media events, politics, human interest, and sports. U.S. trending topics have the most interest in the other countries and Indian trending topics the least. Conversely, India is the most interested in other countries trending topics and the United States the least. This gives evidence of an international hierarchy of perceived importance or relevance with some issues, such as the international interest in U.S. Thanksgiving celebrations, apparently not being directly driven by the media. This hierarchy echoes, and may be caused by, similar news coverage trends. Although the current imbalanced international news coverage does not seem to be out of step with public news interests, the political implication is that the Twitter-using public reflects, and hence seems to implicitly accept, international imbalances in news media agenda setting rather than combating them. This is an issue for those believing that these imbalances make the media too powerful.
Understanding the role of acknowledgments given by researchers in their publications has been a recurrent challenge in the bibliometric field, but relatively unexplored until now. This study presents a general bibliometric analysis on the new funding acknowledgment (FA) information available in the Web of Science. All publications covered by the database in 2009 have been analyzed. The presence and length of the FA text, as well as the presence of peer interactive communication in the acknowledgments, are related to impact indicators, distribution of papers by fields, countries of the authors, and collaboration level of the papers. It is observed that publications with FAs present a higher impact as compared with publications without them. There are also differences across countries and disciplines in the share of publications with FAs and the acknowledgment of peer interactive communication. China is the country with the highest share of publications acknowledging funding, while the presence of FAs in the humanities and social sciences is very low compared to the more basic disciplines. The presence of peer interactive communication in acknowledgments can be linked to countries that have a strong scientific tradition and are incorporated in scientific networks. Peer interactive communication is also common in the fields of humanities and social sciences and can be linked to lower levels of co-authorship. Observed patterns are explained and topics of future research are proposed.
In a previous work (Egghe, 2011), the first author showed that Benford's law (describing the logarithmic distribution of the numbers 1, 2, ... , 9 as first digits of data in decimal form) is related to the classical law of Zipf with exponent 1. The work of Campanario and Coslado (2011), however, shows that Benford's law does not always fit practical data in a statistical sense. In this article, we use a generalization of Benford's law related to the general law of Zipf with exponent beta?>?0. Using data from Campanario and Coslado, we apply nonlinear least squares to determine the optimal beta and show that this generalized law of Benford fits the data better than the classical law of Benford.
This position paper analyzes the current situation in scholarly publishing and peer review practices and presents three theses: (a) we are going to run out of peer reviewers; (b) it is possible to replace referees with readers, an approach that I have named Readersourcing; and (c) it is possible to avoid potential weaknesses in the Readersourcing model by adopting an appropriate quality control mechanism. The readersourcing.org system is then presented as an independent, third-party, nonprofit, and academic/scientific endeavor aimed at quality rating of scholarly literature and scholars, and some possible criticisms are discussed.
The purpose of this study is to determine the usage patterns of core journals by scholars, and to address the differences among various academic disciplines. Thus, the references of 11,230 corresponding authors for the past 35 years from the world's top five highly cited universities and institutions were analyzed. To build robust models of information scattering, we need a deeper understanding of this phenomenon. The results show that core journals usage is a social phenomenon, in exactly the same way as Bradford's law, Zipf's law and Lotka's law. The analysis of author references shows that if core scientific journals are arranged in order of decreasing productivity, then they could be divided into a small group of highly cited periodicals and a large group of minimally cited ones. Scholars may do browsing and similar information-seeking activities to form their core journals, and the findings may support Bates's hypothesis that Bradford's core zone is best searched by browsing. Bradford's law and relevant research may consequently help to solve many of the practical problems that practitioners of the profession face, particularly in collection development in libraries, and help users to gather highly scattered information.
Many emerging countries in Asia demonstrate a strong pattern of growth and potential of diffusion in science and technology that is dynamic and self-propagating. To elucidate the evolution in science and technology and the institutional dynamics that drive the self-propagating behavior, this paper examines the divergent models pursued by selected Asian economies in regard to science and technological catch-up. An analysis of papers and patents production for each nation was conducted to examine the indigenous science and technology capabilities. This study focuses on six major economies, namely China, Malaysia, South Korea, Singapore, Taiwan and Thailand. In addition, Japan, a country with advanced development of science and technology, is included for comparison. The findings provided insight and understanding of evolving science and technological waves and the dynamic potentials in science and technology. We demonstrate the pursued catching-up models that drive the self-propagating behavior and industrialization, thus providing a more complete understanding of the innovation systems than those examined in previous studies.
This article provides an empirical assessment of the performance of the member states of the Association of Southeast Asian Nations in terms of science, technology, and innovation. This study is relevant because it employs a larger data set, examines more countries, and covers more years than previous studies. The results indicate that these countries had differing patterns of performance, and the pattern of growth among them was asymmetrical. Additional findings suggest that these countries performed idiosyncratically with respect to the six quantitative dimensions we examined. Our research includes a form of comparative policy evaluation that might assist the monitoring of the implementation of "Vision 2020". The results simplify how we determine the relative strengths and weaknesses of national innovation systems and are relevant to policy discussions. In relation to transferability, the findings demonstrate similarities to the European Union with regard to performance and governance.
A general expression based on the concepts of the progressive nucleation mechanism is proposed in the form to describe the growth behavior of items in an individual system and a collective of systems. In the above relation, alpha(t) is the ratio of items N(t) at time t to the maximum number C of possible items for the system, I similar to is the corresponding time constant and q is the exponent. The above relation is then used to analyze: (1) the growth behavior of cumulative number N(t) of papers published by individual authors and cumulative citations L(t) of N(t) papers of an author as a function of citation duration t, and (2) the relationship between cumulative citations L(t) of papers and cumulative number N(t) of papers. The proposed approach predicts that: (1) the fraction of items produced by successive systems is additive, (2) the cumulative fraction alpha (sum)(t) of maximum number of sites is the sum of contributions of fractions of maximum number of items produced by different systems, and (3) the values of time constants I similar to and exponent q increase with the addition of fraction of items produced by subsequent systems, but their values are the lowest for individual systems. The approach is applied to explain the growth behavior of cumulative N(t) papers and L(t) citations of four selected Polish professors.
Nanoscience and technology (NST) is a relatively new interdisciplinary scientific domain, and scholars from a broad range of different disciplines are contributing to it. However, there is an ambiguity in its structure and in the extent of multidisciplinary scientific collaboration of NST. This paper investigates the multidisciplinary patterns of Iranian research in NST based on a selection of 1,120 ISI-indexed articles published during 1974-2007. Using text mining techniques, 96 terms were identified as the main terms of the Iranian publications in NST. Then the scientific structure of the Iranian NST was mapped through multidimensional scaling, based upon the co-occurrence of the main terms in the academic publications. The results showed that the NST domain in Iranian publications has a multidisciplinary structure which is composed of different fields, such as pure physics, analytical chemistry, chemistry physics, material science and engineering, polymer science, biochemistry and new emerging topics.
Whether singleton approach (citation analysis of identified source journals) used by Gross and Gross (Science 66(1713):385-389, 1927) or differential approach (citation analysis of articles in specific subject field) applied by Bradford (Engineering 137:85-86, 1934) suitable to select or rank journals in multifaceted subject-'Oceanography' is presented. This study discusses both the approaches analyzing citations of published literature in oceanography from 30 countries. The ranking correlation of journals showed better positive correlation (lowest rho = 0.662 for 2005-2009 to highest rho = 0.817 for 1995-1999) when top ranked journals from the list generated complying Gross and Gross approach (GA) were correlated with same journal titles of the list generated complying Bradford approach than the other way (lowest rho = 0.588 for 2005-2009 to highest rho = 0.726 for 1990-1994). Both the approaches matched similar number of journals to country-wise lists and give unbiased choice in preferring a ranking list. The journals distribution graphs showed typical Bradford-Leimkuhler curves in both the approaches for all the datasets. But the groos droop appears comparatively early with shorter straight line in GA. The high clustering of literature to limited number of journals is a disadvantage in multifaceted subject. So the differential approach used by Bradford is being considered suitable for multifaceted subject like, 'Oceanography'.
Among the most recent bibliometric indicators for normalizing the differences among fields of science in terms of citation behaviour, Kosmulski (J Informetr 5(3):481-485, 2011) proposed the NSP (number of successful paper) index. According to the authors, NSP deserves much attention for its great simplicity and immediate meaning-equivalent to those of the h-index-while it has the disadvantage of being prone to manipulation and not very efficient in terms of statistical significance. In the first part of the paper, we introduce the success-index, aimed at reducing the NSP-index's limitations, although requiring more computing effort. Next, we present a detailed analysis of the success-index from the point of view of its operational properties and a comparison with the h-index's ones. Particularly interesting is the examination of the success-index scale of measurement, which is much richer than the h-index's. This makes success-index much more versatile for different types of analysis-e.g., (cross-field) comparisons of the scientific output of (1) individual researchers, (2) researchers with different seniority, (3) research institutions of different size, (4) scientific journals, etc.
The basic concepts and equations of the progressive nucleation mechanism (PNM) are presented first for the growth and decay of items. The mechanism is then applied to describe the cumulative citations L and citations Delta L per year of the individual most-cited papers i of four selected Polish professors as a function of citation duration t. It was found that the PNM satisfactorily describes the time dependence of cumulative citations L of the papers published by different authors with sufficiently high citations Delta L, as represented by the highest yearly citations Delta L (max) during the entire citation period t (normal citation behavior). The citation period for these papers is less than 15 years and it is even 6-8 years in several cases. However, for papers with citation periods exceeding about 15 years, the growth behavior of citations does not follow the PNM in the entire citation period (anomalous citation behavior), and there are regions of citations in which the citation data may be described by the PNM. Normal and anomalous citation behaviors are attributed, respectively, to the occurrence and nonoccurrence of stationary nucleation of citations for the papers. The PNM also explains the growth and decay of citations Delta L per year of papers exhibiting normal citation behavior.
This paper provides the profiling on the 'relative absorptive capacity of knowledge' research to provide insights of the field based on data collected from the ISI Web of science database during the years 2001-2010. The analysis is established in three phases, namely, the general publication, the subject area, and the topic profiling. The study obtains patterns, characteristics, and attributes at country, institutions, journals, author, and core reference levels. It shows the increase of the research activity in the field, based on the publication productivity during the years mentioned. Most of these publications are classified in the subject areas of business and economics, engineering, and operations research and management science. We highlight the nascent interest of the computer science subject area as a way to operationalize the different studies conducted. We found a lack of contribution from African and Latin-American countries despite the importance of the field for them. Our results are useful in terms of science strategy, science and technology policy, research agendas, research alliances, and research networks according to the special interest of specific actors at the individual, institutional, and national levels.
South Africa has 23 universities, of which five are placed in one or more of the 2011 Shanghai Jiao Tong, Times Higher Education, and Quacquarelli Symonds world university rankings. The five are: Cape Town, Witwatersrand, KwaZulu-Natal, Stellenbosch and Pretoria. They are ranked above the other 18 universities, with Cape Town in top position, mainly because they have significantly higher publication and citation counts. In the Shanghai Jiao Tong ranking Cape Town's Nobel Prize alumni and highly-cited researchers give it an additional lead over second-placed Witwatersrand, which has Nobel Prize alumni but no highly-cited researchers. KwaZulu-Natal, in third place, has no Nobel Prize alumni but one highly-cited researcher, which places it ahead of Stellenbosch and Pretoria despite the latter two having higher publication output. However, in the Times Higher Education ranking, which places Cape Town first and Witwatersrand second, Stellenbosch is ranked but not KwaZulu-Natal, presumably because the publication and citation counts of Stellenbosch are higher. The other 18 universities are ranked by the SCImago and Webometrics rankings in an order consistent with bibliometric indicators, and consistent with approximate simulations of the Shanghai Jiao Tong and Times Higher Education methods. If a South African university aspires to rise in the rankings, it needs to increase publications, citations, staff-student ratio, and proportions of postgraduate students, international students and international staff.
In the present study we analyzed the Brazilian scientific production in the area of science education. The study was structured on: data by research groups registered in Conselho Nacional de Desenvolvimento Cientifico e Tecnolgico; analysis of the post-graduate strictu sensu programs; analysis of theses and dissertations linked to post-graduate programs; and papers in international databases. Our research was conducted strictly via world wide web, from December 2009 to September 2010. It was found that both number of research groups, researchers, post-graduate programs, thesis, dissertations and papers presented a marked increase, especially in the last decade (from 2000 onwards). The major research centers were found to be located in public universities from Brazilian southeast and south regions. However, it was observed a tendency of decentralization, due to a recent investment in new public universities in the other Brazilian regions. So, this study sought to present an overview of the scientific production about science education and we expect that this information can help to expand the vision about the development of this research area in Brazil.
Citation studies have become an important tool for understanding scientific communication processes, as they enable the identification of several characteristics of information-retrieval behavior. This study seeks to analyze citation behavior using two popular ethnobotany articles, and our analysis is guided by the following question: when an author references a work, is he pointing out the work's theoretical contribution, or is bias a factor in citing this reference? Citation analysis reveals an interesting phenomenon, as the majority of citing texts do not consider the theoretical contributions made by the articles cited. Two possible conclusions can be drawn from this scenario: (1) citing authors read the original texts that they cite only superficially, and (2) the works cited are not read by the vast majority of people who reference them. Thus, it is clear that even with sufficient access to reference texts; ethnobotanical studies highlight elements less relevant to the research and reproduce discussions in a non-reflective manner.
Conventional patent citation analyses have focused mainly on the presence of citation relationships, the number of patents cited by the subject patent, and the number of times the subject patent is cited by others (i.e., the numbers of backward and forward citations of the subject patent). However, most of them have not focused on patent classifications. Assuming that a patent based on a variety of technological bases tends to be an important patent that is cited more often, this study examines and clarifies the relationship between the diversity of classifications assigned to backward citations and the number of forward citations for Japanese patents. The results show notable differences in the number of classifications assigned to backward citations between the often cited and less frequently cited groups. It is considered that the diversity of backward citations can be utilized in the evaluation criteria for grouping that roughly identifies the often cited patents or eliminates a large part of less frequently cited patents.
An effective bibliometric analysis was applied in this work to evaluate global scientific production of the subject category of "limnology" from 2001 to 2010. Data was based on the Science Citation Index compiled by Institute for Scientific Information (ISI), Philadelphia, USA. The h-index and NetDraw were designed to characterize the limnology publications. The results showed that the limnology research constantly increased over the past decade. The researchers paid most attention to "diatoms", "eutrophication" and "phosphorus". Moreover, the keywords plus of "growth", "model", "dynamic", offered a thorough description for the limnology research. Among the research institutes interested in limnologic research, the US Geological Survey was the flagship while the USA attained a dominant position in the global research in the field.
We evaluated earthquake research performance based on a bibliometric analysis of 84,051 documents published in journals and other outlets contained in the Scientific Citation Index (SCI) and Social Science Citation Index (SSCI) bibliographic databases for the period of 1900-2010. We summarized significant publication indicators in earthquake research, evaluated national and institutional research performance, and presented earthquake research development from a supplementary perspective. Research output descriptors suggested a solid development in earthquake research, in terms of increasing scientific production and research collaboration. We identified leading authors, institutions, and nations in earthquake research, and there was an uneven distribution of publications at authorial, institutional, and national levels. The most commonly used keywords appeared in the articles were evolution, California, deformation, model, inversion, seismicity, tectonics, crustal structure, fault, zone, lithosphere, and attenuation.
We compared three different bibliometric evaluation approaches: two citation-based approaches and one based on manual classification of publishing channels into quality levels. Publication data for two universities was used, and we worked with two levels of analysis: article and department. For the article level, we investigated the predictive power of field normalized citation rates and field normalized journal impact with respect to journal level. The results for the article level show that evaluation of journals based on citation impact correlate rather well with manual classification of journals into quality levels. However, the prediction from field normalized citation rates to journal level was only marginally better than random guessing. At the department level, we studied three different indicators in the context of research fund allocation within universities and the extent to which the three indicators produce different distributions of research funds. It turned out that the three distributions of relative indicator values were very similar, which in turn yields that the corresponding distributions of hypothetical research funds would be very similar.
The Leiden ranking 2011/2012 provides the Proportion top-10% publications (PP (top-10%) ) as a new indicator. This indicator allows for testing performance differences between two universities for statistical significance.
It is often said that successive generations of researchers face an increasing educational burden due to knowledge accumulation. On the other hand, technological advancement over time can improve the productivity of researchers and even change their cognitive processes. This paper presents a longitudinal study (2004-2011) of citation behavior in doctoral theses at the Massachusetts Institute of Technology's Department of Electrical Engineering and Computer Science. It is found that the number of references cited has increased over the years. At the same time, there has been a decrease in the length of time in the doctoral program and a relative constancy in the culture of the department. This suggests that students are more productive in facing an increased knowledge burden, and indeed seem to encode prior literature as transactive memory to a greater extent, as evidenced by the greater use of older literature.
The phenomenon of all-elements-sleeping-beauties in science is revealed by four special cases. The 'sleeping beauties' prick their fingers on the 'spindles' so that they fall into sleep then are awakened by their 'princes'. The authors speculate that the phenomenon could happen in scientific literatures with high quality.
Although there is some evidence that online videos are increasingly used by academics for informal scholarly communication and teaching, the extent to which they are used in published academic research is unknown. This article explores the extent to which YouTube videos are cited in academic publications and whether there are significant broad disciplinary differences in this practice. To investigate, we extracted the URL citations to YouTube videos from academic publications indexed by Scopus. A total of 1,808 Scopus publications cited at least one YouTube video, and there was a steady upward growth in citing online videos within scholarly publications from 2006 to 2011, with YouTube citations being most common within arts and humanities (0.3%) and the social sciences (0.2%). A content analysis of 551 YouTube videos cited by research articles indicated that in science (78%) and in medicine and health sciences (77%), over three fourths of the cited videos had either direct scientific (e.g., laboratory experiments) or scientific-related contents (e.g., academic lectures or education) whereas in the arts and humanities, about 80% of the YouTube videos had art, culture, or history themes, and in the social sciences, about 63% of the videos were related to news, politics, advertisements, and documentaries. This shows both the disciplinary differences and the wide variety of innovative research communication uses found for videos within the different subject areas.
Eye movement data can provide an in-depth view of human reasoning and the decision-making process, and modern information retrieval (IR) research can benefit from the analysis of this type of data. The aim of this research was to examine the relationship between relevance criteria use and visual behavior in the context of predictive relevance judgments. To address this objective, a multimethod research design was employed that involved observation of participants eye movements, talk-aloud protocols, and postsearch interviews. Specifically, the results reported in this article came from the analysis of 281 predictive relevance judgments made by 24 participants using the Google search engine. We present a novel stepwise methodological framework for the analysis of relevance judgments and eye movements on the Web and show new patterns of relevance criteria use during predictive relevance judgment. For example, the findings showed an effect of ranking order and surrogate components (Title, Summary, and URL) on the use of relevance criteria. Also, differences were observed in the cognitive effort spent between very relevant and not relevant judgments. We conclude with the implications of this study for IR research.
Social tagging and controlled indexing both facilitate access to information resources. Given the increasing popularity of social tagging and the limitations of controlled indexing (primarily cost and scalability), it is reasonable to investigate to what degree social tagging could substitute for controlled indexing. In this study, we compared CiteULike tags to Medical Subject Headings (MeSH) terms for 231,388 citations indexed in MEDLINE. In addition to descriptive analyses of the data sets, we present a paper-by-paper analysis of tags and MeSH terms: the number of common annotations, Jaccard similarity, and coverage ratio. In the analysis, we apply three increasingly progressive levels of text processing, ranging from normalization to stemming, to reduce the impact of lexical differences. Annotations of our corpus consisted of over 76,968 distinct tags and 21,129 distinct MeSH terms. The top 20 tags/MeSH terms showed little direct overlap. On a paper-by-paper basis, the number of common annotations ranged from 0.29 to 0.5 and the Jaccard similarity from 2.12% to 3.3% using increased levels of text processing. At most, 77,834 citations (33.6%) shared at least one annotation. Our results show that CiteULike tags and MeSH terms are quite distinct lexically, reflecting different viewpoints/processes between social tagging and controlled indexing.
Scholars in library and information science are under increasing pressure to seek external funding for research. The National Science Foundation (NSF), which is often the source of this funding, considers proposed projects based on the criteria of Intellectual Merit and Broader Impacts. However, these merit review criteria have been criticized as being insufficiently specific and not appropriate for all types of scientific research. In an effort to examine the extent to which funded projects represented Broader Impacts, the researchers performed a content analysis of the abstracts from projects in the National Science Digital Library, an NSF project that crossed many disciplines and applications, but is of particular relevance to information scientists. When the results of these analyses are placed in the context of the controversy surrounding the Broader Impacts merit review criterion, it is clear that this criterion is interpreted broadly and that even successful proposals often include aspirational or incomplete claims of impact. Because current proposed revisions to the merit review criteria that include emphases on demonstrable innovation and economic benefit will likely only complicate proposers' abilities to describe their projects' potentials, researchers may benefit from a greater understanding of Broader Impacts and how they can be clearly expressed to reviewers.
The purpose of this article is to test the reliability of query intents derived from queries, either by the user who entered the query or by another juror. We report the findings of three studies. First, we conducted a large-scale classification study (similar to 50,000 queries) using a crowdsourcing approach. Next, we used clickthrough data from a search engine log and validated the judgments given by the jurors from the crowdsourcing study. Finally, we conducted an online survey on a commercial search engine's portal. Because we used the same queries for all three studies, we also were able to compare the results and the effectiveness of the different approaches. We found that neither the crowdsourcing approach, using jurors who classified queries originating from other users, nor the questionnaire approach, using searchers who were asked about their own query that they just entered into a Web search engine, led to satisfying results. This leads us to conclude that there was little understanding of the classification tasks, even though both groups of jurors were given detailed instructions. Although we used manual classification, our research also has important implications for automatic classification. We must question the success of approaches using automatic classification and comparing its performance to a baseline from human jurors.
This study explores the factors influencing citations to Internet studies by assessing the relative explanatory power of three perspectives: normative theory, the social constructivist approach, and a natural growth mechanism. Using data on 7,700+ articles of Internet studies published in 100+ Social Sciences Citation Index (SSCI)-listed journals in 20002009, the study adopted a multilevel model to disentangle the impact between article- and journal-level factors on citations. This research strategy resulted in a number of both expected and surprising findings. The primary determinants for citations are found to be journal-level factors, accounting for 14% of the variances in citations of Internet studies. The impact of some, if not all, article-level factors on citations are moderated by journal-level factors. Internet studies, like studies in other areas (e.g., management, demography, and ecology), are cited more for rhetorical purposes, as suggested by the social constructivist approach, rather than as a form of reward, as argued by normative theory. The impact of time on citations varies across journals, which creates a growing citation gap for Internet studies published in journals with different characteristics.
This study augments the transtheoretical model (TTM) of behavior change with the concepts of information behavior and employs this framework to understand young men's needs for and practices of obtaining and avoiding information on physical activity and exercise in relation to their readiness to change exercise behavior. The results, based on statistical analyses of a population-based survey (N?=?616) conducted in Finland, indicate that health information behavior is influenced by an individual's stage of change in the context of physical activity and exercise. In pre-action stages (precontemplation, contemplation, preparation) where individuals do not exercise regularly and are uninformed or lack motivation, commitment, or skills to change behaviors, information is most often encountered through the passive practice of nondirected monitoring. In the action stage, where individuals have recently changed their exercise behaviors, information is obtained most frequently by active seeking. In the maintenance stage, where individuals maintain earlier adopted behaviors, information is habitually obtained through active scanning. These results support the TTM in its postulation that individuals may benefit from stage-tailored health-communication strategies. The limitations of this study include self-reported behaviors, cross-sectional study design, and a possibly biased sample. Further research is needed to explore the role of information behavior in the process of behavior change in greater detail.
In this article, we explore how strongly author name disambiguation (AND) affects the results of an author-based citation analysis study, and identify conditions under which the traditional simplified approach of using surnames and first initials may suffice in practice. We compare author citation ranking and cocitation mapping results in the stem cell research field from 2004 to 2009 using two AND approaches: the traditional simplified approach of using author surname and first initial and a sophisticated algorithmic approach. We find that the traditional approach leads to extremely distorted rankings and substantially distorted mappings of authors in this field when based on first- or all-author citation counting, whereas last-author-based citation ranking and cocitation mapping both appear relatively immune to the author name ambiguity problem. This is largely because Romanized names of Chinese and Korean authors, who are very active in this field, are extremely ambiguous, but few of these researchers consistently publish as last authors in bylines. We conclude that a more earnest effort is required to deal with the author name ambiguity problem in both citation analysis and information retrieval, especially given the current trend toward globalization. In the stem cell research field, in which laboratory heads are traditionally listed as last authors in bylines, last-author-based citation ranking and cocitation mapping using the traditional approach to author name disambiguation may serve as a simple workaround, but likely at the price of largely filtering out Chinese and Korean contributions to the field as well as important contributions by young researchers.
Technology sectors differ in terms of technological complexity. When studying technology and innovation through patent analysis it is well known that similar amounts of technological knowledge can produce different numbers of patented innovation as output. A new multilayered approach to measure the technological value of patents based on ego patent citation networks (PCNs) is developed in this study. The results show that the structural indicators for the ego PCN developed in this contribution can characterize groups of patents and, hence, in an indirect way, the health of companies.
We explore the feasibility of intermediary-based crosswalking and alignment of K-12 science education standards. With the increasing availability of K-12 science, technology, engineering, and mathematics (STEM) digital library content, alignment of that content with educational standards is a significant and continuous challenge. Whereas direct, one-to-one alignment of standards is preferable but currently unsustainable in its resource demands, less resource-intensive intermediary-based alignment offers an interesting alternative. But will it work? We present the results from an experiment in which the machine-based Standard Alignment Tool (SAT)incorporated in the National Science Digital Library (NSDL)was used to collect over half a million direct alignments between standards from different standard-authoring bodies. These were then used to compute intermediary-based alignments derived from the well-known AAAS Project 2061 Benchmarks and NSES standards. The results show strong variation among authoring bodies in their success at crosswalking, with the best results for those who modeled their standards on the intermediaries. The results furthermore show a strong inverse relationship between recall and precision when both intermediates were involved in the crosswalking.
Mixed methods research (MMR) has been described as the third research paradigm that combines qualitative and quantitative research methods. The mixing of research methods requires an epistemological framework that embraces the reality uncovered by different research methods. Three formal ontological categories are introduced for deconstructing the polarized view of reality in objectivism and relativism and for differentiating the nature and characteristics of objective, subjective, and normative validity claims as well as the conditions for justifying objectivity in social research. The characterization of information as objective, subjective, and normative-evaluative simultaneously demands the study of conditions of information-related phenomena that may call for mixed methods research in library and information science.
Wikipedia is frequently viewed as an inclusive medium. But inclusivity within this online encyclopedia is not a simple matter of just allowing anyone to contribute. In its quest for legitimacy as an encyclopedia, Wikipedia relies on outsiders to judge claims championed by rival editors. In choosing these experts, Wikipedians define the boundaries of acceptable comment on any given subject. Inclusivity then becomes a matter of how the boundaries of expertise are drawn. In this article I examine the nature of these boundaries and the implications they have for inclusivity and credibility as revealed through the talk pages produced and sources used by a particular subset of Wikipedia's creatorsthose involved in writing articles on the topic of Philippine history.
This study employed benchmarking and intellectual relevance judgment in evaluating Google, Yahoo!, Bing, Yahoo! Kids, and Ask Kids on 30 queries that children formulated to find information for specific tasks. Retrieved hits on given queries were benchmarked to Google's and Yahoo! Kids' top-five ranked hits retrieved. Relevancy of hits was judged on a graded scale; precision was calculated using the precision-at-ten metric (P@10). Yahoo! and Bing produced a similar percentage in hit overlap with Google (nearly 30%), but differed in the ranking of hits. Ask Kids retrieved 11% in hit overlap with Google versus 3% by Yahoo! Kids. The engines retrieved 26 hits across query clusters that overlapped with Yahoo! Kids' top-five ranked hits. Precision (P) that the engines produced across the queries was P?=?0.48 for relevant hits, and P?=?0.28 for partially relevant hits. Precision by Ask Kids was P?=?0.44 for relevant hits versus P?=?0.21 by Yahoo! Kids. Bing produced the highest total precision (TP) of relevant hits (TP?=?0.86) across the queries, and Yahoo! Kids yielded the lowest (TP?=?0.47). Average precision (AP) of relevant hits was AP?=?0.56 by leading engines versus AP?=?0.29 by small engines. In contrast, average precision of partially relevant hits was AP?=?0.83 by small engines versus AP?=?0.33 by leading engines. Average precision of relevant hits across the engines was highest on two-word queries and lowest on one-word queries. Google performed best on natural language queries; Bing did the same (P?=?0.69) on two-word queries. The findings have implications for search engine ranking algorithms, relevance theory, search engine design, research design, and information literacy.
This study investigates the trend of global concentration in scientific research and technological innovation around the world. It accepts papers and patents as appropriate data for revealing the development and status of science and technology respectively. The performance of these outputs in production and citation impact is taken into consideration in the analysis. The findings suggest that both papers and patents are geographically concentrated on a small number of countries, including the United States, the United Kingdom, Japan, Germany, and France. China has made great progress in paper production and citation impact, and Taiwan and Korea have experienced a rapid growth in patents over the past years. The degree of concentration dramatically decreases when the data from the United States are excluded, indicating the effects of the U.S.'s participation on the concentration. Patents show a higher degree of concentration than papers. With time-varying aspects taken into consideration, the study indicates that the degree of concentration of papers and patents has gradually decreased over time. The concentration of patents has declined more slowly than that of papers. This decrease of the concentration is mainly due to the reduction of the predominant role of the U.S. in world R&D output. (C) 2012 Elsevier Ltd. All rights reserved.
Over the past decade, national research evaluation exercises, traditionally conducted using the peer review method, have begun opening to bibliometric indicators. The citations received by a publication are assumed as proxy for its quality, but they require standardization prior to use in comparative evaluation of organizations or individual scientists: the citation data must be standardized, due to the varying citation behavior across research fields. The objective of this paper is to compare the effectiveness of the different methods of normalizing citations, in order to provide useful indications to research assessment practitioners. Simulating a typical national research assessment exercise, he analysis is conducted for all subject categories in the hard sciences and is based on the Thomson Reuters Science Citation Index-Expanded (R). Comparisons show that the citations average is the most effective scaling parameter, when the average is based only on the publications actually cited. (C) 2012 Elsevier Ltd. All rights reserved.
'Discometrics', a long neglected area of informetric studies was revisited in a network context. Cooperation between jazz musicians was analysed using the recent 'Hirschian' concepts of network informetrics. Partnership Ability Index (phi) was found to be a useful measure to characterize the way performers are embedded in their partnership network. Indications of some positive relations between phi and other 'qualities' of the performers were found. (C) 2012 Elsevier Ltd. All rights reserved.
This study utilizes the artificial neural network to explore the nonlinear relationships between patent performance and the corporate performance of the pharmaceutical companies. Patent performance measured from patent H index, patent citations, and essential technological strength (ETS). The result shows that patent H index, patent citations, and ETS has the nonlinear effect on the corporate performance of the pharmaceutical companies. (C) 2012 Elsevier Ltd. All rights reserved.
We propose two new indices that are able to measure a scientific researcher's overall influence and the level of his/her works' association with the main-stream research subjects within a scientific field. These two new measures - the total influence index and the mainstream index - differ from traditional performance measures such as the simple citation count and the h-index in that they take into account the indirect influence of an author's work. Indirect influence describes a scientific publication's impact upon subsequent works that do not reference it directly. The two measures capture indirect influence information from the knowledge emanating paths embedded in the citation network of a target scientific field. We take the Hirsch index, data envelopment analysis, and lithium iron phosphate battery technology field to examine the characteristics of these two measures. The results show that the total influence index favors earlier researchers and successfully highlights those researchers who have made crucial contributions to the target scientific field. The main-stream index, in addition to underlining total influence, also spotlights active researchers who enter into a scientific field in a later development stage. In summary, these two new measures are valuable complements to traditional scientific performance measures. (C) 2012 Elsevier Ltd. All rights reserved.
Given the recent trend in bibliometrics and information science to use increasingly complex statistical methods, it is necessary to have powerful toolboxes to work with data from Web of Science (Thomson Reuters). We developed such a toolbox with four specific commands for the statistical software package Stata. These commands refer to (1) the import of downloads from Web of Science to Stata, (2) the preprocessing of address information from authors of publications in the downloaded set, (3) the geocoding of address information, and (4) the calculation of the minimum and maximum distance between several co-authors of a single paper. An advantage of developing commands for an established and comprehensive statistical software package (like Stata) is that a large number of further commands are available for the analysis of bibliometric data. We will describe some of these useful commands as well. (C) 2012 Elsevier Ltd. All rights reserved.
The relationship between researchers' publishing and citing behaviours has received little examination despite its potential importance in scholarly communication, particularly at an international level. To remedy this we studied documents and their references indexed in Thomson Reuters's Web of Science (WoS) in the period 2000-2009 to compare journal publishing behaviours against journal citing behaviours across the world. The results reveal that most publications in, and citations to, all five quality based strata of journals examined come from scientifically and economically advanced countries. Nevertheless, in proportion to their total number of citations given to WoS journals, it seems that less developed countries cite high-quality journals at the same rate as developed countries and so the poorer publishing of less developed countries does not seem to be due to a lack of access to top journals. Moreover, examining the publishing and citing trends of countries revealed a decreasing rate of high-income and Scientifically Advanced Countries (SACs) publications in, and citations to, all quality ranges of journals in comparison to the increasing rate of publications and citations of other groups. Finally, research cooperation between developed and less developed countries seems to positively influence the publishing behaviour of the latter as their publications co-authored with developed countries were published more often in top journals. (C) 2012 Elsevier Ltd. All rights reserved.
The paper first introduces the basic problems of author bibliographic coupling including the relationship between author bibliographic coupling and document bibliographic coupling as well as the three calculation methods of author coupling strength, namely, simple method, minimum method and combined method. Next I choose a small sample of authors in Chinese library and information science (LIS) as the research objects to have a comparative analysis of three types of author coupling strength algorithms (the data source is from the Chinese Social Sciences Citation Index (CSSCI)). The result shows that the minimum method is the most appropriate one to calculate the author coupling strength. Then a large sample of authors is chosen to analyze the intellectual structure of Chinese LIS. The result shows that author bibliographic coupling analysis (ABCA) can discover the intellectual structure of a discipline better. It is also found that compared with author cocitation analysis (ACA), ABCA has the advantage that it not only can discover the intellectual structure of a discipline more comprehensively and concretely but also can reflect the research frontier of the discipline. Finally, some practical problems that arise during this research are discussed. (C) 2012 Elsevier Ltd. All rights reserved.
This paper investigates the citation impact of three large geographical areas - the U.S., the European Union (EU), and the rest of the world (RW) - at different aggregation levels. The difficulty is that 42% of the 3.6 million articles in our Thomson Scientific dataset are assigned to several sub-fields among a set of 219 Web of Science categories. We follow a multiplicative approach in which every article is wholly counted as many times as it appears at each aggregation level. We compute the crown indicator and the Mean Normalized Citation Score (MNCS) using for the first time sub-field normalization procedures for the multiplicative case. We also compute a third indicator that does not correct for differences in citation practices across sub-fields. It is found that: (1) No geographical area is systematically favored (or penalized) by any of the two normalized indicators. (2) According to the MNCS, only in six out of 80 disciplines - but in none of 20 fields - is the EU ahead of the U.S. In contrast, the normalized U.S./EU gap is greater than 20% in 44 disciplines, 13 fields, and for all sciences as a whole. The dominance of the EU over the RW is even greater. (3) The U. S. appears to devote relatively more - and the RW less - publication effort to sub-fields with a high mean citation rate, which explains why the U.S./EU and EU/RW gaps for all sciences as a whole increase by 4.5 and 5.6 percentage points in the un-normalized case. The results with a fractional approach are very similar indeed. (C) 2012 Elsevier Ltd. All rights reserved.
In the present work we introduce a modification of the h-index for multi-authored papers with contribution based author name ranking. The modified h-index is denoted by h(mc)-index. It employs the framework of the h(m)-index, which in turn is a straightforward modification of the Hirsch index, proposed by Schreiber. To retain the merit of requiring no additional rearrangement of papers in the h(m)-index and in order to overcome its shortage of benefiting secondary authors at the expense of primary authors, h(mc)-index uses combined credit allocation (CCA) to replace fractionalized counting in the h(m)-index. The h(m)-index is a special form of h(mc)-index and fits for papers with equally important authors or alphabetically ordered authorship. There is a possibility of an author of lower contribution to the whole scientific community obtaining a higher h(mc)-index. Rational h(mc)-index, denoted by h(mcr)-index, can avoid it. A fictitious example as a model case and two empirical cases are analyzed. The correlations of the h(mcr)-index with the h-index and its several variants considering multiple co-authorship are inspected with 30 researchers' citation data. The results show that the h(mcr)-index is more reasonable for authors with different contributions. A researcher playing more important roles in significant work will obtain higher h(mcr)-index. (C) 2012 Elsevier Ltd. All rights reserved.
The process of assessing individual authors should rely upon a proper aggregation of reliable and valid papers' quality metrics. Citations are merely one possible way to measure appreciation of publications. In this study we propose some new, SJR- and SNIP-based indicators, which not only take into account the broadly conceived popularity of a paper (manifested by the number of citations), but also other factors like its potential, or the quality of papers that cite a given publication. We explore the relation and correlation between different metrics and study how they affect the values of a real-valued generalized h-index calculated for 11 prominent scientometricians. We note that the h-index is a very unstable impact function, highly sensitive for applying input elements' scaling. Our analysis is not only of theoretical significance: data scaling is often performed to normalize citations across disciplines. Uncontrolled application of this operation may lead to unfair and biased (toward some groups) decisions. This puts the validity of authors assessment and ranking using the h-index into question. Obviously, a good impact function to be used in practice should not be as much sensitive to changing input data as the analyzed one. (C) 2012 Elsevier Ltd. All rights reserved.
Project funding is an increasingly important mode of research funding. The rationale is that through project funding new fields and new themes can be supported more effectively. Furthermore, project funding improves competition, which is expected to select the better research projects and researchers. However, project funding has a price, as it requires researchers to invest time in reviewing proposals, and to participate in selection committees. In that perspective, selection committee membership can be seen as a service to the scholarly community. However, what do committee members themselves get from membership? In this paper we show that committee members in average are more successful in grant applications than other principle investigators, and this is not explained by performance differences. The findings suggest that committee membership is not only service, but also self-service. (C) 2012 Elsevier Ltd. All rights reserved.
In this paper, we discussed the feasibility of early recognition of highly cited papers with citation prediction tools. Because there are some noises in papers' citation behaviors, the soft fuzzy rough set (SFRS), which is well robust to noises, is introduced in constructing the case-based classifier (CBC) for highly cited papers. After careful design that included: (a) feature reduction by SFRS; (b) case selection by the combination use of SFRS and the concept of case coverage; (c) reasoning by two classification techniques of case coverage based prediction and case score based prediction, this study demonstrates that the highly cited papers could be predicted by objectively assessed factors. It shows that features included the research capabilities of the first author, the papers' quality and the reputation of journal are the most relevant predictors for highly cited papers. (C) 2012 Elsevier Ltd. All rights reserved.
The study tries to construct the diffusion models of crisis information in micro blog. We propose three information release patterns in micro blog according to the duration of crisis information released, namely concentrated release, continuous release, and pulse release. Based on Logistic function, three respective diffusion models are constructed. We choose three crisis events to test the diffusion models using the variables of the number of micro blogs with the crisis information (NMCI) and the increment of NMCI. The estimate results show that the diffusion of crisis information in micro blogs can be described by Logistic function, and the growth curve of NMCI is S-shaped. (C) 2012 Elsevier Ltd. All rights reserved.
Newly introduced bibliometric indices may be biased by the preference of scientists for bibliometric indices, in which their own research receives a high score. To test such a hypothesis, the publication and citation records of nine scientists who recently proposed new bibliometric indices were analyzed in terms of standard indicators, their own indicators, and indicators recently proposed by other scientists. The result of the test was negative, that is, newly introduced bibliometric indices did not favor their authors. (C) 2012 Elsevier Ltd. All rights reserved.
Most networks in information science appear as weighted networks, while many of them (e.g. author citation networks, web link networks and knowledge flow networks) are directed networks. Based on the definition of the h-degree, the directed h-degree is introduced for measuring both weighted networks and directed networks. After analyzing the properties and derived measures of the directed h-degree an actual application of LIS journals citation network is worked out. (C) 2012 Elsevier Ltd. All rights reserved.
Metrics based on percentile ranks (PRs) for measuring scholarly impact involves complex treatment because of various defects such as overvaluing or devaluing an object caused by percentile ranking schemes, ignoring precise citation variation among those ranked next to each other, and inconsistency caused by additional papers or citations. These defects are especially obvious in a small-sized dataset. To avoid the complicated treatment of PRs based metrics, we propose two new indicators-the citation-based indicator (CBI) and the combined impact indicator (CII). Document types of publications are taken into account. With the two indicators, one would no more be bothered by complex issues encountered by PRs based indicators. For a small-sized dataset with less than 100 papers, special calculation is no more needed. The CBI is based solely on citation counts and the CII measures the integrate contributions of publications and citations. Both virtual and empirical data are used so as to compare the effect of related indicators. The CII and the PRs based indicator I3 are highly correlated but the former reflects citation impact more and the latter relates more to publications. (C) 2012 Elsevier Ltd. All rights reserved.
The leaders of scientific groups appear in the last place (or in the first place) of the authors' lists of multi-author papers more often than other scientists (group-members). The preferential position of the group leader depends on the branch of science, geographical location and the time point. New tools to study the order of authors were introduced. The validity of assessment of the contributions of particular authors to the paper solely from their ranks in the authors' lists was challenged. (C) 2012 Elsevier Ltd. All rights reserved.
Because of the variations in citation behavior across research fields, appropriate standardization must be applied as part of any bibliometric analysis of the productivity of individual scientists and research organizations. Such standardization involves scaling by some factor that characterizes the distribution of the citations of articles from the same year and subject category. In this work we conduct an analysis of the sensitivity of researchers' productivity rankings to the scaling factor chosen to standardize their citations. To do this we first prepare the productivity rankings for all researchers (more than 30,000) operating in the hard sciences in Italy, over the period 2004-2008. We then measure the shifts in rankings caused by adopting scaling factors other than the particular factor that seems more effective for comparing the impact of publications in different fields: the citation average of the distribution of cited-only publications. (C) 2012 Elsevier Ltd. All rights reserved.
A novel method is proposed to monitor and record scientists' working timetable. We record the downloads information of scientific papers real-timely from Springer round the clock, and try to explore scientists' working habits. As our observation demonstrates, many scientists are still engaged in their research after working hours every day. Many of them work far into the night, even till next morning. In addition, research work also intrudes into their weekends. Different working time patterns are revealed. In the US, overnight work is more prevalent among scientists, while Chinese scientists mostly have busy weekends with their scientific research. (C) 2012 Elsevier Ltd. All rights reserved.
One hundred scientific and scholarly journal web sites were investigated to find out their use of social media tools and to examine attention data revealed by them. Seventy-eight scientific journals used social media tools, RSS being the most common. Interactive social media tools - Facebook, Twitter and blogs - were present on 19 journal web sites. Attention data were operationalised as liking, commenting or sharing postings on Facebook, Twitter or blog texts or linking to articles, liking a YouTube entry or following a journal on Twitter. Facebook and blog sites of the journals had varying roles with respect to content generated by readers and the journal, and the amount of attention data received by the journals' Facebook, Twitter and blog sites also showed great variation. In scientific communication, social media have a role of their own, complementing that of scientific journals, and their active use indicates the clear demand for them. Attention is difficult to measure also by social media, but their interactive features obviously indicate one part of it, and attention economy presents a fruitful viewpoint for studying scientific communication by providing relevant and useful concepts that describe its characteristics and factors that influence the attention it receives. (C) 2012 Elsevier Ltd. All rights reserved.
The aim of this brief communication is to reply to a letter by Kosmulski (Journal of Informetrics 6(3):368-369, 2012), which criticizes a recent indicator called "success-index". The most interesting features of this indicator, presented in Franceschini et al. (Scientometrics, in press), are: (i) allowing the selection of an "elite" subset from a set of publications and (ii) implementing the field-normalization at the level of an individual publication. We show that the Kosmulski's criticism is unfair and inappropriate, as it is the result of a misinterpretation of the indicator. (C) 2012 Elsevier Ltd. All rights reserved.
A new size-independent indicator of scientific journal prestige, the SJR2 indicator, is proposed. This indicator takes into account not only the prestige of the citing scientific journal but also its closeness to the cited journal using the cosine of the angle between the vectors of the two journals' cocitation profiles. To eliminate the size effect, the accumulated prestige is divided by the fraction of the journal's citable documents, thus eliminating the decreasing tendency of this type of indicator and giving meaning to the scores. Its method of computation is described, and the results of its implementation on the Scopus 2008 dataset is compared with those of an ad hoc Journal Impact Factor, JIF(3y), and SNIP, the comparison being made both overall and within specific scientific areas. All three, the SJR2 indicator, the SNIP indicator and the JIF distributions, were found to fit well to a logarithmic law. Although the three metrics were strongly correlated, there were major changes in rank. In addition, the SJR2 was distributed more equalized than the JIF by Subject Area and almost as equalized as the SNIP, and better than both at the lower level of Specific Subject Areas. The incorporation of the cosine increased the values of the flows of prestige between thematically close journals. (C) 2012 Elsevier Ltd. All rights reserved.
Citation curves for researchers with the same h index can vary greatly in the heaviness of their top (excess citations to core papers) or the heaviness of their tail (citations to non-core papers), revealing quantitative differences across researchers. Also, promotion to the next higher h depends only on citations received by a small subset of papers, so that researchers with a given h may have citation curves whose top and tail reveal a weaker impact than that of researchers with a lower h. To overcome these problems, we propose a two-sided h index, an extension that computes additional h indices progressively up the top and out the tail of the citation curve. This extension represents a citation curve descriptor one of whose elements is the scalar h. The advantages of the two-sided h index are illustrated through analysis of citation curves for 88 researchers with h indices ranging from 8 to 20. Several schemes are also discussed that use the two-sided h index to define criteria for ranking researchers within and across scalar h indices, according to whether the top of the citation curve, its tail, or both are deemed relevant under the circumstances in which research accomplishments are assessed. (C) 2012 Elsevier Ltd. All rights reserved.
There are different ways in which the authors of a scientific publication can determine the order in which their names are listed. Sometimes author names are simply listed alphabetically. In other cases, authorship order is determined based on the contribution authors have made to a publication. Contribution- based authorship can facilitate proper credit assignment, for instance by giving most credits to the first author. In the case of alphabetical authorship, nothing can be inferred about the relative contribution made by the different authors of a publication. In this paper, we present an empirical analysis of the use of alphabetical authorship in scientific publishing. Our analysis covers all fields of science. We find that the use of alphabetical authorship is declining over time. In 2011, the authors of less than 4% of all publications intentionally chose to list their names alphabetically. The use of alphabetical authorship is most common in mathematics, economics (including finance), and high energy physics. Also, the use of alphabetical authorship is relatively more common in the case of publications with either a small or a large number of authors. (C) 2012 Elsevier Ltd. All rights reserved.
Evaluating the performance of institutions with different resources is not easy, any citation distribution comparisons are strongly affected by the differences in the number of articles published. The paper introduces a method for comparing citation distributions of research groups that differ in size. The citation distribution of a larger group is reduced by a certain factor and compared with the original distribution of a smaller group. Expected values and tolerance intervals of the reduced set of citations are calculated. A comparison of both distributions can be conveniently viewed in a graph. The size-independent reduced Hirsch index - a function of reducing factor that allows the comparison of groups within a scientific field - is calculated in the same way. The method can be used for comparing groups or units differing in full-time equivalent, funding or the number of researchers, for comparing countries by population, gross domestic product, etc. It is shown that for the calculation of the reduced Hirsch index, the upper part of the original citation distribution is sufficient. The method is illustrated through several case comparisons. (C) 2012 Elsevier Ltd. All rights reserved.
Despite the huge amount of literature concerning the h-index, few papers have been devoted to its statistical analysis when a probabilistic distribution is assumed for citation counts. The present contribution mainly aims to divulge the inferential techniques recently introduced by Pratelli et al. (2012), by explaining the details for proper point and set estimation of the theoretical h-index. Moreover, some new achievements on simultaneous inference - addressed to produce suitable scholar comparisons - are carried out. Finally, the analysis of the citation dataset for the Nobel Laureates (in the last five years) and for the Fields medallists (from 2002 onward) is considered in order to exemplify the theoretical issues. (C) 2012 Elsevier Ltd. All rights reserved.
Communication and information sharing during the response to a major incident on oil rigs have been identified as significantly influencing capability to control, manage, and limit the effect of the incident. This article reports on one of the few studies of information sharing during such incidents. Interviews drawing on the critical incident technique were conducted with offshore emergency responders and supplemented by internal organizational reports and observations of emergency response exercises. We propose a counterintuitive relationship between trust and information sharing. We argue that better information sharing plays a crucial role in instilling or enhancing trust and that in the time-bound, uncertain, and highly volatile context of offshore emergency response, if trust collapses, then it must be rebuilt swiftly and this can be done through more effective information sharing. We explore this argument using the activity theory concept of contradictions and argue that apparent contradictions in the activity system and the behavior of emergency responders should be analyzed and interpreted by taking into account crucial contextual characteristics. The article draws on further support from relevant literature, including that of the information science, organization, and communication fields.
Drawing from the resource complementarity perspective of the resource-based view of a firm, this study examines the complementary role of governance dimensionsnamely, voice and accountability, political stability, government effectiveness, regulatory quality, rule of law, and control of corruptionon the relationship between information infrastructure in a country and its e-government development. Based on publicly available archival data from 178 countries, our results provide support for the hypothesized model. Specifically, whereas political stability, government effectiveness, and rule of law moderated the relationship of information infrastructure with e-government development in a positive direction, voice and accountability and control of corruption moderated the relationship negatively. Further, the relationship between information infrastructure and e-government development was not contingent on regulatory quality. Our findings contribute to the theoretical discourse on e-government development by highlighting the complementary role of governance and provide suggestions for practice in managing e-government development by enhancing governance, thereby leveraging the effect of information infrastructure on e-government development.
The open content creation process has proven itself to be a powerful and influential way of developing text-based content, as demonstrated by the success of Wikipedia and related sites. Distributed individuals independently edit, revise, or refine content, thereby creating knowledge artifacts of considerable breadth and quality. Our study explores the mechanisms that control and guide the content creation process and develops an understanding of open content governance. The repertory grid method is employed to systematically capture the experiences of individuals involved in the open content creation process and to determine the relative importance of the diverse control and guiding mechanisms. Our findings illustrate the important control and guiding mechanisms and highlight the multifaceted nature of open content governance. A range of governance mechanisms is discussed with regard to the varied levels of formality, the different loci of authority, and the diverse interaction environments involved. Limitations and opportunities for future research are provided.
Earlier studies found that web hyperlink data contain various types of information, ranging from academic to political, that can be used to analyze a variety of social phenomena. Specifically, the numbers of inlinks to academic websites are associated with academic performance, while the counts of inlinks to company websites correlate with business variables. However, the scarcity of sources from which to collect inlink data in recent years has required us to seek new data sources. The recent demise of the inlink search function of Yahoo! made this need more pressing. Different alternative variables or data sources have been proposed. This study compared three types of web data to determine which are better as academic and business quality estimates, and what are the relationships among the three data sources. The study found that Alexa inlink and Google URL citation data can replace Yahoo! inlink data and that the former is better than the latter. Alexa is even better than Yahoo!, which has been the main data source in recent years. The unique nature of Alexa data could explain its relative advantages over other data sources.
Relationships between authors based on characteristics of published literature have been studied for decades. Author cocitation analysis using mapping techniques has been most frequently used to study how closely two authors are thought to be in intellectual space based on how members of the research community co-cite their works. Other approaches exist to study author relatedness based more directly on the text of their published works. In this study we present static and dynamic word-based approaches using vector space modeling, as well as a topic-based approach based on latent Dirichlet allocation for mapping author research relatedness. Vector space modeling is used to define an author space consisting of works by a given author. Outcomes for the two word-based approaches and a topic-based approach for 50 prolific authors in library and information science are compared with more traditional author cocitation analysis using multidimensional scaling and hierarchical cluster analysis. The two word-based approaches produced similar outcomes except where two authors were frequent co-authors for the majority of their articles. The topic-based approach produced the most distinctive map.
This experiment studied the impact of various task phrasings on the search process. Eighty-eight searchers performed four web search tasks prescribed by the researchers. Each task was linked to an existing target web page, containing a piece of text that served as the basis for the task. A matching phrasing was a task whose wording matched the text of the target page. A nonmatching phrasing was synonymous with the matching phrasing, but had no match with the target page. Searchers received tasks for both types in English and in Hebrew. The search process was logged. The findings confirm that task phrasing shapes the search process and outcome, and also user satisfaction. Each search stageretrieval of the target page, visiting the target page, and finding the target answerwas associated with different phenomena; for example, target page retrieval was negatively affected by persistence in search patterns (e.g., use of phrases), user-originated keywords, shorter queries, and omitting key keywords from the queries. Searchers were easily driven away from the top-ranked target pages by lower-ranked pages with title tags matching the queries. Some searchers created consistently longer queries than other searchers, regardless of the task length. Several consistent behavior patterns that characterized the Hebrew language were uncovered, including the use of keyword modifications (replacing infinitive forms with nouns), omitting prefixes and articles, and preferences for the common language. The success self-assessment also depended on whether the wording of the answer matched the task phrasing.
Although chat-based reference services have been studied for over a decade and guidelines have been developed for effective communication in virtual reference service, little is known about the actual sources of miscommunication in these interactions. Our study uses a conversation analytic framework to investigate the types of potential or actual problems in communication that occur between librarians and patrons in chat reference interactions at a university library. Conversation analysis methodology, as developed by Harvey Sacks, Emanuel Schegloff, and Gain Jefferson, provides an empirical basis for claims about problems with communication by investigating what the participants themselves display as problematic or potentially problematic. Based on a corpus of archived chat reference transcripts, we show what types of problems in communication are displayed in the interaction, primarily through repair initiations, whether the problems are resolved, and, if so, how. Sources of problems that were targeted by both librarians and patrons included mistyping, typing in the wrong window, ambiguous terminology, differences in expertise between patrons and librarians, and the difficulty of giving and following instructions while not copresent. We conclude with implications for the training of future librarians in performing chat reference interactions.
Online question-answering (Q&A) services are becoming increasingly popular among information seekers. We divide them into two categories, social Q&A (SQA) and virtual reference (VR), and examine how experts (librarians) and end users (students) evaluate information within both categories. To accomplish this, we first performed an extensive literature review and compiled a list of the aspects found to contribute to a good answer. These aspects were divided among three high-level concepts: relevance, quality, and satisfaction. We then interviewed both experts and users, asking them first to reflect on their online Q&A experiences and then comment on our list of aspects. These interviews uncovered two main disparities. One disparity was found between users expectations with these services and how information was actually delivered among them, and the other disparity between the perceptions of users and experts with regard to the aforementioned three characteristics of relevance, quality, and satisfaction. Using qualitative analyses of both the interviews and relevant literature, we suggest ways to create better hybrid solutions for online Q&A and to bridge the gap between experts and users understandings of relevance, quality, and satisfaction, as well as the perceived importance of each in contributing to a good answer.
Helen Brownson was a federal government employee from 1942 to 1970. At a time when scientific data were becoming exceedingly hard to manage, Brownson was instrumental in coordinating national and international efforts for more efficient, cost-effective, and universal information exchange. Her most significant contributions to documentation/information science were during her years at the National Science Foundation's Office of Scientific Information. From 1951 to 1966, Brownson played a key role in identifying and subsequently distributing government funds toward projects that sought to resolve information-handling problems of the time: information access, preservation, storage, classification, and retrieval. She is credited for communicating the need for information systems and indexing mechanisms to have stricter criteria, standards, and evaluation methods; laying the foundation for present-day NSF-funded computational linguistics projects; and founding several pertinent documentation/information science publications including the Annual Review of Information Science and Technology.
It is shown that under certain circumstances in particular for small data sets, the recently proposed citation impact indicators I3(6PR) and R(6,k) behave inconsistently when additional papers or citations are taken into consideration. Three simple examples are presented, in which the indicators fluctuate strongly and the ranking of scientists in the evaluated group is sometimes completely mixed up by minor changes in the database. The erratic behavior is traced to the specific way in which weights are attributed to the six percentile rank classes, specifically for the tied papers. For 100 percentile rank classes, the effects will be less serious. For the six classes, it is demonstrated that a different way of assigning weights avoids these problems, although the nonlinearity of the weights for the different percentile rank classes can still lead to (much less frequent) changes in the ranking. This behavior is not undesired because it can be used to correct for differences in citation behavior in different fields. Remaining deviations from the theoretical value R(6,k)?=?1.91 can be avoided by a new scoring rule: the fractional scoring. Previously proposed consistency criteria are amended by another property of strict independence at which a performance indicator should aim.
For measuring multilevel impact, we introduce the distributive h-indices, which balance two important components (breadth and strength) of multilevel impact at various citing levels. After exploring the theoretical properties of these indices, we studied two cases: 57 library and information science (LIS) journals and social science research in 38 European countries/territories. Results reveal that there are approximate power-law relations between distributive h-indices and some underlying citation indicators, such as total citations, total citing entities, and the h-index. Distributive h-indices provide comprehensive measures for multilevel impact, and lead to a potential tool for citation analysis, particularly at aggregative levels.
A major stumbling block preventing machines from understanding text is the problem of entity disambiguation. While humans find it easy to determine that a person named in one story is the same person referenced in a second story, machines rely heavily on crude heuristics such as string matching and stemming to make guesses as to whether nouns are coreferent. A key advantage that humans have over machines is the ability to mentally make connections between ideas and, based on these connections, reason how likely two entities are to be the same. Mirroring this natural thought process, we have created a prototype framework for disambiguating entities that is based on connectedness. In this article, we demonstrate it in the practical application of disambiguating authors across a large set of bibliographic records. By representing knowledge from the records as edges in a graph between a subject and an object, we believe that the problem of disambiguating entities reduces to the problem of discovering the most strongly connected nodes in a graph. The knowledge from the records comes in many different forms, such as names of people, date of publication, and themes extracted from the text of the abstract. These different types of knowledge are fused to create the graph required for disambiguation. Furthermore, the resulting graph and framework can be used for more complex operations.
Recommender systems can mitigate the information overload problem and help workers retrieve knowledge based on their preferences. In a knowledge-intensive environment, knowledge workers need to access task-related codified knowledge (documents) to perform tasks. A worker's document referencing behavior can be modeled as a knowledge flow (KF) to represent the evolution of his or her information needs over time. Document recommendation methods can proactively support knowledge workers in the performance of tasks by recommending appropriate documents to meet their information needs. However, most traditional recommendation methods do not consider workers KFs or the information needs of the majority of a group of workers with similar KFs. A group's needs may partially reflect the needs of an individual worker that cannot be inferred from his or her past referencing behavior. In other words, the group's knowledge complements that of the individual worker. Thus, we leverage the group perspective to complement the personal perspective by using hybrid approaches, which combine the KF-based group recommendation method (KFGR) with traditional personalized-recommendation methods. The proposed hybrid methods achieve a trade-off between the group-based and personalized methods by exploiting the strengths of both. The results of our experiment show that the proposed methods can enhance the quality of recommendations made by traditional methods.
The author presents a different view on properties of impact measures than given in the paper of De Visscher (2011). He argues that a good impact measure works better when citations are concentrated rather than spread out over articles. The author also presents theoretical evidence that the g-index and the R-index can be close to the square root of the total number of citations, whereas this is not the case for the A-index. Here the author confirms an assertion of De Visscher.
An evaluation exercise was performed involving 313 papers of research staff (66 persons) of the Deutsche Rheuma-Forschungszentrum (DRFZ) published in 2004-2008. The records and citations to them were retrieved from the Web of Science (Thomson Reuters) in March 2010. The authors compared productivity and citedness of "group leaders" vs. "regular scientists", of "male scientists" vs. "female scientists" using citation-based indexes. It was found that "group leaders" are more prolific and cited more often than "regular scientists", the same is true considering "male" vs. "female scientists". The greatest contrast is observed between "female leaders" and "female regular scientists". The above mentioned differences are significant in indexes related to the number of papers, while values of indexes characterizing the quality of papers (average citation rate per paper and similar indexes) are not substantially different among the groups compared. The mean value of percentile rank index for all the 313 papers is 58.5, which is significantly higher than the global mean value of about 50. This fact is evidence of a higher citation status, on average, of the publications from the DRFZ.
Productivity and citedness of the staff of a German medical research institution are analyzed. It was found in our previous study (Pudovkin et al.: Scientometrics, doi: 10.1007/s11192-012-0659-z, 2012) that male scientists are more prolific and cited more often than female scientists. We explain in our present study one of the possible causes for obtaining this result with reference to Abramo et al. (Scientometrics 84(3): 821-833, 2009), who found in the small subgroups of star scientists a higher performance of male star scientists with respect to female star scientists; but in the remaining complementary subpopulations the performance gap between the two sexes is marginal. In agreement with Abramo et al. (2009), in our small subgroup of star scientists a higher performance of male star scientists with respect to female star scientists could be found. Contrasting, in the large complementary subgroup even a slightly higher performance of female scientists with respect to male scientists was identified. The last is even stronger expressed in favor of women than Abramo's result that the performance gap between the two sexes is truly marginal. In addition to Abramo et al. (2009), we already found in our previous study, special indexes characterizing the quality of papers (but not quantity) are not substantially different among sexes compared.
As our fields have become more sophisticated, complex, and specialized, we deal with ever larger masses of data, and our quantitative results have become more detailed and esoteric, and difficult to interpret. Because our methods are predominantly quantitative, we tend to overlook or underemphasize the qualitative judgments that enter at every stage of our work, and to forget that quantity is only one of the qualities. As in our world today, where we face a flood of factoids and quantitative data stripped of context, and struggle to evaluate it, to give it meaning, and make it into information, so ought we qualitatively to acknowledge and contextualize our research results, not only to make them more relevant, meaningful, and useful to the larger world, but to give our work greater impact and value.
This study on research collaboration (RC) is an attempt to estimate the degree of internationalization of academic institutions and regions. Furthermore potential influences of RC on excellence initiatives of modern universities are investigated relying on source data obtained from SCImago Institutions Rankings. A positive correlation exists between the degree of collaboration and the normalized impact. However, in contrast to output the normalized impact increase progression is non-linear and fluctuating. Differences occur regarding output volume and normalized impact at geographical region level for the leading universities. Different patterns of the Brute force distribution for each collaboration type were also observed at region level as well as at subject area level. A continuously reduced percentage of the domestic (non-collaboration) academic output is a world trend, whereas a steady increase of "international + national" collaboration is observed globally, however, less distinctive in Asia than in the other regions. The impact of Latin American papers originating from domestic production as well as from national collaboration remains considerably below world average values.
Nanotechnology is promising to be the 'transformative' technology of the 21st century with its boundless potential to revolutionize a wide range of industries. Stakes are high as projected estimate of market value and economic and social benefits are immense for countries that can attain competency in this technology. This has stimulated OECD countries as well as emerging economies to channel huge resources for developing core capabilities in this technology. Unlike, other key technologies, recent influential reports highlight China in particular and to some extent India, Brazil and other emerging economies competing with advanced OECD countries in 'nanotechnology'. The present paper investigates through bibliometric and innovation indicators to what extent China and India have been able to assert their position in the global stage. The paper also underscores the importance of capturing indications from standards and products/processes along with publications and patents to capture more accurately the latent variable 'performance'. Study shows that China's progress is remarkable; it has already attained leading position in publications and standard development. India is making its presence more visible particularly in publications. China's research is more sophisticated and addresses nano-materials and its applications whereas India's research shows healthy trend towards addressing developmental problems.
This paper develops a structured comparison among a sample of European researchers in the field of Production Technology and Manufacturing Systems, on the basis of scientific publications and patents. Researchers are evaluated and compared by a variegated set of indicators concerning (1) the output of individual researchers and (2) that of groups of researchers from the same country. While not claiming to be exhaustive, the results of this preliminary study provide a rough indication of the publishing and patenting activity of researchers in the field of interest, identifying (dis)similarities between different countries. Of particular interest is a proposal for aggregating analysis results by means of maps based on publication and patent indicators. A large amount of empirical data are presented and discussed.
This study describes the results of a preliminary bibliometric analysis of 611 research items, published between 1996 and 2011 by researchers affiliated with Creative Research Institution (CRIS) and the Center for Advanced Science and Technology (CAST), Hokkaido University (HU), retrieved from the Web of Science (WoS) database. CRIS has a primary mission to promote cutting-edge, world-class, trans-departmental research within HU, and it conducts fundamental, commercialization-related, cross-disciplinary research and nurtures young in-house/recruited researchers through targeted, innovative tenure-track programs in multiple disciplines. Its research output derives from 3- to 7-year-long time-bound projects funded strategically by HU, external grants [e.g., MEXT Super-COE HU Research and Business Park Project (FY2003-7)], industry-university collaboration with regional businesses, and endowments (e.g., Meiji Dairies). Analyses using co-words, bibliographic coupling, overlay map aided with visualization, etc., lead to the following inferences: (i) The published items comprise a dozen well-defined (inter-)disciplinary clusters, dominated by 3 macro-disciplines (biomedical science, 33%; chemistry, 21%; agricultural science, ca. 10%) that constitute 18 clusters used for mapping; (ii) research conducted by externally funded or endowed projects in the biomedical, physical and environmental science and technology fields (3 broad areas of aggregation derived from the Science Overlay Map) is interdisciplinary; and (iii) there is an apparently low visibility of publications from projects jointly executed with industries to an almost complete absence of output from CRIS in the fields of social sciences in the WoS database.
The notion of core documents and their application is discussed in the context of scientometric networks. An interesting solution of the problem of the arbitrariness of thresholds emerges from the application of Hirsch-type indices to dense networks as are typically observed in local clustering. Examples from several disciplines in the sciences and social sciences illustrate how these core vertices can be determined using this approach, and visualise how core documents are applied to represent the internal structure of the complete network or of parts of it.
There is an increasing need both to understand the translation of biomedical research into improved healthcare and to assess the range of wider impacts from health research such as improved health policies, health practices and healthcare. Conducting such assessments is complex and new methods are being sought. Our new approach involves several steps. First, we developed a qualitative citation analysis technique to apply to biomedical research in order to assess the contribution that individual papers made to further research. Second, using this method, we then proposed to trace the citations to the original research through a series of generations of citing papers. Third, we aimed eventually to assess the wider impacts of the various generations. This article describes our comprehensive literature search to inform the new technique. We searched various databases, specific bibliometrics journals and the bibliographies of key papers. After excluding irrelevant papers we reviewed those remaining for either general or specific details that could inform development of our new technique. Various characteristics of citations were identified that had been found to predict their importance to the citing paper including the citation's location; number of citation occasions and whether the author(s) of the cited paper were named within the citing paper. We combined these objective characteristics with subjective approaches also identified from the literature search to develop a citation categorisation technique that would allow us to achieve the first of the steps above, i.e., being able routinely to assess the contribution that individual papers make to further research.
The causes of gender bias favoring men in scientific and scholarly systems are complex and related to overall gender relationships in most of the countries of the world. An as yet unanswered question is whether in research publication gender bias is equally distributed over scientific disciplines and fields or if that bias reflects a closer relation to the subject matter. We expected less gender bias with respect to subject matter, and so analysed 14 journals of gender studies using several methods and indicators. The results confirm our expectation: the very high position of women in co-operation is striking; female scientists are relatively overrepresented as first authors in articles. Collaboration behaviour in gender studies differs from that of authors in PNAS. The pattern of gender studies reflects associations between authors of different productivity, or "masters" and "apprentices" but the PNAS pattern reflects associations between authors of roughly the same productivity, or "peers". It would be interesting to extend the analysis of these three-dimensional collaboration patterns further, to see whether a similar characterization holds, what it might imply about the patterns of authorship in different areas, what those patterns might imply about the role of collaboration, and whether there are differences between females and males in collaboration patterns.
The objective of this paper is to propose a new unsupervised incremental approach in order to follow the evolution of research themes for a given scientific discipline in terms of emergence or decline. Such behaviors are detectable by various methods of filtering. However, our choice is made on the exploitation of neural clustering methods in a multi-view context. This new approach makes it possible to take into account the incremental and chronological aspects of information by opening the way to the detection of convergences and divergences of research themes at a large scale.
Cancer research outputs in India have expanded greatly in recent years, with some concomitant increase in their citation scores. Part of the increase in output is attributable to greater coverage in the Web of Science of Indian journals, which are more clinical than international ones, and much less often cited. Other measures of esteem have also increased, such as the percentage of reviews and the immediacy with which Indian cancer articles are cited. Most of the output came from just nine of the 35 Indian states and Union Territories, led by New Delhi and Maharashtra. The distribution of the amount of research by cancer site correlates moderately positively with the relative disease burden, with mouth (head and neck) cancer (often caused by the chewing of tobacco or areca, betel or paan) causing the highest number of deaths and also being well researched. We also analysed the articles by type of research, with articles in genetics and chemotherapy being the most numerous. For articles published in 2009-2010, data were available on the funding acknowledgements, and we found, as expected, that articles in clinical subjects were less often supported by external funding than ones in basic research. The major source of support was the Government of India, with relatively small contributions from charities and industry, unlike the situation in the UK and other western European countries.
This article proposes a conceptual framework to study diffusion of knowledge via collaborative social interactions. The framework primes deliberation on (i) nature of knowledg, (ii) chain of knowledge process, and (iii) modes of knowledge transfer while examining mechanisms of knowledge diffusion and collaboration structure. Within such a differentiation scheme while information is considered as a form of filtered data within a context of relevancies, knowledge is considered as a systematically processed information that is bound to individual or collective actions and praxis. The framework is applied employing an empirical research method based on meta-network analysis. The examplary case traces how management sciences related knowledge is diffused and what collaboration structures are exhibited by Turkish management academia from 1920s until 2008. Results from knowledge diffusion models which have been devised and tested in this study hint that management knowledge within local publications follows patterns of information diffusion rather than patterns of knowledge transfer found elsewhere. On the other hand, it is seen that cognitive demand of publishing in citation indexed global journals have given way to cohesive collaborating teams as mean of collaborative knowledge production and transfer.
This paper examines the Web visibility of researchers in the field of communication. First, we measured the Web visibility of authors who have recently published their research in communication journals contained in the Social Science Citation Index (SSCI) provided by the Web of Science. Second, we identified a subset of authors based on their publication outlets and summarize those researchers with the highest Web presence. Lastly, we determined the factors affecting their Web visibility by using a set of national and linguistic variables of the individual researchers. Web data were collected by using a Bing.com advanced search tool based on the API. Web presence is defined as the number of Web (co-) mentions of each researcher. We identified the most solely-visible scholars in the entire communication webosphere and scholars with the most networked visibility based on co-mentions. There is a weak but statistically significant correlation between researchers' Web visibility and their SSCI publication counts. Further, US-based and/or English-speaking scholars were more noticeable than others on cyberspace.
The study explored the feasibility of using Web keyword analysis as an alternative to link analysis and tested the feasibility in a multi-industry environment. The keyword is the organization's name, in this case the company name. American companies from five industries were included in the study. The study found that the Web visibility of a company as measured by the number of Webpages on which the company name appears correlates with the company's business measures (revenue, profits, and assets). The correlation coefficients are similar to that between the inlink counts and the business measures. This suggests that the keyword count (searched by the company name) could replace inlink count as an alternative indicator of some commonly used business measures. The co-word (the co-occurrence of the names of two companies on Webpages) count was used as a measure of the relatedness of the two companies. Multidimensional scaling (MDS) analysis was applied to the co-word matrices and generated MDS maps that showed relationships among companies in a multi-industry context. Keyword data were collected from three different types of Websites (general Websites, blog sites, and Web news sites) and results were compared. The study found blog sites to be the better source to collect data for this type of study. The comparison of MDS maps generated from co-link data and the blog co-word data showed that the co-word analysis is as effective as co-link analysis in mapping business relationships. The value of the study is not limited to the business sector as the co-word method could be applied to analysing relationships among other types of organizations.
This study assessed research patterns and trends of library and information science (LIS) in Korea by applying bibliometric analysis to 159 Korean LIS professors' 2,401 peer-reviewed publications published between 2001 and 2010. Bibliometric analysis of publication data found an increasing trend for collaboration, robust publication patterns, increasing number of international publications, and internationalization of LIS in Korea. The maturation and internalization of LIS research was evidenced in increased number of publications in high impact journals (e.g., SSI, SSCI), growing participation in leading international conferences (e.g., ASIST, TREC), increasing proportion of Korean LIS faculty with international degrees, and high publication rates by professors with international degrees. Though limited in its evaluative power without citation data, publication data can be a rich source for bibliometric analysis as this study has shown. The analysis of publication patterns conducted by the study, which is a first step in our aim to establish a multi-faceted approach for assessing the impact of scholarly work, will be followed up in a future study, where the question of quantity versus quality will be examined by comparing publication counts with citation counts.
To identify delayed recognition publications, or 'Sleeping Beauties' (SBs), that are scarcely cited in the years or decades following their publication, but then go on to become highly cited, we screened citation histories of 184,606 articles in 52 ophthalmology journals using the Science Citation Index-Expanded (Thomson Reuters). Nine articles were identified as SBs, which accounted for 0.005% of basic materials. The SBs were published in Archives of Ophthalmology (n = 3), American Journal of Ophthalmology (n = 3), Acta Ophthalmologica (n = 1), Investigative Ophthalmology and Visual Science (n = 1), and Japanese Journal of Clinical Ophthalmology (n = 1). For citation histories according to the conjuring SB from the fairy tale, the sleep duration ranged from 7 to 59 years with mean of 19.7 years, the depth of sleep as evaluated by the average citations per year during the sleeping period ranged from 0.09 to 0.82 with mean of 0.45 citations, and the awake intensity as determined by the average citations per year during the first 5 years period following awakening ranged from 3.60 to 17.80 with mean of 8.51 citations. The number of total citations up to 2010 ranged from 109 to 375 with mean of 176.3 citations. Topics of the SBs covered description of new clinical diseases including acute retinal necrosis syndrome, cancer-associated retinopathy, and polypoidal choroidal vasculopathy, correlate of central corneal thickness with intraocular pressure readings, inadvertent eyeball perforation in retrobulbar anesthesia, pharmacologic weakening of extraocular muscles, amniotic membrane graft for ocular surface reconstruction, and refractive surgery. These data provide a perspective of rare but interesting delayed citation articles in ophthalmology.
A relevant factor in the growth of academic productivity in the second half of 20th century is the implementation of the internet, particularly in developing countries. One of the first networks in Brazil is the Academic Network at Sao Paulo (ANSP), a regional network implemented in the state of Sao Paulo, which contains the largest concentration of researchers in the country. This study presents a unique metric for analyzing the impact of ANSP in academic productivity in the state of Sao Paulo. We correlate academic production and available bandwidth using Fisher ideal price index with suitable variables to evaluate the impact of the internet on research centers and universities. We find that the members of ANSP show a steady growth in academic productivity compared with other institutions outside of the ANSP network. These results suggest that policies which increase available bandwidth can positively affect academic productivity.
Information systems permeate every business function, thereby requiring holistic Information Systems (IS) approaches. Much academic research is still discipline specific. More interdisciplinary research is needed to inform both industry and academe. Interdisciplinary research has been positively associated with increased levels of innovation, productivity and impact. IS research contributes to the knowledge creation and innovation within IS and other College of Business (COB) disciplines. This research defines the intellectual structures within IS and between IS and other COB disciplines. We use a large scale, diachronic bibliometric analysis of COB journals to assess reciprocal knowledge exchange and also to identify potential intra- and interdisciplinary publication outlets. Our findings show an increase in IS knowledge contributions to other COB disciplines, which supports the discussion that IS is a reference discipline. Our research also visually depicts the intellectual structures within IS and between IS and other COB disciplines. Anyone exploring research in IS and allied COB disciplines can peruse the proximity maps to identify groups of similar journals. The findings from this research inform decisions related to which journals to read, target as publication outlets, and include on promotion and tenure lists.
Journal impact factors (JIF) have been an accepted indicator of ranking journals. However, there has been increasing arguments against the fairness of using the JIF as the sole ranking criteria. This resulted in the creation of many other quality metric indices such as the h-index, g-index, immediacy index, Citation Half-Life, as well as SCIMago journal rank (SJR) to name a few. All these metrics have their merits, but none include any great degree of normalization in their computations. Every citation and every publication is taken as having the same importance and therefore weight. The wealth of available data results in multiple different rankings and indexes existing. This paper proposes the use of statistical standard scores or z-scores. The calculation of the z-scores can be performed to normalize the impact factors given to different journals, the average of z-scores can be used across various criteria to create a unified relative measurement (RM) index score. We use the 2008 JCR provided by Thomson Reuters to demonstrate the differences in rankings that would be affected if the RM-index was adopted discuss the fairness that this index would provide to the journal quality ranking.
The aim of this paper is to present new ideas in evaluating Shanghai University's Academic Ranking of World Universities (ARWU). In particular, this paper shall try to determine whether the normalization of data affects University ranks. In accordance with this, both the normalized and original (raw) data for each of the six variables has been obtained. Based on a sample containing the 54 US universities which are placed in the ARWU top 100, the statistical I-distance method was performed. The results showed great inconsistencies between university ranks obtained for the original and normalized data. These findings were then analyzed and the universities that had the greatest fluctuation in their ranks were noted.
The aim of this article is to test the model analysis conceived by Terry Shinn on the autonomy and unity of science. For him, the differentiation of sciences can be explained in a large part by the diffusion of generic instruments created by research-technologists moving in interstitial arenas between higher education, industry, statistics institutes or the military. We have applied this analysis to research on depression by making the hypothesis that psychiatric rating scales could have played a similar role in the development of this scientific field. To that purpose, we proceeded to a lexicographic study of keywords mentioned in articles listed by the PsycINFOA (c) data base on this subject between 1950 and 2000. In order to realize an associated words analysis, we constructed a co-occurrence matrix and used clustering analysis based on a grouping index; that is, the equivalency index. We obtained significant aggregates of keywords associated with significant periods, or major moments, of the development of research on depression. This periodization confirmed the structural role played by psychiatric rating scales in the development of this scientific field, and led us to discuss and to extend some elements of the model initiated by Shinn.
Based on new comparison principles that take into account both the volume of scientific production and its impact, this paper proposes a method for defining reference classes of universities. Several tools are developed in order to enable university managers to define the value system according to which their university shall be compared to others. We apply this methodology to French universities and illustrate it using the reference classes of the best ranked universities according to several value systems.
An analysis of the changing publication patterns in the Social Sciences and Humanities (SSH) in the period 2000-2009 is presented on the basis of the VABB-SHW, a full coverage database of peer reviewed publication output in SSH developed for the region of Flanders, Belgium. Data collection took place as part of the Flemish performance-based funding system for university research. The development of the database is described and an overview of its contents presented. In terms of coverage of publications by the Web of Science we observe considerable differences across disciplines in the SSH. The overall growth rate in number of publications is over 62.1%, but varies across disciplines between 7.5 and 172.9%. Publication output grew faster in the Social Sciences than in the Humanities. A steady increase in the number and the proportion of publications in English is observed, going hand in hand with a decline in publishing in Dutch and other languages. However, no overall shift away from book publishing is observed. In the Humanities, the share of book publications even seems to be increasing. The study shows that additional full coverage regional databases are needed to be able to characterise publication output in the SSH.
This paper proposes a method for classifying true papers of a set of focal scientists and false papers of homonymous authors in bibliometric research processes. It directly addresses the issue of identifying papers that are not associated ("false") with a given author. The proposed method has four steps: name and affiliation filtering, similarity score construction, author screening, and boosted trees classification. In this methodological paper we calculate error rates for our technique. Therefore, we needed to ascertain the correct attribution of each paper. To do this we constructed a small dataset of 4,253 papers allegedly belonging to a random sample of 100 authors. We apply the boosted trees algorithm to classify papers of authors with total false rate no higher than 30% (i.e. 3,862 papers of 91 authors). A one-run experiment achieves a testing misclassification error 0.55%, testing recall 99.84%, and testing precision 99.60%. A 50-run experiment shows that the median of testing classification error is 0.78% and mean 0.75%. Among the 90 authors in the testing set (one author only appeared in the training set), the algorithm successfully reduces the false rate to zero for 86 authors and misclassifies just one or two papers for each of the remaining four authors.
To study the behavior of Italian researchers living in Italy with a view to creating appropriate policies to tackle the brain drain and discourage academics from weight in driving emigrating, we constructed a survey based on a sample of 4,700 Italian researchers (assistant professors) in several universities in Italy. The outlook is far from rosy: Italian researchers are generally dissatisfied with the economic and social situation of the country. Strong family ties represent the element keeping them at home in Italy. In this regard, no particular differences were noted between the North and South of the country. In analyzing the Italian academic system we identified factors that have greater weight in driving Italian intellectual talent to emigrate: the country's higher education system leaves all dissatisfied. Furthermore, we discovered other factors that, albeit weak, keep Italian researchers in Italy. However, one wonders how much longer family and national ties will be able to keep Italian skilled agents in Italy, and whether such dissatisfaction may jeopardize the country's future economic development.
Objective measures of research performance are necessary to facilitate academic advancement of trainee physicians. In this cross-sectional study, all anaesthetists (n = 98) in higher specialist training in Ireland were surveyed to determine bibliometrics of their scientific publications and individual and institutional characteristics that can influence research productivity. For trainees with publications, the median (range) h-index was 1 (0-4). There was a positive correlation between participation in a formal research program and increased research productivity using mean citations per publication (r (2) = 0.26, P = 0.006) and h-index (r (2) = 0.26, P = 0.006). There was a positive correlation between formal mentorship and mean citations per publication (r (2) = 0.15, P = 0.04) and h-index (r (2) = 0.17, P = 0.03).
The share of nanotechnology publications involving authors from more than one country more than doubled in the 1990s, but then fell again until 2004, before recovering somewhat during the latter years of the decade. Meanwhile, the share of nanotechnology papers involving at least one Chinese author increased substantially over the last two decades. Papers involving Chinese authors are far less likely to be internationally co-authored than papers involving authors from other countries. Nonetheless, this appears to be changing as Chinese nanotechnology research becomes more advanced. An arithmetic decomposition confirms that China's growing share of such research accounts, in large part, for the observed stagnation of international collaboration. Thus two aspects of the globalization of science can work in opposing directions: diffusion to initially less scientifically advanced countries can depress international collaboration rates, while at the same time scientific advances in such countries can reverse this trend. We find that the growth of China's scientific community explains some, but not all of the dynamics of China's international collaboration rate. We therefore provide an institutional account of these dynamics, drawing on Stichweh's [Social Science information 35(2):327-340, 1996] original paper on international scientific collaboration, which, in examining the interrelated development of national and international scientific networks, predicts a transitional phase during which science becomes a more national enterprise, followed by a phase marked by accelerating international collaboration. Validating the application of this approach, we show that Stichweh's predictions, based on European scientific communities in the 18th and 19th centuries, seem to apply to the Chinese scientific community in the 21st century.
The paper presents a methodology called hybrid documents co-citation analysis, for studying the interaction between science and technology in technology diffusion. Our approach rests mostly on patent citation, cluster analysis and network analysis. More specifically, with the patents citing Smalley RE in Derwent innovations index as the data sets, the paper implemented hybrid documents co-citation network through two procedures. Then spectrum cluster algorithm was used to reveal the knowledge structure in technology diffusion. After that, with the concordance between network properties and technology diffusion mechanisms, three indicators containing degree, betweenness and citation half-life, were calculated to discuss the basic documents in the pivotal position during the technology diffusion. At last, the paper summarized the hybrid documents co-citation analysis in practise, thus concluded that science and technology undertook different functions and acted dominatingly in the different period of technology diffusion, though they were co-activity all the time.
We find evidence for the universality of two relative bibliometric indicators of the quality of individual scientific publications taken from different data sets. One of these is a new index that considers both citation and reference counts. We demonstrate this universality for relatively well cited publications from a single institute, grouped by year of publication and by faculty or by department. We show similar behaviour in publications submitted to the arXiv e-print archive, grouped by year of submission and by sub-archive. We also find that for reasonably well cited papers this distribution is well fitted by a lognormal with a variance of around sigma(2) = 1.3 which is consistent with the results of Radicchi et al. (Proc Natl Acad Sci USA 105:17268-17272, 2008). Our work demonstrates that comparisons can be made between publications from different disciplines and publication dates, regardless of their citation count and without expensive access to the whole world-wide citation graph. Further, it shows that averages of the logarithm of such relative bibliometric indices deal with the issue of long tails and avoid the need for statistics based on lengthy ranking procedures.
Using a collection of papers gathered from the Web of Science, and defining disciplines by the JCR classification, this paper compares the disciplinary structure of the G7 countries (representing high S&T level countries) and the BRIC countries (representing fast breaking countries in S&T) by using bibliometric methods. It discusses the similarity and the balance of their disciplinary structure. We found that: (1) High S&T level countries have a similar national disciplinary structure; (2) In recent years the disciplinary structure of the BRIC countries has become more and more similar to that of the G7 countries; (3) The disciplinary structure of the G7 countries is more balanced than that of the BRIC countries (4) In the G7 countries more emphasis goes to the life sciences, while BRIC countries focus on physics, chemistry, mathematics and engineering.
In the present work we analyze the Country Profiles, open access data from ISI Thomson Reuter's Science Watch. The country profiles are rankings of the output (indexed in Web of Science) in different knowledge fields during a determined time span for a given country. The analysis of these data permits defining a Country Profile Index, a tool for diagnosing the activity of the scientific community of a country and their possible strengths and weakness. Furthermore, such analysis also enables the search for identities among research patterns of different countries, time evolution of such patterns and the importance of the adherence to the database journals portfolio in evaluating the productivity in a given knowledge field.
The institutional environment of science differs across countries. Its particularities have an impact on outcomes of scientific enterprise in terms of authorship patterns and patterns of citations. The paper analyzes scholarly papers produced by faculty and graduate students affiliated with six universities, two of them operate in the Russian institutional environment of science and four others-in the Western European and North American. The citation analysis of papers included in two major databases, eLibrary (Russian) and Web of Knowledge (international), shows that the lists of predictors for the number of references to a scholarly article significantly differ in the Western and Russian cases.
Empirical studies of obliteration by incorporation (OBI) may be conducted at the level of the database record or the fulltext citation-in-context. To assess the difference between the two approaches, 1,040 articles with a variant of the phrase "evolutionarily stable strategies" (ESS) were identified by searching the Web of Science (Thomson Reuters, Philadelphia, PA) and discipline-level databases. The majority (72%) of all articles were published in life sciences journals. The ESS concept is associated with a small set of canonical publications by John Maynard Smith; OBI represents a decoupling of the use of the phrase and a citation to a John Maynard Smith publication. Across all articles at the record level, OBI is measured by the number of articles with the phrase in the database record but which lack a reference to a source article (implicit citations). At the citation-in-context level, articles that coupled a non-Maynard Smith citation with the ESS phrase (indirect citations) were counted along with those that cited relevant Maynard Smith publications (explicit citations) and OBI counted only based on those articles that lacked any citation coupled with the ESS text phrase. The degree of OBI observed depended on the level of analysis. Record-level OBI trended upward, peaking in 2002 (62%), with a secondary drop and rebound to 53% (2008). Citation-in-context OBI percentages were lower with no clear pattern. Several issues relating to the design of empirical OBI studies are discussed.
Historically, papers have been physically bound to the journal in which they were published; but in the digital age papers are available individually, no longer tied to their respective journals. Hence, papers now can be read and cited based on their own merits, independently of the journal's physical availability, reputation, or impact factor (IF). We compare the strength of the relationship between journals' IFs and the actual citations received by their respective papers from 1902 to 2009. Throughout most of the 20th century, papers' citation rates were increasingly linked to their respective journals' IFs. However, since 1990, the advent of the digital age, the relation between IFs and paper citations has been weakening. This began first in physics, a field that was quick to make the transition into the electronic domain. Furthermore, since 1990 the overall proportion of highly cited papers coming from highly cited journals has been decreasing and, of these highly cited papers, the proportion not coming from highly cited journals has been increasing. Should this pattern continue, it might bring an end to the use of the IF as a way to evaluate the quality of journals, papers, and researchers.
The federal government has encouraged open access to publicly funded federal science research results, but it is unclear what knowledge can be gleaned from them and how the knowledge can be used to improve scientific research and shape federal research policies. In this article, we present the results of a preliminary study of cyberlearning projects funded by the National Science Foundation (NSF) that address these issues. Our work demonstrates that text-mining tools can be used to partially automate the process of finding NSF's cyberlearning awards and characterizing the fine-grained topics implicit in award abstracts. The methodology we have established to assess NSF's cyberlearning investments should generalize to other areas of research and other repositories of public-access documents.
The article provides an overview of the main ethical and associated political-economic aspects of the preservation of born-digital content and the digitization of analogue content for purposes of preservation. The term "heritage" is used broadly to include scientific and scholarly publications and data. Although the preservation of heritage is generally seen as inherently "good," this activity implies the exercise of difficult moral choices. The ethical complexity of the preservation of digital heritage is illustrated by means of two hypothetical cases. The first deals with the harvesting and preservation in a wealthy country of political websites originating in a less affluent country. The second deals with a project initiated by a wealthy country to digitize the cultural heritage of a less affluent country. The ethical reflection that follows is structured within the framework of social justice and a set of information rights that are identified as corollaries of generally recognized human rights. The main moral agents, that is, the parties that have an interest, and may be entitled to exercise rights, in relation to digital preservation, are identified. The responsibilities that those who preserve digital content have toward these parties, and the political-economic considerations that arise, are then analyzed.
Author attribution studies have demonstrated remarkable success in applying orthographic and lexicographic features of text in a variety of discrimination problems. What might poetic features, such as syllabic stress and mood, contribute? We address this question in the context of two different attribution problems: (a) kindred: differentiate Langston Hughes early poems from those of kindred poets and (b) diachronic: differentiate Hughes early from his later poems. Using a diverse set of 535 generic text features, each categorized as poetic or nonpoetic, correlation-based greedy forward search ranked the features and a support vector machine classified the poems. A small subset of features (similar to 10) achieved cross-validated precision and recall as high as 87%. Poetic features (rhyme patterns particularly) were nearly as effective as nonpoetic in kindred discrimination, but less effective diachronically. In other words, Hughes used both poetic and nonpoetic features in distinctive ways and his use of nonpoetic features evolved systematically while he continued to experiment with poetic features. These findings affirm qualitative studies attesting to structural elements from Black oral tradition and Black folk music (blues) and to the internal consistency of Hughes early poetry.
The Internet has substantially increased the online accessibility of scholarly publications and allowed researchers to access relevant information efficiently across different journals and databases (Costa & Meadows, ). Because of online accessibility, academic researchers tend to read more, and reading has become more superficial (Olle & Borrego, ), such that information overload has become an important issue. Given this circumstance, how the Internet affects knowledge transfer, or, more specifically, the citation behavior of researchers, has become a recent focus of interest. This study assesses the effects of the Internet on citation patterns in terms of 4 characteristics of cited documents: topic relevance, author status, journal prestige, and age of references. This work hypothesizes that academic scholars cite more topically relevant articles, more articles written by lower status authors, articles published in less prestigious journals, and older articles with online accessibility. The current study also hypothesizes that researcher knowledge level moderates such Internet effects. We chose the IT and Group subject area and collected 241 documents published in the pre-web period (1991-1995) and 867 documents published in the web-prevalent period (2006-2010) in the Web of Science database. The references of these documents were analyzed to test the proposed hypotheses, which are significantly supported by the empirical results.
In many data sets, articles are classified into subfields through the journals in which they have been published. The problem is that while many journals are assigned to a single subfield, many others are assigned to several. This article discusses a multiplicative and a fractional strategy to deal with this situation. The empirical part studies different aspects of citation distributions under the two strategies, namely: the number of articles, the mean citation rate, the broad shape of the distribution, their characterization in terms of size- and scale-invariant indicators of high and low impact, and the presence of extreme distributions, that is, distributions that behave very differently from the rest. We found that, despite large differences in the number of articles according to both strategies, the similarity of the citation characteristics of articles published in journals assigned to one or several subfields guarantees that choosing one of the two strategies may not lead to a radically different picture in practical applications. Nevertheless, the characterization of citation excellence through a high-impact indicator may considerably differ depending on that choice.
The growing complexity of challenges involved in scientific progress demands ever more frequent application of competencies and knowledge from different scientific fields. The present work analyzes the degree of collaboration among scientists from different disciplines to identify the most frequent combinations of knowledge in research activity. The methodology adopts an innovative bibliometric approach based on the disciplinary affiliation of publication coauthors. The field of observation includes all publications (167,179) indexed in the Science Citation Index Expanded for the years 2004-2008, authored by all scientists in the hard sciences (43,223) at Italian universities (68). The analysis examines 205 research fields grouped in 9 disciplines. Identifying the fields with the highest potential of interdisciplinary collaboration is useful to inform research polices at the national and regional levels, as well as management strategies at the institutional level.
Bibliometric techniques and social network analysis are used to define the patterns of international medical research in Latin America and the Caribbean based on information available in the Scopus database. The objective was to ascertain countries' capacity to establish intra- and extraregional scientific collaboration. The results show that increased output and citations in medical research have heightened the region's presence and participation in the international scientific arena. These findings may be partly influenced by the inclusion of new journals in the database and regional initiatives that may have enhanced collaboration and knowledge transfer in science. The overall rise in partnering rates is slightly greater intra- than extraregionally. The possible effect of geographic, idiomatic, and cultural proximity is likewise identified. The "scientific dependence" of small or developing countries would explain their high collaboration rates and impact. The evidence shows that the most productive countries draw from knowledge generated domestically or by their neighbors, which would explain why impact is so highly concentrated in the regions with the greatest output. The need to incentivize intraregional relationships must be stressed, although international initiatives should also be supported.
Multiple perspectives on the nonlinear processes of medical innovations can be distinguished and combined using the Medical Subject Headings (MeSH) of the MEDLINE database. Focusing on three main branches-"diseases," "drugs and chemicals," and "techniques and equipment"-we use base maps and overlay techniques to investigate the translations and interactions and thus to gain a bibliometric perspective on the dynamics of medical innovations. To this end, we first analyze the MEDLINE database, the MeSH index tree, and the various options for a static mapping from different perspectives and at different levels of aggregation. Following a specific innovation (RNA interference) over time, the notion of a trajectory which leaves a signature in the database is elaborated. Can the detailed index terms describing the dynamics of research be used to predict the diffusion dynamics of research results? Possibilities are specified for further integration between the MEDLINE database on one hand, and the Science Citation Index and Scopus (containing citation information) on the other.
This project develops the concept of sustainable information practice within the field of information science. The inquiry is grounded by data from a study of 2 ecovillages, intentional communities striving to ground their daily activities in a set of core values related to sustainability. Ethnographic methods employed for over 2 years resulted in data from hundreds of hours of participant observation, semistructured interviews with 22 community members, and a diverse collection of community images and texts. Analysis of the data highlights the tensions that arose and remained as community members experienced breakdowns between community values related to sustainability and their daily information practices. Contributions to the field of information science include the development of the concept of sustainable information practice, an analysis of why community members felt unable to adapt their information practices to better match community concepts of sustainability, and an assessment of the methodological challenges of information practice inquiry within a communal, nonwork environment. Most broadly, this work contributes to our larger understanding of the challenges faced by those attempting to identify and develop more sustainable information practices. In addition, findings from this investigation call into question previous claims that groups of individuals with strong value commitments can adapt their use of information tools to better support their values. In contrast, this work suggests that information practices can be particularly resilient to local, value-based adaptation.
Opinion retrieval is the task of finding documents that express an opinion about a given query. A key challenge in opinion retrieval is to capture the query-related opinion score of a document. Existing methods rely mainly on the proximity information between the opinion terms and the query terms to address the key challenge. In this study, we propose to incorporate the syntactic and semantic information of terms into a probabilistic model to capture the query-related opinion score more accurately. The syntactic tree structure of a sentence is used to evaluate the modifying probability between an opinion term and a noun within the sentence with a tree kernel method. Moreover, WordNet and the probabilistic topic model are used to evaluate the semantic relatedness between any noun and the given query. The experimental results over standard TREC baselines on the benchmark BLOG06 collection demonstrate the effectiveness of our proposed method, in comparison with the proximity-based method and other baselines.
This study examines how the goal orientation of individuals, rather than the goals of a search task, influences information search behavior for 2 discrete stages of the Internet search process. In an Internet-based experiment (N = 106) with temporarily activated motivational orientation (promotion vs. prevention) and message goal frames (gain vs. loss) as independent variables, it was demonstrated that participants selected information that was congruent with their motivational orientation although they did not necessarily spend more time attending to their selection. Participants with the promotion orientation exhibited more scanning behavior and viewed more web pages but spent less time on the Internet search. Congruency effects resulted in higher user engagement, which mediated the congruency effects on the perceived message quality of the health content.
Search engines are essential tools for web users today. They rely on a large number of features to compute the rank of search results for each given query. The estimated reputation of pages is among the effective features available for search engine designers, probably being adopted by most current commercial search engines. Page reputation is estimated by analyzing the linkage relationships between pages. This information is used by link analysis algorithms as a query-independent feature, to be taken into account when computing the rank of the results. Unfortunately, several types of links found on the web may damage the estimated page reputation and thus cause a negative effect on the quality of search results. This work studies alternatives to reduce the negative impact of such noisy links. More specifically, the authors propose and evaluate new methods that deal with noisy links, considering scenarios where the reputation of pages is computed using the PageRank algorithm. They show, through experiments with real web content, that their methods achieve significant improvements when compared to previous solutions proposed in the literature.
Studies have shown that natural language interfaces such as question answering and conversational systems allow information to be accessed and understood more easily by users who are unfamiliar with the nuances of the delivery mechanisms (e.g., keyword-based search engines) or have limited literacy in certain domains (e.g., unable to comprehend health-related content due to terminology barrier). In particular, the increasing use of the web for health information prompts us to reexamine our existing delivery mechanisms. We present enquireMe, which is a contextual question answering system that provides lay users with the ability to obtain responses about a wide range of health topics by vaguely expressing at the start and gradually refining their information needs over the course of an interaction session using natural language. enquireMe allows the users to engage in "conversations" about their health concerns, a process that can be therapeutic in itself. The system uses community-driven question-answer pairs from the web together with a decay model to deliver the top scoring answers as responses to the users' unrestricted inputs. We evaluated enquireMe using benchmark data from WebMD and TREC to assess the accuracy of system-generated answers. Despite the absence of complex knowledge acquisition and deep language processing, enquireMe is comparable to the state-of-the-art question answering systems such as START as well as those interactive systems from TREC.
We introduce a novel methodology for mapping academic institutions based on their journal publication profiles. We believe that journals in which researchers from academic institutions publish their works can be considered as useful identifiers for representing the relationships between these institutions and establishing comparisons. However, when academic journals are used for research output representation, distinctions must be introduced between them, based on their value as institution descriptors. This leads us to the use of journal weights attached to the institution identifiers. Since a journal in which researchers from a large proportion of institutions published their papers may be a bad indicator of similarity between two academic institutions, it seems reasonable to weight it in accordance with how frequently researchers from different institutions published their papers in this journal. Cluster analysis can then be applied to group the academic institutions, and dendrograms can be provided to illustrate groups of institutions following agglomerative hierarchical clustering. In order to test this methodology, we use a sample of Spanish universities as a case study. We first map the study sample according to an institution's overall research output, then we use it for two scientific fields (Information and Communication Technologies, as well as Medicine and Pharmacology) as a means to demonstrate how our methodology can be applied, not only for analyzing institutions as a whole, but also in different disciplinary contexts.
The way in which scientific publications are picked up by the research community can vary. Some articles become instantly cited, whereas others go unnoticed for some time before they are discovered or rediscovered. Papers with delayed recognition have also been labeled sleeping beauties. I briefly discuss an extreme case of a sleeping beauty. Peirce's short note in Science in 1884 shows a remarkable increase in citations since around 2000. The note received less than 1 citation per year in the decades prior to 2000, 3.5 citations per year in the 2000s, and 10.4 in the 2010s. This increase was seen in several domains, most notably meteorology, medical prediction research, and economics. The paper outlines formulas to evaluate a binary prediction system for a binary outcome. This citation increase in various domains may be attributed to a widespread, growing research focus on mathematical prediction systems and the evaluation thereof. Several recently suggested evaluation measures essentially reinvented or extended Peirce's 120-year-old ideas.
Research assessment carries important implications both at the individual and institutional levels. This paper examines the research outputs of scholars in business schools and shows how their performance assessment is significantly affected when using data extracted either from the Thomson ISI Web of Science (WoS) or from Google Scholar (GS). The statistical analyses of this paper are based on a large survey data of scholars of Canadian business schools, used jointly with data extracted from the WoS and GS databases. Firstly, the findings of this study reveal that the average performance of B scholars regarding the number of contributions, citations, and the h-index is much higher when performances are assessed using GS rather than WoS. Moreover, the results also show that the scholars who exhibit the highest performances when assessed in reference to articles published in ISI-listed journals also exhibit the highest performances in Google Scholar. Secondly, the absence of association between the strength of ties forged with companies, as well as between the customization of the knowledge transferred to companies and research performances of B scholars such as measured by indicators extracted from WoS and GS, provides some evidence suggesting that mode 1 and 2 knowledge productions might be compatible. Thirdly, the results also indicate that senior B scholars did not differ in a statistically significant manner from their junior colleagues with regard to the proportion of contributions compiled in WoS and GS. However, the results show that assistant professors have a higher proportion of citations in WoS than associate and full professors have. Fourthly, the results of this study suggest that B scholars in accounting tend to publish a smaller proportion of their work in GS than their colleagues in information management, finance and economics. Fifthly, the results of this study show that there is no significant difference between the contributions record of scholars located in English language and French language B schools when their performances are assessed with Google Scholar. However, scholars in English language B schools exhibit higher citation performances and higher h-indices both in WoS and GS. Overall, B scholars might not be confronted by having to choose between two incompatible knowledge production modes, but with the requirement of the evidence-based management approach. As a consequence, the various assessment exercises undertaken by university administrators, government agencies and associations of business schools should complement the data provided in WoS with those provided in GS.
The evaluation of the work of a researcher and its impact on the research community has been deeply studied in literature through the definition of several measures, first among all the h-index and its variations. Although these measures represent valuable tools for analyzing researchers' outputs, they usually assume the co-authorship to be a proportional collaboration between the parts, missing out their relationships and the relative scientific influences. In this work, we propose the d-index, a novel measure that estimates the dependence degree between authors on their research environment along their entire scientific publication history. We also present a web application that implements these ideas and provides a number of visualization tools for analyzing and comparing scientific dependences among all the scientists in the DBLP bibliographic database. Finally, relying on this web environment, we present case and user studies that highlight both the validity and the reliability of the proposed evaluation measure.
Despite the extensive studies conducted in the field of nanotechnology based on US patent data, the choice of a single database may impede a wider view of this technology frontier. Based on patent data from the Derwent Innovation Index database that covers the data of 41 major patent offices, we review the development of nanotechnology patenting from the dimensions of patenting authority and technological classification. We find that a small number of countries dominating the technology have similar technological diversity in terms of nanotechnology patents. After the discussion and summary of the citation modes and citation rate curve, we construct the patent citation networks at the patent document level and discuss the distinctive transnational citation patterns. We then use Search Path Count Method to extract the technological trajectory, where we find very high selectiveness. In the final section of this paper, we discover the small world phenomenon in the citation networks, which is widely investigated in undirected networks such as co-authorship networks, but rarely touched in citation networks due to the limitations of the presumptions. We propose the reachable path length and citation clustering in the revised small world model for acyclic directed networks and provide the realistic meaning of the new measures.
In this paper, the machine learning tools were used to identify key features influencing citation impact. Both the papers' external and quality information were considered in constructing papers' feature space. Based on the feature space, the soft fuzzy rough set was used to generate a series of associated feature subsets. Then, the KNN classifier was used to find the feature subset with the best classification performance. The results show that citation impact could be predicted by objectively assessed factors. Both the papers' quality and external features, mainly represented as the reputation of the first author, are contributed to future citation impact.
The intellectual structure and its evolution of library and information science (LIS) in China are analyzed with time series data from Chinese Social Sciences Citation Index which is the properest database for ACA practice in the field of social science at present. The result indicates that the subfields of Library and Information Science in China kept changing from 1998 to 2007: some subfields have emerged and developed a lot, e.g., webometrics and competitive intelligence; some subfields maintain, e.g., bibliometrics and intellectual property; and some subfields have begun to decline, e.g., cataloging. Through the comparison with the international LIS, it is found that there are some unique subfields in Chinese LIS from 1998 to 2007, such as competitive intelligence and intellectual property. At the same time, I also suggest that Chinese authors in LIS should pay more attention to the applied research in the future.
To the best of our knowledge, no works analyzing the participation of women as authors and editors in software engineering research publications currently exist. We have therefore followed a well-defined procedure in order to conduct an empirical study of female participation in 12 leading software engineering journals. We have analyzed the gender of the authors, editorial board members, associate editors and editors-in-chief over a two-year period in order to analyze, on the one hand, the rate of participation of women as authors and as editors in software engineering publications, and on the other, whether women are underrepresented. We have also analyzed the female distribution of authors and editors according to the geographical location of their institutions. This was done by first selecting the journals to be used as the population for data collection which then allowed us to identify female authors of papers and female editors, including the country in which their institutions are located. This eventually led to an analysis of female participation in order to understand representation rates. We analyzed 3,546 authors of 1,266 papers in 61 different countries, and 363 members of editorial boards in 30 different countries. The results of this analysis provide quantitative data concerning the participation of women as authors and editors in major software engineering journals including their distribution per country, in which important differences have been found. The results obtained were first used to compare the participation of women as authors and editors and were then used to carry out a series of simulations in order to be able to statistically confirm whether women are underrepresented. The study shows, amongst other things, that women are not underrepresented as editorial boards members and as editors-in-chief of the journals studied, although their representation as editors-in-chief is low.
Here we show a comparison of top economics departments in the US and EU based on a summary measure of the multidimensional prestige of influential papers in 2010. The multidimensional prestige takes into account that several indicators should be used for a distinct analysis of structural changes at the score distribution of paper prestige. We argue that the prestige of influential articles should not only consider one indicator as a single dimension, but in addition take into account further dimensions, since several different indicators have been developed to evaluate the impact of academic papers. After having identified the multidimensionally influential articles from an economics department, their prestige scores can be aggregated to produce a summary measure of the multidimensional prestige of research output of this department, which satisfies numerous properties.
The paper introduces the use of blockmodeling in the micro-level study of the internal structure of co-authorship networks over time. Variations in scientific productivity and researcher or research group visibility were determined by observing authors' role in the core-periphery structure and crossing this information with bibliometric data. Three techniques were applied to represent the structure of collaborative science: (1) the blockmodeling; (2) the Kamada-Kawai algorithm based on the similarities in co-authorships present in the documents analysed; (3) bibliometrics to determine output volume, impact and degree of collaboration from the bibliographic data drawn from publications. The goal was to determine the extent to which the use of these two complementary approaches, in conjunction with bibliometric data, provides greater insight into the structure and characteristics of a given field of scientific endeavour. The paper describes certain features of Pajek software and how it can be used to study research group composition, structure and dynamics. The approach combines bibliometric and social network analysis to explore scientific collaboration networks and monitor individual and group careers from new perspectives. Its application on a small-scale case study is intended as an example and can be used in other disciplines. It may be very useful for the appraisal of scientific developments.
The study of university-industry (U-I) relations has been the focus of growing interest in the literature. However, to date, a quantitative overview of the existing literature in this field has yet to be accomplished. This study intends to fill this gap through the use of bibliometric techniques. By using three different yet interrelated databases-a database containing the articles published on U-I links, which encompass 534 articles published between 1986 and 2011; a 'roots' database, which encompasses over 20,000 references to the articles published on U-I relations; and a 'influences' database which includes more than 15,000 studies that cited the articles published on U-I relations-we obtained the following results: (1) 'Academic spin offs', 'Scientific and technological policies' and (to a greater extent) 'Knowledge Transfer Channels' are topics in decline; (2) 'Characteristics of universities, firms and scientists', along with 'Regional spillovers', show remarkable growth, and 'Measures and indicators' can be considered an emergent topic; (3) clear tendency towards 'empirical' works, although 'appreciative and empirical' papers constitute the bulk of this literature; (4) the multidisciplinary nature of the intellectual roots of the U-I literature-an interesting blending of neoclassical economics (focused on licensing, knowledge transfer and high-tech entrepreneurship) and heterodox approaches (mainly related to systems of innovation) is observed in terms of intellectual roots; (5) the influence of the U-I literature is largely concentrated on the industrialized world and on the research area of innovation and technology (i.e., some 'scientific endogamy' is observed).
Most scientific research has some form of local geographical bias. This could be caused by researchers addressing a geographically localized issue, working within a nationally or regionally defined research network, or responding to research agendas that are influenced by national policy. These influences should be reflected in citation behavior, e.g., more citations than expected by chance of papers by scientists from institutions within the same country. Thus, assessing adjusted levels of national self-citation may give insights into the extent to which national research agendas and scientific cultures influence the behavior of scientists. Here we develop a simple metric of scientific insularism based on rates of national self citation corrected for total scientific output. Based on recent publications (1996-2010), higher than average levels of insularism are associated with geographically large rapidly developing nations (Brazil, Russia, India, and China-the so-called BRIC nations), and countries with strongly ideological political regimes (Iran). Moreover, there is a significant negative correlation between insularism and the average number of citations at the national level. Based on these data we argue that insularism (higher than average levels of national self-citation) may reflect scientific cultures whose priorities and focus are less tightly linked to global scientific norms and agendas. We argue that reducing such insularity is an overlooked challenge that requires policy changes at multiple levels of science education and governance.
Guidelines on authorship requirements are common in biomedical journals but it is not known how authorship is defined by journals and scholarly professional organizations across research disciplines. Prevalence of authorship statements, their specificity and tone, and contributions required for authorship were assessed in 185 journals from Science Citation Index (SCI) and Social Science Citation Index (SSCI), 260 journals from Arts & Humanities Citation Index (A&HCI) and 651 codes of ethics from professional organizations from the online database of the Center for the Study of Ethics in the Profession, USA. In SCI, 53 % of the top-ranked journals had an authorship statement, compared with 32 % in SSCI. In a random sample of A&HCI-indexed journals, only 6 % of the journals addressed authorship. Only 71 (11 %) codes of ethics carried a statement on authorship. Almost all journals had defined authorship criteria compared with 33 % of the ethics codes (chi(2)(1) = 75.975; P < 0.001). The tone of the statements in the journals was aspirational, whereas ethics codes used a normative language for defining authorship (chi(2)(1) = 51.709, P < 0.001). Journals mostly required both research and writing contributions for authorship, while two-thirds of the ethics codes defined only research as a mandatory contribution. In conclusion, the lack of and variety of authorship definitions in journals and professional organizations across scientific disciplines may be confusing for the researchers and lead to poor authorship practices. All stakeholders in research need to collaborate on building the environment where ethical behaviour in authorship is a norm.
The complexity and variety of bibliographic data is growing, and efforts to define new methodologies and techniques for bibliometric analysis are intensifying. In this complex scenario, one of the most crucial issues is the quality of data and the capability of bibliometric analysis to cope with multiple data dimensions. Although the problem of enforcing a multidimensional approach to the analysis and management of bibliographic data is not new, a reference design pattern and a specific conceptual model for multidimensional analysis of bibliographic data are still missing. In this paper, we discuss ten of the most relevant challenges for bibliometric analysis when dealing with multidimensional data, and we propose a reference data model that, according to different goals, can help analysis designers and bibliographic experts in working with large collections of bibliographic data.
Building on the ideas of Stirling (J R Soc Interface, 4(15), 707-719, 2007) and Rafols and Meyer (Scientometrics, 82(2), 263-287, 2010), we borrow models of genetic distance based on gene diversity and propose a general conceptual framework to investigate the diversity within and among systems and the similarity between systems. This framework can be used to reveal the relationship of systems weighted by the similarity of the corresponding categories. Application of the framework to scientometrics is explored to evaluate the balance of national disciplinary structures, and the homogeneity of disciplinary structures between countries.
Due to rapid environmental change, policymakers no longer choose foresight issues based on their own experience. Instead, they need to consider all the possible factors that will influence new technological developments and formulate an appropriate future technological development strategy to the country through the technology foresight system. For the sake of gathering more objective evidence to convince stakeholders to support the foresight issues, researchers can employ bibliometric analysis to describe current scientific development and forecast possible future development trends. Through this process, a consensus is reached about the direction of future technology development. However, we believe that bibliometric analysis can do more for technology policy formulation, such as (1) offer quantitative data as evidence to support the results of qualitative analysis; (2) review the situations of literature publication in specific technological fields to seize the current stage of technology development; and (3) help us grasp the relative advantage of foresight issues development in Taiwan and the world and develop profound strategic planning in accordance with the concept of Revealed Comparative Advantage. For those reasons, our research will revisit the role that bibliometric analysis plays for nations while choosing the foresight issues. In addition, we will analyze the development of the technology policy in Taiwan based on bibliometric analysis, and complete the foresight issues selection by processing key issue integration, key word collection related to this field, the searching and confirmation of literature, development opportunities exploration, comparative development advantage analysis and the innovation-foresight matrix construction, etc.
Recently there has been huge growth in the number of articles displayed on the Web of Science (WoS), but it is unclear whether this is linked to a growth of science or simply additional coverage of already existing journals by the database provider. An analysis of the category of journals in the period of 2000-2008 shows that the number of basic journals covered by Web of Science (WoS) steadily decreased, whereas the number of new, recently established journals increased. A rising number of older journals is also covered. These developments imply a crescive number of articles, but a more significant effect is the enlargement of traditional, basic journals in terms of annual articles. All in all it becomes obvious that the data set is quite instable due to high fluctuation caused by the annual selection criteria, the impact factor. In any case, it is important to look at the structures at the level of specific fields in order to differentiate between "real" and "artificial" growth. Our findings suggest that even-though a growth of about 34 % can be measured in article numbers in the period of 2000-2008, 17 % of this growth stems from the inclusion of old journals that have been published for a longer time but were simply not included in the database so far.
Patents are used as an indicator to assess the growth of science and technology in a given country or area. They are being examined to determine research potentials of research centers, universities, and inventors. The aim of this study is to map the past and current trends in patenting activities with a view to better understanding and tracking the changing nature of science and technology in Iran. The patenting activity in the Iran was investigated for the period 1976-2011, based on the USPTO, WIPO, and EPO (Esp@cenet). We analyzed the affiliation of inventers and collected patents which have at least an Iranian inventor. The collected data were analyzed applying Microsoft Excel. Analytical results demonstrate that between 1976 and 2011, 212 patents have been registered by Iranian inventors in the three above-mentioned databases. The average number of Iranian patents registered per year has increased significantly from 25 in 1976-1980 period to 119 in 2006-2011. It was noted that the highest number of registered patents (27 %) were in "chemistry, metallurgy" area of International Patent Classification (IPC), followed by "human necessities" (18 %), "electricity" (17 %), and "performing operations; transporting" (15 %). Overall, it can be concluded that patent-activities are highly country-specific, the results indicate that Iran is focused on "chemistry, metallurgy" technology.
There is substantial literature on research performance differences between male and female researchers, and its explanation. Using publication records of 852 social scientists, we show that performance differences indeed exist. However, our case study suggests that in the younger generation of researchers these have disappeared. If performance differences exist at all in our case, young female researchers outperform young male researchers. The trend in developed societies, that women increasingly outperform men in all levels of education, is also becoming effective in the science system.
As a novel tool for evaluating research competences of R&D actors, science overlay maps have recently been introduced in the scientometric literature, with associated measures for assessing the degree of diversification in research profiles. In this study, we continue the elaboration of this approach: based on science overlay maps (called here m-maps), a new type of map is introduced to reveal the competence structure of R&D institutions (i-maps). It is argued, that while m-maps represent the multidisciplinarity of research profiles, i-maps convey the extent of interdisciplinarity realized in them. Upon i-maps, a set of new measures are also proposed to quantify this feature. With these measures in hand, and also as a follow-up to our previous work, we apply these measures to a sample of Hungarian Research Institutions (HROs). Based on the obtained rankings, a principal component analysis is conducted to reveal main structural dimensions of researh portfolios (of HROs) covered by these measures. The position of HROs along these dimensions then allows us to draw a typology of organizations, according to various combinations of inter- and multidisciplinarity characteristic of their performance.
The recent literature on high skilled labor migration has taken a turn from analyzing processes of 'brain drain' to processes of 'brain gain' and 'brain circulation'. Returning scientists, having been affiliated to foreign institutes, are able to facilitate knowledge exchanges between the two locations, and facilitate the linkage of the national scientific community to international scientific cooperation projects. In this way, return scientists can have a disproportionate impact on the development of the scientific community in their country of origin. However, not all flows of return migrants have had such a positive impact. Returnees failed to affect developments in some localities, while producing ambiguous effects in others. These studies typically argue that the impact of return migrants is dependent on the absorptive capacity and the local social, cultural, and institutional context in the country of origin. Using data on return migrants within the Taiwanese economic academic community, this paper seeks to add to this literature by arguing that the impact of return migrants is not only dependent on the circumstances in their country of origin, but is also contingent on the nature and quality of the context in which they acquired their international labor experience. Skills and access to knowledge networks are heterogeneously spread over geographical space, so that the context in which a return migrant acquired his or her international labor experience matters.
Whereas in traditional peer review a few selected researchers (peers) are included in the manuscript review process, public peer review includes both invited reviewers (who write 'reviewer comments') and interested members of the scientific community who write comments ('short comments'). Available to us for this investigation are 390 reviewer comments and short comments assessing 119 manuscripts submitted to the journal Atmospheric Chemistry and Physics (ACP). We conducted a content analysis of these comments to determine differences in the main thematic areas considered by the scientists in their assessment comments. The results of the analysis show that in contrast to interested members of the scientific community, reviewers focus mainly on (1) the formal qualities of a manuscript, such as writing style, (2) the conclusions drawn in a manuscript, and (3) the future "gain" that could result from publication of a manuscript. All in all, it appears that 'reviewer comments' better than 'short comments' by interested members of the scientific community support the two main functions of peer review: selection and improvement of what is published.
Since 1968 the Croatian Mathematical Society has issued annual reports on activities of its members in the scientific journal Glasnik Matematiki. Based on these data was analysed production of mathematical scientific articles published in national and international journals in the period of forty years. A rough estimate of the intensity and dynamics of the publication shows that the publication of the reference period can be divided into two stages separated by the War in Croatia. After a period of uncertainty of the 2nd World war the period preceding was characterized by establishing new institutes, colleges and university departments. After the War in Croatia a gradual but large increase in the number of published articles was evident, especially in foreign journals. The War diminished technical writing almost to the zero while increase of scientific production was 9 times greater in 2008 than in 1968.
This study examines the effect of international collaboration of Slovenian authors and the status of journals where papers are published (as determined by their impact factors) on the impact of papers as measured by the number of citations papers receive. Research programme groups working in Slovenia in the 2004-2008 period in the fields of physics, chemistry, biology, biotechnology, and medical science were used for analyses. The results of the analyses show that the effects of the two factors differ among the fields. We discuss possible reasons for this, including the possibility that differences are the result of Slovenia's science policy.
The world-wide popularity of university rankings has spurred the debate about the quality and performance of higher education systems and has had a considerable impact on global society in light of the internationalisation of higher education. While useful for policy makers, such rankings also furnish information on an institution's "prestige", which may in turn contribute to more effective resource capture (students, funding, projects). Certain university profiles and missions may prevent many universities from climbing to higher positions, however. One important question in this regard is: how many of a country's universities can stand at the top of international rankings? The present article attempts to answer this question on the grounds of a study of the Spanish higher education system, and more specifically of an institutional alliance consisting of four high quality universities. A series of research activity indicators drawn from the IUNE Observatory are used to compare this alliance to leading Spanish and international universities and explore whether their visibility and consequently their position in international rankings would be enhanced if they were able to appear under a joint identity. This prospective study also addresses a series of strategies that the Spanish higher education system might implement to successfully rise to the challenges posed by future scenarios.
This paper examines the influence of economic, linguistic, and political factors in the scientific productivity of countries across selected scientific disciplines. Using a negative binomial regression model, I show that the effect of these determinants is contingent upon the scientific field under analysis. The only variable that exerts a positive and significant effect across all disciplines is the size of the economy. The linguistic variable only has a positive influence in the social sciences as well as in medicine and agricultural sciences. In addition, it is also demonstrated that the degree of political authoritarianism has a negative and statistically significant effect in some of the selected fields.
The nature of the empirical proportionality constant A in the relation L = Ah(2) between total number of citations L of the publication output of an author and his/her Hirsch index h is analyzed using data of the publication output and citations for six scientists elected to the membership of the Royal Society in 2006 and 199 professors working in different institutions in Poland. The main problem with the h index of different authors calculated by using the above relation is that it underestimates the ranking of scientists publishing papers receiving very high citations and results in high values of A. It was found that the value of the Hirsch constant A for different scientists is associated with the discreteness of h and is related to the tapered Hirsch index h(T) by A(1/2) approximate to 1.21h(T). To overcome the drawback of a wide range of A associated with the discreteness of h for different authors, a simple index, the radius R of circular citation area, defined as R = (L/pi)(1/2) approximate to h, is suggested. This circular citation area radius R is easy to calculate and improves the ranking of scientists publishing high-impact papers. Finally, after introducing the concept of citation acceleration a = L/t (2) = pi(R/t)(2) (t is publication duration of a scientist), some general features of citations of publication output of Polish professors are described in terms of their citability. Analysis of the data of Polish professors in terms of citation acceleration a shows that: (1) the citability of the papers of a majority of physics and chemistry professors is much higher than that of technical sciences professors, and (2) increasing fraction of conference papers as well as non-English papers and engagement in administrative functions of professors result in decreasing citability of their overall publication output.
National research assessment exercises are conducted in different nations over varying periods. The choice of the publication period to be observed has to address often contrasting needs: it has to ensure the reliability of the results issuing from the evaluation, but also reach the achievement of frequent assessments. In this work we attempt to identify which is the most appropriate or optimal publication period to be observed. For this, we analyze the variation of individual researchers' productivity rankings with the length of the publication period within the period 2003-2008, by the over 30,000 Italian university scientists in the hard sciences. First we analyze the variation in rankings referring to pairs of contiguous and overlapping publication periods, and show that the variations reduce markedly with periods above 3 years. Then we will show the strong randomness of performance rankings over publication periods under 3 years. We conclude that the choice of a 3 year publication period would seem reliable, particularly for physics, chemistry, biology and medicine.
The proportion of pathogenic microorganisms in the microbial world is relatively small, while their threat to human health, economic development and social stability is severe. The quantity and variation of Science Citation Index (SCI) literature related to pathogenic microorganisms may reflect the level of relevant research and the degree of attention. Here we compared trends in the quantity and variety of SCI literature relating to certain important pathogenic microorganisms published by scientists from United States and China from 1996 to 2010 by searching the Science Citation Index database. The pathogenic microorganisms in this study comprise two categories of pathogens: Bacillus anthracis, Yersinia pestis, Francisella tularensis, Ebola virus, Burkholderia pseudomallei, which belong to biodefense-associated pathogens (BDAPs) and the human immunodeficiency virus (HIV), SARS coronavirus, hepatitis B virus (HBV), Mycobacterium tuberculosis, influenza virus, which belong to the commonly encountered health-threatening pathogens. Our results showed that the United States (US) published much more SCI literature on these pathogens than China. Furthermore, literature on BDAPs published by scientists from the US has increased sharply since 2002. However, the numbers of literature relating to CEHTPs from China has demonstrated a gradual increase from 1996 to 2010. Research into pathogenic microorganisms requires three balance to be achieved: investment in BDAP and CEHTP studies; basic and applied research; a faster pace of research into pathogens and fulfilling biosafety and biosecurity requirements.
This study analyses the research output of Nepal in S&T during 2001-10 on several parameters including its growth and country publications share in the world's research output, country publications share in various subjects in the national and global context, pattern of research communication in core domestic and international journals, geographical distribution of publications, share of international collaborative publications at the national level as well as across subjects and characteristics of high productivity institutions, authors and cited papers. The Scopus Citation Database has been used to retrieve the publication data for 10 years.
An index system for evaluating academic papers is constructed and verified based on the empirical analysis of papers that has gained the 6th Chinese Academy of Social Sciences Award for Outstanding Achievements. Some new index, such as paper discipline impact factor, discipline average cited rate per paper and discipline average downloaded rate per paper have been put forward in this paper. The empirical research results show that the ranking of papers calculated by this evaluation index system is in conformity with the awards determined by peer review in general, but still needs to be verified and improved in practice.
Acupuncture, the most important nonpharmacological therapy in traditional Chinese medicine, has attracted significant attention since its introduction to the Western world. This study employs bibliometric analysis to examine the profile of publication activity related to it. The data are retrieved from the database of Science Citation Index Expanded during 1980-2009, and 7,592 papers are identified for analysis. This study finds that almost 20 % of papers are published in CAM journals, and the average cited times per acupuncture paper is 8.69. While the most cited article has been cited 2,109 times, however, 38.15 % of total publications have never been cited. Europe has the largest amount of authored papers with high h-index values; the USA has the largest number of publications on and citations of acupuncture based on country distribution, and this has continued as a significant rising trend. The proportion of collaborative papers shows this upward trend on the worldwide scale while the percentage shares of national collaborations are the highest. The USA produces the most international collaborative documents, although South Korea occupies the highest percentage figure for international collaborative papers. International collaborative papers are the most frequently cited. The average number of authors per paper is 3.69 in the top eight countries/regions. Papers contributed by South Korea are authored by the most people. International collaboration papers are authored by more people, except in Taiwan. South Korea's Kyung Hee University is ranked first in terms of number of papers while Harvard University in the USA accounts for the largest proportion of citations. The University of Exeter, Harvard University and Karolinska Institute have the highest h-index values.
A university may be considered as having dimension-specific prestige in a scientific field (e.g., physics) when a particular bibliometric research performance indicator exceeds a threshold value. But a university has multidimensional prestige in a field of study only if it is influential with respect to a number of dimensions. The multidimensional prestige of influential fields at a given university takes into account that several prestige indicators should be used for a distinct analysis of the influence of a university in a particular field of study. After having identified the multidimensionally influential fields of study at a university their prestige scores can be aggregated to produce a summary measure of the multidimensional prestige of influential fields at this university, which satisfies numerous properties. Here we use this summary measure of multidimensional prestige to assess the comparative performance of Spanish Universities during the period 2006-2010.
A bibliometric analysis was performed on solar power-related research between 1991 and 2010 in journals of all the subject categories of the Science Citation Index. "Solar cell", "solar energy", "solar power", "solar radiation" and "solar thermal" were selected as keywords to search in a part of the title, abstract or keywords. The trends were analyzed with the retrieved results in the publication type and language, characters of scientific output, publication distribution by countries, subject categories and journals, and the frequency of title-words and keywords used. Articles on solar power showed a significant growth along with more participation of countries, while the percentage of international papers reduced. The USA was the country with the most related articles and the most-frequent partner among all the international collaborative articles. Articles of Mainland China and South Korea grew much faster than other countries in the latest 5 years. Chemistry and material fields gradually became the mainstream of the solar power research. Synthetically analyzing three kinds of keywords, it showed that thin film solar photovoltaic technology was a hot spot of the solar power research in the past 20 years. "Dye-sensitized solar cell" and "organic" had extremely high increasing rates, which indicated that more attention was paid to kinds of organic solar cells. It could be concluded that the materials of solar cells would be the emphasis of solar power research in the twenty-first century.
Due to the overall decrease in quality of Taiwanese universities in recent years and the resulting drastic loss of competitive advantage against foreign countries, improving the Taiwanese university system has become an urgent issue requiring immediate attention. Evidence suggests focusing on total quality management (TQM) and that innovation is the key way to effectively upgrade the operation performance and is thus highly advised. Although there are a number of measurement models for TQM and innovation, early models evaluate the performance of each element separately, making evaluation inefficient and inappropriate for practice. There is a new measurement system, the network hierarchical feedback system (NHFS), which integrates the concept and characteristics of both elements; however, the major concern is that the NHFS does not take external organization-oriented improvement into account, such as service quality in higher education, especially in innovation orientation. Additionally, due to the above dilemmas faced by Taiwanese universities, attracting more students has now been a major priority. Thus, we argue that in order to successfully attract potential students, improving TQM and innovation cannot just focus on internal organization-oriented improvement, and we further extend the effectiveness and suitability of the NHFS to a novel and more utilizable performance measurement system, the solid Inno-Qual performance system (IQPS). A hybrid model based on a decision-making trial and evaluation laboratory, a fuzzy analytic network process (FANP), an importance-performance analysis along with in-depth interviews; a fuzzy analytic hierarchical process, and a technique for order preference according to similarity to an ideal solution were adopted to complete the construction. The IQPS is the first measurement system with the most effective characteristics of TQM and innovation embedded for both new and traditional universities of different types. It is intended to enhance and evaluate performance on both external and internal organization-oriented levels, generating synergy and performance improvement.
In this paper, we analyzed data relating to the language of papers written by winners of Nobel Prizes in physics before they won the prize and their journals of publication, and we identified the change in scientific language corresponding with shifts of the center of the scientific world. Using the science citation index as the main data source, we also collected information on the distribution of prize-winning scientists by country, by each scientist's number of published papers, and by language. We then analyzed their papers in terms of the different journals based in different countries. The results are presented in three parts: (1) the main languages used in the papers are English and German. The proportion of papers in English is gradually increasing, while that of papers in German is decreasing. (2) The prize winning scientists' papers have been published mainly in journals in their own nation and in the United States. (3) Journals based in their own countries are very helpful to these scientists early in their careers.
Keeping up with rapidly growing research fields, especially when there are multiple interdisciplinary sources, requires substantial effort for researchers, program managers, or venture capital investors. Current theories and tools are directed at finding a paper or website, not gaining an understanding of the key papers, authors, controversies, and hypotheses. This report presents an effort to integrate statistics, text analytics, and visualization in a multiple coordinated window environment that supports exploration. Our prototype system, Action Science Explorer (ASE), provides an environment for demonstrating principles of coordination and conducting iterative usability tests of them with interested and knowledgeable users. We developed an understanding of the value of reference management, statistics, citation text extraction, natural language summarization for single and multiple documents, filters to interactively select key papers, and network visualization to see citation patterns and identify clusters. A three-phase usability study guided our revisions to ASE and led us to improve the testing methods.
This paper introduces a keyword map of the labels used by the scientists registered in the Google Scholar Citations (GSC) database from December 2011. In all, 15,000 random queries were formulated to GSC to obtain a list of 26,682 registered users. From this list a network graph of 6,660 labels was built and classified according to the Scopus Subject Area classes. Results display a detailed label map of the most used (>15 times) tags. The structural analysis shows that the core of the network is occupied by computer sciencerelated disciplines that account for the most used and shared labels. This core is surrounded by clusters of disciplines related or close to computing such as Information Sciences, Mathematics, or Bioinformatics. Classical areas such as Chemistry and Physics are marginalized in the graph. It is suggested that GSC would in the future be an accurate source to map Science because it is based on the labels that scientists themselves use to describe their own research activity.
Classifying journals or publications into research areas is an essential element of many bibliometric analyses. Classification usually takes place at the level of journals, where the Web of Science subject categories are the most popular classification system. However, journal-level classification systems have two important limitations: They offer only a limited amount of detail, and they have difficulties with multidisciplinary journals. To avoid these limitations, we introduce a new methodology for constructing classification systems at the level of individual publications. In the proposed methodology, publications are clustered into research areas based on citation relations. The methodology is able to deal with very large numbers of publications. We present an application in which a classification system is produced that includes almost 10 million publications. Based on an extensive analysis of this classification system, we discuss the strengths and the limitations of the proposed methodology. Important strengths are the transparency and relative simplicity of the methodology and its fairly modest computing and memory requirements. The main limitation of the methodology is its exclusive reliance on direct citation relations between publications. The accuracy of the methodology can probably be increased by also taking into account other types of relationsfor instance, based on bibliographic coupling.
Knowledge creation and dissemination in science and technology systems are perceived as prerequisites for socioeconomic development. The efficiency of creating new knowledge is considered to have a geographical component, that is, some regions are more capable in terms of scientific knowledge production than others. This article presents a method of using a network representation of scientific interaction to assess the relative efficiency of regions with diverse boundaries in channeling knowledge through a science system. In a first step, a weighted aggregate of the betweenness centrality is produced from empirical data (aggregation). The subsequent randomization of this empirical network produces the necessary null model for significance testing and normalization (randomization). This step is repeated to provide greater confidence about the results (re-sampling). The results are robust estimates for the relative regional efficiency of brokering knowledge, which is discussed along with cross-sectional and longitudinal empirical examples. The network representation acts as a straightforward metaphor of conceptual ideas from economic geography and neighboring disciplines. However, the procedure is not limited to centrality measures, nor is it limited to geographical aggregates. Therefore, it offers a wide range of applications for scientometrics and beyond.
The Library of Congress Subject Headings (LCSH) is a subject structure used to index large library collections throughout the world. Browsing a collection through LCSH is difficult using current online tools in part because users cannot explore the structure using their existing experience navigating file hierarchies on their hard drives. This is due to inconsistencies in the LCSH structure, which does not adhere to the specific rules defining tree structures. This article proposes a method to adapt the LCSH structure to reflect a real-world collection from the domain of science and engineering. This structure is transformed into a valid tree structure using an automatic process. The analysis of the resulting LCSH tree shows a large and complex structure. The analysis of the distribution of information within the LCSH tree reveals a power law distribution where the vast majority of subjects contain few information items and a few subjects contain the vast majority of the collection.
The Leiden Ranking 2011/2012 is a ranking of universities based on bibliometric indicators of publication output, citation impact, and scientific collaboration. The ranking includes 500 major universities from 41 different countries. This paper provides an extensive discussion of the Leiden Ranking 2011/2012. The ranking is compared with other global university rankings, in particular the Academic Ranking of World Universities (commonly known as the Shanghai Ranking) and the Times Higher Education World University Rankings. The comparison focuses on the methodological choices underlying the different rankings. Also, a detailed description is offered of the data collection methodology of the Leiden Ranking 2011/2012 and of the indicators used in the ranking. Various innovations in the Leiden Ranking 2011/2012 are presented. These innovations include (1) an indicator based on counting a university's highly cited publications, (2) indicators based on fractional rather than full counting of collaborative publications, (3) the possibility of excluding non-English language publications, and (4) the use of stability intervals. Finally, some comments are made on the interpretation of the ranking and a number of limitations of the ranking are pointed out.
This study presents an analysis of the use of bibliographic references by individual scientists in three different research areas. The number and type of references that scientists include in their papers are analyzed, the relationship between the number of references and different impact-based indicators is studied from a multivariable perspective, and the referencing patterns of scientists are related to individual factors such as their age and scientific performance. Our results show inter-area differences in the number, type, and age of references. Within each area, the number of references per document increases with journal impact factor and paper length. Top-performance scientists use in their papers a higher number of references, which are more recent and more frequently covered by the Web of Science. Veteran researchers tend to rely more on older literature and non-Web of Science sources. The longer reference lists of top scientists can be explained by their tendency to publish in high impact factor journals, with stricter reference and reviewing requirements. Long reference lists suggest a broader knowledge on the current literature in a field, which is important to become a top scientist. From the perspective of the handicap principle theory, the sustained use of a high number of references in an author's oeuvre is a costly behavior that may indicate a serious, comprehensive, and solid research capacity, but that only the best researchers can afford. Boosting papers' citations by artificially increasing the number of references does not seem a feasible strategy.
The interactions of users with search engines can be seen as implicit relevance feedback by the user on the results offered to them. In particular, the selection of results by users can be interpreted as a confirmation of the relevance of those results, and used to reorder or prioritize subsequent search results. This collection of search/result pairings is called clickthrough data, and many uses for it have been proposed. However, the reliability of clickthrough data has been challenged and it has been suggested that clickthrough data are not a completely accurate measure of relevance between search term and results. This paper reports on an experiment evaluating the reliability of clickthrough data as a measure of the mutual relevance of search term and result. The experiment comprised a user study involving over 67 participants and determines the reliability of image search clickthrough data, using factors identified in previous similar studies. A major difference in this work to previous work is that the source of clickthrough data comes from image searches, rather than the traditional text page searches. Image search clickthrough data were rarely examined in prior works but has differences that impact the accuracy of clickthrough data. These differences include a more complete representation of the results in image search, allowing users to scrutinize the results more closely before selecting them, as well as presenting the results in a less obviously ordered way. The experiment reported here demonstrates that image clickthrough data can be more reliable as a relevance feedback measure than has been the case with traditional text-based search. There is also evidence that the precision of the search system influences the accuracy of click data when users make searches in an information-seeking capacity.
We present a theoretical framework to evaluate XML retrieval. XML retrieval deals with retrieving those document componentsthe XML elementsthat specifically answer a query. In this article, theoretical evaluation is concerned with the formal representation of qualitative properties of retrieval models. It complements experimental methods by showing the properties of the underlying reasoning assumptions that decide when a document is about a query. We define a theoretical methodology based on the idea of aboutness and apply it to current XML retrieval models. This allows comparing and analyzing the reasoning behavior of XML retrieval models experimented within the INEX evaluation campaigns. For each model we derive functional and qualitative properties that qualify its formal behavior. We then use these properties to explain experimental results obtained with some of the XML retrieval models.
Wikipedia is characterized by its dense link structure and a large number of articles in different languages, which make it a notable Web corpus for knowledge extraction and mining, in particular for mining the multilingual associations. In this paper, motivated by a psychological theory of word meaning, we propose a graph-based approach to constructing a cross-language association dictionary (CLAD) from Wikipedia, which can be used in a variety of cross-language accessing and processing applications. In order to evaluate the quality of the mined CLAD, and to demonstrate how the mined CLAD can be used in practice, we explore two different applications of the mined CLAD to cross-language information retrieval (CLIR). First, we use the mined CLAD to conduct cross-language query expansion; and, second, we use it to filter out translation candidates with low translation probabilities. Experimental results on a variety of standard CLIR test collections show that the CLIR retrieval performance can be substantially improved with the above two applications of CLAD, which indicates that the mined CLAD is of sound quality.
A new collaborative approach in information organization and sharing has recently arisen, known as collaborative tagging or social indexing. A key element of collaborative tagging is the concept of collective intelligence (CI), which is a shared intelligence among all participants. This research investigates the phenomenon of social tagging in the context of CI with the aim to serve as a stepping-stone towards the mining of truly valuable social tags for web resources. This study focuses on assessing and evaluating the degree of CI embedded in social tagging over time in terms of two-parameter values, number of participants, and top frequency ranking window. Five different metrics were adopted and utilized for assessing the similarity between ranking lists: overlapList, overlapRank, Footrule, Fagin's measure, and the Inverse Rank measure. The result of this study demonstrates that a substantial degree of CI is most likely to be achieved when somewhere between the first 200 and 400 people have participated in tagging, and that a target degree of CI can be projected by controlling the two factors along with the selection of a similarity metric. The study also tests some experimental conditions for detecting social tags with high CI degree. The results of this study can be applicable to the study of filtering social tags based on CI; filtered social tags may be utilized for the metadata creation of tagged resources and possibly for the retrieval of tagged resources.
Twitter is a popular microblogging service that is used to read and write millions of short messages on any topic within a 140-character limit. Popular or influential users tweet their status and are retweeted, mentioned, or replied to by their audience. Sentiment analysis of the tweets by popular users and their audience reveals whether the audience is favorable to popular users. We analyzed over 3,000,000 tweets mentioning or replying to the 13 most influential users to determine audience sentiment. Twitter messages reflect the landscape of sentiment toward its most popular users. We used the sentiment analysis technique as a valid popularity indicator or measure. First, we distinguished between the positive and negative audiences of popular users. Second, we found that the sentiments expressed in the tweets by popular users influenced the sentiment of their audience. Third, from the above two findings we developed a positive-negative measure for this influence. Finally, using a Granger causality analysis, we found that the time-series-based positive-negative sentiment change of the audience was related to the real-world sentiment landscape of popular users. We believe that the positive-negative influence measure between popular users and their audience provides new insights into the influence of a user and is related to the real world.
This study investigates the argument patterns in Yahoo! Answers, a major question and answer (Q&A) site. Mainly drawing on the ideas of Toulmin (), argument pattern is conceptualized as a set of 5 major elements: claim, counterclaim, rebuttal, support, and grounds. The combinations of these elements result in diverse argument patterns. Failed opening consists of an initial claim only, whereas nonoppositional argument pattern also includes indications of support. Oppositional argument pattern contains the elements of counterclaim and rebuttal. Mixed argument pattern entails all 5 elements. The empirical data were gathered by downloading from Yahoo! Answers 100 discussion threads discussing global warminga controversial topic providing a fertile ground for arguments for and against. Of the argument patterns, failed openings were most frequent, followed by oppositional, nonoppositional, and mixed patterns. In most cases, the participants grounded their arguments by drawing on personal beliefs and facts. The findings suggest that oppositional and mixed argument patterns provide more opportunities for the assessment of the quality and credibility of answers, as compared to failed openings and nonoppositional argument patterns.
The behavior of modern web robots varies widely when they crawl for different purposes. In this article, we present a framework to classify these web robots from two orthogonal perspectives, namely, their functionality and the types of resources they consume. Applying the classification framework to a year-long access log from the UConn SoE web server, we present trends that point to significant differences in their crawling behavior.
In this paper, we carry out an empirical analysis to address some questions concerning the flow of knowledge stemming from military patented technologies. Patented military technology consists of a set of inventions which nature, uses or/and applications have defensive or offensive purposes. In this paper, we focus on the field of weapons and ammunition. Our objective is to identify, why the knowledge embedded in a military technology diffuses into other patented technologies. The methodology relies on a patent citations analysis and involves the specification of several multilevel logit models to identify the individual and country characteristics that determine the citation of military patents in subsequent patents. The data contain 1,756 citations to 582 patents of military origin with a simultaneous Europe-US protection and registered by companies/institutions from 1998 to 2003. The results reveal that military knowledge diffuses more intensively across civil patents, when the original military patent includes diverse technologies (civil and military) and is progressively less specific in terms of weapons and ammunition. Military patents filed by British, French, US, Japanese and German companies are, in this order, more likely to have a larger number of citations in subsequent civil patents. The ownership of the original military patent is not a determining factor for explaining the diffusion into civil patents, but it does influence the diffusion across mixed and military technologies. Finally, the technological capacity of the citing company also affects the type and intensity of the diffusion of the military knowledge.
Thomson Reuter's ISI Web of Knowledge (or ISI for short) is used in the majority of benchmarking analyses and bibliometric research projects. Therefore, it is important to be aware of the limitations of data provided by ISI. This article deals with a limitation that disproportionally affects the Social Sciences: ISI's misclassification of journal articles containing original research into the "review" or "proceedings paper" category. I report on a comprehensive, 11 year analysis, of document categories for 27 journals in nine Social Science and Science disciplines. I show that although ISI's "proceedings paper" and "review" classifications seem to work fairly well in the Sciences, they illustrate a profound misunderstanding of research and publication practices in the Social Sciences.
One is inclined to conceptualize impact in terms of citations per publication, and thus as an average. However, citation distributions are skewed, and the average has the disadvantage that the number of publications is used in the denominator. Using hundred percentiles, one can integrate the normalized citation curve and develop an indicator that can be compared across document sets because percentile ranks are defined at the article level. I apply this indicator to the set of 58 journals in the WoS Subject Category of "Nanoscience & nanotechnology," and rank journals, countries, cities, and institutes using non-parametric statistics. The significance levels of results can thus be indicated. The results are first compared with the ISI-impact factors, but this Integrated Impact Indicator (I3) can be used with any set downloaded from the (Social) Science Citation Index. The software is made publicly available at the Internet. Visualization techniques are also specified for evaluation by positioning institutes on Google Map overlays.
A small number of studies have sought to establish that research papers with more funding acknowledgements achieve higher impact and have claimed that such a link exists because research supported by more funding bodies undergoes more peer review. In this paper, a test of this link is made using recently available data from the Web of Science, a source of bibliographic data that now includes funding acknowledgements. The analysis uses 3,596 papers from a single year, 2009, and a single journal, the Journal of Biological Chemistry. Analysis of this data using OLS regression and two ranks tests reveals the link between count of funding acknowledgements and high impact papers to be statistically significant, but weak. It is concluded that count of funding acknowledgements should not be considered a reliable indicator of research impact at this level. Relatedly, indicators based on assumptions that may hold true at one level of analysis may not be appropriate at other levels.
Because some cited references are not relevant to the citing patent and not all the relevant references are cited, the study attempts to use the bibliographic coupling (BC) approach to filter the irrelevant patent citations and supplement the relevant uncited patent citations to construct a patent citation network (PCN). The study selected the field of electric vehicle technology to explore the phenomenon and examined the characteristics of PCNs in terms of the average BC strength and the average citation time lag. Four PCNs were constructed in this study. The aggregated PCN (APCN) excluded the irrelevant patent citations and added the relevant uncited patent citations, which has brought out significant improvement. The APCN became more concentrated and the information which reserved in the APCN was the most current. Additionally, some invisible technology clusters and relationships were also manifested in the APCN.
Knowledge management has attracted an increasing number of researchers since the concept was born. Its research scope is expanding constantly and its research depth is strengthening. Also, in our country, there are experts and scholars in different fields carrying out researches into knowledge management theory and practice from their own point of view. In order to understand the present situation and the trend of knowledge management, this paper collected degree theses about knowledge management from 2006 to 2010 from Chinese Selected Doctoral Dissertations and Master's Theses Full-Text Databases (CDMD) to analyze. A total of 173 Ph.D. theses and 1,243 Master theses were found by retrieving "knowledge management" with title or keywords. Statistical analysis data shows: the number of the knowledge management degree theses has decreased since 2008; disciplines of degree theses distribute widely and they mainly concentrate on management science, technology and education field; the number of the research institutions has increased, however, the main research institutions are still prominent relatively; knowledge management research hot spots based on the lexical frequency distribution are wide and deep; knowledge management research hot spots based on the analysis of the key words are clear.
This paper borrows Strogatz's dynamic model for love affair between Romeo and Juliet and extends this model to nonlinear simultaneous differential equations model in order that we can characterize the dynamic interaction mechanisms and styles between science and technology (S&T). Then we further apply the proposed new model to the field of nanoscience and nanotechnology (N&N) for the purpose of analyzing the reciprocal dependence between S&T. The empirical results provide an understanding of the relationship between S&T and their dynamic potential of interdependence in the selected 20 leading universities in the field of N&N. We find that at present nanotechnology depends mainly on the scientific-push rather than the technology-pull and nanotechnology is science-based field. In contrast, a parallel development of the technology is not visible. Policy implications are at last put forward based on the several interesting findings for the interaction mechanisms between S&T in the field.
In this study, we combine bibliometric techniques with a machine learning algorithm, the sequential information bottleneck, to assess the interdisciplinarity of research produced by the University of Hawaii NASA Astrobiology Institute (UHNAI). In particular, we cluster abstract data to evaluate Thomson Reuters Web of Knowledge subject categories as descriptive labels for astrobiology documents, assess individual researcher interdisciplinarity, and determine where collaboration opportunities might occur. We find that the majority of the UHNAI team is engaged in interdisciplinary research, and suggest that our method could be applied to additional NASA Astrobiology Institute teams in particular, or other interdisciplinary research teams more broadly, to identify and facilitate collaboration opportunities.
We introduce a new quantitative measure of international scholarly impact of countries by using bibliometric techniques based on publication and citation data. We present a case study to illustrate the use of our proposed measure in the subject area Energy during 1996-2009. We also present geographical maps to visualize knowledge flows among countries. Finally, using correlation analysis between publication output and international scholarly impact, we study the explanatory power of the applied measure.
This paper proposes a citation rank based on spatial diversity (SDCR) in terms of cities and countries, focusing on the measurement of the "spatial" aspect in citation networks. Our main goal is to solve the citation bias caused by different geographical locations of citations. We empirically investigate spatial properties of citing distances, citation patterns and spatial diversity to understand geographical knowledge diffusion, based on the data from "Transportation Science and Technology" subject category in the Web of Science (1966-2009). We also compare the proposed ranking method with other bibliometric measures, and conduct a case study to figure out the recent ranks of the well-established authors in Transportation research. It is found that the SDCR of a focal author is highly correlated with the sum of spatial diversity weights ("strength") of all his in-links, and it is better to set the damping factors smaller than 0.75 when ranking authors with various initial academic years by SDCR. The cases show that Hong Kong is becoming a cluster in Transportation research.
Most biomedical journals accept original research articles in the form of "brief reports". We compared the citations to full papers versus brief reports in a sample of journals on Infectious Diseases, Clinical Microbiology, and Antimicrobial Agents. Brief reports were cited less often than full-size articles [regression coefficient: 10.94 (95 % CI: 5.19, 16.69)] even after adjustment for the journal's impact factor. Our findings may influence decisions of editors and authors regarding brief reports.
A new semi-automatic method is presented to standardize or codify addresses, in order to produce bibliometric indicators from bibliographic databases. The hypothesis is that this new method is very trustworthy to normalize authors' addresses, easy and quick to obtain. As a way to test the method, a set of already hand-coded data is chosen to verify its reliability: 136,821 Spanish documents (2006-2008) downloaded previously from the Web of Science database. Unique addresses from this set were selected to produce a list of keywords representing various institutional sectors. Once the list of terms is obtained, addresses are standardized with this information and the result is compared to the previous hand-coded data. Some tests are done to analyze possible association between both systems (automatic and hand-coding), calculating measures of recall and precision, and some statistical directional and symmetric measures. The outcome shows a good relation between both methods. Although these results are quite general, this overview of institutional sectors is a good way to develop a second approach for the selection of particular centers. This system has some new features because it provides a method based on the previous non-existence of master lists or tables and it has a certain impact on the automation of tasks. The validity of the hypothesis has been proved taking into account not only the statistical measures, but also considering that the obtaining of general and detailed scientific output is less time-consuming and will be even less due to the feedback of these master tables reused for the same kind of data. The same method could be used with any country and/or database creating a new master list taking into account their specific characteristics.
This study presents an innovative approach for identifying the knowledge diffusion path of a target research field. We take the resource-based theory (RBT) as an example to demonstrate the usefulness of this methodology. Several survey studies have provided valuable summarization and commentaries to the RBT from different perspectives. These analyses are useful and pertinent for understanding the development of RBT. However, limited by the methodologies they used, previous scholars can only select part of the RBT literature to conduct the survey work. To eliminate the limitation, this study develops an innovative approach which can handle thousands of articles. This study analyzes a dataset including 2,105 theoretical developments, empirical studies, and review papers to explore the knowledge diffusion path of the RBT. Citation data are used to build the citation network. Main paths are then probed and visualized via social network analysis methodology. To figure out the total picture of the knowledge diffusion path, this study integrates various main path analyses to supplement the traditional approach. The traditional main path analysis investigates the knowledge diffusion from a local view. The global analysis provides a main path from a macro view. The key-route analysis helps explore and clarify a complete picture of the convergence-divergence phenomena. We believe that through this novel tool, new researchers can easily identify the papers that have made major contributions to RBT knowledge diffusion and uncover the interrelationships among them.
There is a considerable amount of discussion, but still no consensus, about which indicator should be used to measure innovation. To participate in this debate, a unique innovation database, SFINNO, is introduced. Innovation counts from the database are used as the baseline, to which individual proxy indicators (patent- and research and development statistics) of innovation and innovation indexes, constructed here with principal component analysis, are compared. The local administrative units of Finland serve as the regional units benchmarked. The study results show that innovation is a complex phenomenon which cannot be entirely explained through the use of proxy statistics, as the linkages between innovation input- and output-indicators are fuzzy. We also show that the strength of these linkages varies by field of technology. Furthermore, different innovation measures produce highly divergent rankings when they are used as benchmarking tools of regional innovative performance. Although the produced innovation indexes perform slightly better, their superiority is marginal. Therefore, caution should be taken before drawing too drastic policy conclusions depending on a single measure of regional innovative performance.
Drawing from the existing literature on risk and inequality measurement, we implement the notion of "certainty equivalent citation" in order (i) to generalize most of the h-type citation indexes (h-, g-, t-, f-, w-index), and (ii) to highlight the centrality of the decision-maker's preferences on distributive aspects (concentration aversion) for the ranking of citation profiles. In order to highlight the sensitivity of citation orderings with respect to concentration aversion, an application to both simulated and real citation profiles is presented.
This paper evaluates to what extent policy-makers have been able to promote the creation and consolidation of comprehensive research groups that contribute to the implementation of a successful innovation system. Malmquist productivity indices are applied in the case of the Spanish Food Technology Program, finding that a large size and a comprehensive multi-dimensional research output are the key features of the leading groups exhibiting high efficiency and productivity levels. While identifying these groups as benchmarks, we conclude that the financial grants allocated by the program, typically aimed at small-sized and partially oriented research groups, have not succeeded in reorienting them in time so as to overcome their limitations. We suggest that this methodology offers relevant conclusions to policy evaluation methods, helping policy-makers to readapt and reorient policies and their associated means, most notably resource allocation (financial schemes), to better respond to the actual needs of research groups in their search for excellence (micro-level perspective), and to adapt future policy design to the achievement of medium-long term policy objectives (meso and macro-level).
To analyze the keywords used in articles published in eating disorder journals indexed in MEDLINE to determine their correspondence with the MeSH or the APA-Terms. Descriptive bibliometric study. We established three inclusion criteria: articles had to be original, to contain keywords, and to have been in the MEDLINE database in the last 5 years. 918 original with 1,868 different keywords were studied. Eight original articles (0.87 %) presented complete correspondence between the keywords used and the indexing terms. Of the keywords studied, 300 (16.06 %) coincided with MeSH and 366 (19.59 %) with APA-Terms. The comparison between keywords matching MeSH and those matching APA-Terms, we found significant differences indicating greater agreement with APA-Terms (p < 0.001). The weak agreement between keywords and indexing terms may hinder the cataloguing of eating disorder articles. The authors of these studies made greater use of keywords related to APA-Terms.
Patent maps showing competition trends in technological development can provide valuable input for decision support on research and development (R&D) strategies. By introducing semantic patent analysis with advantages in representing technological objectives and structures, this paper constructs dynamic patent maps to show technological competition trends and describes the strategic functions of the dynamic maps. The proposed maps are based on subject-action-object (SAO) structures that are syntactically ordered sentences extracted using the natural language processing of the patent text; the structures of a patent encode the key findings of the invention and expertise of its inventors. Therefore, this paper introduces a method of constructing dynamic patent maps using SAO-based content analysis of patents and presents several types of dynamic patent maps by combining patent bibliographic information and patent mapping and clustering techniques. Building on the maps, this paper provides further analyses to identify technological areas in which patents have not been granted ("patent vacuums"), areas in which many patents have actively appeared ("technological hot spots"), R&D overlap of technological competitors, and characteristics of patent clusters. The proposed analyses of dynamic patent maps are illustrated using patents related to the synthesis of carbon nanotubes. We expect that the proposed method will aid experts in understanding technological competition trends in the process of formulating R&D strategies.
In this study we present an analysis of the research trends in Pakistan in the field of nanoscience and nanotechnology. Starting with just seven publications in the year 2000, this number has steadily increased to 542 for the year 2011. Among the top 15 institutions with publications in nanotechnology 13 are universities and only two are R&D organizations. Almost 35 % of the research publications are in the field of material sciences followed by chemistry and physics in that order. The growth in the publications for period 2000-2011 is studied through relative growth rate and doubling time. The authorship pattern is measured by different collaboration parameters, like collaborative index, degree of collaboration, collaboration coefficient and modified collaboration coefficient. Finally the quality of papers is assessed by means of the h-index, g-index, hg-index and p-index.
This paper explores the role of sectors in scientific research and development networks by drawing on bibliometric analyses and innovation systems and triple helix literatures. I conducted a bibliometric study of Vancouver Canada's worldwide infection and immunity network and examined network structure through sociograms, social network metrics, as well as relational contingency table and ANOVA network analyses. Universities are the key network sector followed by hospitals and government organisations. The private sector plays a weak role. Most sectors show a preference for collaborating within, as opposed to across, sectors. This trend is most pronounced in hospitals and least pronounced among firms. Hospitals and universities collaborate well above statistical expectations. I discuss the implications of these findings for future science policy and studies of research and development networks.
The generation of research involves producers (study authors and funders), products (studies and arising publications) and consumption (measured through readership and citation). Bibliometric analyses of research producers, products and consumption over time can be used to describe the evolution of health professions as captured in professional journal publications. Numerous bibliometric studies have been conducted however few have sampled nursing and allied health professional journals. This is despite a growing health workforce and socioeconomic pressures. The aim of this study was to use bibliometric analyses to track change in the producers, products and consumption of seven Australian nursing and allied health professional journals from 1985 through 2010. An analysis of all original research articles published in these journals was performed using a reliable bibliometric audit tool. Articles were sampled every 3 months and at 5 year intervals over a 25 year period. Information relating to authorship, the research methods used and citation patterns was collected. Data were analysed descriptively. Over the study period, all journals shifted towards publishing research that used higher study designs, reported more quantitative data, and were authored by larger research teams. The rate at which this transition occurred (greater evidence base, quantitation and collaboration) differed among the journals sampled. The changes seen in the research published in these journals are likely to be a function of the strategic purpose of each publication (to its professional readership) as well as reflect wider socioeconomic phenomena. Therefore these trends are likely to continue in the future.
Two kinds of bibliographic tools are used to retrieve scientific publications and make them available online. For one kind, access is free as they store information made publicly available online. For the other kind, access fees are required as they are compiled on information provided by the major publishers of scientific literature. The former can easily be interfered with, but it is generally assumed that the latter guarantee the integrity of the data they sell. Unfortunately, duplicate and fake publications are appearing in scientific conferences and, as a result, in the bibliographic services. We demonstrate a software method of detecting these duplicate and fake publications. Both the free services (such as Google Scholar and DBLP) and the charged-for services (such as IEEE Xplore) accept and index these publications.
To facilitate technology development, people rely on quick and intensive knowledge interactions without barriers. However, when people need to transfer knowledge from one place to another, geographical distance is a critical barrier to overcome because tacit and invisible characteristics are embedded in certain knowledge and locations. This study explores how social and scientific resources embedded within persons can motivate personal knowledge-diffusion behaviors; that is, bridging resources between locations. To explain cross-border diffusion, this work analyzes knowledge dissemination of the data envelopment analysis (DEA) method. By collecting theoretical and application papers in DEA methodology from the Web of Science data set, this study analyzes the academic network consisting of 610 researchers and identifies author locations, research disciplines, and their mutual linkages to explain the importance of personal specific characteristics in cross-border diffusion. Regression models and network analysis show the advantages of personal research seniority and cross-disciplinary coordinating capabilities for researchers to diffuse knowledge from one region to another. The corresponding brokering capabilities accumulated within domestic area or adjacent nations are also helpful for specifically brokering resources of other farther places.
This study attempts to explore collaborations in the field of solar cell science and technology, focusing on the productivity and citations of papers and patents at the global and country levels. This study finds that most papers and patents are collaborative efforts, however, the rate of collaboration is higher for papers. In particular, international collaboration is not common in patents. In terms of performance, international collaborations have shown the best performance overall if looking at trends that cover the 30 years from 1980 to 2009, but the performance of single-authored papers has been better in the more recent ten-year period, 2000-2009. At the country level, we found that most countries have higher rates of international collaboration with greater numbers in papers and patents. Asian countries such as Japan, Taiwan and India have significant citation performance with high ratios for domestic collaboration; the rates are even greater than the average ratio for international collaboration.
Our aim deals with appraising the annual impact calculation of journals belonging to the JCR, in terms of the expected citation (with or without selfcitations) by published paper in a range of k-years. A Bayesian approach to the problem, should reflect not only the current prestige of a journal, but also taking into account its recent trajectory. In this wide context, credibility theory becomes an adequate mechanism deciding whether journal's impact factor calculation to be more or less plausible. Under prior belief that journal quality is determined by its impact factor, we model the citation-quality process by choosing a conjugated family of the exponential class in order to obtain a net impact credibility formula. Proposed weighting schema produces the effect of smoothing out any sudden increases or decreases in the year-by-year impact factor. (C) 2012 Elsevier Ltd. All rights reserved.
Scientific impact indexes like h are responsive to two parameters: the researcher's productivity given by the number of her published papers (an aspect of quantity) and citations (an aspect of quality). In this paper I prove that the two parameters can be treated separately: the index h can be axiomatized by appealing (1) only to axioms that allow for productivity changes, but do not require taking into account distinct situations in which a researcher's papers received different numbers of citations or (2) only to axioms that allow for changes in the number of citations received by the researcher's papers, but do not require changes in scientific productivity. The axioms used are weak. Specifically, monotonicity is avoided. (C) 2012 Elsevier Ltd. All rights reserved.
Communication network is a personal or professional set of relationships between individuals or organizations. In other words, it is a pattern of contacts which are created due to the flow of information among the participating actors. The flow of information establishes various types of relationships among the participating entities. These relationships eventually form an overall pattern that could form a gestalt of the total structure within organizational context. In this paper, we analyze the changing communications structure in order to investigate the patterns associated with the final stages of organizational crisis. Organizational crisis has been defined as organizational mortality, organizational death, organizational exit, bankruptcy, decline, retrenchment and failure to characterize various forms of organizational crisis. We draw on theoretical perspectives on organizational crisis proposed by social network analysts and other sociologists to test 5 key propositions on the changes in the network communication structure associated with organizational crisis: (1) a few actors, who are prominent or more active, will become central during the organizational crisis period; (2) reciprocity within the organizational communication network will increase during crisis period; (3) organizational communication network becomes less transitive as organizations experience crisis; (4) number of cliques increases in a communication network as organizations are going through crisis; and (5) communication network becomes increasingly centralized as organizations go through crisis. (C) 2012 Elsevier Ltd. All rights reserved.
The citation distribution of papers of selected individual authors was analyzed using five mathematical functions: power-law, stretched exponential, logarithmic, binomial and Langmuir-type. The former two functions have previously been proposed in the literature whereas the remaining three are novel and are derived following the concepts of growth kinetics of crystals in the presence of additives which act as inhibitors of growth. Analysis of the data of citation distribution of papers of the authors revealed that the value of the goodness-of-the-fit parameter R-2 was the highest for the empirical binomial relation, it was high and comparable for stretched exponential and Langmuir-type functions, relatively low for power law but it was the lowest for the logarithmic function. In the Langmuir-type function a parameter K, defined as Langmuir constant, characterizing the citation behavior of the authors has been identified. Based on the Langmuir-type function an expression for cumulative citations L relating the extrapolated value of citations l(0) corresponding to rank n = 0 for an author and his/her constant K and the number N of paper receiving citation l >= 1 is also proposed. (C) 2012 Elsevier Ltd. All rights reserved.
Using the dataset based on Thomson Reuters Scientific "Web of Science" the distributions of some well-known indicators, such as h-index and g-index, were investigated, and different citation behaviors across different scientific fields resulting from their field dependences were found. To develop a field-independent index, two scaling methods, based on average citation of subject category and journal, were used to normalize the citation received by each paper of a certain author. The distributions of the generalized h-indices in different fields were found to follow a lognormal function with mean and standard deviation of approximately -0.8 and 0.8, respectively. A field-independent index fi-index was then proposed, and its distribution was found to satisfy a universal power-law function with scaling exponent alpha approaching 3.0. Both the power-law and the lognormal universality of the distributions verified the field independence of these indicators. However, deciding which of the scaling methods is the better one is necessary for the validation of the field-independent index. (C) 2012 Elsevier Ltd. All rights reserved.
The aim of the study is to explore the effects of the increase in the number of publications or citations on several impact indicators by a single journal paper or citation. The possible change of the h-index, A-index, R-index, pi-index, pi-rate, Journal Paper Citedness ( JPC), and Citation Distribution Score (CDS) is followed by models. Particular attention is given to the increase of the indices by a single plus citation. The results obtained by the " successively built-up indicator" model show that with increasing number of citations or self-citations the indices may increase substantially. (C) 2012 Elsevier Ltd. All rights reserved.
A proposal is made in this paper for a broadening of perspective in evaluative bibliometrics by complementing the (standard) times cited with a cited reference analysis for a fieldspecific citation impact measurement. The times cited approach counts the citations of a given publication set. In contrast, we change the perspective and start by selecting all papers dealing with a specific research topic or field (the example in this study is research on Aspirin). Then we extract all cited references from the papers of this field-specific publication set and analyse which papers, scientists, and journals have been cited most often. In this study, we use the Chemical Abstracts registry number to select the publications for a specific field. However, the cited reference approach can be used with any other field classification system proposed up to now. (C) 2012 Elsevier Ltd. All rights reserved.
One of the main applications of citation is to find articles that are relevant to a particular article. However, not all citations are equally relevant to the target article. This paper presents an approach to identify the most relevant citation(s). To this end, the Normalized Similarity Index (NSI) is proposed to quantify the similarity between the source and target of a citation base on the co-citations and references shared by them. To validate the method, NSI was calculated for five citation networks and was compared with the peer review grades for the relevancy between the source and the target articles. The results showed a significant correlation between the NSI ranks and those of peer review. Also, combined linkage (CL) and weighted direct citation (WDC) were calculated from the same data. According to the results of comparison between the NSI with other similarity measures, in most cases, NSI did better than others at reproducing the peer rankings. Our principal conclusion is that the NSI can be used to prioritize the citations of given highly cited article, and represent knowledge flow from the target article. (C) 2012 Elsevier Ltd. All rights reserved.
Using an application of scientometric methodology to the analysis of scientific communication, relationships between number of submissions of scientific articles and calendar events (e.g., festive seasons, weekend vacations, national public holidays, Chinese New Year, Christmas) are examined quantitatively. With regard to the aim of understanding the complexities of these relationships, the time series include weekly, monthly, and seasonal variations on the basis of Received Date as reported on the Article History of the Elsevier paper format. Data records are collected during twenty-year ( 1990-2010) and one-year periods - as case study - ended 31 December 2008. The analysis shows that the overall submission rates are strongly influenced by calendar events. (C) 2012 Elsevier Ltd. All rights reserved.
Based on an idea by Kosmulski, Franceschini et al. (2012, Scientometrics 92( 3), 621-641) propose to classify a publication as "successful" when it receives more citations than a specific comparison term (CT). In the intention of the authors CT should be a suitable estimate of the number of citations that a publication - in a certain scientific context and period of time - should potentially achieve. According to this definition, the success-index is defined as the number of successful papers, among a group of publications examined, such as those associated to a scientist or a journal. In the first part of the paper, the success-index is recalled, discussing its properties and limitations. Next, relying on the theory of Information Production Processes (IPPs), an informetric model of the index is formulated, for a better comprehension of the index and its properties. Particular emphasis is given to a theoretical sensitivity analysis of the index. (C) 2012 Elsevier Ltd. All rights reserved.
This paper proposes a framework to analyze the interdisciplinary collaboration in a coauthorship network from a meso perspective using topic modeling: (1) a customized topic model is developed to capture and formalize the interdisciplinary feature; and (2) the two algorithms Diversity Subgraph Extraction (DSE) and Constraint-based Diversity Subgraph Extraction (CDSE) are designed and implemented to extract a meso view, i.e. a diversity subgraph of the interdisciplinary collaboration. The proposed framework is demonstrated using a coauthorship network in the field of computer science. A comparison between DSE and Breadth First Search (BSF)- based subgraph extraction favors DSE in capturing the diversity in interdisciplinary collaboration. Potential possibilities for studying various research topics based on the proposed framework of analysis are discussed. (C) 2012 Elsevier Ltd. All rights reserved.
Using the concepts of h- core and h- tail, shape descriptors and shape centroids, k-index and k ' - index, dynamic measures are probed, with practical data in the fields of Physics and sociology. It is revealed that there are obvious differences between natural sciences ( Physics, particles & fields) and social sciences ( sociology) when c-descriptor, h- core centroid and index are applied as dynamic measures, while few differences exist when using t-descriptor, h- tail centroid and k ' - index, following a time span from 1 to 10 years. (C) 2012 Elsevier Ltd. All rights reserved.
The purpose of this study is to test for the presence of order-effect bias in journal ranking surveys. Data were obtained from 379 active knowledge management and intellectual capital researchers who rated 25 journals on a 7-point scale. Five different versions of the survey instrument were utilized. Consistent with the cognitive elaboration model, the satisficing theory, and the Gricean maxim of orderliness, order-effect bias was observed in journal ranking surveys. Journals that appear in the beginning of the ranking list delivered to survey respondents consistently receive higher scores than journals at the end of the list. Overall, the position of the journal in the list explains over 10% of its score. Therefore, authors of journal ranking studies are recommended to use multiple versions of the survey instrument with randomized journal orders. (C) 2012 Elsevier Ltd. All rights reserved.
Study investigates the availability, persistence and half life of URL citations cited in two Indian LIS journals articles published between 2002 and 2010. This study also investigates how researchers can resurrect lapsed URL citations cited in research articles, using Wayback machine. A total of 1290 URLs cited in 472 research articles published in Indian LIS journals spanning a period of 9 years (2002-2010) were extracted. Study found that only 18.91% (1290 out of 6820) of URLs cited in these journal articles. 39.84% of URL citations were not accessible and remaining 60.15% of URL citations were still accessible. The HTTP 404 error message-" page not found" was the overwhelming message encountered and represented 54.86% of all HTTP error messages. However 51.06% URLs were recovered from HTTP 404 error message. Study also noticed that the half-life of URL citations was increased from 6.33 years to 13.85 years after recovering missing URLs from Wayback machine. (C) 2012 Elsevier Ltd. All rights reserved.
Percentiles have been established in bibliometrics as an important alternative to mean-based indicators for obtaining a normalized citation impact of publications. Percentiles have a number of advantages over standard bibliometric indicators used frequently: for example, their calculation is not based on the arithmetic mean which should not be used for skewed bibliometric data. This study describes the opportunities and limits and the advantages and disadvantages of using percentiles in bibliometrics. We also address problems in the calculation of percentiles and percentile rank classes for which there is not (yet) a satisfactory solution. It will be hard to compare the results of different percentile-based studies with each other unless it is clear that the studies were done with the same choices for percentile calculation and rank assignment. (C) 2012 Elsevier Ltd. All rights reserved.
Unlike competitive higher education systems, non-competitive systems show relatively uniform distributions of top researchers and low performers among universities. In this study, we examine the impact of unproductive and top faculty members on overall research performance of the university they belong to. Furthermore, we analyze the potential relationship between research productivity of a university and the indexes of concentration of unproductive and top researchers. Research performance is evaluated using a bibliometric approach, through publications indexed on the Web of Science between 2004 and 2008. The set analyzed consists of all Italian universities active in the hard sciences. (C) 2012 Elsevier Ltd. All rights reserved.
The time dependence of the h-index is analyzed by considering the average behavior of h as a function of the academic age AA for about 1400 Italian physicists, with career lengths spanning from 3 to 46 years. The individual h-index is strongly correlated with the square root of the total citations Nc: h approximate to 0.53 NC. For academic ages ranging from 12 to 24 years, the distribution of the time scaled index h/ sic AA is approximately time-independent and it is well described by the Gompertz function. The time scaled index h/ sic AA has an average approximately equal to 3.8 and a standard deviation approximately equal to 1.6. Finally, the time scaled index h/ sic AA appears to be strongly correlated with the contemporary h-index hc. (C) 2012 Elsevier Ltd. All rights reserved.
We give a heuristic proof of the relation between the impact factor (IF) and the uncitedness factor (U), the fraction of the papers that are uncited: U = 1/1+IF This generalizes the proof of Hsu and Huang [Physica A 391, 2129-2134, 2012] who obtain the same result but based on the assumption of the validity of the Matthew-effect. This new informetric function opens the discussion on universal informetric laws, distribution dependent laws and parameter dependent laws of which examples from the informetrics literature are given. (C) 2012 Elsevier Ltd. All rights reserved.
Five ratios RH, RT, SH, ST and SZ derived from the three-part division of a set of sources in h-core, h-tail and uncited sources are defined. Dynamic changes in the three independent ratios RH, SH and SZ are studied for six selected topics. Data about these topics are obtained from the Web of Science for scientific papers and Derwent Innovations Index for technical patents. It is observed that all RH-and SH-values decrease when the time span widens, while SZ stays the same or increases; and that all RH- and SH-values for papers are larger than the corresponding values for patents. The shifted Lotka distribution is used in a theoretical interpretation of these empirical phenomena. (C) 2012 Elsevier Ltd. All rights reserved.
Accurate measurement of research productivity should take account of both the number of co-authors of every scientific work and of the different contributions of the individuals. For researchers in the life sciences, common practice is to indicate such contributions through position in the authors list. In this work, we measure the distortion introduced to bibliometric ranking lists for scientific productivity when the number of co-authors or their position in the list is ignored. The field of observation consists of all Italian university professors working in the life sciences, with scientific production examined over the period 2004-2008. The outcomes of the study lead to a recommendation against using indicators or evaluation methods that ignore the different authors' contributions to the research results. (C) 2012 Elsevier Ltd. All rights reserved.
The study explores the characteristics of China's independent research articles published from 1980 to 2011, based on the database of Science Citation Index Expanded. The publication outputs of seven major industrialized countries including Canada, France, Japan, Germany, Italy, the UK, and the USA were compared with China. Annual production, field performance, research emphases and trends, top articles, as well as main institutional and individual contributors by its top cited articles were analyzed. Some newly developed indicators related to words in title, author keywords, KeyWords Plus, first author, corresponding author, and Y-index were employed to provide in-depth information on topic and author contributions. Results showed that China has been closing the gap with the USA with the greatest growth, and has stood the second since 2006. Most top cited articles were published in 2000s, made up approximately seven tenths of total articles. Pronounced activities were found in chemistry and physics related categories. The core categories included multidisciplinary chemistry, physical chemistry, multidisciplinary materials science, and applied physics. Moreover, China's performance of nanotechnology and science, especially carbon nanotubes, nanoparticles, nanowires, and nanostructures showed dramatic growth. Six top articles with at least 1000 citations were examined, and were observed to concern medicine, nanotube, and adsorption. In addition, main contributing institutions and authors were also revealed and evaluated. Chinese Academy of Sciences played a dominant role, and Tsinghua University, Peking University and five universities in Hong Kong showed good scientific performance. (C) 2012 Elsevier Ltd. All rights reserved.
This paper proposes a new node centrality measurement index (c-index) and its derivative indexes (iterative c-index and c(g)-index) to measure the collaboration competence of a node in a weighted network. We prove that c-index observe the power law distribution in the weighted scale-free network. A case study of a very large scientific collaboration network indicates that the indexes proposed in this paper are different from other common centrality measures (degree centrality, betweenness centrality, closeness centrality, eigenvector centrality and node strength) and other h-type indexes (lobby-index, w-lobby index and h-degree). The c-index and its derivative indexes proposed in this paper comprehensively utilize the amount of nodes' neighbors, link strengths and centrality information of neighbor nodes to measure the centrality of a node, composing a new unique centrality measure for collaborative competency. (C) 2012 Elsevier Ltd. All rights reserved.
Several studies have assessed whether funding of disease specific research is in line with their burden. The authors of these studies concluded that the burden of a disease was a good predictor for its associated research funding. However, previous analyses did not take into account diseases that mainly affect people living in low income regions, i.e. so-called diseases of poverty. Moreover, the analyses were only performed for the burden diseases cause in high income countries. We investigated whether the conclusions about the relationship between burden and funding still holds when (1) including diseases of poverty and (2) accounting for the burden of diseases in low income countries. We found that the relationship between the burden and the level of diseases specific funding decreases for people living in low income countries. We find the best predictor for the level of funding to be the mortality in high income countries. In contrast to previous studies, we were able to include more diseases into our analyses (74). This enabled us to discover differences in funding levels between and within groups of diseases. we found that research on cancers was over funded with respect to the associated burden. In contrast, diseases of poverty systematically receive less funding than would be expected based on their burden. Other groups of diseases (cardiovascular diseases and mental illnesses) contained both over and under funded diseases. (C) 2012 Elsevier Ltd. All rights reserved.
This article examines the extent to which specific features of interdisciplinary research are accurately reflected in selected bibliometric measures of scholarly publications over time. To test the validity of these measures, we compare knowledge of research processes and impact based on ethnographic studies of a well-established researcher's laboratory, together with personal interview data, against bibliometric indicators of cognitive integration, diffusion, and impact represented in the entire portfolio of papers produced by this researcher over time.
Science is principally driven by the efforts of a vanishingly small fraction of researchers publishing the majority of scientific research and garnering the majority of citations. Despite this well-established trend, knowledge of exactly how many articles these researchers publish, how highly they are cited, and how they achieved their distinctive accomplishments is meager. This article examines the publication and citation patterns of the world's most highly cited environmental scientists and ecologists, inquiring into their levels of scientific productivity and visibility, examining relationships between scientific productivity and quality within their research programs, and considering how different publication strategies contribute to these distinctive successes. Generally speaking, highly cited researchers are also highly productive, publishing on average well over 100 articles each. Furthermore, articles published by this group are more highly cited on average than articles published in premier generalist journal like Nature and Science, and their citation to publication ratios are more equitably distributed than is typical. Research specialization and primacy of authorship are important determinants of citation frequency, while geographic differences and collaborative propensity matter less. The article closes with a set of suggestions for those wishing to increase the use of their research by the scientific community.
We set out in the present study to examine 1997-2007 data on inventors, based upon country of residence, and on the process of co-invention, with the ultimate aim of undertaking analysis of the main partner countries currently collaborating with China in global technological production. Through our focus on China, we are able to demonstrate the evolving trend towards the establishment of collaborative patenting networks within an emerging market. In addition to exploring the pattern of joint international inventions, we link the patent data to other macro-economic factors for empirical analysis. Our results indicate that the relative manufacturing strength, the international trade exposure, and the respective economy standing have positive effect on the propensity for engaging in such international co-invention activities.
This paper provides evidence on the mechanisms influencing the patent output of a sample of small and large, entrepreneurial and established biotechnology firms from the input of indirect knowledge acquired from capital expenditures and direct knowledge from in-house R&D. Statistical models of counts are used to analyse the relationship between patent applications and R&D investment and capital expenditures. It focuses on biotechnology in the period 2002-2007 and is based on a unique data set drawn from various sources including the EU Industrial R&D Investment Scoreboard, the European Patent Office (EPO), the US Patent and Trademark Office, and the World Intellectual Property Organisation. The statistical models employed in the paper are Poisson distribution generalisations with the actual distribution of patent counts fitting the negative binomial distribution and gamma distribution very well. Findings support the idea that capital expenditures-taken as equivalent to technical change embodied in new machinery and capital equipment-may also play a crucial role in the development of new patentable items from scientific companies. For EPO patents, this role appears even more important than that played by R&D investment. The overall picture emerging from our analysis of the determinants of patenting in biotechnology is that the innovation process involves a well balanced combination of inputs from both R&D and new machinery and capital equipment.
Purpose: To provide up-to-date bibliometric reference data describing the output and success of psychology researchers in the German-speaking countries, including lifetime publication and citation numbers, and to investigate associations of bibliometric measures with academic status and gender as well as the department characteristics of size and quota of senior researchers. Method Queried literature databases using an extensive online register of academic psychologists in the German-speaking countries, obtaining valid data for 85 % (N = 1742) of the population of interest. Findings Skewed distributions for publications and citations; maximum number of German-language (=native) publications much higher than maximum number of English-language publications; relatively large part of population publishing almost exclusively in German; publication count predictable by academic status, gender, department size, and quota of senior researchers; citation count predictable by publication count, status, department size, and quota of senior researchers; department characteristics interact with individual characteristics to produce specific conditions under which publication count and citation count are higher or lower than expected: combination of female gender, small department size and large quota of senior researchers is associated with particularly increased publication count; female gender and large department size are associated with decreased publication count; high publication count, large department size and low quota of senior researchers are associated with increased citation count; low publication count and large quota of senior researchers are associated with decreased citation count. Conclusions Reference values for scientific output provided in this study provide an anchor for monitoring and international comparison; despite considerable noise in data, we show that interactions of individual and organizational characteristics are relevant for scientific success and should be investigated further, e.g. by adopting various measures of organizational diversity and tracing a population longitudinally.
In this paper we argue that the emergence of the dominant model of university organization, which is characterized by a large agglomeration of many (often loosely affiliated) small research groups, might have an economic explanation that relates to the features of the scientific production process. In particular, we argue that there are decreasing returns to scale on the level of the individual research groups, which prevent them from becoming to large, while we argue for positive agglomeration effects on the supra-research-group-level inside the university. As a consequence an efficient university organization would precisely consist of tying together many small individual research groups without merging them. Basing our empirical analysis on a multilevel dataset for German research institutes from four disciplines we are able to find strong support for the presence of these effects. This suggests that the emergence of the dominant model of university organization may also be the result of these particular features of the production process, where the least we can say is that this model is under the given circumstances highly efficient.
This paper discusses and copes with the difficulties that arise when trying to reproduce the results of the Shanghai academic ranking of world universities. In spite of the ambiguity of the methodology of the ranking with regard to the computation of the scores on its six indicators, the paper presents a set of straightforward procedures to estimate raw results and final relative scores. Discrepancies between estimated scores and the results of the ranking are mostly associated with the difficulties encountered in the identification of institutional affiliations, and are not significant. We can safely state that the results of the Shanghai academic ranking of world universities are in fact reproducible.
In August 2011, Thomson Reuters launched version 5 of the Science and Social Science Citation Index in the Web of Science (WoS). Among other things, the 222 ISI Subject Categories (SCs) for these two databases in version 4 of WoS were renamed and extended to 225 WoS Categories (WCs). A new set of 151 Subject Areas was added, but at a higher level of aggregation. Perhaps confusingly, these Subject Areas are now abbreviated "SC'' in the download, whereas "WC'' is used for WoS Categories. Since we previously used the ISI SCs as the baseline for a global map in Pajek (Pajek is freely available at http://vlado.fmf.uni-lj.si/pub/networks/pajek/) (Rafols et al., Journal of the American Society for Information Science and Technology 61:1871-1887, 2010) and brought this facility online (at http://www.leydesdorff.net/overlaytoolkit), we recalibrated this map for the new WC categories using the Journal Citation Reports 2010. In the new installation, the base maps can also be made using VOSviewer (VOSviewer is freely available at http://www.VOSviewer.com/) (Van Eck and Waltman, Scientometrics 84:523-538, 2010).
How scientific progress functions in detail and what the specific prerequisites for scientific breakthroughs in a given research area are, is still unclear today. According to philosopher of science Thomas S. Kuhn, scientific advancement takes place via paradigm shift. As a principle supplementing Kuhn's theory, we proposed the Anna Karenina principle: a new paradigm can be successful only when several key prerequisites are fulfilled (e.g., verified by means of independent data and methods). If any one of these prerequisites is not fulfilled, the paradigm will not be successful. Aiming at investigating the schema of paradigm shift supplemented by the Anna Karenina principle with the aid of concrete examples from science, in this study we analyze one of the most important scientific revolutions: the shift from a fixed to a mobile worldview in geoscientific thinking. This paradigm shift will be explained based on key papers that played a decisive role, selected carefully from reviews in the literature. The account of the development will be complemented by empirical findings that were produced based on publication and citation data using the software Histcite.
Based on the concept that scientific research is an important component of a country's knowledge-based economy, this study aims to answer the question "Are CIVETS the next BRICs" by comparing a series of scientometrics indicators using data from the Essential Science Indicators database and the World Bank Report 2009. The main findings are that at the country group level, there is no significant difference between CIVETS and BRICs in knowledge-based economy performance, scientific research quality and scientific research structure and that the number of scientific research papers is the clear gap between them. The results may be of use to find the answer to the question "Are CIVETS the next BRICs" at least from the perspective of scientometrics.
The recent trend of rapid growth in the scientific and engineering activities in East Asian Newly Industrializing Economies (NIEs) resulted in a change in the structure of world knowledge production. In South Korea, particularly, not only the numbers of publications have increased, but there is a noticeable change in the composition of scientific and engineering activities. This paper notes the most of the research on the knowledge production of advanced countries, along with a handful of studies about the knowledge production of latecomers. Recent changes in the patterns of knowledge production in latecomer countries provoke the deeper understanding about the underlying mechanisms of ongoing change. Therefore, this paper explores the patterns of knowledge production activities in latecomers by analyzing scientific and engineering capabilities using empirical evidence from Korea. The results suggest that the patterns of accumulation of knowledge production in Korea gradually evolved from engineering to scientific activities. Important policy implications can be drawn from the findings for supporting scientific and engineering research activity in the latecomers in general and NIEs in particular.
Since machine-readable documents have become widespread, some recent studies have proposed retrieval methods using a combination of citation linkage and its context. In the case of co-citation linkage, there have been attempts to discern 'strong' co-citations from 'weak' ones by examining the positions of citations in a document. However, this promising concept has not yet been sufficiently evaluated, and it remains unclear whether search performance is significantly improved. Therefore, this paper explores the effects of using co-citation context more deeply and more widely by comparing the search performance of six retrieval methods, which differ as to whether co-citation context and normalization using cited frequency are used. For empirically evaluating the effects, a special test collection was created from CiteSeer Metadata, and the search performances of the six retrieval methods were compared by two IR metrics (AP and nDCG). The main conclusions of this paper are: (1) co-citation context has a positive effect on co-citation searching; (2) the normalization technique using cited frequency is useful for context-based co-citation searching; (3) approaches of using co-citation context tend to affect the characteristics of search performance.
Metrics of success or impact in academia may do more harm than good. To explore the value of citations, the reported efficacy of treatments in ecology and evolution from close to 1,500 publications was examined. If citation behavior is rationale, i.e. studies that successfully applied a treatment and detected greater biological effects are cited more frequently, then we predict that larger effect sizes increases study relative citation rates. This prediction was not supported. Citations are likely thus a poor proxy for the quantitative merit of a given treatment in ecology and evolutionary biology-unlike evidence-based medicine wherein the success of a drug or treatment on human health is one of the critical attributes. Impact factor of the journal is a broader metric, as one would expect, but it also unrelated to the mean effect sizes for the respective populations of publications. The interpretation by the authors of the treatment effects within each study differed depending on whether the hypothesis was supported or rejected. Significantly larger effect sizes were associated with rejection of a hypothesis. This suggests that only the most rigorous studies reporting negative results are published or that authors set a higher burden of proof in rejecting a hypothesis. The former is likely true to a major extent since only 29 % of the studies rejected the hypotheses tested. These findings indicate that the use of citations to identify important papers in this specific discipline-at least in terms of designing a new experiment or contrasting treatments-is of limited value.
The article introduces a relational input-output model for the productivity analysis of university research. The comparative analyses focus on top university research in hard sciences from 4 East Asian countries (Hong Kong, Singapore, South Korea, Taiwan) and 4 North European countries (Denmark, Finland, Norway, Sweden), universities of which get altogether 95 recognitions in the HEEACT Top 300 rankings in the Natural Sciences (Sci), Technology (Tec) or Clinical Medicine (Med). According to productivity ratings (A(0), A, A(+), A(++)), Taiwan receives 10 A(++) ratings (Sci 5, Tec 5), Sweden 9 (Sci 4, Med 4, Tec 1) and Hong Kong 9 (Tec 4, Med 2, Sci 1). The smallest numbers of A(++) ratings are found in Norway, 1 (Med) and Finland 3 (all in Med). The only university with an A(++) rating in the top of all three fields is the National University of Singapore. The Pohang University of Science and Technology (South Korea) and the National Tsing Hua University (Taiwan) are exceptionally productive in Sci and Tec; Karolinska Institutet (Sweden) and the University of Helsinki (Finland) belong to the top in Med. Even though Northern European countries are ranked higher in the 'knowledge economy indicators', East Asians fare better by indicators of learning outcomes and by productivity of university research in Natural Sciences and Technology; North European countries are stronger in Clinical Medicine.
Negative results are commonly assumed to attract fewer readers and citations, which would explain why journals in most disciplines tend to publish too many positive and statistically significant findings. This study verified this assumption by counting the citation frequencies of papers that, having declared to "test" a hypothesis, reported a "positive" (full or partial) or a "negative" (null or negative) support. Controlling for various confounders, positive results were cited on average 32 % more often. The citation advantage, however, was unequally distributed across disciplines (classified as in the Essential Science Indicators database). Using Space Science as the reference category, the citation differential was positive and formally statistically significant only in Neuroscience & Behaviour, Molecular Biology & Genetics, Clinical Medicine, and Plant and Animal Science. Overall, the effect was significantly higher amongst applied disciplines, and in the biological compared to the physical and the social sciences. The citation differential was not a significant predictor of the actual frequency of positive results amongst the 20 broad disciplines considered. Although future studies should attempt more fine-grained assessments, these results suggest that publication bias may have different causes and require different solutions depending on the field considered.
Analyses the growth and development of pheromone biology research productivity in India in terms of publication output as reflected in Science Citation Index (SCI) for the period 1978-2008. It includes 330 publications from India, including 285 articles, 22 notes, 18 reviews, 4 letters and 1 conference paper, from 200 institutions. About 9.4 % of publications is contributed by Indian Institute of Technology, Kanpur followed by Bhabha Atomic Research Centre, Bombay (7.27 %). All the papers published by Indian researchers have appeared in journals with impact factors between 0.20 and 4.14. About 24.24 % of authors contributed single articles. The growth rate of publications varied from 0.30 to 9.09 % per year. The annual growth rate was highest in the year 2006 at 9.09 %. The study reveals that the output of pheromone biology research in India has gradually increased over the years.
The study compares the coverage, ranking, impact and subject categorization of Library and Information Science journals, specifically, 79 titles based on data from Web of Science (WoS) and 128 titles from Scopus. Comparisons were made based on prestige factor scores reported in 2010 Journal Citation Reports and SCImago Journal Rank 2010 and noting the change in ranking when the differences are calculated. The rank normalized impact factor and the Library of Congress Classification System were used to compare impact rankings and subject categorization. There was high degree of similarity in rank normalized impact factor of titles in both WoS and Scopus databases. The searches found 162 journals, with 45 journals appearing in both databases. The rankings obtained for normalized impact scores confirm higher impact scores for titles covered in Scopus because of its larger coverage of titles. There was mismatch of subject categorization among 34 journal titles in both databases and 22 of the titles were not classified under Z subject headings in the Library of Congress catalogue. The results revealed the changes in journal title rankings when normalized, and the categorization of some journal titles in these databases might be incorrect.
The study examines India's performance on antioxidants using several quantitative measures such as India's global publication share, rank, growth rate and citation quality, its publication share in various sub-fields in terms of national share utilising last 10 year's (2001-10) publications data obtained from the Scopus database. We have also determined Indian share with international collaborative papers at the national level as well as is major international collaborative partners, besides analysing the characteristics of its high productivity institutions, authors and high-cited papers, etc.
This research examines the association of co-authorship network centrality (degree, closeness and betweeness) and the academic research performance of chemistry researchers in Pakistan. Higher centrality in the co-authorship network is hypothesized to be positively related to performance, in terms of academic publication, with gender having a positive moderating effect for female researchers. Using social network analysis, this study examines the bibliometric data (2002-2009) from ISI Web of Science for the co-authorship network of 2,027 Pakistani authors publishing in the field of Chemistry. A non-temporal analysis using node-level regression reports positive impact of degree and closeness and negative impact of betweeness centrality on research performance. Temporal analysis using node-level regression (time 1: 2002-2005; time 2: 2006-2009) confirms the direction of causality and demonstrates the positive association of degree and closeness centrality on research performance. Findings indicate a moderating role of gender on the relationship of both degree and closeness centrality with research performance for Pakistani female authors.
The definition assigned to self-citations is nontrivial. This decision can affect research outputs in a number of ways. The current paper considers the self-citation definition used by the Web of Science, and compares this with an alternative definition, advanced in the present study, within the context of the work of an individual researcher. A discussion follows.
The aim of this paper is to map the intellectual structure of research in doctoral dissertations of Library and Information Science in China. By use of Co-word analysis, including cluster analysis, strategic diagram and social network analysis, we studied the internal and external structure and relationship of research fields in doctoral dissertations of Library and Information Science in China. Data was collected, during the period of 1994-2011, from six public dissertation databases and ten degree databases provided by the universities/institutes which have been authorized to grant doctoral degrees of Library and Information Science in China. The results show that Wuhan University is the most important institution of doctoral education in LIS in China. The focuses of researches, including information resource, ontology, semantic web, semantic search, electronic government, information resource management, knowledge management, knowledge innovation, knowledge sharing, knowledge organization, network, information service, information need and digital library. The research fields of LIS doctoral dissertations in China are varied. Many of these research fields are still immature; accordingly, the well-developed and core research fields are fewer.
Do the best Italian academics move abroad? What is the academic productivity of an Italian researcher working in Italy compared with one working abroad? Does academic productivity depend on their well-being at work? The aim of this study is to find explanations for these questions and to demonstrate the relationship that exists between academic productivity and organizational well-being and work, both for researchers who are Italian emigrants abroad (project IRA) and for those who remain in Italy (IRI project).This goal was achieved through two surveys. Where there is an atmosphere of a wellness organization, it creates a productive work environment (vision abroad); conversely, a poor working environment that is associated with an organizational system that is below the average level negatively affects the overall academic productivity (in Italy). We can confirm that working environments with better organizational climate produce more productive academics.
International collaboration enhances citation impact. Collaborating with a country increments the citations received from it. But some collaborating countries provide greater increments in this sense than others, and likewise some countries receive greater increments from their partner countries than others. We observed a certain tendency for these increments to be lower in countries with greater impacts. Also, all the countries studied had higher Domestic Impacts as a result of collaborating, although this increment was less than that obtained from other countries. Finally, there were differences in the behaviour of the countries between the various scientific disciplines, with the effects being greatest in Social Sciences, followed by Engineering.
This paper compares R&D productivity change across countries considering the fact that national R&D expenditure may produce multiple outputs, including patents and journal articles. Based on the concept of directional distance function and Luenberger productivity index, this paper develops a Luenberger R&D productivity change (LRC) index and then decomposes it into R&D efficiency change (catch-up effect) and R&D technical change (innovation effect). Utilizing a panel dataset of 29 countries over the 1998-2005 period to implement the empirical estimation, the results show that the R&D productivity growth is mainly attributed to the innovation effect; meanwhile, non-OECD countries have better performance on both efficiency change and technical change than their OECD counterparts. Moreover, patent-oriented R&D productivity growth serves as the main source of national R&D productivity growth than the journal article-oriented one.
This paper aims to inform choice of citation time window for research evaluation, by answering three questions: (1) How accurate is it to use citation counts in short time windows to approximate total citations? (2) How does citation ageing vary by research fields, document types, publication months, and total citations? (3) Can field normalization improve the accuracy of using short citation time windows? We investigate the 31-year life time non-self-citation processes of all Thomson Reuters Web of Science journal papers published in 1980. The correlation between non-self-citation counts in each time window and total non-self-citations in all 31 years is calculated, and it is lower for more highly cited papers than less highly cited ones. There are significant differences in citation ageing between different research fields, document types, total citation counts, and publication months. However, the within group differences are more striking; many papers in the slowest ageing field may still age faster than many papers in the fastest ageing field. Furthermore, field normalization cannot improve the accuracy of using short citation time windows. Implications and recommendations for choosing adequate citation time windows are discussed.
The citer h-index of a researcher (introduced by Ajiferuke and Wolfram) was found to have a strong linear relationship with the h-index of this researcher. This finding of Franceschini, Maisano, Perotti and Proto also revealed, experimentally, that the slope of this straight line (passing through the origin) is strictly larger than one. In this paper we present a rationale for this empirical result of this author on the relation between the h-index before and after a transformation of the citation data.
Using a keyword mining approach, this paper explores the interdisciplinary and integrative dynamics in five nano research fields. We argue that the general trend of integration in nano research fields is converging in the long run, although the degree of this convergence depends greatly on the indicators one chooses. Our results show that nano technologies applied in the five studied nano fields become more diverse over time. One field learns more and more related technologies from others. The publication and citation analysis also proves that nano technology has developed to a relatively mature stage and has become a standardized and codified technology.
In this study, differences between Spanish social sciences and humanities journals are examined using a quantitative approach. Firstly, using a set of 144 psychology journals and 69 philosophy journals, statistically significant differences have been identified in 11 characteristics/indicators. Secondly, a logistic regression was carried out on the dichotomous response variable "belonging to the social sciences" or "belonging to the humanities", on 777 Spanish social sciences journals, 563 humanities journals that have been previously classified and 17 existing predictor variables. The regression model reached an overall correct classification of 78.8 %. The explanatory variables considered in the model are analyzed and interpreted taking into account the change in the odds ratio and the indication of their contribution to the correct classification rate in the two response values. Finally the average associated probability of belonging to the social sciences group is calculated for each discipline and reflected in a spectrum of the probability of belonging to the social sciences or the humanities.
2,215 publications covering the period going from 1959 to 2011, with at least one author affiliated to Benin, were searched from Scopus and analyzed. These publications were co-authored by 10,225 scientists that correspond to 5,122 single authors in several disciplines of which the most prolific are Agricultural and biological science, and Medicine. None of the Benin-based journals were indexed in Scopus; approximately 5 % of the publications appeared in African reviews covered by Scopus. Researchers' home institutions are mainly the University of Abomey-Calavi, its laboratories and some international organizations or cooperation agencies. The private universities were not mentioned in the affiliations list. The yearly percentage of international collaboration is over 80 %; France, the former colonial power is the main research partner whereas the West African region is the main partner at the African continent level; others partners are from Europe and America continents. This study suggests the setting up of a national database to index the domestic scientific literature; it should contribute to the improvement of the national research output.
An exploration is presented of Scopus as a data source for the study of international scientific migration or mobility for five study countries: Germany, Italy, the Netherlands, UK and USA. It is argued that Scopus author-affiliation linking and author profiling are valuable, crucial tools in the study of this phenomenon. It was found that the UK has the largest degree of outward international migration, followed by The Netherlands, and the USA the lowest. Language similarity between countries is a more important factor in international migration than it is in international co-authorship. During 1999-2010 the Netherlands showed a positive "migration balance" with the UK and a negative one with Germany, suggesting that in the Netherlands there were more Ph.D. students from Germany than there were from the UK, or that for Dutch post docs stage periods in the UK were more attractive than those in Germany. Comparison of bibliometric indicators with OECD statistics provided evidence that differences exist in the way the various study countries measured their number of researchers. The authors conclude that a bibliometric study of scientific migration using Scopus is feasible and provides significant outcomes. They make suggestions for further research.
Meta-analysis refers to the statistical methods used in research synthesis for combining and integrating results from individual studies. The present study draws on the strengths of bibliometric methods in order to offer an overview of meta-analytic research activity in psychology, as well as to characterize its most important aspects and their evolution over time. A total of 2,874 articles published in scientific journals were identified and standard bibliometric indicators (e.g., number of articles, productivity by country, and national and international collaborations) and laws (e.g., Price's and Lotka's law) were applied to these data. The results suggest a clear upward trend not only in the number of articles published since the 1970s (with a peak of productivity in 2010), but also in both the number of authors by article (, SD = 1.53) and internationalization, especially since the 1990s. The interest in meta-analysis extends to many authors (n = 5,445), countries (n = 44) and scientific journals (n = 394), as well as to several areas of psychology that mostly fit a growing exponential model. In future studies it would be interesting to explore the citing behaviour and patterns in the meta-analysis literature.
Number of published medical/dental articles is growing at an exponential rate; this makes it difficult to collect all these resources and provide an organized and valuable/useful document. Systematic reviews and meta-analyses as high-level evidences are considered remedies for this concern. Continuous alterations in all fields of dental sciences necessitate the more of such high-level evidences. This study aimed on the quantity of endodontic systematic reviews and meta-analyses so far. This study began with targeted electronic searches of PubMed, and Cochrane library databases about the present systematic review and meta-analysis articles in endodontics within 2001-Jan 2012. Overall, 49 studies were systematic review and meta-analysis, the first comprised 34 articles and the latter contained seven articles; the remained eight studies had utilized both of them. Performing a topic sorting, 22 articles were about materials and techniques, 12 about pre- and post-treatment considerations, four about single/multiple visits, six with perio-prostho themes, and the five remained were of other topics. Limited number of 49 high-level evidences does not meet the expectation from endodontics as a boundless and progressive field of science. Therefore, more comprehensive and all-inclusive studies of systematic reviews and meta-analyses are compulsory in endodontics. The more the scientific-based endodontic practice, the more the high-level evidence based publications with good systematic reviews and favorable meta-analysis.
The study of international technology spillover focuses on bilateral relationship between host country and home country, and the study of transnational technology network focuses on the position and power of individual country in a network. Introducing the network model and information theory into the research framework, we propose a new measure of international technology spillover in this present article, in which the bilateral relationship was transformed to multilateral relationship between countries, and network features at the whole network level were investigated instead of at the individual level. By using data from CHELEM-International Trade Database, we measure three transnational spillover networks in the high-tech field from 1979 to 2009, and analyze the results of various technology networks at different time points.
We propose a model to analyze citation growth and influences of fitness (competitiveness) factors in an evolving citation network. Applying the proposed method to modeling citations to papers and scholars in the InfoVis 2004 data, a benchmark collection about a 31-year history of information visualization, leads to findings consistent with citation distributions in general and observations of the domain in particular. Fitness variables based on prior impacts and the time factor have significant influences on citation outcomes. We find considerably large effect sizes from the fitness modeling, which suggest inevitable bias in citation analysis due to these factors. While raw citation scores offer little insight into the growth of InfoVis, normalization of the scores by influences of time and prior fitness offers a reasonable depiction of the field's development. The analysis demonstrates the proposed model's ability to produce results consistent with observed data and to support meaningful comparison of citation scores over time.
There are fewer female than male professors in the world (21-79 distribution in the country of examination). The unequal distribution of male and female professors has usually been taken to indicate that men and women have not had equal opportunities to achieve professorship. At the same time, the increase in the proportion of female professors has been taken as evidence that academia is becoming more gender equal. It is possible that both of these assumptions are flawed, and that the gender distribution among professors is the result of demographic inertia, i.e., affected by the previous distribution of men and women within the system, and how fast the distribution has changed.This study examines whether the chances, for men and women, of becoming a full professor changes over time, and whether gender differences may possibly depend on early career events. It concludes that women are significantly less likely than men to become professors and that this situation is not improving over time. In spite of policies that have tried to increase the proportion of female professors, the chances of a woman becoming a professor do not change over time. We also show that these gender differences in promotion rate can be attributed to early career events.
The diaspora of a less developed country, who reside outside their country of origin, can contribute to the parent country through financial or knowledge transfers, connections, or on return of talented persons. The knowledgebase of the diasporas is therefore of interest to the parent country. Scientific publications of the Indian diaspora are an indicator of the existing knowledge base of Indians overseas. Samples drawn from Web of Science (1986-2010), using a selected list of unique Indian names, are analyzed with the objective of comparing and identifying distinguishing features of the diaspora. While both Indian and diaspora samples have increased over time, publication output from Indians overseas has increased more rapidly. English was by far the most frequently used language. A major difference was found in the type of publication with many more proceedings papers and meeting abstracts by the diaspora, showing increasing importance of rapid publication of novel results. Number of articles was about the same in both samples, but a more detailed look at the top 100 journals qualifies the nature of the journal space used, which again shows major differences. Articles in Nature and Science confirm the differences in the high impact range. We end with a discussion of limitations which includes effects of changing database coverage with time.
The global development of solar photovoltaic power is seen as a potentially major technology in the pursuit of alternative energy sources. Given its evolutionary nature, in terms of both technology and the market, there is some discernible divergence between the innovative capability and production capacity of certain countries. We set out in the present study to explore this issue by examining the productive and innovative performance of six countries covering the period from 1996 to 2006. Our empirical analyses, at both country level and firm level, provide a strong indication that such tendency of incongruence possibly comes as a result of differences in the business strategies adopted by the various countries, as well as the extent of their technological advantage.
Terahertz technology is one of the most promising research areas in the 21st century. In this work, we intend to compare the research status quo on terahertz technology between 1990 and 2010 using knowledge domain visualization techniques. Our data consists of 633 patents retrieved from Aureka management platform and 10,344 journal articles indexed in the ISI web of knowledge. Our analysis is a combination of two information visualization tools for analysis, Aureka and CiteSpace. Aureka is allowed for the analysis of patents filed/granted each year, priority country, inventors, assignees, citation counting, and cluster analysis, while networks of co-authors, countries, institutions, document co-citation networks and document co-citation clusters, are performed by CiteSpace. This research provides a comprehensive domain visualization map of innovation and knowledge in the area of terahertz technology. Our result shows that Aureka and CiteSpace are two promising visualization approaches to analyze patents and papers in any given field.
Most governmental research assessment exercises do not use citation data for the Social Sciences and Humanities as Web of Science or Scopus coverage in these disciplines is considered to be insufficient. We therefore assess to what extent Google Scholar can be used as an alternative source of citation data. In order to provide a credible alternative, Google Scholar needs to be stable over time, display comprehensive coverage, and provide non-biased comparisons across disciplines. This article assesses these conditions through a longitudinal study of 20 Nobel Prize winners in Chemistry, Economics, Medicine and Physics. Our results indicate that Google Scholar displays considerable stability over time. However, coverage for disciplines that have traditionally been poorly represented in Google Scholar (Chemistry and Physics) is increasing rapidly. Google Scholar's coverage is also comprehensive; all of the 800 most cited publications by our Nobelists can be located in Google Scholar, although in four cases there are some problems with the results. Finally, we argue that Google Scholar might provide a less biased comparison across disciplines than the Web of Science. The use of Google Scholar might therefore redress the traditionally disadvantaged position of the Social Sciences in citation analysis.
This study focuses on analyzing the driving factors of government and industry funding and the effects of such funding on academic innovation performance in the Taiwan's university-industry-government (UIG) collaboration system. This research defines the relationships of the triple helix in the UIG collaboration system as a complex intertwined combination that covers demography, financial support, and innovation performance. These relationships are simultaneously modeled by a multivariate technique, structural equation modeling, to investigate the causal-effect relationship among the antecedent factors on the subsequent ones. This model will enable us to investigate three questions: (1) Is government funding or industry funding tied to university demography, to university innovation performance, or to both? (2) Does government funding lead industry funding? (3) Is government funding or industry funding conducive to more university innovation performance? In addition to verifying the model against all participating universities in the UIG collaboration, we also categorize them into two tiers in terms of whether or not universities have been selected for the incentive programs of UIG collaboration so as to explore groups' differences.
It is shown that the generalized Pareto distribution gives a good fit to citable documents, citations above a threshold and also for the h-index of countries. The h-index has a finite second moment, while the citable documents and citations are extremely heavy tailed with the estimated index of citations less than one. The relationship derived between the h-index, citation and number of publications is also investigated and the model proposed by Glanzel confirmed empirically.
Discovering and assessing fields of expertise in emerging technologies from patent data is not straightforward. First, patent classification in an emerging technology being far from complete, the definitions of the various applications of its inventions are embedded within communities of practice. Because patents must contain full record of prior art, co-citation networks can, in theory, be used to identify and delineate the inventive effort of these communities of practice. However, the use patent citations for the purpose of measuring technological relatedness is not obvious because they can be added by examiners. Second, the assessment of the development stage of emerging industries has been mostly done through simple patent counts. Because patents are not all valuable, a better way of evaluating an industry's stage of development would be to use multiple patent quality metrics as well as economic activity agglomeration indicators. The purpose of this article is to validate the use of (1) patent citations as indicators of technological relatedness, and (2) multiple indicators for assessing an industry's development stage. Greedy modularity optimization of the 'Canadian-made' nanotechnology patent co-citation network shows that patent citations can effectively be used as indicators of technological relatedness. Furthermore, the use of multiple patent quality and economic agglomeration indicators offers better assessment and forecasting potential than simple patent counts.
Understanding how individual scientists build a personal portfolio of research is key to understanding outcomes on the level of scientific fields, institutions, and systems. We lack the scientometric and statistical instruments to examine the development over time of the involvement of researchers in different problem areas. In this paper we present a scientometric method to map, measure, and compare the entire corpus of individual scientists. We use this method to analyse the search strategies of 43 condensed matter physicists along their academic lifecycle. We formulate six propositions that summarise our theoretical expectations and are empirically testable: (1) a scientist's work consists of multiple finite research trails; (2) a scientist will work in several parallel research trails; (3) a scientist's role in research trail selection changes along the lifecycle; (4) a scientist's portfolio will converge before it diverges; (5) the rise and fall of research trails is associated with career changes; and (6) the rise and fall of research trails is associated with the potential for reputational gain. Four propositions are confirmed, the fifth is rejected, and the sixth could not be confirmed or rejected. In combination, the results of the four confirmed propositions reveal specific search strategies along the academic lifecycle. In the PhD phase scientists work in one problem area that is often unconnected to the later portfolio. The postdoctoral phase is where scientists diversify their portfolio and their social network, entering various problem areas and abandoning low-yielding ones. A professor has a much more stable portfolio, leading the work of PhDs and postdoctoral researchers. We present an agenda for future research and discuss theoretical and policy implications.
This paper extends Borgman's (Communication Research 16: 583, 1989) three-facet framework (artifacts, producers, concepts) for bibliometric analyses of scholarly communication by adding a fourth gatekeepers. The four-facet framework was applied to the field of Library and Information Science to test for variations in the networks produced using operationalizations of each of these four facets independently. Fifty-eight journals from the Information Science and Library Science category in the 2008 Journal Citation Report were studied and the network proximity of these journals based on Venue-Author-Coupling (producer), journal co-citation analysis (artifact), topic analysis (concept) and interlocking editorial board membership (gatekeeper) was measured. The resulting networks were examined for potential correlation using the Quadratic Assignment Procedure. The results indicate some consensus regarding core journals, but significant differences among some networks. Holistic measures of scholarly communication that take multiple facets into account are proposed. This work is relevant in an assessment-conscious and metrics-driven age.
The general aim of this paper is to come to terms with the organization and organization level research in scientometrics. Most of the debate on the issues that revolve organization level research in scientometrics is technical. As such, most contributions presume a clear understanding of what constitutes the organization in the first place. To our opinion however, such "a-priorism" is at least awkward, given that even in specialist fields there is no clear understanding of what constitutes the organization. The main argument of this paper holds that performing organization level research in scientometrics can only proceed by taking a pragmatic stance on the constitution of the organization. As such, we argue that performing organization level research in scientometrics (i) requires both authoritative "objective" and non-authoritative "subjective" background knowledge, (ii) involves non-logic practices that can be more or less theoretically informed, and (iii) depends crucially upon the general aim of the research endeavor in which the organization is taken as a basic unit of analysis. To our opinion a pragmatic stance on organization level research in scientometrics is a viable alternative to both overly positivist and overly relativist approaches as well as that it might render the relation between scientometrics and science policy more productive.
In the paper, we apply small world complex network theory to analyze scientific research in the field of service innovation, and discover its research focuses. Our study considers the key words and subject categories of the publications as actors to map keyword co-occurrence network and subject category co-occurrence network, and compare them with their corresponding random binary networks to judge whether these complex networks have the characteristics of small world network, in order to find the hot issues in the field by the small world network analysis. We discuss the knowledge structure in the field through analyzing 437 papers that were searched from Web of Science database over the period 1992-2011. We find that case study, service industry, service quality, market orientation, new product development, and knowledge management were the most popular keywords of the field, and also show the dynamic development of the research focuses in recent 10 years. The researchers who made most contribution in a certain field are also found out. It is concluded that there were more researchers who did investigation about service innovation in the category of Business and Economics, Engineering, Public Administration, Operations Research and Management Science, and Computer Science than those in other categories. The study suggests a quantitative method to analyze trends of scientific research in a certain field, and presents some directions of research mainstream to the researchers who may be interested in the service innovation.
Patenting is often done in collaboration with other inventors to integrate complementary and additional knowledge. The paper takes a spatial view of this issue and analyses the distances between inventors of German patents. We compare the distances between invention teams of German patent applications from 1993-2006 and distinguish between academic and corporate teams and those consisting of researchers from both domains ('mixed teams'). Due to their different institutional backgrounds different types of proximity guide their spatial search for partners. The basic finding is that regional collaboration clearly prevails. However, the distance between collaborating inventors of corporate patents exceeds that of inventors of academic patents, but the largest distances can be found in science-industry collaborative patents. When excluding directly neighboured collaboration, which is likely to be in-house collaboration, the differences between academic and corporate teams vanish, but mixed teams still overcome longer distances.
The access to bibliographic and citation databases allows to evaluate scientific performance, and provides useful means of general characterisation. In this paper we investigate the clustering of Iberian universities, resulting from the similarity in the number and specific nature of the scientific disciplines given by the Essential Science Indicators database. A further refining of the analysis, as provided by PCA, clearly reveals the relationship between the universities and the scientific disciplines in the main groups. Similarity between universities is not dictated only by the number of areas in the ranking, but also stems from the nature of the ranked scientific areas and the specific combination in each university.
This study investigates the effects of large-scale research funding from the Japanese government on the research outcomes of university researchers. To evaluate the effects, we use the difference-in-differences estimator and measure research outcomes in terms of number of papers and citation counts per paper. Our analysis shows that the funding program led to an increase in the number of papers in some fields and an increase in the citation counts in the other fields. A comparison of our estimation results with assessment data obtained from peer reviews showed important differences. Since the characteristics of research vary according to the field, bibliometrics analysis should be used along with the peer review method for a more accurate analysis of research impact.
This paper attempts to highlight quantitatively and qualitatively the growth and development of world literature on materials science in terms of publication output and citations as per Web of Science (2006-2010). The objective of the study was to perform a scientometric analysis of all materials science research publications in the world. The parameters studied include growth of publications and citations, continent-wise distribution of publications and citations, country-wise distribution of publications, domain-wise distribution of publications and citations, publication efficiency index, distribution of publications and citations according to number of collaborating countries, variation of mean impact factor in materials science domains, identification of highly cited publications and highly preferred journals, quality of research output and application of Bradford's law.
This study aimed to identify and analyze the characteristics of the top-cited articles published in the Science Citation Index Expanded from 1991 to 2010. Articles that have been cited more than 1,000 times since publication to 2010 were assessed regarding their distribution in indexed journals and categories of the Web of Science. Five bibliometric indicators were used to evaluate source institutions and countries. A new indicator, the Y-index, is proposed to assess publication quantity and the character of contribution to articles. We identify 3,652 top-cited articles with 71 % originating from US. The fourteen most productive institutions were all located in US. Science, Nature, New England Journal of Medicine, and Cell hosted the most cited publications. In addition, the Y-index was successfully applied to evaluate the publication character of authors, institutions, and countries.
The purpose of this paper is to explore the core and emerging knowledge of electronic commerce (e-commerce) research. Data was collected from the top six e-commerce journals from 2006-2010. A total of 1,064 electronic commerce related articles and 33,173 references were identified. There were 48 high value research articles identified using a citation and co-citation analysis. Using statistical analysis including factor analysis, multidimensional scaling, and cluster analysis, we identified five research areas: trust, technology acceptance and technology application, e-commerce task-related application, e-markets, and identity and evaluation. We also identified emerging core knowledge, information systems success. The findings of this study provide core knowledge and directions for researchers and practitioners interested in the electronic commerce field.
This study aims to identify possible gender inequalities in the scholarly output of researchers in the field of psychology in Spain. A sample of 522 papers and reviews published in 2007 was extracted from the Thomson ISI Web of Science. The presence of women, the collaboration pattern and the impact of these scientific publications were analyzed. The results show that the average number of female researchers per paper was 0.42 (SD 0.33) and that 42.3 % of the papers had a female researcher as the first author. Moreover, the proportion of female authors of a paper was statistically significantly higher when the first author was female. Studies carried out in cooperation with other Spanish or international institutions had fewer female authors than studies conducted at a single center. The impact of the papers, measured by the journal impact factor and the number of citations, was independent of the authors' gender or the proportion of female authors. In summary, the study highlights a gender imbalance in Spanish scientific output in Psychology, and a higher proportion of male researchers in international networks.
The central area indices and the central interval indices, as introduced in Dorta-Gonzalez and Dorta-Gonzalez (Scientometrics 88(3):729-745, 2011), are studied from a theoretical point of view. They are defined in order to yield higher impact values of "selective" authors (i.e., authors with concentrated number of citations over their publications). We show that this property is not valid for every citation distribution. However, if Zipf's law is adopted for the citation distribution, we can show that the central area indices and the central interval indices have indeed higher values for more selective authors.
We performed a bibliometric analysis of published research on Global Positioning System (GPS) for the period of 1991-2010, based on the Science Citation Index and Social Sciences Citation Index databases. Our search identified a total of 15,759 GPS-related publications in the period. We analyzed the patterns of publication outputs, subject categories and major journals, international productivity and collaboration, geographic distribution of authors, and author keywords. The annual number of publications in GPS research increased from 98 in 1991 to 1934 in 2010. "Geochemistry & Geophysics", "Geosciences, Multidisciplinary", and "Engineering, Electrical & Electronic" were the top 3 most popular subject categories. As the flagship journal in the field, Geophysical Research Letters had the highest publication count. The USA, the UK and Germany were the top 3 most productive countries. The most productive institution was the California Institute of Technology (Caltech), followed by the Chinese Academy of Sciences and the University of Colorado. The USA was the most frequent partner in international collaborations. Caltech took the central position in the collaboration network. The major spatial clusters of authors were in the USA, the Europe Union, and East Asia (including China, Japan and South Korea). "Ionosphere", "Remote Sensing" and "Monitoring" are growing research subjects in the field of GPS, while "Deformation", "Geoid" and "Tectonics" are becoming gradually less significant. Our study revealed underlying patterns in scientific outputs and academic collaborations and may serve as an alternative and innovative way of revealing global research trends in GPS.
We continue the investigation for more than 2,150 astrophysics papers published from July 2007 to June 2008 of various possible correlations among time from submission to acceptance; nationalities of lead authors; numbers of citations to the papers in three years after publication; subdisciplines; and numbers of authors. Paper I found that submissions from American authors were accepted faster than others but by only about 3.8 days out of a median of 105 days. Here we report the following additional relationships: (1) the correlation of citation rate with lag time is weak, the most cited papers having intermediate lag times, (2) citation rates are highest for papers with European and American authors and much smaller for papers from less-developed (etc.) countries, with other prosperous countries in between, (3) citation rates are much larger for currently hot topics (exoplanets, cosmology), than for less hot ones (binary stars, for instance), (4) papers with many authors (seven to more than 100) are more often cited than 1-2 author ones, but this is not linear, and author numbers are not much correlated with lag times, and (5) the lag time for hot topics is about the same as that for less hot topics, which surprised us. Of specific subfields, solar papers are, on average, accepted fastest, quite often within less than 2 months. We don't know why.
The visibility of an article depends to a large extent on its authors. We study the question how each co-author's relative contribution to the visibility of the article can be determined and quantified using an indicator, referring to such an indicator as a CAV-indicator. A two-step procedure is elaborated, whereby one first chooses an indicator (e.g. total number of citations, h-index aEuro broken vertical bar) and subsequently one of two possible approaches. The case where the indicator is an h-type index is elaborated in a Lotkaian framework. Different examples illustrate the procedure and the choices involved in determining a CAV-indicator.
Publication productivity during 2009-2011 was studied for physicists who teach in South African universities, using data from departmental websites and Thomson Reuters' Web of Science. The objective was to find typical ranges of two measures of individual productivity: number of papers and sum of author share, where author share per n-author paper is 1/n author units (AU). All values given below are average output per year. Median productivity was 1.33 papers (inter-quartile range 0.33-2.33) and 0.3 AU (inter-quartile range 0.1-0.5 AU). The lowest 10 % did not publish, and the top 10 % produced above four papers and above 1 AU. Productivity varied with rank, ranging from medians of 0.67 papers and 0.2 AU for lecturers to 1.67 papers and 0.4 AU for full professors. Productivity of South African professors was similar to that of a sample of USA professors in a comparable mid-ranked bracket in the Shanghai Jiao Tong world ranking of universities, and about half that of professors in the six top-ranked departments in the world, which had medians of four papers and 1 AU.
The study of journal authorship and editorial board membership from a gender perspective is addressed in this paper following international recommendations about the need to obtain science and technology indicators by gender. Authorship informs us about active scientists who contribute to the production and dissemination of new knowledge through journal articles, while editorial board membership tells us about leading scientists who have obtained scientific recognition within the scientific community. This study analyses by gender the composition of the editorial boards of 131 high-quality Spanish journals in all fields of science, the presence of men and women as authors in a selection of 36 journals, and the evolution of these aspects from 1998 to 2009. Female presence is lower than male presence in authorship, editorial board membership and editorship. The presence of female authors is slightly lower than the presence of women in the Spanish Higher Education sector and doubles female presence in editorial boards, which mirrors female presence in the highest academic rank. The gender gap tends to diminish over the years in most areas, especially in authorship and very slightly in editorial board membership. Large editorial boards and having a female editor-in-chief are positively correlated with women presence in editorial boards. The situation of women in Spanish science is further assessed in an international context analysing a selection of international reference journals. The usefulness of journal-based indicators to monitor the situation of men and women in science and to assess the success of policies oriented to enhance gender equality in science is finally discussed.
A desirable goal of scientific management is to introduce, if it exists, a simple and reliable way to measure the scientific excellence of publicly funded research institutions and universities to serve as a basis for their ranking and financing. While citation-based indicators and metrics are easily accessible, they are far from being universally accepted as way to automate or inform evaluation processes or to replace evaluations based on peer review. Here we consider absolute measurements of research excellence at an amalgamated, institutional level and specific measures of research excellence as performance per head. Using biology research institutions in the UK as a test case, we examine the correlations between peer review-based and citation-based measures of research excellence on these two scales. We find that citation-based indicators are very highly correlated with peer-evaluated measures of group strength, but are poorly correlated with group quality. Thus, and almost paradoxically, our analysis indicates that citation counts could possibly form a basis for deciding on, how to fund research institutions, but they should not be used as a basis for ranking them in terms of quality.
An increasing number of researchers have recently shown interest in the relationship between economic growth of a country and its research output, measured in scientometric indicators. The answer is not only of theoretical interest but it can also influence the specific policies aimed at the improvement of a country's research performance. Our paper focuses on this relationship. We argue that research output is a manifestation of the improvement of human capital in the economy. We examine this relationship specifically in South Africa for the period 1980-2008. Using the autoregressive distributed lag method, we investigate the relationship between GDP and the comparative research performance of the country in relation to the rest of the world (the share of South African papers compared to the rest of the world). The relationship is confirmed for individual fields of science (biology and biochemistry, chemistry, material sciences, physics, psychiatry and psychology). The results of this study indicate that in South Africa for the period 1980-2008 the comparative performance of the research output can be considered as a factor affecting the economic growth of the country. Similarly, the results confirm the results of Vinkler (2008) and Lee et al. (2011). In contrast, economic growth did not influence the research output of the country for the same period. Policy implications are also discussed.
Bibliometrics, "scientometrics", "informetrics", and "webometrics" can all be considered as manifestations of a single research area with similar objectives and methods, which we call "information metrics" or iMetrics. This study explores the cognitive and social distinctness of iMetrics with respect to the general information science (IS), focusing on a core of researchers, shared vocabulary and literature/knowledge base. Our analysis investigates the similarities and differences between four document sets. The document sets are drawn from three core journals for iMetrics research (Scientometrics, Journal of the American Society for Information Science and Technology, and Journal of Informetrics). We split JASIST into document sets containing iMetrics and general IS articles. The volume of publications in this representation of the specialty has increased rapidly during the last decade. A core of researchers that predominantly focus on iMetrics topics can thus be identified. This core group has developed a shared vocabulary as exhibited in high similarity of title words and one that shares a knowledge base. The research front of this field moves faster than the research front of information science in general, bringing it closer to Price's dream.
This article presents a review of the social media-based systems; an emerging area of information system research, design, and practice shaped by social media phenomenon. Social media-based system (SMS) is the application of a wider range of social software and social media phenomenon in organizational and non-organization context to facilitate every day interactions. To characterize SMS, a total of 274 articles (published during 2003-2011) were analyzed that were classified as computer science information system related in the Web of Science data base and had at least one social media phenomenon related keyword-social media; social network analysis; social network; social network site; and social network system. As a result, we found four main research streams in SMS research dealing with: (1) organizational aspect of SMS, (2) non-organizational aspect of SMS, (3) technical aspect of SMS, and (4) social as a tool. The results indicates that SMS research is fragmented and has not yet found way into the core IS journals, however, it is diverse and interdisciplinary in nature. We also proposed that unlike the conventional and socio-technical IS where information is bureaucratic, formal, bounded within the intranet, and tightly controlled by organizations; in the SMS context, information is social, informal, boundary-less (i.e. boundary is within the internet), has less control, and more sharing of information may lead to higher value/impact.
With the rapid rise of Chinese economy, now ranking as the second largest economy in the world in 2010, many Chinese firms have started taking technological lead in the global market. Nevertheless, whether Chinese firms have learned from their prior in-licensing technologies and accumulated technological capabilities in sustaining their economic growth remains underexplored. This paper aims to fill this void. Using a unique dataset containing the information on licensing for 83 large Chinese firms in the electronic sector during 2000-2004, we find that these firms have successfully learned from the international technologies that they previously licensed-in when subsequent patent citations made by these Chinese licensee firms to their licensed patents are used to identify these successful learners.
The paper reports the developments and citation patterns over three time periods of research on Renewable Energy generation and Wind Power 1995-2011 in EU, Spain, Germany and Denmark. Analyses are based on Web of Science and incorporate journal articles as well as conference proceeding papers. Scientometric indicators include publication collaboration ratios, top-player distribution as well as citedness and correspondence analyses of citing publications, relative citation impact, distributions of top-cited as well as top-citing institutions and publication sources and cluster analysis of citing title terms to map knowledge export areas. Findings show an increase in citation impact for Renewable Energy and Wind Power research albeit hampered by scarcely cited conference papers. Although EU maintains its global top position in producing Renewable Energy and Wind Power research the developments of EU and German world shares as well as citation impact are negative during the most recent 7 year period. During the same time the citation impact of Spain and Denmark increase and place both nations among the top-ranking countries in Wind Power research. Spain is the only EU country that increases its world production share from 2000. China is currently ranked three after EU and USA in research output, however with a very low citation impact. Spain, Denmark and Germany each demonstrates distinct collaboration patterns and publication source and citation distribution profiles. More than half the citations to EU Wind Power research are EU-self citations. An expected intensified EU collaboration in the Wind Energy field does not come about. The most productive research institutions in Denmark and Spain are also the most cited ones.
The network of patents connected by citations is an evolving graph, which provides a representation of the innovation process. A patent citing another implies that the cited patent reflects a piece of previously existing knowledge that the citing patent builds upon. A methodology presented here (1) identifies actual clusters of patents: i.e., technological branches, and (2) gives predictions about the temporal changes of the structure of the clusters. A predictor, called the citation vector, is defined for characterizing technological development to show how a patent cited by other patents belongs to various industrial fields. The clustering technique adopted is able to detect the new emerging recombinations, and predicts emerging new technology clusters. The predictive ability of our new method is illustrated on the example of USPTO subcategory 11, Agriculture, Food, Textiles. A cluster of patents is determined based on citation data up to 1991, which shows significant overlap of the class 442 formed at the beginning of 1997. These new tools of predictive analytics could support policy decision making processes in science and technology, and help formulate recommendations for action.
National Research Assessment Exercises (NRAEs) aim to improve returns from public funding of research. Critics argue that they undervalue publications influencing practice, not citations, implying that journals valued least by NRAEs are disproportionately useful to practitioners. Conservation biology can evaluate this criticism because it uses species recovery plans, which are practitioner-authored blueprints for recovering threatened species. The literature cited in them indicates what is important to practitioners' work. We profiled journals cited in 50 randomly selected recovery plans from each of the USA, Australia and New Zealand, using ranking criteria from the Australian Research Council and the SCImago Institute. Citations showed no consistent pattern. Sometimes higher ranked publications were represented more frequently, sometimes lower ranked publications. Recovery plans in all countries also contained 37 % or more citations to 'grey literature', discounted in NRAEs. If NRAEs discourage peer-reviewed publication at any level they could exacerbate the trend not to publish information useful for applied conservation, possibly harming conservation efforts. While indicating the potential for an impact does not establish that it occurs, it does suggest preventive steps. NRAEs considering the proportion of papers in top journals may discourage publication in lower-ranked journals, because one way to increase the proportion of outputs in top journals is by not publishing in lower ones. Instead, perhaps only a user-nominated subset of publications could be evaluated, a department's or an individual's share of the top publications in a field could be noted, or innovative new multivariate assessments of research productivity applied, including social impact.
The purpose of this study is to integrate the method of chance discovery with visualization tools (KeyGraph) for presenting important and latent research topics in the e-commerce (EC) field. This study collects keywords and abstracts from 995 articles in four primary EC journals. To establish the professional terms of EC, this work divides EC development into three periods: the development of the Internet, the growth of information technology, and the extension of commerce applications. For exploring significant and latent EC topics, this study analyzes the differences and similarities between international and Taiwanese sources. Pursuing this approach yields three findings. First, this paper determines that the KeyGraph as a computing process and a visualization tool is an effective method for exploring future research topics. Second, international EC topics have different thematic characteristics at different phases and they are more diverse and extensive than Taiwanese sources. Third, a professional thesaurus is very helpful in identifying EC research topics. All these findings suggest Taiwanese scholars should pay more attention to research issues from international journals when studying EC.
Negative results are not popular to disseminate. However, their publication would help to save resources and foster scientific communication. This study analysed the bibliometric and semantic nature of negative results publications. The Journal of Negative Results in Biomedicine (JNRBM) was used as a role model. Its complete articles from 2002-2009 were extracted from SCOPUS and supplemented by related records. Complementary negative results records were retrieved from Web of Science in "Biochemistry" and "Telecommunications". Applied bibliometrics comprised of co-author and co-affiliation analysis and a citation impact profile. Bibliometrics showed that authorship is widely spread. A specific community for the publication of negative results in devoted literature is non-existent. Neither co-author nor co-affiliation analysis indicated strong interconnectivities. JNRBM articles are cited by a broad spectrum of journals rather than by specific titles. Devoted negative results journals like JNRBM have a rather low impact measured by the number of received citations. On the other hand, only one-third of the publications remain uncited, corroborating their importance for the scientific community. The semantic analysis relies on negative expressions manually identified in JNRBM article titles and abstracts and extracted to syntactic patterns. By using a Natural Language Processing tool these patterns are then employed to detect their occurrences in the multidisciplinary bibliographical database PASCAL. The translation of manually identified negation patterns to syntactic patterns and their application to multidisciplinary bibliographic databases (PASCAL, Web of Science) proved to be a successful method to retrieve even hidden negative results. There is proof that negative results are not only restricted to the biomedical domain. Interestingly a high percentage of the so far identified negative results papers were funded and therefore needed to be published. Thus policies that explicitly encourage or even mandate the publication of negative results could probably bring about a shift in the current scientific communication behaviour.
In order to explore the rule of knowledge creation activities at both temporal and spatial scales, this paper makes statistical analysis of the time interval and spatial displacement of consecutive knowledge creation activities of high-yield, low-yield, and ASFP (all the scientists published at least four papers), respectively. The research shows that, for high-yield scientists, the time interval of knowledge creation activities obeys heavy-tailed distribution and embodies bursting features, with both long-time silence and intensive burst of creation activities. The time interval distribution of low-yield scientists is approximate to exponential distribution, and is often randomly and occasionally distributed. For ASFP, the spatial distribution of creation activities also embodies heavy-tailed features, where their activities are intensively confined to a certain knowledge field, and where long-distance exploration across the knowledge fields has also been made in knowledge creation activities.
The Antarctic continent is the most untouched region of the world but is also among the most vulnerable to global environmental change. Alterations to the Antarctic environment can have cascading effects many of which are unpredictable. Our objective was to investigate the contribution of Brazilian scientists to Antarctic research and to characterize the actions taken by the country to improve its scientific output and its international impact in this area. Scientific publications related to Antarctica, released from 1981 to 2011 were searched using three important science data bases. The data were used to determine the absolute increase and the relative growth rate of publications in order to characterize the contribution of Brazil to the world's scientific understanding of Antarctica. The number of publications revealed an undersized contribution of the Brazilian science to the world's publications about Antarctica. However, over the last 30 years there has been a substantial increase in the number of publications associated with governmental financial policies. As in other countries, Brazil's most significant scientific contributions regarding the Antarctic continent are in the biological sciences. Therefore, public policies should maintain the current official support, while the research groups should pay attention to strategic scientific and technological areas still uncovered in the Antarctic research.
This study investigates, at the journal as well as the article level, if there is a difference in citations between English-language and non-English publications. The Web of Knowledge is used as data source. The investigation focuses on the fields of physics and chemistry. Using a precise definition of a "non-English journal", we filter out nine physics and thirty-four chemistry non-English journals, scattered over six physics and seven chemistry subfields. Average received citations per paper (CpP) of the non-English journal(s) are compared with the CpP of pure English journals, and this in the same subfield. We clearly observe that non-English journals are inferior-in number of citations received-to pure English journals and this in all physics and chemistry subfields studied. Further, twelve physics journals and ten chemistry journals were chosen as sample journals to compare the CpP of non-English papers with that of English language papers in the same journal. The result of this comparison is that for the majority of these journals and for most of the publication years the CpP of non-English papers is lower than that of the English language papers. Finally, analyzing linguistic characteristics of the citing literature confirms the own-language preference in non-English physics and chemistry journals.
Bibliometric analysis of publication metadata is an important tool for investigating emerging fields of technology. However, the application of field definitions to define an emerging technology is complicated by ongoing and at times rapid change in the underlying technology itself. There is limited prior work on adapting the bibliometric definitions of emerging technologies as these technologies change over time. The paper addresses this gap. We draw on the example of the modular keyword nanotechnology search strategy developed at Georgia Institute of Technology in 2006. This search approach has seen extensive use in analyzing emerging trends in nanotechnology research and innovation. Yet with the growth of the nanotechnology field, novel materials, particles, technologies, and tools have appeared. We report on the process and results of reviewing and updating this nanotechnology search strategy. By employing structured text-mining software to profile keyword terms, and by soliciting input from domain experts, we identify new nanotechnology-related keywords. We retroactively apply the revised evolutionary lexical query to 20 years of publication data and analyze the results. Our findings indicate that the updated search approach offers an incremental improvement over the original strategy in terms of recall and precision. Additionally, the updated strategy reveals the importance for nanotechnology of several emerging cited-subject categories, particularly in the biomedical sciences, suggesting a further extension of the nanotechnology knowledge domain. The implications of the work for applying bibliometric definitions to emerging technologies are discussed.
Today, university ranking has turned into a critical issue in the world. Each university is identified with a surface form under which the whole performance of that university is assessed. This article intends to provide a clear picture of the inconsistencies observed in recording Iranian university titles by their affiliated authors and to clarify the negative impact of such inconsistencies in positioning Iranian universities in global university ranking systems. To collect various surface forms of Iranian university names, use was made of ISI Web of Science through keywords Cu = Iran and py = 2000-2009. Only MSRT universities were considered. Two M.A. experts listed all variant forms of a single university under that name. The form publicized in a university's website was considered as its entry name. The major sources of variation identified were as follows: Acronyms, misspellings, abbreviations, space variations, syntactic permutation, application of vowels/consonants and vowel/consonant combinations, /a/vs./aa/, Tashdid, Kasra ezafe, redundancy, downcasing, voiceless glottal stop sound /?/, shortening and deletion of titles. It was found that at its present shape Iranian universities are not receiving the rank they really deserve simply because authors affiliated to a university use university title forms inconsistently. It was recommended that authors follow the surface form publicized by universities in their websites, use the help of an editor in their works, and not be credited for their articles in case the forms deviate from those publicized through the websites. A spell checker, as an add-ins software is highly needed to homogenize Iranian university surface forms by replacing the variants by the dominant form proposed.
What are the factors which render an article more likely to be cited? Using social network analysis of citations between published scholarly works, the nascent field around Social Studies of Science is examined from its incipience in 1971 until 2008. To gauge intellectual positioning, closeness centrality and orthodoxy rates are derived from bibliographic networks. Bibliographic orthodoxy is defined as the propensity of an article to cite other highly popular works. Orthodoxy and closeness centrality have differing effects on citation rates, varying across historical periods of development in the field. Effects were modest, but significant. In early time periods, articles with higher orthodoxy rates were cited more, but this effect dissipated over time. In contrast, citations associated with closeness centrality increased over time. Early SSS citation networks were smaller, less structurally cohesive and less modular than later networks. In contrast, later networks were larger, more structurally cohesive, more modular and less dense. These changes to the global SSS knowledge networks are linked to changes in the scientific reward structure ensconced in the network, particularly regarding orthodoxy and closeness centrality.
This paper examines the possible home bias in the citation of the 300 most-cited articles in selected management journals between 2005 and 2009. The management journals chosen for the study were the ten with the greatest average impact over the last 5 years. The theoretical framework was built on: the theory of asymmetric information furnished by Financial Economics; contributions in the bibliometric field which indicate geographical bias in the scientific community's citation patterns, and the notion of paradigm, employed in the Sociology of Science field. The data from the sample provide empirical evidence of a home bias in the citation pattern of the papers analysed. Here, home bias is defined as the positive difference between the percentage of a country's self-citations minus the average number of citations of the same nation's work by the remaining countries surveyed.
This paper presents a new method for comparing universities based on information theoretic measures. The research output of each academic institution is represented statistically by an impact-factor histogram. To this aim, for each academic institution we compute the probability of occurrence of a publication with impact factor in different intervals. Assuming the probabilities associated with a pair of academic institutions our objective is to measure the Information Gain between them. To do so, we develop an axiomatic characterization of relative information for predicting institution-institution dissimilarity. We use the Spanish university system as our scenario to test the proposed methodology for benchmarking three universities with the rest as a case study. For each case we use different scientific fields such as Information and Communication Technologies, Medicine and Pharmacy, and Economics and Business as we believe comparisons must take into account their disciplinary context. Finally we validate the Information Gain values obtained for each case with previous studies.
This paper evaluates the European Paradox according to which Europe plays a leading world role in terms of scientific excellence, measured in terms of the number of publications, but lacks the entrepreneurial capacity of the US to transform this excellent performance into innovation, growth, and jobs. Citation distributions for the US, the European Union (EU), and the Rest of the World are evaluated using a pair of high- and low-impact indicators, as well as the mean citation rate (MCR). The dataset consists of 3.6 million articles published in 1998-2002 with a common 5-year citation window. The analysis is carried at a low aggregation level, namely, the 219 sub-fields identified with the Web of Science categories distinguished by Thomson Scientific. The problems posed by international co-authorship and the multiple assignments of articles to sub-fields are solved following a multiplicative strategy. We find that, although the EU has more publications than the US in 113 out of 219 sub-fields, the US is ahead of the EU in 189 sub-fields in terms of the high-impact indicator, and in 163 sub-fields in terms of the low-impact indicator. Finally, we verify that using the high-impact indicator the US/EU gap is usually greater than when using the MCR.
Multimedia has taken on a very important role in our daily life which has led to a rapid growth research on this topic. Multimedia research covers a variety of problem domains so one must examine many current popular research areas to obtain a basic understanding of current multimedia research. This allows us to understand what has been done recently and to consider what will be more important in future. In this study, we collect and analyze data from ACM Multimedia conferences from 2007 to 2011. In particular, the organized sessions (or areas) and the citation count of popular areas are examined using the Web of Science and Google Scholar. Then, the self-organizing map method is used as a visualization tool for keyword analysis in order to identify popular areas and research topics in multimedia. In addition, we also examine the consistency of the identified popular research areas and topics between the ACM Multimedia conferences and two recent journal special issues.
Universities currently need to satisfy the demands of different audiences. In light of the increasing policy emphasis on "third mission" activities, universities are attempting to incorporate these into their traditional missions of teaching and research. University strategies to accomplishing its traditional missions are well-honed and routinized, but the incorporation of the third mission is posing important strategic and managerial challenges for universities. This study explores the relationship between university-business collaborations and academic excellence in order to examine the extent to which academic institutions can balance these objectives. Based on data from the UK Research Assessment Exercise 2001 at the level of the university department, we find no systematic positive or negative relationship between scientific excellence and engagement with industry. Across the disciplinary fields reported in the 2001 Research Assessment Exercise (i.e. engineering, hard sciences, biomedicine, social sciences and the humanities) the relationship between academic excellence and engagement with business is largely contingent on the institutional context of the university department. This paper adds to the growing body of literature on university engagement with business by examining this activity for the social sciences and the humanities. Our findings have important implications for the strategic management of university departments and for higher education policy related to measuring the performance of higher education research institutions.
A series of techniques based on bibliometric clustering and mapping for scientometrics analysis was implemented in a software toolkit called CATAR for free use. Application of the toolkit to the field of library and information science (LIS) based on journal clustering for subfield identification and analysis to suggest a proper set of LIS journals for research evaluation is described. Two sets of data from Web of Science in the Information Science & Library Science (IS&LS) subject category of Journal Citation Reports were analyzed: one ranges from year 2000 to 2004, the other from 2005 to 2009. The clustering results in graphic dendrograms and multi-dimensional scaling maps from both datasets consistently show that some IS&LS journals clustered in the management information systems subfield are distant from the other journals in terms of their intellectual base. Additionally, the cluster characteristics analyzed based on a diversity index reveals the regional characteristics for some identified subfields. Since journal classification has become a high-stake issue that affects the evaluation of scholars and universities in some East Asian countries, both cases (isolation in intellectual base and regionalism in national interest) should be taken into consideration when developing research evaluation in LIS based on journal classification and ranking for the evaluation to be fairly implemented without biasing future LIS research.
In this study we present an analysis of the research trends in Pakistan in the field of biotechnology for the period 1980-2011. Starting with just 15 publications in 1980 with a negligible annual growth rate for the initial 15 years, the number of publications reached 3,273 in 2011 with an annual growth rate of 22 % for the last 15 years. This growth in publications is studied through factors such as Relative Growth Rate and Doubling Time. A comparison of organizations actively engaged in research in biotechnology is made through factors such as their total publications, total citations, and average citations per paper and indices that determine the quality of publications like h-index, g-index, hg-index and p-index. University of Karachi shows the highest number of publications (2,698), while National Institute of Biotechnology and Genetic Engineering with fewer publications shows the highest average citation per paper (8.07). Agha Khan University however, shows the highest h, g, hg and p indices.
Conferences play a major role for the development of scientific domains. While journal and article contributions in the field of international business (IB) are a general and well researched area of scientometric studies, conferences are not. The absence of a systematic assessment of international business conferences as a reference to the collective status of the Academy of International Business (AIB) community is astonishing. Whatever reasons are accountable for that fact, this paper starts to fill that gap. It establishes a knowledge network composed of the last six years AIB conferences. We collected all the contributions in full text with their abstracts and keywords from 2006 to 2011. All the data have been organized in a data system and we used the information-theoretic clustering method which allows different analytical views through the entire knowledge corpus. The results indicate significant statistical differences between topic modules and keyword threads of the yearly conferences. There are three keywords which dominate as a leitmotif between 2006 and 2011, but the detailed structure changes from conference to conference significantly.
We propose an indicator to "measure" the extent to which co-publication through international collaboration enhances the value of scientific output of an organisation or agency performing academic research. A second order approach is used which combines a quality proxy (impact) and a quantity or size proxy (number of papers published) to yield a trinity of energy like scalar proxies. From these it is possible to define an index of foreign collaboration and another evenness indicator that shows the size and unevenness of the role foreign collaboration plays in the total academic output of the organization.
The objective of this study is to investigate scientific collaboration in biotechnology in the northeast region of Brazil. The data presented refer to the 1980-2010 period and were collected from the Brazilian National Council for Scientific and Technological Development platform database known as Lattes (a compilation of curricula vitae of researchers in Brazil, including a record of their scientific production) and from the Institute for Scientific Information Web of Science database. Our analysis involved the use of bibliometric indicators, specifically co-authorship between or among institutions, as well as the evaluation of social networks and multivariate statistics. Overall, we verified that collaboration takes place mostly at the intra-institutional level. At intra-regional scale, we could observe the development of four clusters in relation to the collaboration dynamics, in which geographic proximity stands out as grouping factor. At the interregional level, the partnerships revolve around institutions that count with laboratory infrastructure and research tradition in the field of biotechnology. Regarding international collaboration, it remains connected to national scientific cooperation programs.
Mention indicators have frequently been used in Webometric studies because they provide a powerful tool for determining the degree of visibility and impact of web resources. Among mention indicators, hypertextual links were a central part of many studies until Yahoo! discontinued the 'linkdomain' command in 2011. Selective links constitute a variant of external links where both the source and target of the link can be selected. This paper intends to study the influence of social platforms (measured through the number of selective external links) on academic environments, in order to ascertain both the percentage that they constitute and whether some of them can be used as substitutes of total external links. For this purpose, 141 URLs belonging to 76 Spanish universities were compiled in 2010 (before Yahoo! stopped their link services), and the number of links from 13 selected social platforms to these universities were calculated. Results confirm a good correlation between total external links and links that come from social platforms, with the exception of some applications (such as Digg and Technorati). For those universities with a higher number of total external links, the high correlation is only maintained on Delicious and Wikipedia, which can be utilized as substitutes of total external links in the context analyzed. Notwithstanding, the global percentage of links from social platforms constitute only a small fraction of total links, although a positive trend is detected, especially in services such as Twitter, Youtube, and Facebook.
The purpose was to undertake a descriptive, quantitative comparative analysis of the production, visibility and online access to public health research results in the field of health public policy among Mexico, Chile and Argentina. A literature search in the field was conducted in MEDLINE (1966-2010) and LILACS (1980-2010) through BIREME's virtual health library. A bibliometric analysis was conducted to identify the type of documents produced, authorship, language of publication, check-tags, major subject content, journals used, and main participating institutions. Visibility was obtained through the identification of the document type used and the subject content, per database. Accessibility was limited to online full-text access. Only 6 (out of 30) health science descriptors under health public policy have emerged as relevant by all three countries in both databases; namely, health services accessibility; health care reform; decentralization; health systems; consumer participation and financing, health. References retrieved from MEDLINE corresponded to journal articles in all three countries. In LILACS monographs corresponded to over 40 %. Overall health public policy documents addressed adult female and male studies, with the exception of Argentina which addressed female and male children. Full-text accessibility was less than 25 % of total production. Health public policy research is in its infancy in Spanish speaking Latin America. While health care reforms have been implemented regionally in the last three decades, few (20 %) subject contents have been explored. Further research is needed to fill existing gaps; as well as bigger efforts to increase online full-text accessibility and dissemination of research results.
Bibliometric analysis techniques are increasingly being used to analyze and evaluate scientific research produced by institutions and grant funding agencies. This article uses bibliometric methods to analyze journal articles funded by NOAA's Office of Ocean Exploration and Research (OER), an extramural grant-funding agency focused on the scientific exploration of the world's oceans. OER-supported articles in this analysis were identified through grant reports, personal communication, and acknowledgement of OER support or grant numbers. The articles identified were analyzed to determine the number of publications and citations received per year, subject, and institution. The productivity and citation impact of institutions in the US receiving OER grant funding were mapped geographically. Word co-occurrence and bibliographic coupling networks were created and visualized to identify the research topics of OER-supported articles. Finally, article citation counts were evaluated by means of percentile ranks. This article demonstrates that bibliometric analysis can be useful for summarizing and evaluating the research performance of a grant funding agency.
The journal Impact Factor (IF) is not comparable among fields of science and social science because of systematic differences in publication and citation behaviour across disciplines. In this work, a decomposing of the field aggregate impact factor into five normally distributed variables is presented. Considering these factors, a principal component analysis is employed to find the sources of the variance in the Journal Citation Reports (JCR) subject categories of science and social science. Although publication and citation behaviour differs largely across disciplines, principal components explain more than 78 % of the total variance and the average number of references per paper is not the primary factor explaining the variance in impact factors across categories. The categories normalized impact factor based on the JCR subject category list is proposed and compared with the IF. This normalization is achieved by considering all the indexing categories of each journal. An empirical application, with one hundred journals in two or more subject categories of economics and business, shows that the gap between rankings is reduced around 32 % in the journals analyzed. This gap is obtained as the maximum distance among the ranking percentiles from all categories where each journal is included.
This paper aimed to present the profile of the researchers, the pattern of scientific collaboration and the knowledge organization in the area of information science in Brazil. The study covered sex differences, skills by region and type of institution, academic formation, indicators of productivity, relations of co-authorship, interactions with other fields of knowledge, and sectors of application of the researches developed in the area. The survey, covering the period 2000-2010, was based on information from the curricula vitae of the researchers with Research Productivity Grant funded by a government agency and from the Directory of Research Group of the National Council for Scientific and Technological Development. The results revealed that the majority of the researchers are women, both in research and postgraduate; there is a significant regional asymmetry; the studies are concentrated in public universities; the papers are published mainly in national journals with open access; the scientific production follows the same pattern of the areas of humanities, social sciences, and linguistics, literature and arts; there is a tendency of increasing the incidence and extent of co-authored papers; there is interaction with other 20 areas of knowledge, which are directly or indirectly connected, forming a single component that comprises all of them; and 'information and S&T management' followed by 'education' are the main sectors of application of the studies developed by the Brazilian researchers. The study therefore showed an overview of this scientific community seeking to contribute to a better understanding of its characteristics and specificities.
This paper analyzes the relationship among research collaboration, number of documents and number of citations of computer science research activity. It analyzes the number of documents and citations and how they vary by number of authors. They are also analyzed (according to author set cardinality) under different circumstances, that is, when documents are written in different types of collaboration, when documents are published in different document types, when documents are published in different computer science subdisciplines, and, finally, when documents are published by journals with different impact factor quartiles. To investigate the above relationships, this paper analyzes the publications listed in the Web of Science and produced by active Spanish university professors between 2000 and 2009, working in the computer science field. Analyzing all documents, we show that the highest percentage of documents are published by three authors, whereas single-authored documents account for the lowest percentage. By number of citations, there is no positive association between the author cardinality and citation impact. Statistical tests show that documents written by two authors receive more citations per document and year than documents published by more authors. In contrast, results do not show statistically significant differences between documents published by two authors and one author. The research findings suggest that international collaboration results on average in publications with higher citation rates than national and institutional collaborations. We also find differences regarding citation rates between journals and conferences, across different computer science subdisciplines and journal quartiles as expected. Finally, our impression is that the collaborative level (number of authors per document) will increase in the coming years, and documents published by three or four authors will be the trend in computer science literature.
In this research, we propose a method to trace scientist's research trends realtimely. By monitoring the downloads of scientific articles in the journal of Scientometrics for 744 h, namely one month, we investigate the download statistics. Then we aggregate the keywords in these downloaded research papers, and analyze the trends of article downloading and keyword downloading. Furthermore, taking both the downloads of keywords and articles into consideration, we design a method to detect the emerging research trends. We find that in scientometrics field, social media, new indices to quantify scientific productivity (g-index), webometrics, semantic, text mining, and open access are emerging fields that scientometrics researchers are focusing on.
In this study, we empirically investigate the role of references in patents in a firm's technological learning and innovation when the patents are transferred (i.e., technology licensing activities) to these firms. This study is based on a sample of 68 Chinese high-tech firms that engaged in patent technology licensing while using a matching sample of non-licensee firms, and it examines covered patents in licensee agreements that were originally registered in the European Patent Office between 2000 and 2005. Empirical results indicate that the reference scope (defined as the number of different patent classes-classes that the examined patent does not belong to-in the backward citations) and the time lag of the backward citations each has a positive effect and a negative effect on the licensee firms' innovation outcomes respectively, measured as the number of Chinese patent applications during the 5 years after the licensing year. However, it failed to find a positive effect of the science-based citations (defined as backward citations to journal articles) as we predicted.
This research aims at performing a comparative study between the Brazilian scientific production in Dentistry, from 2000 to 2009 and countries that contribute with at least 2 % of the world's scientific production indexed in the Scopus database. More specifically, we intend to assess the annual Brazilian scientific production by comparing it to the other countries', analyze the Brazilian and other countries' publications in journals with higher impact factors, as well as to highlight the scientific production from these countries and its international visibility, measured by its total and by its average of citations and normalized citation index per year, by comparing the countries, and to compare the index h of such countries. As work procedure, the SCImago Journal and Country Rank was used as source, identifying the group of producing countries in the Dentistry area from 1996 to 2009. From a total of 136 countries, 13 were highlighted as the most productive, each one of them accounting for at least 2 % the worldwide scientific production in the area. The following indicators were raised for each country: number of produced documents, total of citations, self-citations, average of citations per document and index h. We verified that Brazil is the only country in Latin America that is pictured among the most productive ones in the Dentistry area. We observed that Brazil presents a growing visibility and impact in the international scenery, what suggests that its production is constantly consolidating, with Brazilian scientific recognition in the main vehicles of dissemination in the area.
Scientists collaborate increasingly on a global scale. Does this trend also hold for other bibliometric relations such as direct citations, cocitations and shared references? This study examines citation-based relations in publications published in the journal Scientometrics from 1981 to 2010. Different measures of Mean Geographical Distance (MGD) are tested. If we take all citation links into consideration, there is no indication of MGD increase, but when we look at maximum distances of each relation, a weak tendency of increasing MGD could be observed. One major factor behind the lack of growth of mean distances is the form of the distribution of citation links over distances. Our data suggest that the interactions might grow simultaneously for both short and long distances.
We formulate the problem of how to climb in multi-attribute rankings with known weights using mathematical optimization. A model is derived based on familiar practices used in rankings in higher education where several attributes are combined using known weights to obtain a score. The method applies in any situation where multiple attributes are used to rank entities. We invoke several assumptions such as independence among attributes and that administrators can affect the values of some of the attributes and know the cost of doing so. Our results suggest that a strategy to advance in the rankings is to focus on modifying the value of fewer rather than more attributes. The model is generalized to allow for synergies and antagonisms among the attributes.
Because in terms of research productivity, performance of women is weaker than men's, and since little is known on the factors inhibiting academic women's productivity in Iran, the present article aims to study factors inhibiting research productivity of Iranian women in ISI. To do this, at first, women who have already had published documents indexed in ISI were identified through web of science (WoS). Afterwards, in order to collect their view regarding factors inhibiting women's research productivity, a researcher-made questionnaire was used. To analyze the collected data, the statistical software SPSS (version 17) was used. Both descriptive (percentage and frequency) and inferential (ANOVA) statistics were employed to reach valid findings. The findings indicate that the most inhibitory factors affecting negatively publishing scholarly articles by Iranian women are 'Shortcomings in the existing laws', 'Stereotypes and beliefs concerning women', 'Family work', 'Social and cultural contingencies', 'Child care', and 'Low collaboration with male colleagues'. Finally, some remarks for the improvement of the current condition are highlighted.
In this study, the possibilities to extend the basis for research performance exercises with editorial material are explored. While this document type has been traditionally not considered as an important type of scientific communication in research performance assessment procedures, there is a perception from researchers that editorial materials should be considered as relevant document types as important sources for the dissemination of scientific knowledge. In a number of these cases, some of the mentioned editorial materials are actually 'highly cited'. This lead to a thorough scrutiny of editorials or editorial material over the period 1992-2001, for all citation indexes of Thomson Scientific. The relevance of editorial materials through three quantitative bibliometric characteristics of scientific publications, namely page length, number of references, and the number of received citations, are thoroughly analyzed.
We use a trading metaphor to study knowledge transfer in the sciences as well as the social sciences. The metaphor comprises four dimensions: (a) Discipline Self-dependence, (b) Knowledge Exports/Imports, (c) Scientific Trading Dynamics, and (d) Scientific Trading Impact. This framework is applied to a dataset of 221 Web of Science subject categories. We find that: (i) the Scientific Trading Impact and Dynamics of materials science and transportation science have increased; (ii) biomedical disciplines, physics, and mathematics are significant knowledge exporters, as is statistics and probability; (iii) in the social sciences, economics, business, psychology, management, and sociology are important knowledge exporters; and (iv) Discipline Self-dependence is associated with specialized domains which have ties to professional practice (e.g., law, ophthalmology, dentistry, oral surgery and medicine, psychology, psychoanalysis, veterinary sciences, and nursing). (c) 2012 Elsevier Ltd. All rights reserved.
Analysis of 131 publications during 2006-2007 by staff of the School of Environmental Science and Management at Southern Cross University reveals that the journal impact factor, article length and type (i.e., article or review), and journal self-citations affect the citations accrued to 2012. Authors seeking to be well cited should aim to write comprehensive and substantial review articles, and submit them to journals with a high impact factor which has previously carried articles on the topic. Nonetheless, strategic placement of articles is complementary to, and no substitute for careful crafting of good quality research. Evidence remains equivocal regarding the contribution of an author's prior publication success (h-index) and of open-access journals. Crown Copyright (c) 2012 Published by Elsevier Ltd. All rights reserved.
The SNIP (source normalized impact per paper) indicator is an indicator of the citation impact of scientific journals. The indicator, introduced by Henk Moed in 2010, is included in Elsevier's Scopus database. The SNIP indicator uses a source normalized approach to correct for differences in citation practices between scientific fields. The strength of this approach is that it does not require a field classification system in which the boundaries of fields are explicitly defined. In this paper, a number of modifications that were recently made to the SNIP indicator are explained, and the advantages of the resulting revised SNIP indicator are pointed out. It is argued that the original SNIP indicator has some counterintuitive properties, and it is shown mathematically that the revised SNIP indicator does not have these properties. Empirically, the differences between the original SNIP indicator and the revised one turn out to be relatively small, although some systematic differences can be observed. Relations with other source normalized indicators proposed in the literature are discussed as well. (c) 2012 Elsevier Ltd. All rights reserved.
The data of F1000 and InCites provide us with the unique opportunity to investigate the relationship between peers' ratings and bibliometric metrics on a broad and comprehensive data set with high-quality ratings. F1000 is a post-publication peer review system of the biomedical literature. The comparison of metrics with peer evaluation has been widely acknowledged as a way of validating metrics. Based on the seven indicators offered by InCites, we analyzed the validity of raw citation counts (Times Cited, 2nd Generation Citations, and 2nd Generation Citations per Citing Document), normalized indicators (journal Actual/Expected Citations, Category Actual/Expected Citations, and Percentile in Subject Area), and a journal based indicator (Journal Impact Factor). The data set consists of 125 papers published in 2008 and belonging to the subject category cell biology or immunology. As the results show, Percentile in Subject Area achieves the highest correlation with F1000 ratings; we can assert that for further three other indicators (Times Cited, 2nd Generation Citations, and Category Actual/Expected Citations) the "true" correlation with the ratings reaches at least a medium effect size. (c) 2012 Elsevier Ltd. All rights reserved.
Hypes occur in every domain of human behavior, including scientific research. We show in this contribution that journals and authors who studied the h-index benefited in terms of short-term citations. As, moreover, the introduction of the h-index is more a 'clever find' than a first rate intellectual achievement, its rise can be compared to a stock market bubble. (c) 2012 Elsevier Ltd. All rights reserved.
Standard bibliometric indices were re-defined using a generalized concept of "successful paper". A family-tree based upon the new definitions provides new insights into the relationships between the standard indices, and empty boxes in the family-tree may inspire design of new indices. (c) 2012 Elsevier Ltd. All rights reserved.
Academic productivity and research funding have been hot topics in biomedical research. While publications and their citations are popular indicators of academic productivity, there has been no rigorous way to quantify co-authors' relative contributions. This has seriously compromised quantitative studies on the relationship between academic productivity and research funding. Here we apply an axiomatic approach and associated bibliometric measures to revisit a recent study by Ginther et al. (Ginther et al., 2011a,b) in which the probability of receiving a U.S. National Institutes of Health (NIH) R01 award was analyzed with respect to the applicant's race/ethnicity. Our results provide new insight and suggest that there is no significant racial bias in the NIH review process, in contrast to the conclusion from the study by D. K. Ginther et al. Our axiomatic approach has a potential to be widely used for scientific assessment and management. (c) 2012 Elsevier Ltd. All rights reserved.
The h-index has been shown to have predictive power. Here I report results of an empirical study showing that the increase of the h-index with time often depends for a long time on citations to rather old publications. This inert behavior of the h-index means that it is difficult to use it as a measure for predicting future scientific output. (c) 2013 Elsevier Ltd. All rights reserved.
China's status as a scientific power, particularly in the emerging area of nanotechnology, has become widely accepted in the global scientific community. The role of knowledge spillover in China's nanotechnology development is generally assumed, albeit without much convincing evidence. Very little has been investigated on the different mechanisms of knowledge spillover. Utilizing both cross-sectional data and longitudinal data of 77 Chinese nanoscientists' publications, this study aims to differentiate individual effects from the effect of international collaboration on the research performance of Chinese researchers. The study finds evidence in support of the "birds of a feather flock together" argument - that China's best scientists collaborate at international level. It also finds that collaboration across national boundaries has a consistently positive effect on China's nano research quality with a time-decaying pattern. Language turns out to be the most influential factor impacting the quality or visibility of Chinese nano research. Policy implications on research evaluation, human capital management, and public research and development allocation are also discussed in the end. (c) 2012 Elsevier Ltd. All rights reserved.
We introduce archetypal analysis as a tool to describe and categorize scientists. This approach identifies typical characteristics of extreme ('archetypal') values in a multivariate data set. These positive or negative contextual attributes can be allocated to each scientist under investigation. In our application, we use a sample of seven bibliometric indicators for 29,083 economists obtained from the RePEc database and identify six archetypes. These are mainly characterized by ratios of published work and citations. We discuss applications and limitations of this approach. Finally, we assign relative shares of the identified archetypes to each economist in our sample. (c) 2012 Elsevier Ltd. All rights reserved.
Since its introduction, the Journal Impact Factor has probably been the most extensively adopted bibliometric indicator. Notwithstanding its well-known strengths and limits, it is still widely misused as a tool for evaluation, well beyond the purposes it was intended for. In order to shed further light on its nature, the present work studies how the correlation between the Journal Impact Factor and the (time-weighed) article Mean Received Citations (intended as a measure of journal performance) has evolved through time. It focuses on a sample of hard sciences and social sciences journals from the 1999 to 2010 time period. Correlation coefficients (Pearson's Coefficients as well as Spearman's Coefficients and Kendall's tau(alpha)) are calculated and then tested against several null hypotheses. The results show that in most cases Journal Impact Factors and their yearly variations do not display a strong correlation with citedness. Differences also exist among scientific areas. (c) 2012 Elsevier Ltd. All rights reserved.
The patterns of scientific collaboration have been frequently investigated in terms of complex networks without reference to time evolution. In the present work, we derive collaborative networks (from the arXiv repository) parameterized along time. By defining the concept of affine group, we identify several interesting trends in scientific collaboration, including the fact that the average size of the affine groups grows exponentially, while the number of authors increases as a power law. We were therefore able to identify, through extrapolation, the possible date when a single affine group is expected to emerge. Characteristic collaboration patterns were identified for each researcher, and their analysis revealed that larger affine groups tend to be less stable. (c) 2012 Elsevier Ltd. All rights reserved.
The arbitrariness of the h-index becomes evident, when one requires q x h instead of h citations as the threshold for the definition of the index, thus changing the size of the core of the most influential publications of a dataset. I analyze the citation records of 26 physicists in order to determine how much the prefactor q influences the ranking. Likewise, the arbitrariness of the highly-cited-publications indicator is due to the threshold value, given either as an absolute number of citations or as a percentage of highly cited papers. The analysis of the 26 citation records shows that the changes in the rankings in dependence on these thresholds are rather large and comparable with the respective changes for the h-index. (c) 2013 Elsevier Ltd. All rights reserved.
The minimum configuration to have a h-index equal to h is h papers each having h citations, hence h(2) citations in total. To increase the h-index to h + 1 we minimally need (h + 1)(2) citations, an increment of I-1(h) = 2h + 1. The latter number increases with 2 per unit increase of h. This increment of the second order is denoted I-2(h) =2. If we define I-1 and I-2 for a general Hirsch configuration (say n papers each having f(n) citations) we calculate I-1(f) and I-2(f) similarly as for the h-index. We characterize all functions f for which I-2(f) = 2 and show that this can be obtained for functions f(n) different from the h-index. We show that f(n) = n (i.e. the h-index) if and only if I-2(f) = 2, f(1) = 1 and f(2) = 2. We give a similar characterization for the threshold index (where n papers have a constant number C of citations). Here we deal with second order increments I-2(f) = 0. (c) 2013 Elsevier Ltd. All rights reserved.
The purpose of this paper is to analyse and describe the topological properties of the institutional and national collaboration network from the profiles extracted from Google Scholar Citations (GSC). 19,912 unique profiles with "co-authors" were obtained from a web crawl performed in March 2012. Several statistical and network analysis techniques were used to map and analyse these collaboration relationships at the country and institution level. Results show that The United States dominates the world scientific map and that every research institution is grouped by national, geographical and cultural criteria. A clustering phenomenon based on the self-similarity and fractal properties of scale-free networks is also observed. We conclude that GSC is a suitable tool for collaboration studies only at macro level between countries and institutions. (c) 2013 Elsevier Ltd. All rights reserved.
This study describes the meaning of and the formula for S-index, which is a novel evaluation index based on the number of citations of each article in a particular journal and the rank of the article according to the number of citations. This study compares S-index with Impact Factor (IF), which is the most well-known evaluation index, using the Korea Citation Index data. It is shown that S-index is positively correlated with the number of articles published in a journal. Tapered h-index (h(T)-index), which is based on all articles of a journal like S-index, is compared with S-index. It is shown that there is a very strong positive correlation between S-index and h(T)-index. Although S-index is similar to h(T-)index, S-index has a slightly better differentiating power and ranks the journal with evenly cited articles higher. (c) 2013 Elsevier Ltd. All rights reserved.
In this paper we provide the reader with a visual representation of relationships among the impact of book chapters indexed in the Book Citation Index using information gain values and published by different academic publishers in specific disciplines. The impact of book chapters can be characterized statistically by citations histograms. For instance, we can compute the probability of occurrence of book chapters with a number of citations in different intervals for each academic publisher. We predict the similarity between two citation histograms based on the amount of relative information between such characterizations. We observe that the citation patterns of book chapters follow a Lotkaian distribution. This paper describes the structure of the Book Citation Index using 'heliocentric clockwise maps' which allow the reader not only to determine the grade of similarity of a given academic publisher indexed in the Book Citation Index with a specific discipline according to their citation distribution, but also to easily observe the general structure of a discipline, identifying the publishers with higher impact and output. (c) 2013 Elsevier Ltd. All rights reserved.
This paper addresses emerging trends in the collective dynamics found in knowledge networks, those networks composed of the relationships among knowledge sources, such as citation networks and keyword networks. In studying the formation and detection of new trends in the process of knowledge evolution, we use the collective dynamics approach to construct a network of knowledge clusters based on citation clustering. This approach explores the processes and rules of new trends emerging in knowledge clusters by examining the continuous changes in keyword vectors found in the interaction and coordination between evolving knowledge clusters. In direct citation networks, the collective dynamics approach is found to be superior to the baseline method, especially in predicting small knowledge fields with less data and more uncertainties. (c) 2013 Elsevier Ltd. All rights reserved.
The analysis of research collaboration by field is traditionally conducted beginning with the classification of the publications from the context of interest. In this work we propose an alternative approach based on the classification of the authors by field. The proposed method is more precise if the intended use is to provide a benchmark for the evaluation of individual propensity to collaborate. In the current study we apply the new methodology to all Italian university researchers in the hard sciences, measuring the propensity to collaborate for the various fields: in general, and specifically with intramural colleagues, extramural domestic and extramural foreign organizations. Using a simulation, we show that the results present substantial differences from those obtained through application of traditional approaches. (c) 2013 Elsevier Ltd. All rights reserved.
We propose a cross-field evaluation method for the publications of research institutes. With this approach, we first determine a set of the most visible publications (MVPs) for each field from the publications of all assessed institutes according to the field's h-index. Then, we measure an institute's production in each field by its percentage share (i.e., contribution) to the field's MVPs. Finally, we obtain an institute's cross-field production measure as the average of its contributions to all fields. The proposed approach is proven empirically to be reasonable, intuitive to understand, and uniformly applicable to various sets of institutes and fields of different publication and citation patterns. The field and cross-field production measures obtained by the proposed approach not only allow linear ranking of institutes, but also reveal the degree of their production difference. (c) 2013 Elsevier Ltd. All rights reserved.
There is an overall perception of increased interdisciplinarity in science, but this is difficult to confirm quantitatively owing to the lack of adequate methods to evaluate subjective phenomena. This is no different from the difficulties in establishing quantitative relationships in human and social sciences. In this paper we quantified the interdisciplinarity of scientific journals and science fields by using an entropy measurement based on the diversity of the subject categories of journals citing a specific journal. The methodology consisted in building citation networks using the Journal Citation Reports (R) database, in which the nodes were journals and edges were established based on citations among journals. The overall network for the 11-year period (1999-2009) studied was small-world and followed a power-law with exponential cutoff distribution with regard to the in-strength. Upon visualizing the network topology an overall structure of the various science fields could be inferred, especially their interconnections. We confirmed quantitatively that science fields are becoming increasingly interdisciplinary, with the degree of interdisplinarity (i.e. entropy) correlating strongly with the in-strength of journals and with the impact factor. (c) 2013 Elsevier Ltd. All rights reserved.
Meta-analysis refers to the statistical methods used in research synthesis for combining and integrating results from individual studies. In this regard meta-analytical studies share with narrative reviews the goal of synthesizing the scientific literature on a particular topic, while as in the case of standard articles they present new results. This study aims to identify the potential similarities and differences between meta-analytical studies, reviews and standard articles as regards their impact and structural features in the field of psychology. To this end a random sample of 335 examples of each type of document were selected from the Thomson Reuters Web of Science database. The results showed that meta-analytical studies receive more citations than do both reviews and standard articles. All three types of documents showed a similar pattern in terms of institutional collaboration, while reviews and meta-analytical studies had a similar number of authors per document. However, reviews had a greater number of references and pages than did meta-analytical studies. The implications of these results for the scientific community are discussed. (c) 2013 Elsevier Ltd. All rights reserved.
The distributions of citations L, two- (IF2) and five-year impact factors (IF5), and citation half-lives lambda of journals published in different selected countries are analyzed using Langmuir-type relation: y(n) = y(0) {1 - alpha Kn/(1 + Kn)}, where y(n) denotes L-n, IF2(n) or IF5(n) of n-ranked journal, y(0) is the value of y(n) when journal rank n = 0, alpha is an empirical effectiveness parameter, and K is the Langmuir constant. It was found that: (1) the general features of the distribution of L-n, IF2(n) or IF5(n) of the journals published in different individual countries are similar to the results obtained before by the author from the analysis of the citation distribution data of papers of individual authors (K. Sangwal,Journal of Informetrics 7 (2013) 36-49), (2) in contrast to the theoretically expected value of the effectiveness parameter alpha = 1, the calculated values of alpha > 1 for journals published in different countries, (3) the trends of the distribution of cited half-lives lambda(n) of journals differ from those of L-n, IF2(n) and IF5(n) data for different countries, and show one, two or three linear regions, the longest linear regions with low slopes are observed in the case of countries publishing relatively high number of journals, and (4) the product of the Langmuir constant K and the number N of journals for the processes of citations and two- and five-year impact factors of journals published in different countries is constant for a process. The results suggest that: (1) the values of alpha > 1 are associated with a process that retards the generation of items (i.e. citations or impact factors), the difference (alpha - 1) being related to the dissemination of contents of the journals published by a country, and (2) the constancy of KN is related to the publication potential of a country. (c) 2013 Elsevier Ltd. All rights reserved.
In this paper we attempt to assess the impact of journals in the field of forestry, in terms of bibliometric data, by providing an evaluation of forestry journals based on data envelopment analysis (DEA). In addition, based on the results of the conducted analysis, we provide suggestions for improving the impact of the journals in terms of widely accepted measures of journal citation impact, such as the journal impact factor (IF) and the journal h-index. More specifically, by modifying certain inputs associated with the productivity of forestry journals, we have illustrated how this method could be utilized to raise their efficiency, which in terms of research impact can then be translated into an increase of their bibliometric indices, such as the h-index, IF or eigenfactor score. (c) 2013 Elsevier Ltd. All rights reserved.
This paper applies the Ijiri-Simon test for systematic deviations from Gibrat's law to citation numbers of economists. It is found that often-cited researchers attract new citation numbers that are disproportionate to the quality of their work. It is also found that this Matthew effect is stronger for economists who started their academic career earlier. (c) 2013 Elsevier Ltd. All rights reserved.
The evaluation of performance at the individual level is of fundamental importance in informing management decisions. The literature provides various indicators and types of measures, however a problem that is still unresolved and little addressed is how to compare the performance of researchers working in different fields (apples to oranges). In this work we propose a solution, testing various scaling factors for the distributions of research productivity in 174 scientific fields. The analysis is based on the observation of scientific production by all Italian university researchers active in the hard sciences over the period 2004-2008, as indexed by the Web of Science. The most effective scaling factor is the average of the productivity distribution of researchers with productivity above zero. (c) 2013 Elsevier Ltd. All rights reserved.
Variables subject to an order restriction, for instance Y <= X, have a bivariate distribution over a non-rectangular joint domain that entails a non-null and potentially large structural relation even if the variables show no association (in the sense that particular ranges of values of X do not co-occur with particular ranges of values of Y). Order restrictions affect a number of scientometric indices (including the h index and its variants) that are routinely subjected to correlational analyses to assess whether they provide redundant information, but these correlations are contaminated by the structural relation. This paper proposes an alternative definition of association between variables subject to an order restriction that eliminates their structural relation and reverts to the conventional definition when applied to variables that are not subject to order restrictions. This alternative definition is illustrated in a number of theoretical cases and it is also applied to empirical data involving scientometric indices subject to an order restriction. A test statistic is also derived which allows testing for the significance of an association between variables subject to an order restriction. (c) 2013 Elsevier Ltd. All rights reserved.
The definition of the g-index is as arbitrary as that of the h-index, because the threshold number g(2) of citations to the g most cited papers can be modified by a prefactor at one's discretion, thus taking into account more or less of the highly cited publications within a dataset. In a case study I investigate the citation records of 26 physicists and show that the prefactor influences the ranking in terms of the generalized g-index less than for the generalized h-index. I propose specifically a prefactor of 2 for the g-index, because then the resulting values are of the same order of magnitude as for the common h-index. In this way one can avoid the disadvantage of the original g-index, namely that the values are usually substantially larger than for the h-index and thus the precision problem is substantially larger; while the advantages of the g-index over the h-index are kept. Like for the generalized h-index, also for the generalized g-index different prefactors might be more useful for investigations which concentrate only on top scientists with high citation frequencies or on junior researchers with small numbers of citations. (c) 2013 Elsevier Ltd. All rights reserved.
Evaluative bibliometrics is concerned with comparing research units by using statistical procedures. According to Williams (2012) an empirical study should be concerned with the substantive and practical significance of the findings as well as the sign and statistical significance of effects. In this study we will explain what adjusted predictions and marginal effects are and how useful they are for institutional evaluative bibliometrics. As an illustration, we will calculate a regression model using publications (and citation data) produced by four universities in German-speaking countries from 1980 to 2010. We will show how these predictions and effects can be estimated and plotted, and how this makes it far easier to get a practical feel for the substantive meaning of results in evaluative bibliometric studies. An added benefit of this approach is that it makes it far easier to explain results obtained via sophisticated statistical techniques to a broader and sometimes non-technical audience. We will focus particularly on Average Adjusted Predictions (AAPs), Average Marginal Effects (AMEs), Adjusted Predictions at Representative Values (APRVs) and Marginal Effects at Representative Values (MERVs). (c) 2013 Elsevier Ltd. All rights reserved.
This paper puts forward a quantitative approach aimed at the understanding of the evolutionary paths of change of emerging nanotechnological innovation systems. The empirical case of the newly emerging zinc oxide one-dimensional nanostructures is used. In line with other authors, 'problems' are visualized as those aspects guiding the dynamics of innovation systems. It is argued that the types of problems confronted by an innovation system, and in turn its dynamics of change, are imprinted on the nature of the underlying knowledge bases. The latter is operationalized through the construction of co-citation networks from scientific publications. We endow these co-citation networks with directionality through the allocation of a particular problem, drawn from a 'problem space' for nanomaterials, to each network node. By analyzing the longitudinal, structural and cognitive changes undergone by these problem-attached networks, we attempt to infer the nature of the paths of change of emerging nanotechnological innovation systems. Overall, our results stress the evolutionary mechanisms underlying change in a specific N&N subfield. It is observed that the latter may exert significant influence on the innovative potentials of nanomaterials.
Universities' online seats have gradually become complex systems of dynamic information where all their institutions and services are linked and potentially accessible. These online seats now constitute a central node around which universities construct and document their main activities and services. This information can be quantitative measured by cybermetric techniques in order to design university web rankings, taking the university as a global reference unit. However, previous research into web subunits shows that it is possible to carry out systemic web analyses, which open up the possibility of carrying out studies which address university diversity, necessary for both describing the university in greater detail and for establishing comparable ranking units. To address this issue, a multilevel university cybermetric analysis model is proposed, based on parts (core and satellite), levels (institutional and external) and sublevels (contour and internal), providing a deeper analysis of institutions. Finally the model is integrated into another which is independent of the technique used, and applied by analysing Harvard University as an example of use.
Using bibliometric methods, we investigate China's international scientific collaboration from three levels of collaborating countries, institutions and individuals. We design a database in SQL Server, and make analysis of Chinese SCI papers based on the corresponding author field. We find that China's international scientific collaboration is focused on a handful of countries. Nearly 95 % international co-authored papers are collaborated with only 20 countries, among which the USA account for more than 40 % of all. Results also show that Chinese lineage in the international co-authorship is obvious, which means Chinese immigrant scientists are playing an important role in China's international scientific collaboration, especially in English-speaking countries.
Rather than "measuring" a scientist impact through the number of citations which his/her published work can have generated, isn't it more appropriate to consider his/her value through his/her scientific network performance illustrated by his/her co-author role, thus focussing on his/her joint publications, and their impact through citations? Whence, on one hand, this paper very briefly examines bibliometric laws, like the h-index and subsequent debate about co-authorship effects, but on the other hand, proposes a measure of collaborative work through a new index. Based on data about the publication output of a specific research group, a new bibliometric law is found. Let a co-author C have written J (joint) publications with one or several colleagues. Rank all the co-authors of that individual according to their number of joint publications, giving a rank r to each co-author, starting with r = 1 for the most prolific. It is empirically found that a very simple relationship holds between the number of joint publications J by coauthors and their rank of importance, i.e., J ae 1/r. Thereafter, in the same spirit as for the Hirsch core, one can define a "co-author core", and introduce indices operating on an author. It is emphasized that the new index has a quite different (philosophical) perspective that the h-index. In the present case, one focusses on "relevant" persons rather than on "relevant" publications. Although the numerical discussion is based on one "main author" case, and two "control" cases, there is little doubt that the law can be verified in many other situations. Therefore, variants and generalizations could be later produced in order to quantify co-author roles, in a temporary or long lasting stable team(s), and lead to criteria about funding, career measurements or even induce career strategies.
We investigate the impact of collaborative research in academic Finance literature to find out whether and to what extent collaboration leads to higher impact articles (6,667 articles across 2001-2007 extracted from the Web of Science). Using the top 5 % as ranked by the 4-year citation counts following publication, we also follow related secondary research questions such as the relationships between article impact and author impact; collaboration and average author impact of an article; and, the nature of geographic collaboration. Key findings indicate: collaboration does lead to articles of higher impact but there is no significant marginal value for collaboration beyond three authors; high impact articles are not monopolized by high impact authors; collaboration and the average author impact of high-impact articles are positively associated, where collaborative articles have a higher mean author impact in comparison to single-author articles; and collaboration among the authors of high impact articles is mostly cross-institutional.
Science studies have not yet provided a conceptual scheme that distinguishes creative accomplishments from other research contributions. Likewise, there is no commonly agreed typology capturing all important manifestations of innovative science. This article takes up these two desiderata. We argue that scientific creativity springs from the fundamental tension between originality and scientific relevance. Based on this consideration, we introduce a conceptual scheme that singles out creative research accomplishments from other contributions in science. Furthermore, this paper shows that creative contributions are not only advances in theory but also new methods, new empirical phenomena, and the development of new research instrumentation. For illustrative purposes, the article introduces examples from science history and presents results from bibliometric studies.
We analyzed the productivity and visibility of publications on the subject category of Clinical Neurology by countries in the period 2000-2009. We used the Science Citation Index Expanded database of the ISI Web of Knowledge. The analysis was restricted to the citable documents. Bibliometric indicators included the number of publications, the number of citations, the median and interquartile range of the citations, and the h-index. We identified 170,483 publications (84.9 % original articles) with a relative increase of 28.5 % throughout the decade. Fourteen countries published over 2,000 documents in the decade and received more than 50,000 citations. The average of citations received per publication was 8 (interquartile range: 3-20) and the h-index was 261. USA was the country with the highest number of publications, followed by Germany, Japan, the UK and Italy. Moreover, USA publications had the largest number of citations received (44.5 % of total), followed by the UK, Germany, Canada, and Italy. On the other hand, Sweden, the Netherlands and the UK had the highest median citations for their total publications. During the period 2000-2009 there was a significant increase in Clinical Neurology publications. Most of the publications and citations comprised 14 countries, with the USA in the first position. Interestingly, most of the publications and citations originated from only 14 countries, with European countries with relatively low population, such as Switzerland, Austria, Sweden, Belgium, and the Netherlands, in this top group.
The correct attribution of scientific publications to their true owners is extremely important, considering the detailed evaluation processes and the future investments based upon them. This attribution is a hard job for bibliometricians because of the increasing amount of documents and the raise of collaboration. Nevertheless, there is no published work with a comprehensive solution of the problem. This article introduces a procedure for the detailed identification and normalisation of addresses to facilitate the correct allocation of the scientific production included in databases. Thanks to our long experience in the manual normalisation of addresses, we have created and maintained various master lists. We have already developed an application to detect institutional sectors (issued in a previous paper) and now we analyse the details of particular institutions, taking advantage of our master tables. To test our methodology we have implemented it in a Spanish data set already manually codified (95,314 unique addresses included in the year 2008 on the Web of Science databases). This data was analysed with a full text search against our master lists, giving optional codes for each address and choosing which one could be automatically encoded and which one should be reviewed manually. The results of the implementation, comparing the automatic versus manual codes, showed 87 % automatically codified records with 1.9 % of error. We should review manually only 13 %. Finally, we applied the Wilcoxon non-parametric test to show the validity of the methodology, comparing detailed codes of centres already encoded with the automatically encoded ones, and concluding that their distribution was similar with a significance of 0.078.
This study adopts a bibliometric approach to analyze the progress in global parallel computing research from the related literature in the Science Citation Index Expanded database from 1958 to 2011. By investigating the characteristics of annual publication outputs, we find that parallel computing has recently experienced increasing attention again after its first rapid development in the 1990s, and the research in this field is entering into a new phase. The distribution of publications indicates that the seven major industrial countries (G7), with USA ranking top, are identified as the most productive and influential countries in this domain. Author keywords were analyzed by comparison, and we conclude that the study focus of parallel computing has shifted from hardware to software, with parallel application and programming based on MPI, GPUs and multicores being the research tendencies; grid computing and cloud computing dominate the distributed computing area due to their heterogeneous and scalable structures; and, furthermore, the processors of parallel machines are heading for a diverse development. The citing-cited matrix brings into light the intense interactions among the disciplines of computer science, engineering, mathematics and physics. The mutual interactions between the four disciplines have increased gradually and reflect the subject characteristics in influence content.
This research explores the structure and status of theories used in Communication as an alternative for Communication discipline identity research and characteristics evaluation. This research assumes that communication theories are not only ongoing practices of intellectual communities, but also discourse about how theory can address a range of channels, transcend specific technologies and bridge levels of analysis. It examines widely-cited theoretical contentions among academic articles and the connections among these theories. Network analysis suggests that framing theory is the most influential of the identified theories (ranking first in frequency and degree, closeness, betweenness and eigenvector centrality) and serves to link other communication theories and theory groups. While mass communication and technology theories exhibited the highest centrality, interpersonal, persuasion and organization communication theories were grouped together, integrating sub-theories of each group. Framing theory was the most popular and influential communication theory bridging not only mass communication theories, but also interpersonal, technology, information system, health, gender, inter-cultural and organizational communication theories.
Traditional bibliometric indicators are considered too limited for some research areas such as humanities and social sciences because they mostly reveal a specific aspect of academic performance (quantity of publications) and tend to ignore a significant part of research production. The frequent misuses (e.g. improper generalizations) of bibliometric measures results in a substantial part of the research community failing to consider the exact nature of bibliometric measures. This study investigates the links between practices for assessing academic performance, bibliometric methods' use and underlying values of research quality within the scientific community of University of Lausanne, Switzerland. Findings reveal four researcher profiles depending on research orientations and goals, ranging from those using "pure" quantitative tools to those using more subjective and personal techniques. Each profile is characterized according to disciplinary affiliation, tenure, academic function as well as commitment to quality values.
This study investigates the citation patterns of theoretical and empirical papers published in a top economics journal, namely American Economic Review, over a period of almost 30 years, while also exploring the determinants of citation success. The results indicate that empirical papers attract more citation success than theoretical studies. However, the pattern over time is very similar. Moreover, among empirical papers it appears that the cross-country studies are more successful than single country studies focusing on North America data or other regions.
Along with China's economic emergence is a controversy over the quality and international visibility of citation index publications. This study uses bibliometric statistics to shed further light on the global landscape of citation index publications with special focus on China and the USA. The analysis explores 31 years of the TRS (Thomson Reuters Scientific) database, spanning the 1980-2010 period. Based on this study, the USA maintains global dominance for both WOK (Web of Knowledge) and WOS (Web of Science) TRS publications. Although China ranks a distant second for WOK, it lags behind five other nations for WOS publications. China's scientific base needs further restructuring for greater global visibility. Emerging economies such as China, India, Brazil and South Africa are fast rising in the global ranks for WOK/WOS publications. China may already be leading the world in some publication attributes, although it could take several more decades to catch up with the USA in others. Normalizations of the publications with population, PTE (population with tertiary education) and GDP (gross domestic product) put small/low-population countries in the global lead. However, countries such as Canada, Greenland, Iceland and Sweden still rank high for most of these publication attributes. Furthermore, WOS per WOK analysis shows that small and/or economically weak countries place greater emphasis on WOS publications. This is particularly visible for countries in Africa and South America. Despite the addition of a large number of indigenous Chinese journals to the TRS database, prediction analysis suggests that China's desire to surpass the USA could be delayed for several decades. In the race for the next-generation scientific superpower, however, China not only needs to sustain substantial investments in research and development, but also requires restructuring of its research industry. This is especially critical for data readiness, availability and accessibility to the scientific community, and radical implementations of research recommendations.
This paper reports first results on the interplay of different levels of the science system. Specifically, we would like to understand if and how collaborations at the author (micro) level impact collaboration patterns among institutions (meso) and countries (macro). All 2,541 papers (articles, proceedings papers, and reviews) published in the international journal Scientometrics from 1978-2010 are analyzed and visualized across the different levels and the evolving collaboration networks are animated over time. Studying the three levels in isolation we gain a number of insights: (1) USA, Belgium, and England dominated the publications in Scientometrics throughout the 33-year period, while the Netherlands and Spain were the subdominant countries; (2) the number of institutions and authors increased over time, yet the average number of papers per institution grew slowly and the average number of papers per author decreased in recent years; (3) a few key institutions, including Univ Sussex, KHBO, Katholieke Univ Leuven, Hungarian Acad Sci, and Leiden Univ, have a high centrality and betweenness, acting as gatekeepers in the collaboration network; (4) early key authors (Lancaster FW, Braun T, Courtial JP, Narin F, or VanRaan AFJ) have been replaced by current prolific authors (such as Rousseau R or Moed HF). Comparing results across the three levels reveals that results from one level might propagate to the next level, e.g., top rankings of a few key single authors can not only have a major impact on the ranking of their institution but also lead to a dominance of their country at the country level; movement of prolific authors among institutions can lead to major structural changes in the institution networks. To our knowledge, this is the most comprehensive and the only multi-level study of Scientometrics conducted to date.
We examine the international scientific productivity on information literacy since its inception in 1974 until late 2011, based on a bibliometric analysis of scientific articles included in the web of science and Scopus databases. The sample comprised two macro-domains-the most productive and the least productive. The former was the area of social sciences (SoS), covering such disciplines as information and documentation, communication, education, management, etc. The latter was the area of health sciences (HeS), covering such disciplines as medicine, nursing, etc. The objective of the study was to analyse the evolution of research activity during this period, taking into account the authors' production, the distribution and co-authorship of the works, the affiliation, and the most frequently used journals. A quantitative and qualitative methodological approach was taken, based on statistical, mathematical, and content analyses. The results showed exponential growth of the scientific publications in both domains (R (2) = 0.9544 for SoS, and R (2) = 0.9393 for HeS), with a predominance of Anglo-Saxon authors. Author productivity was low (1.29 and 1.12 papers/author), while the dispersion of articles by journal averaged 4.96 in SoS and 1.86 in HeS. Scientific collaboration exceeded 53 % in the SoS domain and 69 % in HeS. There was a major dispersion of the places of the authors' affiliation. In both domains, the author distributions fitted Lotka's law, and the journal distributions Bradford's Law.
The citation analysis of the research output of the German economic research institutes presented here is based on publications in peer-reviewed journals listed in the Social Science Citation Index for the 2000-2009 period. The novel feature of the paper is that a count data model quantifies the determinants of citation success and simulates their citation potential. Among the determinants of the number of cites the quality of the publication outlet exhibits a strong positive effect. The same effect has the number of the published pages, but journals with size limits also yield more cites. Field journals get less citations in comparison to general journals. Controlling for journal quality, the number of co-authors of a paper has no effect, but it is positive when co-authors are located outside the own institution. We find that the potential citations predicted by our best model lead to different rankings across the institutes than current citations indicating structural change.
We compared scientific indicators related to Benin, Senegal and Ghana. We collected data from Web of Science and used bibliometric indicators like annual production, language and type of publication, citable and cited documents, citations, h-index, field share, specialization index, and international collaboration rate. Results show that Benin performs well regarding the percentage of citable and cited documents, the share of production and the specialization index in the fields of Natural sciences and Agricultural sciences; it occupies the median position with respect to the production and the specialization index in the fields of Engineering and technology on the one hand and Medical and health sciences on the other hand, behind Ghana and ahead Senegal. It lays however behind Ghana and Senegal with respect to the total output, citations per citable or cited documents, h-index, the share of production and specialization index in the fields of Social science and Humanities; it has the highest international collaboration rate. The study revealed that the three countries cooperated less, and only if a third western country intervened. It pointed out the role of Western countries in driving collaboration among developing countries.
The seeking of evidence for revealing the research performance of Education in Taiwan, in response to the stimulus by the national research projects, is presented and interpreted. More than 70,000 publication records over the years 1990-2011 from Web of Science were downloaded and analyzed. The overview analysis by data aggregation and country ranking shows that Taiwan has significantly improved its publication productivity and citation impact over the last decade. The drill-down analysis based on journal bibliographic coupling, information visualization, and diversity and trend indexes, reveals that e-Learning and Science Education are two fast growing subfields that attract global interests and that Taiwan is among the top-ranked countries in these two fields in terms of research productivity. Implications of the analysis are discussed with an emphasis on the subfield characteristics from which more insightful interpretations can be obtained, such as the regional or cultural characteristics that may affect the performance ranking.
A citation advantage for research covered by the mass media is a plausible, but poorly studied phenomenon. Two previous studies, both conducted in the United States, found a positive correlation between media reporting and citations. Only one of these studies was able to conclude that the correlation was caused by a real "publicity effect" rather than by the media highlighting papers that are intrinsically destined to have greater scientific impact (called the 'earmark' hypothesis). This study assessed the relative importance of the publicity effect outside the US, by comparing studies published in 2008 and 2009 in the Proceedings of the National Academy of Sciences that had been featured in newspapers in Italy and the United Kingdom. Newspapers in the two countries covered a similar range of topics, and tended to over-represent local (national) research. Compared to studies not appearing in any of the newspapers considered, those featured in British newspapers had around 63 % more citations, whilst in Italian newspapers 16 %. The proportion of citations from Italian authors, however, was significantly increased by newspapers, particularly by those in Italian. The equivalent effect on citations from the UK was smaller and only marginally significant. Studies accompanied by a press release did not receive, overall, significantly more citations. In sum, results suggest that the publicity effect is strongest for English-speaking media, whilst non-English reporting has mostly a local influence. These effects might represent a confounding factor in citation-based research assessment and might contribute to the many biases known to affect the scientific literature.
As all databases, the bibliometric ones (e.g. Scopus, Web of Knowledge and Google Scholar) are not exempt from errors, such as missing or wrong records, which may obviously affect publication/citation statistics and-more in general-the resulting bibliometric indicators. This paper tries to answer to the question "What is the effect of database uncertainty on the evaluation of the h-index?", breaking the paradigm of deterministic database analysis and treating responses to database queries as random variables. Precisely an informetric model of the h-index is used to quantify the variability of this indicator with respect to the variability stemming from errors in database records. Some preliminary results are presented and discussed.
We explore pilot web-based methods to probe the strategies followed by new small and medium-sized technology-based firms as they seek to commercialize emerging technologies. Tracking and understanding the behavior of such early commercial entrants is not straightforward because smaller firms with limited resources do not always widely engage in readily visible and accessible activities such as publishing and patenting. However, many new firms, even if small, present information about themselves that is available online. Focusing on the early commercialization of novel graphene technologies, we introduce a "web scraping" approach to systematically capture information contained in the online web pages of a sample of small and medium-sized high technology graphene firms in the US, UK, and China. We analyze this information and devise measures that gauge how firm specialization in the target technology impacts overall market orientation. Three groups of graphene enterprises are identified which vary by their focus on product development, materials development, and integration into existing product portfolios. Country-level factors are important in understanding these early diverging commercial approaches in the nascent graphene market. We consider management and policy implications of our findings, and discuss the value, including strengths and weaknesses, of web scraping as an additional information source on enterprise strategies in emerging technologies.
Schubert introduced the partnership ability phi-index relying on a researcher's number of co-authors and collaboration rate. As a Hirsch-type index, phi was expected to be consistent with Schubert-Glanzel's model of h-index. Schubert demonstrated this relationship with the 34 awardees of the Hevesy medal in the field of nuclear and radiochemistry (r (2) = 0.8484). In this paper, we upscale this study by testing the phi-index on a million researchers in computer science. We found that the Schubert-Glanzel's model correlates with the million empirical phi values (r (2) = 0.8695). In addition, machine learning through symbolic regression produces models whose accuracy does not exceed a 6.1 % gain (r (2) = 0.9227). These results suggest that the Schubert-Glanzel's model of phi-index is accurate and robust on the domain-wide bibliographic dataset of computer science.
In the era of the fast-paced knowledge economy, patent data may be analyzed to measure technological competitiveness. This paper aims to explore patent performance by indicators and technology interactions based on patent citation of assignee types. This study involved four types of patent assignees (i.e. universities, industries, governments, and individuals) in five technological fields (i.e. computers and communications; drugs and medical; electrical and electronics; chemical; and mechanical) over three periods (i.e. 1997-2001, 2002-2006, and 2007-2011). Four indicators were chosen for analysis of patent performance; they included, patent share, science linkage, current impact index, and citation density. The findings of this study show that among all four assignee types, industries had the highest patent productivity in all fields, and universities had the highest impact in all fields except for drugs and medical. Other interesting phenomena were also observed. Examples include reciprocal technology interactions between universities and governments; low technology interactions of industries in each field; individuals' higher patent performance and technology interactions in the field of drugs and medical.
This paper introduces a methodology for the construction of a country level patent value indicator based on the family size of a country's patent profile at the level of technology fields. Because individual family members target different markets and technologies have a different propensity to internationalization, family size has been shown to have a restricted power to assess the quality of patent profiles of countries. We address this gap by weighting the members of patent families filed at different patent offices before calculating the family size indicators, to account for the market potential in which the patents of these families were filed. We apply different weighting schemes and test which scheme is best able to explain the export performance of countries. In order to conduct our analyses, a panel dataset, consisting of annual data (1990-2002) on international trade from the UN-COMTRADE database and patent data from the "EPO Worldwide Patent Statistical Database" (PATSTAT), was compiled. Several bivariate analyses reveal that weighted and unweighted family counts are highly correlated, meaning that statistics based on absolute (weighted or unweighted) family counts are barely affected by the chosen weighting factor. This, however, is different when using the average family size, where weighting the family members by imports, as well as GDP, can be shown to have a robust positive effect to explaining export performance. The imports and the GDP weighted average family size are thus able to act as a consistent indicator of patent value at the country and technology field level.
The web is not only the main scholarly communication tool but also an important source of additional information about the individual researchers, their scientific and academic activities and their formally and informally published results. The aim of this study is to investigate whether successful scientists use their personal websites to disseminate their work and career details and to know which specific contents are provided on those sites, in order to check if they could be used in research evaluation. The presence of the highly cited researchers working at European institutions were analysed, a group clearly biased towards senior male researchers working in large countries (United Kingdom and Germany). Results show that about two thirds of them have a personal website, specially the scientists from Denmark, Israel and the United Kingdom. The most frequent disciplines in those websites are economics, mathematics, computer sciences and space sciences, which probably reflect the success of open access subject repositories like RepEc, Arxiv or CiteSeerX. Other pieces of information analysed from the websites include personal and contact data, past experience and description of expertise, current activities and lists of the author's scientific papers. Indicators derived from most of these items can be used for developing a portfolio with evaluation purposes, but the overall availability of them in the population analysed is not representative enough by now for achieving that objective. Reasons for that insufficient coverage and suggestions for improvement are discussed.
This paper presents a relatively simple, objective and repeatable method for selecting sets of patents that are representative of a specific technological domain. The methodology consists of using search terms to locate the most representative international and US patent classes and determines the overlap of those classes to arrive at the final set of patents. Five different technological fields (computed tomography, solar photovoltaics, wind turbines, electric capacitors, electrochemical batteries) are used to test and demonstrate the proposed method. Comparison against traditional keyword searches and individual patent class searches shows that the method presented in this paper can find a set of patents with more relevance and completeness and no more effort than the other two methods. Follow on procedures to potentially improve the relevancy and completeness for specific domains are also defined and demonstrated. The method is compared to an expertly selected set of patents for an economic domain, and is shown to not be a suitable replacement for that particular use case. The paper also considers potential uses for this methodology and the underlying techniques as well as limitations of the methodology.
Similarly to the h-index and other indicators, the success-index is a recent indicator that makes it possible to identify, among a general group of papers, those of greater citation impact. This indicator implements a field-normalization at the level of single paper and can therefore be applied to multidisciplinary groups of articles. Also, it is very practical for normalizations aimed at achieving the so-called size-independency. Thanks to these (and other) properties, this indicator is particularly versatile when evaluating the publication output of entire research institutions. This paper exemplifies the potential of the success-index by means of several practical applications, respectively: (i) comparison of groups of researchers within the same scientific field, but affiliated with different universities, (ii) comparison of different departments of the same university, and (iii) comparison of entire research institutions. A sensitivity analysis will highlight the success-index's robustness. Empirical results suggest that the success-index may be conveniently extended to large-scale assessments, i.e., involving a large number of researchers and research institutions.
This study primarily aims to reveal the worldwide patterns of authors' information scattering through illustrating the possible differences among authors based on subject, country, geographic region, institution, economic and scientific level factors. Second, changes in patterns of information scattering during the past 21 years are checked. Finally, a hypothesis aimed at demonstrating a probable relationship among the three research domains including information scattering, scholarly information-seeking behavior and scholarly journal usage is presented. 176,943 authors, who have more than ten papers in WoS from 1990 to 2010 were examined. The findings revealed that patterns of information scattering have changed during the past 21 years, and the number of journals in the core and middle zones has almost doubled. It was also found that authors tend to use a small number of journals to retrieve the majority of their required information, while a small amount of their information needs come from a wide variety of journals. However, with regard to patterns of information scattering, some differences exist among authors based on factors including institutions, countries and subject fields. In addition, this study shows that information-scattering patterns might be affected by scholars' information-seeking behaviors. A causal explanation of information scattering through scholarly information-seeking behavior has, without a doubt, the potential to provide practical solutions to better meet scholars' information needs and requirements.
This study explores a bibliometric approach to quantitatively assessing current research trends on solid waste, by using the related literature published between 1997 and 2011 in journals of all the subject categories of the Science Citation Index. The articles acquired from such literature were concentrated on the general analysis by publication type and language, characteristics of articles outputs, country, subject categories and journals, and the frequency of title-words and keywords used. Over the past 15 years, there had been a notable growth trend in publication outputs, along with more participation of countries/territories. The seven major industrialized countries (G7) published the majority of the world articles, while their article share was being replacing by other countries represented by BRIC countries. An analysis of the title-words, author keywords and keywords plus showed that municipal solid waste and sludge were the major research types of solid wastes and "anaerobic digestion", "wastewater" and "heavy metals" were recent major topics of solid waste research. Meanwhile, the analysis indicated the analysis technologies, represented by solid-phase extraction and tandem mass-spectrometry, were more and more widely used for solid waste research. Besides, life cycle assessment and health risk assessment were the most two frequently environmental assessment tools used for solid waste research in the 15-year research period.
This study aims to map the content and structure of the knowledge base of research on intercultural relations as revealed in co-citation networks of 30 years of scholarly publications. Source records for extracting co-citation information are retrieved from Web of Science (1980-2010) through comprehensive keyword search and filtered by manual semantic coding. Exploratory network and content analysis is conducted (1) to discover the development of major research themes and the relations between them over time; (2) to locate representative core publications (the stars) that are highly co-cited with others and those (the bridges) connecting more between rather than within subfields or disciplines. Structural analysis of the co-citation networks identifies a core cluster that contains foundational knowledge of this domain. It is well connected to almost all the other clusters and covers a wide range of subject categories. The evolutionary path of research themes shows trends moving towards (e.g. psychology and business and economics) and away from (e.g. language education and communication) the core cluster over time. Based on the results, a structural framework of the knowledge domain of intercultural relations research is proposed to represent thematic relatedness between topical groups and their relations.
Certain key questions in Scientometrics can only be answered by following a statistical approach. This paper illustrates this point for the following question: how similar are citation distributions with a fixed, common citation window for every science in a static context, and how similar are they when the citation process of a given cohort of papers is modeled in a dynamic context?.
Bioinformatics is a fast-growing, diverse research field that has recently gained much public attention. Even though there are several attempts to understand the field of bioinformatics by bibliometric analysis, the proposed approach in this paper is the first attempt at applying text mining techniques to a large set of full-text articles to detect the knowledge structure of the field. To this end, we use PubMed Central full-text articles for bibliometric analysis instead of relying on citation data provided in Web of Science. In particular, we develop text mining routines to build a custom-made citation database as a result of mining full-text. We present several interesting findings in this study. First, the majority of the papers published in the field of bioinformatics are not cited by others (63 % of papers received less than two citations). Second, there is a linear, consistent increase in the number of publications. Particularly year 2003 is the turning point in terms of publication growth. Third, most researches of bioinformatics are driven by USA-based institutes followed by European institutes. Fourth, the results of topic modeling and word co-occurrence analysis reveal that major topics focus more on biological aspects than on computational aspects of bioinformatics. However, the top 10 ranked articles identified by PageRank are more related to computational aspects. Fifth, visualization of author co-citation analysis indicates that researchers in molecular biology or genomics play a key role in connecting sub-disciplines of bioinformatics.
According to the articles related to remote sensing of SCI and SSCI databases during 1991-2010, this study evaluated the geographical influence of authors by the new index (geographical impact factor), and revealed the auctorial, institutional, national, and spatiotemporal patterns in remote sensing research. Remote sensing research went up significantly in the past two decades. Imaging science & photographic technology was the important subject category. International Journal of Remote Sensing was the top active journal. All authors were mainly concentrated in North America, Western Europe, and East Asia. Jackson TJ from USDA ARS was the most productive author, Coops NC from University of British Columbia had more high-quality articles, and Running SW from University of Montana carried the greatest geographical influence. The USA was the largest contributor in global remote sensing research with the most single-country and internationally collaborative articles, and the NASA was the most powerful research institute. The international cooperation of remote sensing research increased distinctly. Co-word analysis found the common remote sensing platform and sensors, revealed the widespread adoption of major technologies, and demonstrated keen interest in land cover/land use, vegetation, and climate change. Moreover, the remote sensing research was closely correlated with the satellite development.
Gold Open Access (=Open Access publishing) is for many the preferred route to achieve unrestricted and immediate access to research output. However, true Gold Open Access journals are still outnumbered by traditional journals. Moreover availability of Gold OA journals differs from discipline to discipline and often leaves scientists concerned about the impact of these existent titles. This study identified the current set of Gold Open Access journals featuring a Journal Impact Factor (JIF) by means of Ulrichsweb, Directory of Open Access Journals and Journal Citation Reports (JCR). The results were analyzed regarding disciplines, countries, quartiles of the JIF distribution in JCR and publishers. Furthermore the temporal impact evolution was studied for a Top 50 titles list (according to JIF) by means of Journal Impact Factor, SJR and SNIP in the time interval 2000-2010. The identified top Gold Open Access journals proved to be well-established and their impact is generally increasing for all the analyzed indicators. The majority of JCR-indexed OA journals can be assigned to Life Sciences and Medicine. The success-rate for JCR inclusion differs from country to country and is often inversely proportional to the number of national OA journal titles. Compiling a list of JCR-indexed OA journals is a cumbersome task that can only be achieved with non-Thomson Reuters data sources. A corresponding automated feature to produce current lists "on the fly" would be desirable in JCR in order to conveniently track the impact evolution of Gold OA journals.
The World Wide Web has become an important source of academic information. The linking feature of the Web has been used to study the structure of academic web, as well as the presence of academic and research institutes on the Web. In this paper, we propose an integrated model for exploring the subject macrostructure of a specific academic topic on the Web and automatically depicting the knowledge map that is closer to what a domain expert would expect. The model integrates a hyperlink-induced topic search (HITS)-based link network extending strategy and a semantic based clustering algorithm with the aid of co-link analysis and social network analysis (SNA) to discover subject-based communities in the academic web space. We selected to use websites as analytical units rather than web pages because of the subject stability of a website. Compared with traditional techniques in Webometrics and SNA that have been used for such analyses, our model has the advantages of working on open web space (capability to explore unknown web resources and identify important ones) and of automatically building an extendable and hierarchical web knowledge map. The experiment in the area of Information Retrieval shows the effectiveness of the integrated model in analyzing and portraying of subject clustering phenomenon in academic web space.
A recent critique of the use of journal impact factors (IF) by Vanclay noted imprecision and misuses of IF. However, the substantial alternatives he suggested offer no clear improvement over IF as a single measure of scholarly impact of a journal, leaving IF as not yet replaceable.
The institutionally independent publications of Tsinghua University and Peking University were compared by two main indicators namely peak-year citations per publication and h-index, based on the data extracted from the Science Citation Index Expanded, Web of Science from 1974 to 2011. Analyzed aspects covered total publication outputs, annual production, impact, authorships, Web of Science categories, journals, and most cited articles. Results shows that the two universities were in the same scale based on the peak-year citations per publication, the h-index, and top cited articles with no less than 100 citations. Publication of the top three most productive Web of Science categories differed between these two universities. Tsinghua University published more articles in applied science and engineering fields, while Peking University had more basic science articles. In addition, article life was applied to compare the impact of the most cited articles and single author articles of the two universities.
Current research performance assessment criteria contribute to some extent to author inflation per publication. Among various indicators for evaluating the quality of research with multiple authors, harmonic counting is relatively superior in terms of calculation, scientific ethics, and application. However, two important factors in harmonic counting are not yet clearly understood. These factors are the perceptions of scientists regarding the (1) corresponding author and (2) equally credited authors (ECAs). We carry out a survey investigation on different perceptions of author position versus contribution among medical researchers with different subfields and professional ranks in China, in order to provide several pieces of evidence on the aforementioned factors. We are surprised to find that researchers with different professional ranks tend to largely acknowledge their own contribution in collaborative research. Next, we conduct an empirical study to measure individual's citation impact using inflated counts versus harmonic counts. The results indicate that harmonic h-index cannot reflect the high peak of harmonic citations. Therefore, we use (1) harmonic R-index to differentiate authors based on the harmonic citations of each paper belonging to their respective h-cores; and (2) Normalization harmonic (h, R) index as a meaningful indicator in ranking scientists. Using a sample of 40 Ph.D. mentors in the field of cardiac and cardiovascular diseases, harmonic counts can distinguish between scientists who are often listed as major contributors and those regularly listed as co-authors. This method may also discourage unethical publication practices such as ghost authorship and gift authorship.
Citation frequencies and journal impact factors (JIFs) are being used more and more to assess the quality of research and allocate research resources. If these bibliometric indicators are not an adequate predictor of research quality, there could be severe negative consequences for research. To analyse to which extent citation frequencies and journal impact factors correlate with the methodological quality of clinical research articles included in an SBU systematic review of antibiotic prophylaxis in surgery. All 212 eligible original articles were extracted from the SBU systematic review "Antibiotic Prophylaxis in Surgery" and categorized according to their methodological rigourness as high, moderate or low quality articles. Median of citation frequencies and JIFs were compared between the methodological quality groups using Kruskal-Wallis non-parametric test. An in-depth study of low-quality studies with higher citation frequencies/JIFs was also conducted. No significant differences were found in median citation frequencies (p = 0.453) or JIFs (p = 0.185) between the three quality groups. Studies that had high citation frequencies/JIFs but were assessed as low-quality lacked control groups, had high dropout rates or low internal validity. This study of antibiotic prophylaxis in surgery does not support the hypothesis that bibliometric indicators are a valid instrument for assessing methodological quality in clinical trials. This is a worrying observation, since bibliometric indicators have a major influence on research funding. However, further studies in other areas are needed.
The "Jaccardized Czekanowski index", JCz, an indicator measuring the similarity between the cited and citing journal list of a given journal is proposed in the paper. It is shown that the indicator characterizes the network properties of individual journals and, in aggregated form, also that of subject categories or countries. For subject categories, JCz appears to be related to the multidisciplinarity of the category. For countries, the multinational or local character of the publishers seems to have determining role.
The external examiners of 166 Swedish and 168 Danish biomedical PhD-theses, the candidates and their supervisors were assessed bibliometrically. In Sweden, 43 % of examiners were from abroad, most commonly USA and UK, while in Denmark 39 % of theses were examined by an examiner from abroad, mostly from neighbouring Sweden. As many came from Canada as from Denmark to examine Swedish theses. Foreign examiners were more merited (based on number of publications) than domestic examiners. Foreign examiners examined significantly more men in Denmark and more women in Sweden. In the case of co-publication only after the examination, one Swedish and three Danish PhD-candidates published with their foreign external examiners 4-10 years after the examination and two Swedish and seven Danish supervisors co-published with foreign examiners, suggesting that although invitation of foreign external examiners only rarely lead to collaborative work it may be more common in Denmark than in Sweden. Co-publication between external examiners and the candidates or their supervisors was rarer in Sweden than in Denmark in the period surrounding the examination, although the numbers are small. The use of foreign external examiners stimulates academic intercourse in general and sets a useful benchmark for common PhD standards but does not markedly increase international collaboration after the examination.
This study attempts to examine systematically the growth trajectories of science, technology and science-based technologies of Japan and South Korea. Drawing upon the empirical materials and findings, this paper provides a detailed description of the evolution and pathways taken by Japan and South Korea to achieve growth in science and technology. Both the quantities (number of papers and patents) and impact (citations) measures of research activities are used to provide a coherent depiction of progress and development trajectories. Japan and South Korea achieved significant progress in production of science and technology. However, both economies experienced a sharp contraction in the number of citations per new patent since the mid 2000s. To address their structural systemic failure, Japan and South Korea have invested heavily in scientific areas that concord with the next wave of technological innovations. The effort has recorded positive effects on science-based technological growth trajectories.
Existing university rankings apply fixed and exogenous weights based on a theoretical framework, stakeholder or expert opinions. Fixed weights cannot embrace all requirements of a 'good ranking' according to the Berlin Principles. As the strengths of universities differ, the weights on the ranking should differ as well. This paper proposes a fully nonparametric methodology to rank universities. The methodology is in line with the Berlin Principles. It assigns to each university the weights that maximize (minimize) the impact of the criteria where university performs relatively well (poor). The method accounts for background characteristics among universities and evaluates which characteristics have an impact on the ranking. In particular, it accounts for the level of tuition fees, an English speaking environment, size, research or teaching orientation. In general, medium sized universities in English speaking countries benefit from the benevolent ranking. On the contrary, we observe that rankings with fixed weighting schemes reward large and research oriented universities. Especially Swiss and German universities significantly improve their position in a more benevolent ranking.
Although many studies have been conducted to clarify the factors that affect the citation frequency of "academic papers," there are few studies where the citation frequency of "patents" has been predicted on the basis of statistical analysis, such as regression analysis. Assuming that a patent based on a variety of technological bases tends to be an important patent that is cited more often, this study examines the influence of the number of cited patents' classifications and compares it with other factors, such as the numbers of inventors, classifications, pages, and claims. Multiple linear, logistic, and zero-inflated negative binomial regression analyses using these factors are performed. Significant positive correlations between the number of classifications of cited patents and the citation frequency are observed for all the models. Moreover, the multiple regression analyses demonstrate that the number of classifications of cited patents contributes more to the regression than do other factors. This implies that, if confounding between factors is taken into account, it is the diversity of classifications assigned to backward citations that more largely influences the number of forward citations.
This paper aims at contributing to the on-going discussion about building and applying bibliometric indicators. It sheds light on their properties and requirements concerning six different aspects: deterministic versus probabilistic approach, application-related properties, the time dependence, normalization issues, size dependence and network indicators.
The purpose of this review is to examine the shaping of librarianship in the academic context through the literature of career specialties, with Abbott's (1988) system of professions providing an analytic framework. The specialties investigated are systems librarian, electronic resource librarian, digital librarian, institutional repository manager, clinical librarian and informationist, digital curator/research data manager, teaching librarian/information literacy educator, and information and knowledge manager. Piecemeal literature based on job advertisements, surveys, and individual case studies is consolidated to offer a novel perspective on the evolution of the profession. The resilience of the profession's core jurisdiction is apparent despite pressures to erode it. Forays into teaching, and more recently into open access and data management, can be understood as responses to such pressure. The attractions but also the risks of embedded roles and overextended claims become apparent when comparing past and prospective specialties.
When it comes to evaluating online information experiences, what metrics matter? We conducted a study in which 30 people browsed and selected content within an online news website. Data collected included psychometric scales (User Engagement, Cognitive Absorption, System Usability Scales), self-reported interest in news content, and performance metrics (i.e., reading time, browsing time, total time, number of pages visited, and use of recommended links); a subset of the participants had their physiological responses recorded during the interaction (i.e., heart rate, electrodermal activity, electrocmytogram). Findings demonstrated the concurrent validity of the psychometric scales and interest ratings and revealed that increased time on tasks, number of pages visited, and use of recommended links were not necessarily indicative of greater self-reported engagement, cognitive absorption, or perceived usability. Positive ratings of news content were associated with lower physiological activity. The implications of this research are twofold. First, we propose that user experience is a useful framework for studying online information interactions and will result in a broader conceptualization of information interaction and its evaluation. Second, we advocate a mixed-methods approach to measurement that employs a suite of metrics capable of capturing the pragmatic (e.g., usability) and hedonic (e.g., fun, engagement) aspects of information interactions. We underscore the importance of using multiple measures in information research, because our results emphasize that performance and physiological data must be interpreted in the context of users' subjective experiences.
User interest as a very dynamic information need is often ignored in most existing information retrieval systems. In this research, we present the results of experiments designed to evaluate the performance of a real-time interest model (RIM) that attempts to identify the dynamic and changing query level interests regarding social media outputs. Unlike most existing ranking methods, our ranking approach targets calculation of the probability that user interest in the content of the document is subject to very dynamic user interest change. We describe 2 formulations of the model (real-time interest vector space and real-time interest language model) stemming from classical relevance ranking methods and develop a novel methodology for evaluating the performance of RIM using Amazon Mechanical Turk to collect (interest-based) relevance judgments on a daily basis. Our results show that the model usually, although not always, performs better than baseline results obtained from commercial web search engines. We identify factors that affect RIM performance and outline plans for future research.
The digital environment provides an abundance of images and multimedia and offers a new potential for using resources in multiple modes of representation for teaching and learning. This article reports the findings of a case study that investigated the use of image and multimedia resources in an undergraduate classroom. The study assumed a contextual approach and focused on different class contexts and students' literacy practices. The class, which took place in a resource-rich, multimodal environment, was perceived by students as a positive learning experience. The distribution of resources and their role in teaching and learning varied and depended on the context of use. The findings indicate that images fulfilled important descriptive and mnemonic functions when students were introduced to new concepts, but their role was limited in practices that required students to analyze and synthesize knowledge.
Young undergraduate college students are often described as digital natives, presumed to prefer living and working in completely digital information environments. In reality, their world is part-paper/part-digital, in constant transition among successive forms of digital storage and communication devices. Studying for a degree is the daily work of these young people, and effective management of paper and digital academic materials and resources contributes crucially to their success in life. Students must also constantly manage their work against deadlines to meet their course and university requirements. This study, following the Personal Information Management (PIM) paradigm, examines student academic information management under these various constraints and pressures. A total of 41 18- to 22-year-old students were interviewed and observed regarding the content, structure, and uses of their immediate working environment within their dormitory rooms. Students exhibited remarkable creativity and variety in the mixture of automated and manual resources and devices used to support their academic work. The demands of a yearlong procession of assignments, papers, projects, and examinations increase the importance of time management activities and influence much of their behavior. Results provide insights on student use of various kinds of information technology and their overall planning and management of information associated with their studies.
General sentiment analysis for the social web has become increasingly useful for shedding light on the role of emotion in online communication and offline events in both academic research and data journalism. Nevertheless, existing general-purpose social web sentiment analysis algorithms may not be optimal for texts focussed around specific topics. This article introduces 2 new methods, mood setting and lexicon extension, to improve the accuracy of topic-specific lexical sentiment strength detection for the social web. Mood setting allows the topic mood to determine the default polarity for ostensibly neutral expressive text. Topic-specific lexicon extension involves adding topic-specific words to the default general sentiment lexicon. Experiments with 8 data sets show that both methods can improve sentiment analysis performance in corpora and are recommended when the topic focus is tightest.
Negation, intensifiers, and modality are common linguistic constructions that may modify the emotional meaning of the text and therefore need to be taken into consideration in sentiment analysis. Negation is usually considered as a polarity shifter, whereas intensifiers are regarded as amplifiers or diminishers of the strength of such polarity. Modality, in turn, has only been addressed in a very naive fashion, so that modal forms are treated as polarity blockers. However, processing these constructions as mere polarity modifiers may be adequate for polarity classification, but it is not enough for more complex tasks (e.g., intensity classification), for which a more fine-grained model based on emotions is needed. In this work, we study the effect of modifiers on the emotions affected by them and propose a model of negation, intensifiers, and modality especially conceived for sentiment analysis tasks. We compare our emotion-based strategy with two traditional approaches based on polar expressions and find that representing the text as a set of emotions increases accuracy in different classification tasks and that this representation allows for a more accurate modeling of modifiers that results in further classification improvements. We also study the most common uses of modifiers in opinionated texts and quantify their impact in polarity and intensity classification. Finally, we analyze the joint effect of emotional modifiers and find that interesting synergies exist between them.
In Wikipedia, volunteers collaboratively author encyclopedic entries, and therefore managing conflict is a key factor in group success. Behavioral research describes 3 conflict types: task-related, affective, and process. Affective and process conflicts have been consistently found to impede group performance; however, the effect of task conflict is inconsistent. We propose that these inconclusive results are due to underspecification of the task conflict construct, and focus on the transition phase where task-related disagreements escalate into affective and process conflict. We define these transitional phases as distinct constructstask-affective and task-process conflictand develop a theoretical model that explains how the various task-related conflict constructs, together with the composition of the wiki editor group, determine the quality of the collaboratively authored wiki article. Our empirical study of 96 Wikipedia articles involved multiple data-collection methods, including analysis of Wikipedia system logs, manual content analysis of articles' discussion pages, and a comprehensive assessment of articles' quality using the Delphi method. Our results show that when group members' disagreementsoriginally task relatedescalate into personal attacks or hinge on procedure, these disagreements impede group performance. Implications for research and practice are discussed.
Since the 1990s, with the heightened competition and the strong growth of the international higher education market, an increasing number of rankings have been created that measure the scientific performance of an institution based on data. The Leiden Ranking 2011/2012 (LR) was published early in 2012. Starting from Goldstein and Spiegelhalter's (1996) recommendations for conducting quantitative comparisons among institutions, in this study we undertook a reformulation of the LR by means of multilevel regression models. First, with our models we replicated the ranking results; second, the reanalysis of the LR data showed that only 5% of the PPtop10% total variation is attributable to differences between universities. Beyond that, about 80% of the variation between universities can be explained by differences among countries. If covariates are included in the model the differences among most of the universities become meaningless. Our findings have implications for conducting university rankings in general and for the LR in particular. For example, with Goldstein-adjusted confidence intervals, it is possible to interpret the significance of differences among universities meaningfully: Rank differences among universities should be interpreted as meaningful only if their confidence intervals do not overlap.
Previous research has revealed the following three challenges for knowledge sharing: awareness of expertise distribution, motivation for sharing, and network ties. In this case study, we examine how different generations of information and communication technologies (ICTs), ranging from e-mail to micro-blogging, can help address these challenges. Twenty-one interviews with employees from a multinational company revealed that although people think social media can better address these challenges than older tools, the full potential of social media for supporting knowledge sharing has yet to be achieved. When examining the interconnections among different ICTs, we found that employees choice of a combination of ICTs, as affected by their functional backgrounds, could create technological divides among them and separate resources. This finding indicates that having more ICTs is not necessarily better. ICT integration, as well as support for easy navigation, is crucial for effective knowledge search and sharing. Adaptation to local culture is also needed to ensure worldwide participation in knowledge sharing.
Social media communities have emerged recently as open and free communication platforms to support real-time information sharing among members. Drawing on social capital theories, we develop a theoretical model to investigate how the two types of social capital (bonding and bridging) contribute to the individual and collective well-being of virtual communities through information exchange. Research hypotheses were tested through survey instruments and computer archive data of 475 members of a large social network site during the Wenchuan earthquake (2008) in China. We find that bonding has a positive and significant impact on bridging. Both bonding and bridging have positive and significant impacts on information quality, but not on information quantity. Results also suggest that information quality is more critical to individuals and collective well-being than information quantity after a disaster.
Looking for ontology in a search engine, one can find so many different approaches that it can be difficult to understand which field of research the subject belongs to and how it can be useful. The term ontology is employed within philosophy, computer science, and information science with different meanings. To take advantage of what ontology theories have to offer, one should understand what they address and where they come from. In information science, except for a few papers, there is no initiative toward clarifying what ontology really is and the connections that it fosters among different research fields. This article provides such a clarification. We begin by revisiting the meaning of the term in its original field, philosophy, to reach its current use in other research fields. We advocate that ontology is a genuine and relevant subject of research in information science. Finally, we conclude by offering our view of the opportunities for interdisciplinary research.
Compelled Nonuse of Information (CNI) is a model of information behavior developed by Houston (2009, 2011a). CNI posits the existence of nonvolitional mechanisms that force information behaviors beyond the control of the individual. The CNI model consists of six primary CNI types: intrinsic somatic barriers, socio-environmental barriers, authoritarian barriers, threshold knowledge shortfall barriers, attention shortfall barriers, and filtering barriers. This typology of information interaction limitations functions across a full range of socio-economic contexts and thus lends itself to analysis of intractable power-based inequities such as intimate partner violence (IPV). IPV includes physical, mental, financial, and social attacks that, if known, generate socially sanctioned responses of both formal (e.g., law enforcement) and informal (e.g., pastoral) approaches. Using the CNI framework to analyze information factors in distinct facets of the IPV experience, as identified in the cross disciplinary research on this phenomenon, this article provides a practical application of CNI to a complicated, high-risk phenomenon.
In this study, we design an innovative method for answering students' or scholars' academic questions (for a specific scientific publication) by automatically recommending e-learning resources in a cyber-infrastructure-enabled learning environment to enhance the learning experiences of students and scholars. By using information retrieval and metasearch methodologies, different types of referential metadata (related Wikipedia pages, data sets, source code, video lectures, presentation slides, and online tutorials) for an assortment of publications and scientific topics will be automatically retrieved, associated, and ranked (via the language model and the inference network model) to provide easily understandable cyberlearning resources to answer students' questions. We also designed an experimental system to automatically answer students' questions for a specific academic publication and then evaluated the quality of the answers (the recommended resources) using mean reciprocal rank and normalized discounted cumulative gain. After examining preliminary evaluation results and student feedback, we found that cyberlearning resources can provide high-quality and straightforward answers for students' and scholars' questions concerning the content of academic publications.
Digital possessions are digital items that individuals distinguish from other digital items by specific qualities that individuals perceive the digital items to possess. Twenty-three participants were interviewed about their definitions of and relationships with digital possessions to identify the most salient characteristics of digital possessions and to inform preservation. Findings indicate that digital possessions are characterized as (a) providing evidence of the individual, (b) representing the individual's identity, (c) being recognized as having value, and (d) exhibiting a sense of bounded control. Furthermore, archival concepts of primary, secondary, and intrinsic values provide the frame for the defining characteristics. Although several findings from this study are consistent with former studies of material possessions and digital possessions, this study expands research in the area using the concept of digital possessions to inform preservation and by applying archival principles of value. Understanding the nature of the individual and digital item relationship provides potential to explore new areas of reference and outreach services in libraries and archives. As the nature of archival and library reference services evolves, some scholars have predicted that archives and libraries will play a part in helping individuals manage their personal collections An exploration of individuals' relationships with their digital possessions can serve as a starting point at which scholars can explore the potential of personal information management consulting as a new area of reference and information services, specifically for the preservation of personal digital material.
We present our semeiotic-inspired concept of information as 1 of 3 important elements in meaning creation, the 2 other concepts being emotion and cognition. We have the inner world (emotion); we have the outer world (information); and cognition mediates between the two. We analyze the 3 elements in relation to communication and discuss the semeiotics-inspired communication model, the Dynacom; then, we discuss our semeiotic perspective on the meaning-creation process and communication with regard to a few, but central, elements in library and information science, namely, the systems-oriented perspective, the user-oriented perspective, and a domain-oriented perspective.
This article provides an overview of studies that have used citation analysis in the field of humanities in the period 1951 to 2010. The work is based on an exhaustive search in databasesparticularly those in library and information scienceand on citation chaining from papers on citation analysis. The results confirm that use of this technique in the humanities is limited, and although there was some growth in the 1970s and 1980s, it has stagnated in the past 2 decades. Most of the work has been done by research staff, but almost one third involves library staff, and 15% has been done by students. The study also showed that less than one fourth of the works used a citation database such as the Arts & Humanities Citation Index and that 21% of the works were in publications other than library and information science journals. The United States has the greatest output, and English is by far the most frequently used language, and 13.9% of the studies are in other languages.
This paper correlates the peer evaluations performed in late 2009 by the disciplinary committees of CNPq (a Brazilian funding agency) with some standard bibliometric measures for 55 scientific areas. We compared the decisions to increase, maintain or decrease a scientist's research scholarship funded by CNPq. We analyzed these decisions for 2,663 Brazilian scientists and computed their correlations (Spearman rho) with 21 different measures, among them: total production, production in the last 5 years, production indexed in Web of Science and Scopus, total citations received (according to WOS, Scopus, and Google Scholar), h-index and m-quotient (according to the three citation services). The highest correlations for each area range from 0.95 to 0.29, although there are areas with no significantly positive correlation with any of the metrics.
This study examines the relationship between academic seniority and research productivity through a study of a sample of academics at Australian law schools. To measure research productivity, we use both publications in top law journals, variously defined, and citation metrics. A feature of the study is that we pay particular attention to addressing the endogeneity of academic rank. To do so, we use a novel identification strategy, proposed by Lewbel (Journal of Business and Economic Statistics 30:67-80, 2012), which utilises a heteroscedastic covariance restriction to construct an internal instrumental variable. Our main finding is that once endogeneity of academic rank is addressed, more senior academics at Australian law schools do not publish more articles in top law journals (irrespective of how top law journals are defined) than their less senior colleagues. However, Professors continue to have greater impact than Lecturers when research productivity is measured in terms of total citations and common citation indices, such as the h-index and g-index.
This study examines technological collaboration in the solar cell industry using the information of patent assignees and inventors as defined by the United States Patent and Trademark Office. Three different collaborative types, namely local (same city), domestic (different cities of the same country), and international collaboration, are discussed. The general status of solar cell patent collaborations, transforming trends of collaborative patterns, average numbers of assignees and inventors for three collaborative types, and international collaboration countries are studied. It is found that co-invented patents and co-assigned patents have both increased in numbers during the four decades studied, and that collaboration between technology owners is very low while the collaboration between inventors is active. Domestic collaboration is the main collaborative pattern for both assignee collaboration and inventor collaboration. The other two collaborative types show contrary trends: international collaboration has slowly risen in the past decades while local collaboration has dwindled. The US has the largest number of internationally collaborative patents worldwide, though such patents account for a low portion of total US patents. In contrast, China has a small total number of patents and internationally collaborative patents, however its international collaborative shares are higher. The international collaboration patents among countries are few. A co-assigned patent analysis indicates that the main international cooperation partner of the United States is Japan. Based on an international co-invented patent analysis, the main international collaboration partners of the United States are Britain, Japan, and Germany; and the United States is also the most important collaboration partner of China.
This paper analyses existing trends in the collaborative structure of the Pharmacology and Pharmacy field in Spain and explores its relationship with research impact. The evolution in terms of size of the research community, the typology of collaborative links (national, international) and the scope of the collaboration (size of links, type of partners) are studied by means of different measures based on co-authorship. Growing heterogeneity of collaboration and impact of research are observed over the years. Average journal impact (MNJS) and citation score (MNCS) normalised to world average tend to grow with the number of authors, the number of institutions and collaboration type. Both national and international collaboration show MNJS values above the country's average, but only internationally co-authored publications attain citation rates above the world's average. This holds at country and institutional sector levels, although not all institutional sectors obtain the same benefit from collaboration. Multilateral collaboration with high-level R&D countries yields the highest values of research impact, although the impact of collaboration with low-level R&D countries has been optimised over the years. Although scientific collaboration is frequently based on individual initiative, policy actions are required to promote the more heterogeneous types of collaboration.
The objective of this work was to test the relationship between characteristics of an author's network of coauthors to identify which enhance the h-index. We randomly selected a sample of 238 authors from the Web of Science, calculated their h-index as well as the h-index of all co-authors from their h-index articles, and calculated an adjacency matrix where the relation between co-authors is the number of articles they published together. Our model was highly predictive of the variability in the h-index (R (2) = 0.69). Most of the variance was explained by number of co-authors. Other significant variables were those associated with highly productive co-authors. Contrary to our hypothesis, network structure as measured by components was not predictive. This analysis suggests that the highest h-index will be achieved by working with many co-authors, at least some with high h-indexes themselves. Little improvement in h-index is to be gained by structuring a co-author network to maintain separate research communities.
This study analyzes the editorials in Science and Nature published between 2000 and 2012 about careers in science. Of the total body of documents, 8.8 % dealt with science careers. The editorials were manually classified by topics and then mapped using the VOSviewer. This revealed six easily distinguishable clusters: career conditions in science, the attractiveness of science as a career, merit-based career policies, the effect of research funding on careers, specific groups underrepresented in science, and mobility of scientists. The paper summarizes the main thrust of the arguments in these editorials. There is strong agreement about the problems in scientific careers, but less consensus on the solutions to these problems. The paper also explores whether mapping on the basis of automatically identified terms could have provided adequate results, but concludes that manual classification is needed.
Big Science accelerator complexes are no longer mere tools for nuclear and particle physics, but modern-day experimental resources for a wide range of natural sciences and often named instrumental to scientific and technological development for innovation and economic growth. Facilities compete on a global market to attract the best users and facilitate the best science, and advertise the achievement of their users as markers of quality and productivity. Thus a need has risen for (quantitative) quality assessment of science on the level of facilities. In this article, we examine some quantitative performance measurements frequently used by facilities to display quality: technical reliability, competition for access, and publication records. We report data from the world's three largest synchrotron radiation facilities from the years 2004-2010, and discuss their meaning and significance by placing them in proper context. While we argue that quality is not possible to completely capture in these quantitative metrics, we acknowledge their apparent importance and, hence, we introduce and propose facilitymetrics as a new feature of the study of modern big science, and as a new empirical focus for scientometrical study, in the hope that future studies can contribute to a deeper, much-needed analysis of the topic.
Gender and racial disparities have greatly diminished in academia over the last 30 years, but attrition rates among women and minority faculty still remain high. In this paper we examine gender and racial disparities in publishing, an activity that is important for career advancement, but has not been incorporated adequately into the debate on faculty attrition. We surveyed a random sample of 1,065 authors who contributed a peer-reviewed journal article indexed in the Web of Science (WoS) in 2005 and at least one other article during the period of 2001-2004 in four academic disciplines representing natural sciences (biochemistry and water resources) and social sciences (anthropology and economics). We then report on the relationships between demographic variables (gender and race/ethnicity) and career-related variables (academic rank, discipline, and h-index) of these authors. Our findings show that at every career level and within each discipline, women were under-represented in academic positions compared to men and an even lower percentage of women published at each academic level than were employed at that level. Further, we found that women had lower h-indices than men in all four disciplines surveyed. Societal and biological constraints may reduce women's ability to obtain research intensive positions and contribute to these gender disparities. Hispanics and blacks were underrepresented among individuals awarded with doctoral degrees, doctorate recipients employed in academia, and academics publishing in WoS as compared to their representation in the population. Whites, Asians, and Native Americans and Pacific Islanders were adequately or over-represented in each category. Additionally, blacks had lower h-indices than the other ethnic groups across the disciplines surveyed. Compared to women, attrition among blacks and Hispanics appears to occur earlier in their career development. Cumulative experiences with discrimination and stereotypes may partly explain higher attrition and lower publication productivity among blacks and Hispanics.
This study uses the method of citation context analysis to compare differences in citation contexts, including cited concepts and citation functions, between natural sciences (NS) and social sciences and humanities (SSH), based on articles citing Little Science, Big Science (LSBS) published between 1963 and 2010. The findings indicate that NS and SSH researchers frequently cite LSBS as a source that is related to a specific topic and as evidence to support a claim. No significant differences were identified in the distribution of cited concepts included in LSBS, but significant differences were observed in the reasons for citing LSBS between NS and SSH citing articles. However, reverse trends were observed in the percentage of some cited concepts and citation functions between NS and SSH, which implies that subtle differences in citation behavior exist between NS and SSH researchers. In addition, each concept category has a different half-life. Concepts related to characteristics of big science and scientific collaboration have the longest half-lives.
Retraction is a self-cleaning activity done in the global science community. In this study, the retraction of global scientific publications from 2001 to 2010 was quantitatively analyzed by using the Science Citation Index Expanded. The results indicated that the number of retractions increased faster compared to the number of global scientific publications. Three very different patterns of retraction existed in each field. In the multi-disciplinary category and in the life sciences, retraction was relatively active. The impact factor strongly correlated with the number of retractions, but did not significantly correlate with the rate of retraction. Although the increases in the number of publications in China, India, and South Korea were faster, their retraction activities were higher than the worldwide average level.
We describe mathematically the age-independent version of the h-index, defined by Abt (Scientometrics 91(3):863-868, 2012) and explain when this indicator is constant with age. We compare this index with the one where not the h-index is divided by career length but where all citation numbers are divided by career length and where we then calculate the new h-index. Both mathematical models are compared. A variant of this second method is by calculating the h-index of the citation data, divided by article age. Examples are given.
Better understanding of research and publishing misconduct can improve strategies to mitigate their occurrence. In this study, we examine various trends among 2,375 articles retracted due to misconduct in all scholarly fields. Proportions of articles retracted due to "publication misconduct" (primarily plagiarism and duplicate publication) or "distrust data or interpretations" (primarily research artifacts and unexplained irreproducibility of data) differ significantly between PubMed (35 and 59 %, respectively) and non-PubMed (56 and 27 %) articles and between English- and non-English-speaking author affiliation countries. Retraction rates due to any form of misconduct, adjusted for the size of the literature in different disciplines, vary from 0.22 per 100,000 articles in the Humanities to 7.58 in Medicine and 7.69 in Chemistry. The annual rate of article retractions due to misconduct has increased exponentially since 2001, and the percentage of all retractions involving misconduct allegations has grown from 18.5-29.2 % for each year from 1990-1993 to 55.8-71.9 % for each year from 2007-2010. Despite these increases, the prominence of research integrity in the news media has not changed appreciably over the past 20 years. Articles retracted due to misconduct are found in all major scholarly disciplines. The higher rate of plagiarism among authors from non-English speaking countries may diminish if institutions improved their support for the writing of English manuscripts by their scholars. The training of junior scholars on proper codes of research (and publishing) conduct should be embraced by all disciplines, not just by biomedical fields where the perception of misconduct is high.
Small and medium-sized enterprises (SMEs) are more important today than in the past, due to their capabilities of creating jobs and boosting the economy. SMEs need continual innovation to survive in a competitive market and to continue growth. But SMEs suffer from the lack of information to generate innovative ideas. The objectives of this study are to suggest a new method to recommend promising technologies to SMEs that need "knowledge arbitrage" and to help SMEs come up with ideas on new R&D. To this end, this study used three analytic techniques: co-word analysis, collaborative filtering, and regression analysis. The suggested method is tested to assure its usefulness by the real case of knowledge arbitrage from LCD to Solar cell. The main contribution of this study is that it is the first to suggest the new method using recommendation algorithm (collaborative filtering) for SMEs' knowledge arbitrage.
Quantitative evaluation of scientists now has become very important at many aspects in the range of nation, even all over the world. Among the indices used for quantitative evaluation, h-type indicators are the most popular right now. However, because of the problem that mastering more than 40 variants is difficult and time-consuming, we need an intuitional and quick method by which we can present these indicators for evaluators and those even with little knowledge regarding to h-type indicators. In this paper, we introduce the paper-citation histogram in which most h-type indicators could be illustrated with their geometrical interpretation. With the help of these plots, evaluators can better understand the indices in a relatively short time. Meanwhile the geometrical interpretation can provide an insight into the research achievements of scientists.
The paper proposes two simple new indexes-k and w-to assess a scientist's publications record based on citations. The two indexes are superior to the widely used h index (Hirsch, 2005), as they preserve all its valuable characteristics and try to overcome one of its shortcomings, i.e. that it uses only a fraction of the information contained in a scientist's citations profile and, as a result, it is defined over the set of positive integers and does not show a sufficiently fine 'granularity' to allow a fully satisfactory ranking of scientists. This problem is particularly acute in many areas of Social Sciences and Humanities, where scientific productivity and citation practices typically yield fewer citations per paper and, as a consequence, are characterized by 'structurally' lower values of the h index. Both the indexes proposed are defined over R+, their integer part is equal to the scientist's h index and they fall in the right-open interval [h, h+1). While the h index is influenced only by part of the citations received by a scientist's most-cited publications, the k index takes into account all the citations received by her most-cited publications and the w index accounts for the citations received by the entire set of her publications. Variants of the k and w indexes are proposed which consider co-authorship. To show the extent to which the h index and the new indexes proposed may yield different results, they are calculated for 332 professors of economics in Italian universities and the results obtained used to rank Italian university departments.
In this study, new centrality (collaborative) measures are proposed for a node in weighted networks in three different categories. The bibliometric indicators' concepts (e.g., h-index and g-index) are applied to the network analysis measures in order to introduce the new centrality measures. First category of measures (i.e., l-index, al-index and gl-index) only considers a node's neighbors' degree. Second category of measures (i.e., h-Degree, a-Degree and g-Degree) takes into account the links' weights of a node in a weighted network. Third category of measures (i.e., Hw-Degree, Aw-Degree and Gw-Degree) combines both neighbors' degree and their links' weight. Using a co-authorship network, the association between these new measures and the existing measures with scholars' performance is examined to show the applicability of the new centrality measures. The analysis shows that the scholars' citation-based performances measures are significantly associated with all the proposed centrality measures but the correlation coefficient for the ones based on average indicators (i.e., a-Degree and Aw-Degree) is the highest.
Municipal solid waste (MSW) management in China draws particular attention as China has become the largest MSW generator in the world. The paper analyzed the growth and development of MSW research productivity in China in terms of publication output as reflected in science citation index for the period 1997-2011. The study revealed that the output of MSW research in China has rapidly increased over the 15 years in contrast with USA. Chinese authors contributed 730 publications out of which 708 were journal articles, 17 reviews, 3 editorial materials, 1 correction and 1 meeting abstract, from 421 institutions. About 13.70 % of publications were contributed by Chinese Academy of Sciences, followed by Tongji University, Shanghai (13.15) and Tsinghua University, Beijing (11.10 %). That impact factors of the top 20 journals publishing most papers were between 0.30 and 4.63. Leading 20 authors in the area of MSW research published at least 13 articles per person. The annual share of publications varied from 0.27 to 20.96 % per year. The share was highest in the year 2009 at 20.96 %. An analysis of the title-words showed that "landfill", "incineration" and "management" were recent major topics of municipal solid waste research in China. The results could help researchers understand the characteristics of research output and search hot spots of MSW field in China.
In recent years, several national and community-driven conference rankings have been compiled. These rankings are often taken as indicators of reputation and used for a variety of purposes, such as evaluating the performance of academic institutions and individual scientists, or selecting target conferences for paper submissions. Current rankings are based on a combination of objective criteria and subjective opinions that are collated and reviewed through largely manual processes. In this setting, the aim of this paper is to shed light into the following question: to what extent existing conference rankings reflect objective criteria, specifically submission and acceptance statistics and bibliometric indicators? The paper specifically considers three conference rankings in the field of Computer Science: an Australian national ranking, a Brazilian national ranking and an informal community-built ranking. It is found that in all cases bibliometric indicators are the most important determinants of rank. It is also found that in all rankings, top-tier conferences can be identified with relatively high accuracy through acceptance rates and bibliometric indicators. On the other hand, acceptance rates and bibliometric indicators fail to discriminate between mid-tier and bottom-tier conferences.
Historically, co-citation models have been based only on bibliographic information. Full-text analysis offers the opportunity to significantly improve the quality of the signals upon which these co-citation models are based. In this work we study the effect of reference proximity on the accuracy of co-citation clusters. Using a corpus of 270,521 full text documents from 2007, we compare the results of traditional co-citation clustering using only the bibliographic information to results from co-citation clustering where proximity between reference pairs is factored into the pairwise relationships. We find that accounting for reference proximity from full text can increase the textual coherence (a measure of accuracy) of a co-citation cluster solution by up to 30% over the traditional approach based on bibliographic information.
Given the importance of cross-disciplinary research (CDR), facilitating CDR effectiveness is a priority for many institutions and funding agencies. There are a number of CDR types, however, and the effectiveness of facilitation efforts will require sensitivity to that diversity. This article presents a method characterizing a spectrum of CDR designed to inform facilitation efforts that relies on bibliometric techniques and citation data. We illustrate its use by the Toolbox Project, an ongoing effort to enhance cross-disciplinary communication in CDR teams through structured, philosophical dialogue about research assumptions in a workshop setting. Toolbox Project workshops have been conducted with more than 85 research teams, but the project's extensibility to an objectively characterized range of CDR collaborations has not been examined. To guide wider application of the Toolbox Project, we have developed a method that uses multivariate statistical analyses of transformed citation proportions from published manuscripts to identify candidate areas of CDR, and then overlays information from previous Toolbox participant groups on these areas to determine candidate areas for future application. The approach supplies 3 results of general interest: 1. A way to employ small data sets and familiar statistical techniques to characterize CDR spectra as a guide to scholarship on CDR patterns and trends. 2. A model for using bibliometric techniques to guide broadly applicable interventions similar to the Toolbox. 3. A method for identifying the location of collaborative CDR teams on a map of scientific activity, of use to research administrators, research teams, and other efforts to enhance CDR projects.
Today's organizations spend a great deal of time and effort on e-mail leakage prevention. However, there are still no satisfactory solutions; addressing mistakes are not detected and in some cases correct recipients are wrongly marked as potential mistakes. In this article we present a new approach for preventing e-mail addressing mistakes in organizations. The approach is based on an analysis of e-mail exchanges among members of an organization and the identification of groups based on common topics. When a new e-mail is about to be sent, each recipient is analyzed. A recipient is approved if the e-mail's content belongs to at least one common topic to both the sender and the recipient. This can be applied even if the sender and recipient have never communicated directly before. The new approach was evaluated using the Enron e-mail data set and was compared with a well known method for the detection of e-mail addressing mistakes. The results show that the proposed approach is capable of detecting 87% of nonlegitimate recipients while incorrectly classifying only 0.5% of the legitimate recipients. These results outperform previous work, which reports a detection rate of 82% without reference to the false positive rate.
The impact of international collaboration on research performance has been extensively explored over the past two decades. Most research, however, focuses on quantity and citation-based indicators. Using the turnover of keywords, this study develops an integrative approach, tracking and visualizing the shift of the research stream, and tests it within the context of U.S.-China collaboration in nanotechnology. The results show evidence in support of the linkage between the emergence of a new research stream of Chinese researchers when there is U.S.-China collaboration. We also find that the triggered research stream diffused further via extended coauthorship. Policy implications for science and technology development and resource allocation in the United States and China are discussed.
Censuses are one of the most relevant types of statistical data, allowing analyses of the population in terms of demography, economy, sociology, and culture. For fine-grained analysis, census agencies publish census microdata that consist of a sample of individual records of the census containing detailed anonymous individual information. Working with microdata from different censuses and doing comparative studies are currently difficult tasks due to the diversity of formats and granularities. In this article, we show that novel data processing techniques can be applied to make census microdata interoperable and easy to access and combine. In fact, we demonstrate how Linked Open Data principles, a set of techniques to publish and make connections of (semi-)structured data on the web, can be fruitfully applied to census microdata. We present a step-by-step process to achieve this goal and we study, in theory and practice, two real case studies: the 2001 Spanish census and a general framework for Integrated Public Use Microdata Series (IPUMS-I).
Recent studies of authorship attribution have used machine-learning methods including regularized multinomial logistic regression, neural nets, support vector machines, and the nearest shrunken centroid classifier to identify likely authors of disputed texts. These methods are all limited by an inability to perform open-set classification and account for text and corpus size. We propose a customized Bayesian logit-normal-beta-binomial classification model for supervised authorship attribution. The model is based on the beta-binomial distribution with an explicit inverse relationship between extra-binomial variation and text size. The model internally estimates the relationship of extra-binomial variation to text size, and uses Markov Chain Monte Carlo (MCMC) to produce distributions of posterior authorship probabilities instead of point estimates. We illustrate the method by training the machine-learning methods as well as the open-set Bayesian classifier on undisputed papers of The Federalist, and testing the method on documents historically attributed to Alexander Hamilton, John Jay, and James Madison. The Bayesian classifier was the best classifier of these texts.
This study explores the ways adolescents create information collaboratively in the digital environment. In spite of the current widespread practice of information creation by young people, little research exists to illuminate how youth are engaged in creative information behavior or how they make participatory contributions to the changing information world. The purposefully selected sample includes teenagers who actively produce and share information projects, such as online school magazines, an information-sharing website in Wiki, and a digital media library, using Scratch-a graphical programming language developed by MIT Media Lab. Qualitative data were collected through group and individual interviews informed by Dervin's Sense-Making Methodology. The data analysis technique included directed qualitative content analysis with Atlas.ti. Findings reveal the process of information creation, including content development, organization, and presentation of information, as well as noticeable patterns by youth such as visualizing, remixing, tinkering, and gaining a sense of empowerment. This study extends our knowledge of the creative aspects of information behavior.
In organizations, the amount of attention that user-generated knowledge receives in knowledge management systems (KMSs) may not imply its potential for benefiting organizational activities in terms of accelerating innovation and product development. To optimize the utilization of knowledge in organizations, it is crucial to identify factors that influence knowledge popularity. From a network perspective, this study proposes a model to evaluate knowledge popularity by investigating 2 attributes of contextual information (i.e., authors and tags) that are embedded in a heterogeneous knowledge network, and how they interact to impact knowledge popularity. Objective data obtained through the interaction history of a KMS in a global telecommunication company was applied to test the hypotheses. This paper contributes to the extant literature on knowledge popularity by identifying contextual attributions of knowledge, and empirically tests the impact of their interactions on knowledge popularity.
In this article, we use innovative full-text citation analysis along with supervised topic modeling and network-analysis algorithms to enhance classical bibliometric analysis and publication/author/venue ranking. By utilizing citation contexts extracted from a large number of full-text publications, each citation or publication is represented by a probability distribution over a set of predefined topics, where each topic is labeled by an author-contributed keyword. We then used publication/citation topic distribution to generate a citation graph with vertex prior and edge transitioning probability distributions. The publication importance score for each given topic is calculated by PageRank with edge and vertex prior distributions. To evaluate this work, we sampled 104 topics (labeled with keywords) in review papers. The cited publications of each review paper are assumed to be "important publications" for the target topic (keyword), and we use these cited publications to validate our topic-ranking result and to compare different publication-ranking lists. Evaluation results show that full-text citation and publication content prior topic distribution, along with the classical PageRank algorithm can significantly enhance bibliometric analysis and scientific publication ranking performance, comparing with term frequency-inverted document frequency (tf-idf), language model, BM25, PageRank, and PageRank + language model (p < .001), for academic information retrieval (IR) systems.
Polarity classification is one of the main tasks related to the opinion mining and sentiment analysis fields. The aim of this task is to classify opinions as positive or negative. There are two main approaches to carrying out polarity classification: machine learning and semantic orientation based on the integration of knowledge resources. In this study, we propose to combine both approaches using a voting system based on the majority rule. In this way, we attempt to improve the polarity classification of two parallel corpora such as the opinion corpus for Arabic (OCA) and the English version of the OCA (EVOCA). Several experiments have been performed to check the feasibility of the proposed method. The results show that the experiment that took into account both approaches in the voting system obtained the best performance. Moreover, it is also shown that the proposed method slightly improves the best results obtained using machine learning approaches solely over the OCA and the EVOCA separately. Therefore, we can conclude that the approach proposed here might be considered a good strategy for polarity detection when we work with bilingual parallel corpora.
This study aims to identify the categorical characteristics and usage patterns of the most popular image tags in Flickr. The "metadata usage ratio" is introduced as a means of assessing the usage of a popular tag as metadata. We also compare how popular tags are used as image tags or metadata in the Flickr general collection and the Library of Congress's photostream (LCP), also in Flickr. The Flickr popular tags in the list overall are categorically stable, and the changes that do appear reflect Flickr users' evolving technology-driven cultural experience. The popular tags in Flickr had a high number of generic objects and specific locations-related tags and were rarely at the abstract level. Conversely, the popular tags in the LCP describe more in the specific objects and time categories. Flickr users copied the Library of Congress-supplied metadata that related to specific objects or events and standard bibliographic information (e. g., author, format, time references) as popular tags in the LCP. Those popular tags related to generic objects and events showed a high metadata usage ratio, while those related to specific locations and objects showed a low image metadata usage ratio. Popular tags in Flickr appeared less frequently as image metadata when describing specific objects than specific times and locations for historical images in Flickr LCP collections. Understanding how people contribute image tags or image metadata in Flickr helps determine what users need to describe and query images, and could help improve image browsing and retrieval.
Based on the complete set of firm data for Sweden (N = 1,187,421; November 2011), we analyze the mutual information among the geographical, technological, and organizational distributions in terms of synergies at regional and national levels. Using this measure, the interaction among three dimensions can become negative and thus indicate a net export of uncertainty by a system or, in other words, synergy in how knowledge functions are distributed over the carriers. Aggregation at the regional level (NUTS3) of the data organized at the municipal level (NUTS5) shows that 48.5% of the regional synergy is provided by the 3 metropolitan regions of Stockholm, Gothenburg, and Malmo/Lund. Sweden can be considered a centralized and hierarchically organized system. Our results accord with other statistics, but this triple helix indicator measures synergy more specifically and quantitatively. The analysis also provides us with validation for using this measure in previous studies of more regionalized systems of innovation (such as Hungary and Norway).
This article analyzes the context of citations within the full text of research articles. It studies articles published in a single journal: the Journal of Informetrics (JOI), in the first year the journal was published, 2007. The analysis classified the citations into in-and out-disciplinary content and looked at their use within the articles' sections such as introduction, literature review, methodology, findings, discussion, and conclusions. In addition, it took into account the age of cited articles. A thematic analysis of these citations was performed in order to identify the evolution of topics within the articles sections and the journal's content. A matrix describing the relationships between the citations' use, and their in- and out-disciplinary focus within the articles' sections is presented. The findings show that an analysis of citations based on their in-and out-disciplinary orientation within the context of the articles' sections can be an indication of the manner by which cross-disciplinary science works, and reveals the connections between the issues, methods, analysis, and conclusions coming from different research disciplines.
Evaluation is a vital research area in the digital library domain, demonstrating a growing literature in conference and journal articles. We explore the directions and the evolution of evaluation research for the period 2001-2011 by studying the evaluation initiatives presented at 2 main conferences of the digital library domain, namely the Association for Computing Machinery and the Institute of Electrical and Electronics Engineers (ACM/IEEE) Joint Conference on Digital Libraries (JCDL), and the European Conference on Digital Libraries (ECDL; since 2011 renamed to the International Conference on Theory and Practice of Digital Libraries [TPDL]). The literature is annotated using a domain ontology, named DiLEO, which defines explicitly the main concepts of the digital library evaluation domain and their correlations. The ontology instances constitute a semantic network that enables the uniform and formal representation of the critical evaluation constructs in both conferences, untangles their associations, and supports the study of their evolution. We discuss interesting patterns in the evaluation practices as well as in the research foci of the 2 venues, and outline current research trends and areas for further research.
This article presents a user study showing the effectiveness of a linked-based, virtual integration infrastructure that gives users access to relevant online resources, empowering them to design an information-seeking path that is specifically relevant to their context. IntegraL provides a lightweight approach to improve and augment search functionality by dynamically generating context-focused "anchors" for recognized elements of interest generated by library services. This article includes a description of how IntegraL's design supports users' information-seeking behavior. A full user study with both objective and subjective measures of IntegraL and hypothesis testing regarding IntegraL's effectiveness of the user's information-seeking experience are described along with data analysis, implications arising from this kind of virtual integration, and possible future directions.
In this article, we report on the status of graphs in 21 scientific agricultural journals indexed in Thomson Reuters' Web of Knowledge. We analyze the authors' use of graphs in this context in relation to the quality of these journals as measured by their 2-year impact factors. We note a substantial variability in the use of graphs in this context: For one journal, 100% of the papers include graphs, whereas for others only about 50% of them include graphs. We also show that higher impact agricultural journals publish more papers with graphs and that there are more graphs in these papers than in those in journals with lower impact factors (r = +0.40).
This article presents a new Parsimonious Citer-Based Measure for assessing the quality of academic papers. This new measure is parsimonious as it looks for the smallest set of citing authors (citers) who have read a certain paper. The Parsimonious Citer-Based Measure aims to address potential distortion in the values of existing citer-based measures. These distortions occur because of various factors, such as the practice of hyperauthorship. This new measure is empirically compared with existing measures, such as the number of citers and the number of citations in the field of artificial intelligence (AI). The results show that the new measure is highly correlated with those two measures. However, the new measure is more robust against citation manipulations and better differentiates between prominent and nonprominent AI researchers than the above-mentioned measures.
Are computer science papers extended after they are published? We have surveyed 200 computer science publications, 100 journal articles, and 100 conference papers, using self-citations to identify potential and actual continuations. We are interested in determining the proportion of papers that do indeed continue, how and when the continuation takes place, and whether any distinctions are found between the journal and conference populations. Despite the implicit assumption of a research line behind each paper, manifest in the ubiquitous future research notes that close many of them, we find that more than 70% of the papers are never continued.
Personal information behavior has been studied within a large number of different contexts; however, individuals show different information-related competencies in their professional, academic or daily life contexts. Literature suggests that if a person is information-related competent in one context, then he or she will be competent in the rest of the contexts of action. But this is only true for a basic level of information competencies. This article reports results from 24 interviews performed to mature e-learning students and suggests that in a more advanced level of information competencies, some competencies that appear in one context are not manifested in another context. Several factors have been found to be related with information competencies transfer between contexts. Attitude is a key factor and feelings regarding Internet use is another critical factor. Specifically in learning environments, the results suggest that canned content and planned learning strategies can discourage a proactive attitude and enthusiasm for information and communication technologies, and therefore the acquisition of information-related competencies. Understanding the differences and common patterns between these contexts may be useful for designing better information systems, services and instruction.
Communication and coordination are considered essential components of successful collaborations, and provision of awareness is a highly valuable feature of a collaborative information seeking (CIS) system. In this article, we investigate how providing different kinds of awareness support affects people's coordination behavior in a CIS task, as reflected by the communication that took place between them. We describe a laboratory study with 84 participants in 42 pairs with an experimental CIS system. These participants were brought to the laboratory for two separate sessions and given two exploratory search tasks. They were randomly assigned to one of the three systems, defined by the kind of awareness support provided. We analyzed the messages exchanged between the participants of each team by coding them for their coordination attributes. With this coding, we show how the participants employed different kinds of coordination during the study. Using qualitative and quantitative analyses, we demonstrate that the teams with no awareness, or with only personal awareness support, needed to spend more time and effort doing coordination than those with proper group awareness support. We argue that appropriate and adequate awareness support is essential for a CIS system for reducing coordination costs and keeping the collaborators well coordinated for a productive collaboration. The findings have implications for system designers as well as cognitive scientists and CIS researchers in general.
Knowledge contributing has long been identified as a bottleneck in knowledge management since individuals tend to believe that their contributing would not be worth the effort, given high levels of expectation to receive some value in return. Self-perception theory posits that individuals come to know their own internal beliefs by inferring them partially from observations of their own overt behavior. Building on self-perception theory and adhering to the principle that the relationship between behavior and beliefs is one of mutual influence, we develop a research model to explore the behavioral transfer from knowledge seeking to knowledge contributing and consider the mediating effect of intrinsic motivation. Data collected from 430 users of Web 2.0 applications were used to test the model. The mediating role of intrinsic motivation between knowledge seeking and knowledge contributing is confirmed. These findings and their implications for theory and practice are discussed.
Designing effective consumer health information systems requires deep understanding of the context in which the systems are being used. However, due to the elusive nature of the concept of context, few studies have made it a focus of examination. To fill this gap, we studied the context of consumer health information searching by analyzing questions posted on a social question and answer site: Yahoo! Answers. Based on the analysis, a model of context was developed. The model consists of 5 layers: demographic, cognitive, affective, situational, and social and environmental. The demographic layer contains demographic factors of the person of concern; the cognitive layer contains factors related to the current search task (specifically, topics of interest and information goals) and users' cognitive ability to articulate their needs. The affective layer contains different affective motivations and intentions behind the search. The situational layer contains users' perceptions of the current health condition and where the person is in the illness trajectory. The social and environmental layer contains users' social roles, social norms, and various information channels. Several novel system functions, including faceted search and layered presentation of results, are proposed based on the model to help contextualize and improve users' interactions with health information systems.
This paper develops an understanding of creativity to meet the requirements of the decision of the Supreme Court of theUnitedStates inFeist v.Rural (1991). The inclusion of creativity in originality, in a minimal degree of creativity, and in a creative spark below the level required for originality, is first established. Conditions for creativity are simultaneously derived. Clauses negatively implying creativity are then identified and considered. The clauses that imply creativity can be extensively correlated with conceptions of computability. The negative of creativity is then understood as an automatic mechanical or computational procedure or a so routine process that results in a highly routine product. Conversely, creativity invariantly involves a not mechanical procedure. The not mechanical is then populated by meaning, in accord with accepted distinctions, drawing on a range of discourses. Meaning is understood as a different level of analysis to the syntactic or mechanical and also as involving direct human engagement with meaning. As direct engagement with meaning, it can be connected to classic concepts of creativity, through the association of dissimilars. Creativity is finally understood as not mechanical human activity above a certain level of routinicity. Creativity is then integrated with a minimal degree of creativity and with originality. The level of creativity required for a minimal degree is identified as intellectual. The combination of an intellectual level with a sufficient amount of creativity can be read from the exchange values connected with the product of creative activity. Humanly created bibliographic records and indexes are then possible correlates to, or constituents of, a minimal degree of creativity. A four-stage discriminatory process for determining originality is then specified. Finally, the strength and value of the argument are considered.
Wikipedia is a repository of information freely available to those with Internet access. Given its radical departure from previous encyclopedias, it is not surprising that it is controversial. Wikipedia is freely editable, leading to debates over accuracy and writing style. It has also included topics, especially in the area of popular culture, which some believe are not appropriate for a serious, comprehensive encyclopedia. In this article, I do not examine these arguments, but, rather, a different problem confronting Wikipedia. Through a case study of Cambodian history articles, I demonstrate how Wikipedia limits itself through a largely unconscious appropriation of the dominant discourse of representation surrounding its objects of inquiry. When article quality is examined, a distinct pattern emerges that can readily be matched to the dominant historiographical tradition of Cambodian history. As well as presenting this case study as a demonstration of the influence of dominant discursive narratives, I wish to contextualize this privileging of particular discourses within debates about the New World Information and Communication Order (NWICO) that emerged in the 1970s. I argue that the NWICO can be useful in our thinking about the relationship of information professionals to Wikipedia today.
The semantic knowledge of Wikipedia has proved to be useful for many tasks, for example, named entity disambiguation. Among these applications, the task of identifying the word sense based on Wikipedia is a crucial component because the output of this component is often used in subsequent tasks. In this article, we present a two-stage framework (called TSDW) for word sense disambiguation using knowledge latent in Wikipedia. The disambiguation of a given phrase is applied through a two-stage disambiguation process: (a) The first-stage disambiguation explores the contextual semantic information, where the noisy information is pruned for better effectiveness and efficiency; and (b) the second-stage disambiguation explores the disambiguated phrases of high confidence from the first stage to achieve better redisambiguation decisions for the phrases that are difficult to disambiguate in the first stage. Moreover, existing studies have addressed the disambiguation problem for English text only. Considering the popular usage of Wikipedia in different languages, we study the performance of TSDW and the existing state-of-the-art approaches over both English and Traditional Chinese articles. The experimental results show that TSDW generalizes well to different semantic relatedness measures and text in different languages. More important, TSDW significantly outperforms the state-of-the-art approaches with both better effectiveness and efficiency.
Bibliometric indicators are increasingly used in support of decisions about recruitment, career advancement, rewards, and selective funding for scientists. Given the importance of the applications, bibliometricians are obligated to carry out empirical testing of the robustness of the indicators, in simulations of real contexts. In this work, we compare the results of national-scale research assessments at the individual level, based on the following three different indexes: the h-index, the g-index, and fractional scientific strength (FSS), an indicator previously proposed by the authors. For each index, we construct and compare rankings lists of all Italian academic researchers working in the hard sciences during the period 2001-2005. The analysis quantifies the shifts in ranks that occur when researchers' productivity rankings by simple indicators such as the h- or g-indexes are compared with those by more accurate FSS.
A knowledge map of digital library (DL) research shows the semantic organization of DL research topics and also the evolution of the field. The research reported in this article aims to find the core topics and subtopics of DL research in order to build a knowledge map of the DL domain. The methodology is comprised of a four-step research process, and two knowledge organization methods (classification and thesaurus building) were used. A knowledge map covering 21 core topics and 1,015 subtopics of DL research was created and provides a systematic overview of DL research during the last two decades (1990-2010). We argue that the map can work as a knowledge platform to guide, evaluate, and improve the activities of DL research, education, and practices. Moreover, it can be transformed into a DL ontology for various applications. The research methodology can be used to map any human knowledge domain; it is a novel and scientific method for producing comprehensive and systematic knowledge maps based on literary warrant.
Multisession successive information searches are common but little research has focused on quantitative analysis. This article enhances our understanding of successive information searches by employing an experimental method to observe whether and how the behavioral characteristics of searchers statistically significantly changed over sessions. It focuses on a specific type of successive search called transmuting successive searches, in which searchers learn about and gradually refine their information problems during the course of the information search. The results show that searchers' behavioral characteristics indeed exhibit different patterns in different sessions. The identification of the behavioral characteristics can help information retrieval systems to detect stages or sessions of the information search process. The findings also help validate a theoretical framework to explain successive searches and suggest system requirements for supporting the associated search behavior. The study is one of the first to not only test for statistical significance among research propositions concerning successive searches but to also apply the research principles of implicit relevance feedback to successive searches.
Understanding the behavior and characteristics of web users is valuable when improving information dissemination, designing recommendation systems, and so on. In this work, we explore various methods of predicting the ratio of male viewers to female viewers onYouTube. First, we propose and examine two hypotheses relating to audience consistency and topic consistency. The former means that videos made by the same authors tend to have similar male-to-female audience ratios, whereas the latter means that videos with similar topics tend to have similar audience gender ratios. To predict the audience gender ratio before video publication, two features based on these two hypotheses and other features are used in multiple linear regression (MLR) and support vector regression (SVR). We find that these two features are the key indicators of audience gender, whereas other features, such as gender of the user and duration of the video, have limited relationships. Second, another method is explored to predict the audience gender ratio. Specifically, we use the early comments collected after video publication to predict the ratio via simple linear regression (SLR). The experiments indicate that this model can achieve better performance by using a few early comments. We also observe that the correlation between the number of early comments (cost) and the predictive accuracy (gain) follows the law of diminishing marginal utility. We build the functions of these elements via curve fitting to find the appropriate number of early comments (approximately 250) that can achieve maximum gain at minimum cost.
One way of evaluating individual scientists is the determination of the number of highly cited publications, where the threshold is given by a large reference set. It is shown that this indicator behaves in a counterintuitive way, leading to inconsistencies in the ranking of different scientists.
Research demonstrates that information disseminated and circulated in online forums may have a significant impact on investors and on the securities market, so an understanding of that environment is critical. This article reports on an analysis of information sharing and use in three investment discussion forums. Threads containing 1,787 posts were coded using previously developed typologies for Internet-based discussion. Citations were studied in their context and sources were categorized into types. A high degree of collaborative information behavior was identified, but the study also reveals some areas of information use that may compromise investors' decision making, including heavy reliance on personal sources of information and other sources that vary greatly in trustworthiness, including commercially sponsored information, blogs, and investor guru sites. These challenges are discussed and recommendations are made for improving services to investors. Questions for additional research are also identified.
Increasingly, e-books are becoming alternatives to print books in academic libraries, thus providing opportunities to assess how well the use of e-books meets the requirements of academics. This study uses the tasktechnology fit (TTF) model to explore the interrelationships of e-books, the affordances offered by smart readers, the information needs of academics, and the " fit" of technology to tasks as well as performance. We propose that the adoption of e-books will be dependent on how academics perceive the fit of this new medium to the tasks they undertake as well as what added-value functionality is delivered by the information technology that delivers the content. The study used content analysis and an online survey, administered to the faculty in Medicine, Science and Engineering at the University of New South Wales, to identify the attributes of a TTF construct of e-books in academic settings. Using exploratory factor analysis, preliminary findings confirmed annotation, navigation, and output as the core dimensions of the TTF construct. The results of confirmatory factor analysis using partial least squares path modeling supported the overall TTF model in reflecting significant positive impact of task, technology, and individual characteristics on TTF for e-books in academic settings; it also confirmed significant positive impact of TTF on individuals' performance and use, and impact of using e-books on individual performance. Our research makes two contributions: the development of an e-book TTF construct and the testing of that construct in a model validating the efficacy of the TTF framework in measuring perceived fit of e-books to academic tasks.
This study investigates the manifestation and utility of a generic function-based topical relevance typology adapted to the subject domain of clinical medicine. By specifying the functional role of a given piece of relevant information in the overall structure of a topic, the proposed typology provides a generic framework for integrating different pieces of clinical evidence and a multifaceted view of a clinical problem. In medical problem solving structured knowledge plays a key role. The typology provides the conceptual basis for integrating and structuring knowledge; it incorporates and goes beyond existing clinical schemes (such as PICO and illness script) and offers extra assistance for physicians as well as lay users (such as patients and caregivers) to manage the vast amount of diversified evidence, to maintain a structured view of the patient problem at hand, and ultimately to make well-grounded clinical choices. Developed as a generic topical framework across topics and domains, the typology proved useful for clinical medicine once extended with domainspecific definitions and relationships. This article reports the findings of using the adapted and extended typology in the analysis of 26 clinical questions and their evidence-based answers. The article concludes with potential applications of the typology to improve clinical information seeking, organizing, and processing.
Korea, along with Asia at large, is producing more and more valuable academic materials. Furthermore, the demand for academic materials produced in nonWestern societies is increasing among English-speaking users. In order to search among such material, users rely on keywords such as author names. However, Asian nations such as Korea and China have markedly different methods of writing personal names from Western naming traditions. Among these differences are name components, structure, writing customs, and distribution of surnames. These differences influence the Anglicization of Asian academic researchers' names, often leading to them being written in various fashions, unlike Western personal names. These inconsistent formats can often lead to difficulties in searching and finding academic materials for Western users unfamiliar with Korean and Asian personal names. This article presents methods for precisely understanding and categorizing Korean personal names in order to make academic materials by Korean authors easier to find for Westerners. As such, this article discusses characteristics particular to Korean personal names and furthermore analyzes how the personal names of Korean academic researchers are currently being written in English.
Using the CD-ROM version of the Science Citation Index 2010 (N = 3,705 journals), we study the (combined) effects of (a) fractional counting on the impact factor (IF) and (b) transformation of the skewed citation distributions into a distribution of 100 percentiles and six percentile rank classes (top-1%, top-5%, etc.). Do these approaches lead to field-normalized impact measures for journals? In addition to the 2-year IF (IF2), we consider the 5-year IF (IF5), the respective numerators of these IFs, and the number of Total Cites, counted both as integers and fractionally. These various indicators are tested against the hypothesis that the classification of journals into 11 broad fields by PatentBoard/NSF (National Science Foundation) provides statistically significant between-field effects. Using fractional counting the between-field variance is reduced by 91.7% in the case of IF5, and by 79.2% in the case of IF2. However, the differences in citation counts are not significantly affected by fractional counting. These results accord with previous studies, but the longer citation window of a fractionally counted IF5 can lead to significant improvement in the normalization across fields.
Primary health care practitioners routinely search for information within electronic knowledge resources. We proposed four levels of outcomes of informationseeking: situational relevance, cognitive impact, information use, and patient health outcomes. Our objective was to produce clinical vignettes for describing and testing these levels. We conducted a mixed methods study combining a quantitative longitudinal study and a qualitative multiple case study. Participants were 10 nurses, 10 medical residents, and 10 pharmacists. They had access to an online resource, and did 793 searches for treatment recommendations. Using the Information Assessment Method (IAM), participants rated their searches for each of the four levels. Rated searches were examined in interviews guided by log reports and a think-aloud protocol. Cases were defined as clearly described searches where clinical information was used for a specific patient. For each case, interviewees described the four levels of outcomes. Quantitative and qualitative data were merged into clinical vignettes. We produced 130 clinical vignettes. Specifically, 46 vignettes (35.4%) corresponded to clinical situations where information use was associated with one or more than one type of positive patient health outcome: increased patient knowledge (n = 28), avoidance of unnecessary or inappropriate intervention (n = 25), prevention of disease or health deterioration (n = 9), health improvement (n = 6), and increased patient satisfaction (n = 3). Results suggested information use was associated with perceived benefits for patients. This may encourage clinicians to search for information more often when they feel the need. Results supported the four proposed levels of outcomes, which can be transferable to other information-seeking contexts.
We present a model that describes which fraction of the literature on a certain topic we will find when we use n (n = 1, 2,...) databases. It is a generalization of the theory of discovering usability problems. We prove that, in all practical cases, this fraction is a concave function of n, the number of used databases, thereby explaining some graphs that exist in the literature. We also study limiting features of this fraction for n very high and we characterize the case that we find all literature on a certain topic for n high enough.
This large-scale international study measures the attitudes of more than 4,000 researchers toward peer review. In 2009, 40,000 authors of research papers from across the globe were invited to complete an online survey. Researchers were asked to rate a number of general statements about peer review, and then a subset of respondents, who had themselves peer reviewed, rated a series of statements concerning their experience of peer review. The study found that the peer review process is highly regarded by the vast majority of researchers and considered by most to be essential to the communication of scholarly research. Nine out of 10 authors believe that peer review improved the last paper they published. Double-blind peer review is considered the most effective form of peer review. Nearly three quarters of researchers think that technological advances are making peer review more effective. Most researchers believe that although peer review should identify fraud, it is very difficult for it to do so. Reviewers are committed to conducting peer review in the future and believe that simple practical steps, such as training new reviewers would further improve peer review.
Team science and collaboration have become crucial to addressing key research questions confronting society. Institutions that are spread across multiple geographic locations face additional challenges. To better understand the nature of cross-campus collaboration within a single institution and the effects of institutional efforts to spark collaboration, we conducted a case study of collaboration at Cornell University using scientometric and network analyses. Results suggest that crosscampus collaboration is increasingly common, but is accounted for primarily by a relatively small number of departments and individual researchers. Specific researchers involved in many collaborative projects are identified, and their unique characteristics are described. Institutional efforts, such as seed grants and topical retreats, have some effect for researchers who are central in the collaboration network, but were less clearly effective for others.
In this article, we present an in-home observation and in-context research study investigating how 38 adolescents aged 14-17 search on the Internet. We present the search trends adolescents display and develop a framework of search roles that these trends help define. We compare these trends and roles to similar trends and roles found in prior work with children ages 7, 9, and 11. We use these comparisons to make recommendations to adult stakeholders such as researchers, designers, and information literacy educators about the best ways to design search tools for children and adolescents, as well as how to use the framework of searching roles to find better methods of educating youth searchers. Major findings include the seven roles of adolescent searchers, and evidence that adolescents are social in their computer use, have a greater knowledge of sources than younger children, and that adolescents are less frustrated by searching tasks than younger children.
Traditional information retrieval (IR) systems show significant limitations on returning relevant documents that satisfy the user's information needs. In particular, to answer geographic and temporal user queries, the IR task becomes a nonstraightforward process where the available geographic and temporal information is often unstructured. In this article, we propose a geotemporal search approach that consists of modeling and exploiting geographic and temporal query context evidence that refers to implicit multivarying geographic and temporal intents behind the query. Modeling geographic and temporal query contexts is based on extracting and ranking geographic and temporal keywords found in pseudo-relevant feedback (PRF) documents for a given query. Our geotemporal search approach is based on exploiting the geographic and temporal query contexts separately into a probabilistic ranking model and jointly into a proximity ranking model. Our hypothesis is based on the concept that geographic and temporal expressions tend to co-occur within the document where the closer they are in the document, the more relevant the document is. Finally, geographic, temporal, and proximity scores are combined according to a linear combination formula. An extensive experimental evaluation conducted on a portion of the New York Times news collection and the TREC 2004 robust retrieval track collection shows that our geotemporal approach outperforms significantly a well-known baseline search and the best known geotemporal search approaches in the domain. Finally, an in-depth analysis shows a positive correlation between the geographic and temporal query sensitivity and the retrieval performance. Also, we find that geotemporal distance has a positive impact on retrieval performance generally.
Online videos provide a novel, and often interactive, platform for the popularization of science. One successful collection is hosted on the TED (Technology, Entertainment, Design) website. This study uses a range of bibliometric (citation) and webometric (usage and bookmarking) indicators to examine TED videos in order to provide insights into the type and scope of their impact. The results suggest that TED Talks impact primarily the public sphere, with about three-quarters of a billion total views, rather than the academic realm. Differences were found among broad disciplinary areas, with art and design videos having generally lower levels of impact but science and technology videos generating otherwise average impact for TED. Many of the metrics were only loosely related, but there was a general consensus about the most popular videos as measured through views or comments on YouTube and the TED site. Moreover, most videos were found in at least one online syllabus and videos in online syllabi tended to be more viewed, discussed, and blogged. Less-liked videos generated more discussion, although this may be because they are more controversial. Science and technology videos presented by academics were more liked than those by nonacademics, showing that academics are not disadvantaged in this new media environment.
The paper discusses and analyzes the notion of information quality in terms of a pragmatic philosophy of language. It is argued that the notion of information quality is of great importance, and needs to be situated better within a sound philosophy of information to help frame information quality in a broader conceptual light. It is found that much research on information quality conceptualizes information quality as either an inherent property of the information itself, or as an individual mental construct of the users. The notion of information quality is often not situated within a philosophy of information. This paper outlines a conceptual framework in which information is regarded as a semiotic sign, and extends that notion with Paul Grice's pragmatic philosophy of language to provide a conversational notion of information quality that is contextual and tied to the notion of meaning.
In this article, we investigate what sorts of information humans request about geographical objects of the same type. For example, Edinburgh Castle and Bodiam Castle are two objects of the same type: castle. The question is whether specific information is requested for the object type castle and how this information differs for objects of other types (e.g., church, museum, or lake). We aim to answer this question using an online survey. In the survey, we showed 184 participants 200 images pertaining to urban and rural objects and asked them to write questions for which they would like to know the answers when seeing those objects. Our analysis of the 6,169 questions collected in the survey shows that humans have shared ideas of what to ask about geographical objects. When the object types resemble each other (e.g., church and temple), the requested information is similar for the objects of these types. Otherwise, the information is specific to an object type. Our results may be very useful in guiding Natural Language Processing tasks involving automatic generation of templates for image descriptions and their assessment, as well as image indexing and organization.
The Open Access (OA) movement, which postulates gratis and unrestricted online access to publicly funded research findings, has significantly gained momentum in recent years. The two ways of achieving OA are self-archiving of scientific work by the authors (Green OA) and publishing in OA journals (Gold OA). But there is still no consensus which model should be supported in particular. The aim of this simulation study is to discover mechanisms and predict developments that may lead to specific outcomes of possible market transformation scenarios. It contributes to theories related to OA by substantiating the argument of a citation advantage of OA articles and by visualizing the mechanisms of a journal system collapsing in the long-term due to the continuation of the serials crisis. The practical contribution of this research stems from the integration of all market players: Decisions regarding potential financial support of OA models can be aligned with our findings, as well as the decision of a publisher to migrate his/her journals to Gold OA. Our results indicate that for scholarly communication in general, a transition to Green OA combined with a certain level of subscription-based publishing and a migration of few top journals is the most beneficial development.
Specialized medical ontologies and terminologies, such as SNOMED CT and the Unified Medical Language System (UMLS), have been successfully leveraged in medical information systems to provide a standard web-accessible medium for interoperability, access, and reuse. However, these clinically oriented terminologies and ontologies cannot provide sufficient support when integrated into consumer-oriented applications, because these applications must understand both technical and lay vocabulary. The latter is not part of these specialized terminologies and ontologies. In this article, we propose a two-step approach for building consumer health terminologies from text: 1) automatic extraction of definitions from consumer-oriented articles and web documents, which reflects language in use, rather than relying solely on dictionaries, and 2) learning to map definitions expressed in natural language to terminological knowledge by inducing a syntactic-semantic grammar rather than using hand-written patterns or grammars. We present quantitative and qualitative evaluations of our two-step approach, which show that our framework could be used to induce consumer health terminologies from text.
With the increasing number and diversity of search tools available, interest in the evaluation of search systems, particularly from a user perspective, has grown among researchers. More researchers are designing and evaluating interactive information retrieval (IIR) systems and beginning to innovate in evaluation methods. Maturation of a research specialty relies on the ability to replicate research, provide standards for measurement and analysis, and understand past endeavors. This article presents a historical overview of 40 years of IIR evaluation studies using the method of systematic review. A total of 2,791 journal and conference units were manually examined and 127 articles were selected for analysis in this study, based on predefined inclusion and exclusion criteria. These articles were systematically coded using features such as author, publication date, sources and references, and properties of the research method used in the articles, such as number of subjects, tasks, corpora, and measures. Results include data describing the growth of IIR studies over time, the most frequently occurring and cited authors and sources, and the most common types of corpora and measures used. An additional product of this research is a bibliography of IIR evaluation research that can be used by students, teachers, and those new to the area. To the authors' knowledge, this is the first historical, systematic characterization of the IIR evaluation literature, including the documentation of methods and measures used by researchers in this specialty.
The goal of this study was to propose novel cyberlearning resource-based scientific referential metadata for an assortment of publications and scientific topics, in order to enhance the learning experiences of students and scholars in a cyberinfrastructure-enabled learning environment. By using information retrieval and meta-search approaches, different types of referential metadata, such as related Wikipedia pages, data sets, source code, video lectures, presentation slides, and (online) tutorials for scientific publications and scientific topics will be automatically retrieved, associated, and ranked. In order to test our method of automatic cyberlearning referential metadata generation, we designed a user experiment to validate the quality of the metadata for each scientific keyword and publication and resource-ranking algorithm. Evaluation results show that the cyberlearning referential metadata retrieved via meta-search and statistical relevance ranking can help students better understand the essence of scientific keywords and publications.
In this article, we show how the Eigenfactor score, originally designed for ranking scholarly journals, can be adapted to rank the scholarly output of authors, institutions, and countries based on author-level citation data. Using the methods described in this article, we provide Eigenfactor rankings for 84,808 disambiguated authors of 240,804 papers in the Social Science Research Network (SSRN)a preprint and postprint archive devoted to the rapid dissemination of scholarly research in the social sciences and humanities. As an additive metric, the Eigenfactor scores are readily computed for collectives such as departments or institutions as well. We show that a collective's Eigenfactor score can be computed either by summing the Eigenfactor scores of its members or by working directly with a collective-level cross-citation matrix. We provide Eigenfactor rankings for institutions and countries in the SSRN repository. With a network-wide comparison of Eigenfactor scores and download tallies, we demonstrate that Eigenfactor scores provide information that is both different from and complementary to that provided by download counts. We see author-level ranking as one filter for navigating the scholarly literature, and note that such rankings generate incentives for more open scholarship, because authors are rewarded for making their work available to the community as early as possible and before formal publication.
Journals in the Information Science & Library Science category of Journal Citation Reports (JCR) were compared using both bibliometric and bibliographic features. Data collected covered journal impact factor (JIF), number of issues per year, number of authors per article, longevity, editorial board membership, frequency of publication, number of databases indexing the journal, number of aggregators providing full-text access, country of publication, JCR categories, Dewey decimal classification, and journal statement of scope. Three features significantly correlated with JIF: number of editorial board members and number of JCR categories in which a journal is listed correlated positively; journal longevity correlated negatively with JIF. Coword analysis of journal descriptions provided a proximity clustering of journals, which differed considerably from the clusters based on editorial board membership. Finally, a multiple linear regression model was built to predict the JIF based on all the collected bibliographic features.
The central issue in language model estimation is smoothing, which is a technique for avoiding zero probability estimation problem and overcoming data sparsity. There are three representative smoothing methods: Jelinek-Mercer (JM) method; Bayesian smoothing using Dirichlet priors (Dir) method; and absolute discounting (Dis) method, whose parameters are usually estimated empirically. Previous research in information retrieval (IR) on smoothing parameter estimation tends to select a single value from optional values for the collection, but it may not be appropriate for all the queries. The effectiveness of all the optional values should be considered to improve the ranking performance. Recently, learning to rank has become an effective approach to optimize the ranking accuracy by merging the existing retrieval methods. In this article, the smoothing methods for language modeling in information retrieval (LMIR) with different parameters are treated as different retrieval methods, then a learning to rank approach to learn a ranking model based on the features extracted by smoothing methods is presented. In the process of learning, the effectiveness of all the optional smoothing parameters is taken into account for all queries. The experimental results on the Learning to Rank for Information Retrieval (LETOR) LETOR3.0 and LETOR4.0 data sets show that our approach is effective in improving the performance of LMIR.
This article describes patterns of scientific growth that emerge in response to major research accomplishments in instrumentation and the discovery of new matter. Using two Nobel Prize-winning contributions, the scanning tunneling microscope (STM) and the discovery of Buckminsterfullerenes (BUF), we examine the growth of follow-up research via citation networks at the author and subdiscipline level. A longitudinal network analysis suggests that structure, cohesiveness, and interdisciplinarity vary considerably with the type of breakthrough and over time. Scientific progress appears to be multifaceted, including not only theoretical advances but also the discovery of new instrumentation and new matter. In addition, we argue that scientific growth does not necessarily lead to the formation of new specialties or new subdisciplines. Rather, we observe the emergence of a research community formed at the intersection of subdisciplinary boundaries.
We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.
Fractional scoring has been proposed to avoid inconsistencies in the attribution of publications to percentile rank classes. Uncertainties and ambiguities in the evaluation of percentile ranks can be demonstrated most easily with small data sets. But for larger data sets, an often large number of papers with the same citation count leads to the same uncertainties and ambiguities, which can be avoided by fractional scoring, demonstrated by four different empirical data sets with several thousand publications each, which are assigned to six percentile rank classes. Only by utilizing fractional scoring does, the total score of all papers exactly reproduce the theoretical value in each case.
Retracting published scientific articles is increasingly common. Retraction is a self-correction mechanism of the scientific community to maintain and safeguard the integrity of scientific literature. However, a retracted article may pose a profound and long-lasting threat to the credibility of the literature. New articles may unknowingly build their work on false claims made in retracted articles. Such dependencies on retracted articles may become implicit and indirect. Consequently, it becomes increasingly important to detect implicit and indirect threats. In this article, our aim is to raise the awareness of the potential threats of retracted articles even after their retraction and demonstrate a visual analytic study of retracted articles with reference to the rest of the literature and how their citations are influenced by their retraction. The context of highly cited retracted articles is visualized in terms of a co-citation network as well as the distribution of articles that have high-order citation dependencies on retracted articles. Survival analyses of time to retraction and postretraction citation are included. Sentences that explicitly cite retracted articles are extracted from full-text articles. Transitions of topics over time are depicted in topic-flow visualizations. We recommend that new visual analytic and science mapping tools should take retracted articles into account and facilitate tasks specifically related to the detection and monitoring of retracted articles.
With the rise of user-generated content, evaluating the credibility of information has become increasingly important. It is already known that various user characteristics influence the way credibility evaluation is performed. Domain experts on the topic at hand primarily focus on semantic features of information (e.g., factual accuracy), whereas novices focus more on surface features (e.g., length of a text). In this study, we further explore two key influences on credibility evaluation: topic familiarity and information skills. Participants with varying expected levels of information skills (i.e., high school students, undergraduates, and postgraduates) evaluated Wikipedia articles of varying quality on familiar and unfamiliar topics while thinking aloud. When familiar with the topic, participants indeed focused primarily on semantic features of the information, whereas participants unfamiliar with the topic paid more attention to surface features. The utilization of surface features increased with information skills. Moreover, participants with better information skills calibrated their trust against the quality of the information, whereas trust of participants with poorer information skills did not. This study confirms the enabling character of domain expertise and information skills in credibility evaluation as predicted by the updated 3S-model of credibility evaluation.
Conceptualizations of disciplinarity often focus on the social aspects of disciplines; that is, disciplines are defined by the set of individuals who participate in their activities and communications. However, operationalizations of disciplinarity often demarcate the boundaries of disciplines by standard classification schemes, which may be inflexible to changes in the participation profile of that discipline. To address this limitation, a metric called venue-author-coupling (VAC) is proposed and illustrated using journals from the Journal Citation Report's (JCR) library science and information science category. As JCRs are some of the most frequently used categories in bibliometric analyses, this allows for an examination of the extent to which the journals in JCR categories can be considered as proxies for disciplines. By extending the idea of bibliographic coupling, VAC identifies similarities among journals based on the similarities of their author profiles. The employment of this method using information science and library science journals provides evidence of four distinct subfields, that is, management information systems, specialized information and library science, library science-focused, and information science-focused research. The proposed VAC method provides a novel way to examine disciplinarity from the perspective of author communities.
Applying text mining techniques to legal issues has been an emerging research topic in recent years. Although some previous studies focused on assisting professionals in the retrieval of related legal documents, they did not take into account the general public and their difficulty in describing legal problems in professional legal terms. Because this problem has not been addressed by previous research, this study aims to design a text-mining-based method that allows the general public to use everyday vocabulary to search for and retrieve criminal judgments. The experimental results indicate that our method can help the general public, who are not familiar with professional legal terms, to acquire relevant criminal judgments more accurately and effectively.
While it is easy to identify whether someone has found a piece of information during a search task, it is much harder to measure how much someone has learned during the search process. Searchers who are learning often exhibit exploratory behaviors, and so current research is often focused on improving support for exploratory search. Consequently, we need effective measures of learning to demonstrate better support for exploratory search. Some approaches, such as quizzes, measure recall when learning from a fixed source of information. This research, however, focuses on techniques for measuring open-ended learning, which often involve analyzing handwritten summaries produced by participants after a task. There are two common techniques for analyzing such summaries: (a) counting facts and statements and (b) judging topic coverage. Both of these techniques, however, can be easily confounded by simple variables such as summary length. This article presents a new technique that measures depth of learning within written summaries based on Bloom's taxonomy (B.S. Bloom & M.D. Engelhart, 1956). This technique was generated using grounded theory and is designed to be less susceptible to such confounding variables. Together, these three categories of measure were compared by applying them to a large collection of written summaries produced in a task-based study, and our results provide insights into each of their strengths and weaknesses. Both fact-to-statement ratio and our own measure of depth of learning were effective while being less affected by confounding variables. Recommendations and clear areas of future work are provided to help continued research into supporting sensemaking and learning.
While many studies have compared research productivity across scientific fields, they have mostly focused on the hard sciences, in many cases due to limited publication data for the softer disciplines; these studies have also typically been based on a small sample of researchers. In this study we use complete publication data for all researchers employed at Norwegian universities over a 4-year period, linked to biographic data for each researcher. Using this detailed and complete data set, we compare research productivity between five main scientific domains (and subfields within them), across academic positions, and in terms of age and gender. The study's key finding is that researchers from medicine, natural sciences, and technology are most productive when whole counts of publications are used, while researchers from the humanities and social sciences are most productive when article counts are fractionalized according to the total number of authors. The strong differences between these fields in publishing forms and patterns of coauthorship raise questions as to whether publication indicators can justifiably be used for comparison of productivity across scientific disciplines.
With the development of Web2.0, social tagging systems in which users can freely choose tags to annotate resources according to their interests have attracted much attention. In particular, literature on the emergence of collective intelligence in social tagging systems has increased. In this article, we propose a probabilistic generative model to detect latent topical communities among users. Social tags and resource contents are leveraged to model user interest in two similar and correlated ways. Our primary goal is to capture user tagging behavior and interest and discover the emergent topical community structure. The communities should be groups of users with frequent social interactions as well as similar topical interests, which would have important research implications for personalized information services. Experimental results on two real social tagging data sets with different genres have shown that the proposed generative model more accurately models user interest and detects high-quality and meaningful topical communities.
Previous analyses identified research on environmental tobacco smoke to be subject to strong fluctuations as measured by both quantitative and qualitative indicators. The evolution of search algorithms (based on the Web of Science and Web of Knowledge database platforms) was used to show the impact of errors of omission and commission in the outcomes of scientometric research. Optimization of the search algorithm led to the complete reassessment of previously published findings on the performance of environmental tobacco smoke research. Instead of strong continuous growth, the field of environmental tobacco smoke research was shown to experience stagnation or slow growth since mid-1990s when evaluated quantitatively. Qualitative analysis revealed steady but slow increase in the citation rate and decrease in uncitedness. Country analysis revealed the North-European countries as leaders in environmental tobacco smoke research (when the normalized results were evaluated both quantitatively and qualitatively), whereas the United States ranked first only when assessing the total number of papers produced. Scientometric research artifacts, including both errors of omission and commission, were shown to be capable of completely obscuring the real output of the chosen research field.
There is an abundance of evidence to suggest that online behavior differs from behaviors in the offline world, and that there are a number of important factors which may affect the communication strategies of people within an online space. This article examines some of these, namely, whether the sex, age, and identifiability of blog authors, as well as the genre of communication, affect communication strategies. Findings suggest that the level of identifiability of the blog author has a limited effect upon their communication strategies. However, sex appeared to influence online behavior in so far as men were more likely to swear and attack others in their blogs. Genre had an important influence on disclosure with more self-disclosure taking place in the diary genre (i.e., blogs in which people talk about their own lives) comparative to the filter genre (i.e., blogs in which people talk about events external to their lives). Age affected both self-disclosure and language use. For example, younger bloggers tended to use more swearing, express more negative emotions and disclose more personal information about others. These findings suggest that age, sex, genre, and identifiability form a cluster of variables that influence the language style and self-disclosure patterns of bloggers; however, the level of identifiability of the blogger may be less important in this respect. Implications of these findings are discussed.
This study investigates the motivational factors affecting the quantity and quality of voluntary knowledge contribution in online Q&A communities. Although previous studies focus on knowledge contribution quantity, this study regards quantity and quality as two important, yet distinct, aspects of knowledge contribution. Drawing on self-determination theory, this study proposes that five motivational factors, categorized along the extrinsic-intrinsic spectrum of motivation, have differential effects on knowledge contribution quantity versus quality in the context of online Q&A communities. An online survey with 367 participants was conducted in a leading online Q&A community to test the research model. Results show that rewards in the reputation system, learning, knowledge self-efficacy, and enjoy helping stand out as important motivations. Furthermore, rewards in the reputation system, as a manifestation of the external regulation, is more effective in facilitating the knowledge contribution quantity than quality. Knowledge self-efficacy, as a manifestation of intrinsic motivation, is more strongly related to knowledge contribution quality, whereas the other intrinsic motivation, enjoy helping, is more strongly associated with knowledge contribution quantity. Both theoretical and practical implications are discussed.
A percentile-based bibliometric indicator is an indicator that values publications based on their position within the citation distribution of their field. The most straightforward percentile-based indicator is the proportion of frequently cited publications, for instance, the proportion of publications that belong to the top 10% most frequently cited of their field. Recently, more complex percentile-based indicators have been proposed. A difficulty in the calculation of percentile-based indicators is caused by the discrete nature of citation distributions combined with the presence of many publications with the same number of citations. We introduce an approach to calculating percentile-based indicators that deals with this difficulty in a more satisfactory way than earlier approaches suggested in the literature. We show in a formal mathematical framework that our approach leads to indicators that do not suffer from biases in favor of or against particular fields of science.
This study undertook an exploratory analysis of the relationship between the body of scientific literature associated with HIV/AIDS and the trajectory of the epidemic, measured by the rate of new cases diagnosed annually in the United States for the period covering 1981 to 2009. The body of scientific literature examined in this investigation was constituted from scientific research that developed alongside the epidemic and was extracted from MEDLINE, a bibliographic database of the United States. National Library of Medicine. Content analysis methods were employed for qualitative data reduction, and regression analysis was used to assess whether variation in the trajectory of the epidemic co-occurred with variation in the publication of specific genres of content within the scientific literature relating to HIV/AIDS. The regression model confirmed a statistically significant association between the representative body of HIV/AIDS scientific literature and the epidemic trajectory, and identified three research categories, namely, ameliorative drug treatments, other clinical protocols, and health education, as being most significantly associated with the epidemic trajectory. Implicit in the findings of this study are areas of scientific research that are of functional and practical interest to clinicians, policy makers, the lay public, and contributors to the body of scientific literature.
We analyze the benefits in terms of scientific impact deriving from international collaboration, examining both those for a country when it collaborates and also those for the other countries when they are collaborating with the former. The data show the more countries there are involved in the collaboration, the greater the gain in impact. Contrary to what we expected, the scientific impact of a country does not significantly influence the benefit it derives from collaboration, but does seem to positively influence the benefit obtained by the other countries collaborating with it. Although there was a weak correlation between these two classes of benefit, the countries with the highest impact were clear outliers from this correlation, tending to provide proportionally more benefit to their collaborating countries than they themselves obtained. Two surprising findings were the null benefit resulting from collaboration with Iran, and the small benefit resulting from collaboration with the United States despite its high impact.
Peer review supports scientific conferences in selecting high-quality papers for publication. Referees are expected to evaluate submissions equitably according to objective criteria (e.g., originality of the contribution, soundness of the theory, validity of the experiments). We argue that the submission date of papers is a subjective factor playing a role in the way they are evaluated. Indeed, program committee (PC) chairs and referees process submission lists that are usually sorted by paperIDs. This order conveys chronological information, as papers are numbered sequentially upon reception. We show that order effects lead to unconscious favoring of early-submitted papers to the detriment of later-submitted papers. Our point is supported by a study of 42 peer-reviewed conferences in Computer Science showing a decrease in the number of bids placed on submissions with higher paperIDs. It is advised to counterbalance order effects during the bidding phase of peer review by promoting the submissions with fewer bids to potential referees. This manipulation intends to better share bids out among submissions in order to attract qualified referees for all submissions. This would secure reviews from confident referees, who are keen on voicing sharp opinions and recommendations (acceptance or rejection) about submissions. This work contributes to the integrity of peer review, which is mandatory to maintain public trust in science.
Semantic similarity is vital to many areas, such as information retrieval. Various methods have been proposed with a focus on comparing unstructured text documents. Several of these have been enhanced with ontology; however, they have not been applied to ontology instances. With the growth in ontology instance data published online through, for example, Linked Open Data, there is an increasing need to apply semantic similarity to ontology instances. Drawing on ontology-supported polarity mining (OSPM), we propose an algorithm that enhances the computation of semantic similarity with polarity mining techniques. The algorithm is evaluated with online customer review data. The experimental results show that the proposed algorithm outperforms the baseline algorithm in multiple settings.
The Publishers Association of Flanders, Belgium, has created a label for peer-reviewed books: the Guaranteed Peer Reviewed Content (GPRC) label (www.gprc.be/en). We introduce the label and the logic behind it. A label for peer-reviewed books encourages transparency in academic book publishing. It is especially relevant for the social sciences and humanities and in the context of performance-based funding of university research.
This paper explores the possible citation chain reactions of a Nobel Prize using the mathematician Robert J. Aumann as a case example. The results show that the award of the Nobel Prize in 2005 affected not only the citations to his work, but also affected the citations to the references in his scientific oeuvre. The results indicate that the spillover effect is almost as powerful as the effect itself. We are consequently able to document a ripple effect in which the awarding of the Nobel Prize ignites a citation chain reaction to Aumann's scientific ouvre and to the references in its nearest citation network. The effect is discussed using innovation decision process theory as a point of departure to identify the factors that created a bandwagon effect leading to the reported observations.
Several independent authors reported a high share of uncited publications, which include those produced by top scientists. This share was repeatedly reported to exceed 10% of the total papers produced, without any explanation of this phenomenon and the lack of difference in uncitedness between average and successful researchers. In this report, we analyze the uncitedness among two independent groups of highly visible scientists (mathematicians represented by Fields medalists, and researchers in physiology or medicine represented by Nobel Prize laureates in the respective field). Analysis of both groups led to the identical conclusion: over 90% of the uncited database records of highly visible scientists can be explained by the inclusion of editorial materials progress reports presented at international meetings (meeting abstracts), discussion items (letters to the editor, discussion), personalia (biographic items), and by errors of omission and commission of the Web of Science (WoS) database and of the citing documents. Only a marginal amount of original articles and reviews were found to be uncited (0.9 and 0.3%, respectively), which is in strong contrast with the previously reported data, which never addressed the document types among the uncited records.
Explicit definition of the limits of citation analysis demands additional tests for the validity of citation analysis. The stability of citation rankings over time can be regarded as confirming the validity of evaluative citation analysis. This stability over time was investigated for two sets of citation records from the Web of Science (Thomson Reuters, Philadelphia, PA) for articles published in journals classified in Journal Citation Reports as Mathematics. These sets are of all such articles for the 1960s and for the 1970s. This study employs only descriptive statistics and draws no inferences to any larger population. The study found a high correlation from one decade to the next of rankings among sets of most highly cited articles. However, the study found a low correlation for rankings among articles whose ranks were the 500 directly below those of the 500 most cited. This perhaps expected result is discussed in terms of the Glanzel-Schubert-Schoepflin stochastic model for citation processes and also in connection with an account of the purposes of evaluative citation analysis. This interpretative context suggests why the limitations of citation analysis may be inherent to citation analysis even when it is done well.
The concept of Linked Data has made its entrance in the cultural heritage sector due to its potential use for the integration of heterogeneous collections and deriving additional value out of existing metadata. However, practitioners and researchers alike need a better understanding of what outcome they can reasonably expect of the reconciliation process between their local metadata and established controlled vocabularies which are already a part of the Linked Data cloud. This paper offers an in-depth analysis of how a locally developed vocabulary can be successfully reconciled with the Library of Congress Subject Headings (LCSH) and the Arts and Architecture Thesaurus (AAT) through the help of a general-purpose tool for interactive data transformation (OpenRefine). Issues negatively affecting the reconciliation process are identified and solutions are proposed in order to derive maximum value from existing metadata and controlled vocabularies in an automated manner.
This article presents the results of a user evaluation of automatically generated concept keywords and place names (toponyms) for geo-referenced images. Automatically annotating images is becoming indispensable for effective information retrieval, since the number of geo-referenced images available online is growing, yet many images are insufficiently tagged or captioned to be efficiently searchable by standard information retrieval procedures. The Tripod project developed original methods for automatically annotating geo-referenced images by generating representations of the likely visible footprint of a geo-referenced image, and using this footprint to query spatial databases and web resources. These queries return raw lists of potential keywords and toponyms, which are subsequently filtered and ranked. This article reports on user experiments designed to evaluate the quality of the generated annotations. The experiments combined quantitative and qualitative approaches: To retrieve a large number of responses, participants rated the annotations in standardized online questionnaires that showed an image and its corresponding keywords. In addition, several focus groups provided rich qualitative information in open discussions. The results of the evaluation show that currently the annotation method performs better on rural images than on urban ones. Further, for each image at least one suitable keyword could be generated. The integration of heterogeneous data sources resulted in some images having a high level of noise in the form of obviously wrong or spurious keywords. The article discusses the evaluation itself and methods to improve the automatic generation of annotations.
A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
We present a methodological approach, called Group Informatics, for understanding the social connections that are created between members of technologically mediated groups. Our methodological approach supports focused thinking about how online groups differ from each other, and diverge from their face-to-face counterparts. Group Informatics is grounded in 5 years of empirical studies of technologically mediated groups in online learning, software engineering, online political discourse, crisis informatics, and other domains. We describe the Group Informatics model and the related, 2-phase methodological approach in detail. Phase one of the methodological approach centers on a set of guiding research questions aimed at directing the application of Group Informatics to new corpora of integrated electronic trace data and qualitative research data. Phase 2 of the methodological approach is a systematic set of steps for transforming electronic trace data into weighted social networks.
The proliferation of digital knowledge repositories (DKRs) used for distributed and collocated work raises important questions about how to manage these technologies. This study investigates why individuals contribute information to DKRs by applying and extending transactive memory theory. Data from knowledge workers (N=208) nested in work groups (J=17) located in Europe and the United States revealed, consistent with transactive memory theory, that perceptions of experts' retrieval of information were positively related to the likelihood of information provision to DKRs. The relationship between experts' perceptions of retrieval and information provision varied from group to group, and cross-level interactions indicated that trust in how the information would be used and the interdependence of tasks within groups could explain that variation. Furthermore, information provision to DKRs was related to communication networks in ways consistent with theorizing regarding the formation of transactive memory systems. Implications for theory and practice are discussed, emphasizing the utility of multilevel approaches for conceptualizing and modeling why individuals provide information to DKRs.
The article reports the findings of a content analysis study of 16 student-group proposals for a grade eight history project. The students listed their topic and thesis in the proposal, and information in support of their thesis. The study's focus is this topic-to-thesis transition. The study's conceptual framework is Kuhlthau's six stage ISP Model's transition from exploring information in Stage 3 to formulating a focus or personal perspective on the assignment topic in Stage 4. Our study coding scheme identifies elements of the students' implicit knowledge in the 16 proposals. To validate implicit knowledge as a predictor of successful student performance, implicit knowledge was coded, scored, and then the correlation coefficient was established between the score and the students' instructors' marks. In Part 2 of the study we found strong and significant association between the McGill coding scores and the instructors' marks for the 16 proposals. This study is a first step in identifying, operationalizing, and testing user-centered implicit knowledge elements for future implementation in interactive information systems designed for middle school students researching a thesis-objective history assignment.
20th century massification of higher education and research in academia is said to have produced structurally stratified higher education systems in many countries. Most manifestly, the research mission of universities appears to be divisive. Authors have claimed that the Swedish system, while formally unified, has developed into a binary state, and statistics seem to support this conclusion. This article makes use of a comprehensive statistical data source on Swedish higher education institutions to illustrate stratification, and uses literature on Swedish research policy history to contextualize the statistics. Highlighting the opportunities as well as constraints of the data, the article argues that there is great merit in combining statistics with a qualitative analysis when studying the structural characteristics of national higher education systems. Not least the article shows that it is an over-simplification to describe the Swedish system as binary; the stratification is more complex. On basis of the analysis, the article also argues that while global trends certainly influence national developments, higher education systems have country-specific features that may enrich the understanding of how systems evolve and therefore should be analyzed as part of a broader study of the increasingly globalized academic system.
According to current research in bibliometrics, percentiles (or percentile rank classes) are the most suitable method for normalizing the citation counts of individual publications in terms of the subject area, the document type, and the publication year. Up to now, bibliometric research has concerned itself primarily with the calculation of percentiles. This study suggests how percentiles (and percentile rank classes) can be analyzed meaningfully for an evaluation study. Publication sets from four universities are compared with each other to provide sample data. These suggestions take into account on the one hand the distribution of percentiles over the publications in the sets (universities here) and on the other hand concentrate on the range of publications with the highest citation impactthat is, the range that is usually of most interest in the evaluation of scientific performance.
This paper deals with the role of a journal's publisher country in determining the expected citation rates of the articles published in it. We analyze whether a paper has a higher citation rate when it is published in one of the large publisher nations, the U.S., U.K., or the Netherlands, compared to a hypothetical situation when the same paper is published in journals of different origin. This would constitute a free lunch, which could be explained by a Matthew effect visible on the country-level, similar to the well-documented Matthew effect on the author-level. We first use a simulation model that highlights increasing citation returns to quality as the central key condition on which such a Matthew effect may emerge. Then we use an international bibliometric panel data set of forty-nine countries for the years 2000-2010 and show that such a free lunch implied by this Matthew effect can be observed for top journals from the U.S. and depending on the specification also from the U.K. and the Netherlands, while there is no effect for lower-ranked American journals and negative effects for lower-ranked British journals as well as those coming from the Netherlands.
Since the invention of sound reproduction in the late 19th century, studio practices in musical recording evolved in parallel with technological improvements. Recently, digital technology and Internet file sharing led to the delocalization of professional recording studios and the decline of traditional record companies. A direct consequence of this new paradigm is that studio professions found themselves in a transitional phase, needing to be reinvented. To understand the scope of these recent technological advances, we first offer an overview of musical recording culture and history and show how studio recordings became a sophisticated form of musical artwork that differed from concert representations. We then trace the economic evolution of the recording industry through technological advances and present positive and negative impacts of the decline of the traditional business model on studio practices and professions. Finally, we report findings from interviews with six world-renowned record producers reflecting on their recording approaches, the impact of recent technological advances on their careers, and the future of their profession. Interviewees appreciate working on a wider variety of projects than they have in the past, but they all discuss trade-offs between artistic expectations and budget constraints in the current paradigm. Our investigations converge to show that studio professionals have adjusted their working settings to the new economic situation, although they still rely on the same aesthetic approaches as in the traditional business model to produce musical recordings.
Although it is commonly expected that the citation context of a reference is likely to provide more detailed and direct information about the nature of a citation, few studies in the literature have specifically addressed the extent to which the information in different parts of a scientific publication differs. Do abstracts tend to use conceptually broader terms than sentences in a citation context in the body of a publication? In this article, we propose a method to analyze and compare latent topics in scientific publications, in particular, from abstracts of papers that cited a target reference and from sentences that cited the target reference. We conducted an experiment and applied topical modeling techniques to full-text papers in eight biomedicine journals. Topics derived from the two sources are compared in terms of their similarities and broad-narrow relationships defined based on information entropy. The results show that abstracts and citation contexts are characterized by distinct sets of topics with moderate overlaps. Furthermore, the results confirm that topics from abstracts of citing papers have broader terms than topics from citation contexts formed by citing sentences. The method and the findings could be used to enhance and extend the current methodologies for research evaluation and citation evaluation.
The recently proposed fractional scoring scheme is used to attribute publications to percentile rank classes. It is shown that in this way uncertainties and ambiguities in the evaluation of specific quantile values and percentile ranks do not occur. Using the fractional scoring the total score of all papers exactly reproduces the theoretical value.
Although modes of interaction between the two continue to evolve, society and science are inextricably linked. Preserving the integrity of science, and by extension society, in the era of Twitter and Facebook represents a significant challenge. The concept of open communication in science is not a new one. Sociologist and scientific historian Robert Merton elegantly chronicled the qualities, or norms of science as Communism, Universalism, Disinterestedness, and Organized Scepticism, referred to by the acronym CUDOS. Is social networking providing us with more efficient ways of upholding deep-rooted principles, or are we at risk of compromising the integrity of science by bypassing traditional gatekeepers?
According to estimates the mobile device will soon be the main platform for searching the web, and yet our knowledge of how mobile consumers use information, and how that differs from desktops/laptops users, is imperfect. The paper sets out to correct this through an analysis of the logs of a major cultural website, Europeana. The behavior of nearly 70,000 mobile users was examined over a period of more than a year and compared with that for PC users of the same site and for the same period. The analyses conducted include: size and growth of use, time patterns of use; geographical location of users, digital collections used; comparative information-seeking behavior using dashboard metrics, clustering of users according to their information seeking, and user satisfaction. The main findings were that mobile users were the fastest-growing group and will rise rapidly to a million by December 2012 and that their visits were very different in the aggregate from those arising from fixed platforms. Mobile visits could be described as being information "lite": typically shorter, less interactive, and less content viewed per visit. Use took a social rather than office pattern, with mobile use peaking at nights and weekends. The variation between different mobile devices was large, with information seeking on the iPad similar to that for PCs and laptops and that for smartphones very different indeed. The research further confirms that informationseeking behavior is platform-specific and the latest platforms are changing it all again. Websites will have to adapt.
Delayed open access (OA) refers to scholarly articles in subscription journals made available openly on the web directly through the publisher at the expiry of a set embargo period. Although a substantial number of journals have practiced delayed OA since they started publishing e-versions, empirical studies concerning OA have often overlooked this body of literature. This study provides comprehensive quantitative measurements by identifying delayed OA journals and collecting data concerning their publication volumes, embargo lengths, and citation rates. Altogether, 492 journals were identified, publishing a combined total of 111,312 articles in 2011; 77.8% of these articles were made OA within 12 months from publication, with 85.4% becoming available within 24 months. A journal impact factor analysis revealed that delayed OA journals have citation rates on average twice as high as those of closed subscription journals and three times as high as immediate OA journals. Overall, the results demonstrate that delayed OA journals constitute an important segment of the openly available scholarly journal literature, both by their sheer article volume and by including a substantial proportion of highimpact journals.
Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it
A framework is developed that supports the theoretical design of an organizational memory information system (OMIS). The framework provides guidance for managing the processing capabilities of an organization by matching knowledge location, flexibility, and processing requirements with data architecture. This framework is tested using three different sets of data attributes and data architectures from 147 business professionals that have experience in IS development. We find that tradeoffs exist between the amount of knowledge embedded in the data architecture and the flexibility of data architectures. This trade-off is contingent on the characteristics of the set of tasks that the data architecture is being designed to support. Further, the match is important to consider in the design of OMIS database architecture.
Web 2.0 creates a new world of collaboration. Many online communities of practice have provided a virtual Internet platform for members to create, collaborate, and contribute their expertise and knowledge. To date, we still do not fully understand how members evaluate their knowledge-sharing experiences, and how these evaluations affect their decisions to continue sharing knowledge in online communities of practice. In this study, we examined why members continue to share knowledge in online communities of practice, through theorizing and empirically validating the factors and emergent mechanisms (post-knowledge-sharing evaluation processes) that drive continuance. Specifically, we theorized that members make judgments about their knowledge-sharing behaviors by comparing their normative expectations of reciprocity and capability of helping other members with their actual experiences. We empirically tested our research model using an online survey of members of an online community of practice. Our results showed that when members found that they receive the reciprocity they expected, they will feel satisfied. Likewise, when they found that they can help other members as they expected, they will feel satisfied and their knowledge self-efficacy will also be enhanced. Both satisfaction and knowledge self-efficacy further affect their intention to continue sharing knowledge in an online community of practice. We expect this study will generate interest among researchers in this important area of research, and that the model proposed in this article will serve as a starting point for furthering our limited understanding of continuance behaviors in online communities of practice.
Even if knowledge is a commodity that a museum offers as Hooper-Greenhill (1992) has argued, the mechanisms of how a museum comes to know what it mediates are not well understood. Using a case study approach, the aim of this study is to investigate what types of sources and channels, with a special emphasis on social processes and structures of information, support collaborative information work, and the emergence of knowledge in a museum environment. The empirical study was conducted using a combination of ethnographic observation of and interviews with staff members at a medium-sized museum in a Nordic country. The study shows that much of the daily information work is routinized and infrastructuralized in social information exchange and reproduction of documented information and museum collections.
This article offers important background information about a new product, the Book Citation Index (BKCI), launched in 2011 by Thomson Reuters. Information is illustrated by some new facts concerning The BKCI's use in bibliometrics, coverage analysis, and a series of idiosyncrasies worthy of further discussion. The BKCI was launched primarily to assist researchers identify useful and relevant research that was previously invisible to them, owing to the lack of significant book content in citation indexes such as the Web of Science. So far, the content of 33,000 books has been added to the desktops of the global research community, the majority in the arts, humanities, and social sciences fields. Initial analyses of the data from The BKCI have indicated that The BKCI, in its current version, should not be used for bibliometric or evaluative purposes. The most significant limitations to this potential application are the high share of publications without address information, the inflation of publication counts, the lack of cumulative citation counts from different hierarchical levels, and inconsistency in citation counts between the cited reference search and the book citation index. However, The BKCI is a first step toward creating a reliable and necessary citation data source for monographs- a very challenging issue, because, unlike journals and conference proceedings, books have specific requirements, and several problems emerge not only in the context of subject classification, but also in their role as cited publications and in citing publications.
Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i. e., Naive bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
Subjective language contains information about private states. The goal of subjective language identification is to determine that a private state is expressed, without considering its polarity or specific emotion. A component of word meaning, Personal Sense, has clear potential in the field of subjective language identification, as it reflects a meaning of words in terms of unique personal experience and carries personal characteristics. In this paper we investigate how Personal Sense can be harnessed for the purpose of identifying subjectivity in news titles. In the process, we develop a new Personal Sense annotation framework for annotating and classifying subjectivity, polarity, and emotion. The Personal Sense framework yields high performance in a fine-grained subsentence subjectivity classification. Our experiments demonstrate lexico-syntactic features to be useful for the identification of subjectivity indicators and the targets that receive the subjective Personal Sense.
This study investigated query modification patterns and semantic attributes in queries executed during user searches for images on the web. Its purpose was to identify whether query modification patterns were related to users' contextual factors and content sources as well as whether the patterns characterize the use of semantic attributes expressed in users' search queries in an interactive web-searching process. To this end, a collection of 970 image search queries executed by undergraduate students in a naturalistic setting was analyzed. The study's findings showed that query modification patterns were significantly associated with content sources. Among the types of query modification employed, "Reformulation" and "New" were the most frequently used. Terms related to format, object, or place associated with an image also were found to be frequently used in search queries. Terms referring to type or genre were the most frequent attributes in web image search queries, which suggests a change from previous findings of search queries in a professional context. Implications are discussed in terms of search assistants for web image searches and semantic annotation to improve image indexing.
The increasing employment of bibliometric measures for assessing, describing, and mapping science inevitably leads to the increasing need for a citation theory constituting a theoretical frame for both citation analysis and the description of citers' behavior. In this article a theoretical model, encompassing both normative and constructivist approaches, is suggested. The conceptualization of scientific communities as autopoietic systems, the components of which are communicative events, allows us to observe the reproductive function of citations conceived as codes and media of scientific communication. Citations, thanks to their constraining and enabling properties, constitute the engine of the structuration process ensuring the reproduction of scientific communities. By referring to Giddens' structuration theory, Luhmann's theory about social systems as communicative networks, Merton's "sociology of science" and his conceptualizations about the functions of citations, as well as Small's proposal about citations as concept-symbols, a sociologically integrated approach to scientometrics is proposed.
This research studied the role of knowledge organization in the process of decision making in the field of energy efficiency and rational use of energy (EERUE). Theoretical contributions to knowledge organization and decision making are stressed. We chose to work with a type of methodology known as multiple criteria decision making-Saaty's analytical hierarchies process (AHP). This made it possible to develop a detailed analysis of the decision-making process and arrive at a hierarchical model. The model provided a structure representing the studied field, where an order of priority could be given to the decision-making process. The knowledge derived may be used in other fields of study such as information retrieval and knowledge representation.
Bibliometric mapping and visualization techniques represent one of the main pillars in the field of scientometrics. Traditionally, the main methodologies employed for representing data are multidimensional scaling, principal component analysis, or correspondence analysis. In this paper we aim to present a visualization methodology known as biplot analysis for representing bibliometric and science and technology indicators. A biplot is a graphical representation of multivariate data, where the elements of a data matrix are represented according to dots and vectors associated with the rows and columns of the matrix. In this paper, we explore the possibilities of applying biplot analysis in the research policy area. More specifically, we first describe and introduce the reader to this methodology and secondly, we analyze its strengths and weaknesses through 3 different case studies: countries, universities, and scientific fields. For this, we use a biplot analysis known as JK-biplot. Finally, we compare the biplot representation with other multivariate analysis techniques. We conclude that biplot analysis could be a useful technique in scientometrics when studying multivariate data, as well as an easy-to-read tool for research decision makers.
A network is constructed by nodes and links, thus the node degree and the link strength appear as underlying quantities in network analysis. While the power-law distribution of node degrees is verified as a basic feature of numerous real networks, we investigate whether the link strengths follow the power-law distribution in weighted networks. After testing 12 different paper cocitation networks with 2 methods, fitting in double-log scales and the Kolmogorov-Smirnov test (K-S test), we observe that, in most cases, the link strengths also follow the approximate power-law distribution. The results suggest that the power-law type distribution could emerge not only in nodes and informational entities, but also in links and informational connections.
This study proposes a new framework for citation content analysis (CCA), for syntactic and semantic analysis of citation content that can be used to better analyze the rich sociocultural context of research behavior. This framework could be considered the next generation of citation analysis. The authors briefly review the history and features of content analysis in traditional social sciences and its previous application in library and information science (LIS). Based on critical discussion of the theoretical necessity of a new method as well as the limits of citation analysis, the nature and purposes of CCA are discussed, and potential procedures to conduct CCA, including principles to identify the reference scope, a two-dimensional (citing and cited) and two-module (syntactic and semantic) codebook, are provided and described. Future work and implications are also suggested.
In one of the first attempts at providing a mathematical framework for the Hirsch index, Egghe and Rousseau (2006) assumed the standard Lotka model for an author's citation distribution to derive a delightfully simple closed formula for his/her h-index. More recently, the same authors (Egghe & Rousseau, 2012b) have presented a new (implicit) formula based on the so-called shifted Lotka function to allow for the objection that the original model makes no allowance for papers receiving zero citations. Here it is shown, through a small empirical study, that the formulae actually give very similar results whether or not the uncited papers are included. However, and more important, it is found that they both seriously underestimate the true h-index, and we suggest that the reason for this is that this is a contextthe citation distribution of an author-in which straightforward Lotkaian informetrics is inappropriate. Indeed, the analysis suggests that even if we restrict attention to the upper tail of the citation distribution, a simple Lotka/Pareto-like model can give misleading results.
In this article, the authors answer the question of whether the field of scientometrics/bibliometrics shares essential characteristics of metrics sciences. To achieve this objective, the citation network of seven selected metrics and their information environment is analyzed.
Recent studies on first- and second-order similarities have shown that the latter one outperforms the first one as input for document clustering or partitioning applications. First-order similarities based on bibliographic coupling or on lexical approaches come with specific methodological issues like sparse matrices, sensitive to spelling variances or context differences. Second-order similarities were proposed to tackle these problems and take the lexical context into account. But also a hybrid combination of both types of similarities proved an important improvement which integrates the strengths of the two approaches and diminishes their weaknesses. In this paper we extend the notion of second-order similarity by applying it in the context of the hybrid approach. We conclude that there is no added value for the clearly defined clusters but that the second-order similarity can provide an additional viewpoint for the more general clusters.
Scientific writing is about communicating ideas. Today, simplicity is more important than ever. Scientist are overwhelmed with new information. The overall growth rate for scientific publication over the last few decades has been at least 4.7 % per year, which means doubling publication volume every 15 years. I measure simplicity/readability with proportion of adjectives and adverbs in a paper, and find natural science to be the most readable and social science the least readable.
Scientists generally do scientific collaborations with one another and sometimes change their affiliations, which leads to scientific mobility. This paper proposes a recursive reinforced name disambiguation method that integrates both coauthorship and affiliation information, especially in cases of scientific collaboration and mobility. The proposed method is evaluated using the dataset from the Thomson Reuters Scientific "Web of Science". The probability of recall and precision of the algorithm are then analyzed. To understand the effect of the name ambiguation on the h-index and g-index before and after the name disambiguation, calculations of their distribution are also presented. Evaluation experiments show that using only the affiliation information in the name disambiguation achieves better performance than that using only the coauthorship information; however, our proposed method that integrates both the coauthorship and affiliation information can control the bias in the name ambiguation to a higher extent.
Different scientific fields have different citation practices. Citation-based bibliometric indicators need to normalize for such differences between fields in order to allow for meaningful between-field comparisons of citation impact. Traditionally, normalization for field differences has usually been done based on a field classification system. In this approach, each publication belongs to one or more fields and the citation impact of a publication is calculated relative to the other publications in the same field. Recently, the idea of source normalization was introduced, which offers an alternative approach to normalize for field differences. In this approach, normalization is done by looking at the referencing behavior of citing publications or citing journals. In this paper, we provide an overview of a number of source normalization approaches and we empirically compare these approaches with a traditional normalization approach based on a field classification system. We also pay attention to the issue of the selection of the journals to be included in a normalization for field differences. Our analysis indicates a number of problems of the traditional classification-system-based normalization approach, suggesting that source normalization approaches may yield more accurate results.
In this study, we aim to evaluate the global scientific output of laparoscopy research, and try to find an alternative statistical approach to quantitatively and qualitatively assess the current global research trend on laparoscopy. Data were based on the Science Citation Index Expanded (SCI-E), from the Institute of Scientific Information Web of Science database. Articles referring to laparoscopy during 1997-2011 were concentrated on the analysis by scientific output characters, international collaboration, and the frequency of author keywords used. Globally, 59,264 papers were published during the 15-year study period, including 15 document types. Among them, there were 40,318 articles, to which a two-phase model was applied to simulate the high correlation between cumulative number of articles and the year. International collaborative publications were more prevalent in recent years, and were more powerful due to the sharing of ideas and workloads. Japan, Sweden, Poland, Canada, the UK, India, France and Spain benefit a lot from the international cooperation. With the comprehensive analysis of distribution and change of article titles, author keywords and abstracts, it can be concluded that research related to 'morbid obesity', 'robotic surgery', 'prostatectomy' and 'NOTES (natural orifice transluminal endoscopic surgery)' are the main orientations of all the laparoscopy research in the 21st century.
The gap in statistics between multi-variate and time-series analysis can be bridged by using entropy statistics and recent developments in multi-dimensional scaling. For explaining the evolution of the sciences as non-linear dynamics, the configurations among variables can be important in addition to the statistics of individual variables and trend lines. Animations enable us to combine multiple perspectives (based on configurations of variables) and to visualize path-dependencies in terms of trajectories and regimes. Path-dependent transitions and systems formation can be tested using entropy statistics.
Scientific importance ranking has long been an important research topic in scientometrics. Many indices based on citation counts have been proposed. In recent years, several graph-based ranking algorithms have been studied and claimed to be reasonable and effective. However, most current researches fall short of a concrete view of what these graph-based ranking algorithms bring to bibliometric analysis. In this paper, we make a comparative study of state-of-the-art graph-based algorithms using the APS (American Physical Society) dataset. We focus on ranking researchers. Some interesting findings are made. Firstly, simple citation-based indices like citation count can return surprisingly better results than many cutting-edge graph-based ranking algorithms. Secondly, how we define researcher importance may have tremendous impacts on ranking performance. Thirdly, some ranking methods which at the first glance are totally different have high rank correlations. Finally, the data of which time period are chosen for ranking greatly influence ranking performance but still remains open for further study. We also try to give explanations to a large part of the above findings. The results of this study open a third eye on the current research status of bibliometric analysis.
Bibliographic data of publications indexed in Web of Science with at least one (co-)author affiliated to any of the 15 West African countries and published from 2001 to 2010 included are downloaded. Analyses focused one collaboration indicators especially intra regional collaboration, intra African collaboration and collaboration with the world. Results showed that the rate of papers with only one author is diminishing whereas the rate of papers with six and more authors is increasing. Nigeria is responsible for more than half the region's total scientific output. The main African partner countries are South Africa (in the Southern Africa, Cameroon (in the Central Africa), Kenya and Tanzania (in the Eastern Africa). The main non African partner countries are France, USA and United Kingdom, which on their own contributed to over 63 % of the papers with a non West African address. Individual countries have higher international collaboration rate, except Nigeria. West African countries cooperated less with each other and less with African and developing countries than they did with developed ones. The study suggests national authorities to express in actions their commitment to allot at least 1 % of their GDP to science and technology funding. It also suggests regional integration institutions to encourage and fund research activities that involve several institutions from different West African countries in order to increase intra regional scientific cooperation.
We propose new variations of the standard and the real-valued (or interpolated) h-index. More precisely, we propose two different types. For the first type, sources are years, and items are either publications, or citations received or citations per publication. The second type makes use of the speed by which citations are received: it is a diffusion speed index.
This paper provides scientometricians with a brief overview of the history of economic statistics and its international standards. Part of the latter is the Frascati family of standards in science and technology input statistics. Some recommendations are given for improvements in these standards. Proposals are developed to relate research inputs as defined in the Frascati manual and bibliometrically measured outputs.
Larger agglomerations of individuals create a social environment can sustain a larger repertoire of intellectual capabilities, thereby facilitating the creation and recombination of ideas, and increasing the likelihood that interactions among individuals will occur through which new ideas are generated and shared. Relatedly, cities have long been the privileged setting for invention and innovation. These two phenomena are brought together in the superlinear scaling relationship whereby urban inventive output (measured through patenting) increases more than proportionally with increasing population size. We revisit the relationship between urban population size and patenting using data for a global set of metropolitan areas in the OECD and show, for the first time, that the superlinear scaling between patenting and population size observed for US metropolitan areas holds for urban areas across a variety of urban and economic systems. In fact the scaling relationships established for the US metropolitan system and for the global metropolitan system are remarkably similar.
Pharmacology/pharmacy is an important scientific field and plays a pivotal role in new drug research and development. China has steadily increased investment in drug development. This study aimed to evaluate the productivity of China in the field pharmacology/pharmacy in the past decade in relation to ten representative countries. The publications in the field pharmacology/pharmacy of China and ten representative countries in the past decade (2001-2010) were retrieved from Web of Science database, and studies were conducted on the immediacy index of articles published in 2011. Multiple bibliometric indicators were obtained from the "InCites" analysis. Most of the bibliometric indicators for the developed countries including the USA and the European countries remained stable in the past decade. The number of publications by the Asian countries, especially China, increased dramatically in the past decade year by year; however, the Asian countries improved little in the indicators assessing the scientific quality of publications including the citation behaviors and the impact relative to either country and subject area. It may need a long time to fill in the gap, in terms of the scientific quality, between the developing countries and the developed countries. In view of the dramatic increase in the financial investment, our findings suggest that the development of the field pharmacology/pharmacy worldwide is not optimistic, which may partially explain the decreased R&D productivity of pharmaceutical industry since the last decade.
In the paper we show that the bibliographic data can be transformed into a collection of compatible networks. Using network multiplication different interesting derived networks can be obtained. In defining them an appropriate normalization should be considered. The proposed approach can be applied also to other collections of compatible networks. The networks obtained from the bibliographic data bases can be large (hundreds of thousands of vertices). Fortunately they are sparse and can be still processed relatively fast. We answer the question when the multiplication of sparse networks preserves sparseness. The proposed approaches are illustrated with analyses of collection of networks on the topic "social network" obtained from the Web of Science. The works with large number of co-authors add large complete subgraphs to standard collaboration network thus bluring the collaboration structure. We show that using an appropriate normalization their effect can be neutralized. Among other, we propose a measure of collaborativness of authors with respect to a given bibliography and show how to compute the network of citations between authors and identify citation communities.
There is a world trend for Research Performance Evaluation (RPE), developing of new scientometric indices and examining of their application. Consequently, concerns and anomalies arise about the convergent validity and reliability of these indices for the decision making purposes. This is especially prevalent in the region/countries/disciplines having less or emerging trends of publishing and getting citations. The present scientometric study addresses usefulness of the most noted metric h-index along with other selected indicators in the field of Engineering in Malaysians universities. To understand, the role of this metric if any, we examined the functional correlation, predictive value and its relationship with national assessment criteria. Results report that this indicator has good potential to work alone, ease in use and robust to get a broader snapshot for positioning and performance evaluation. However, for better decision making purpose, this can be used for broader contextual peer assessment process along with other indicators. Its validity is further checked with two size independent institutional h-indices: h(G-H) and h(m).
The purpose of this study is to determine principal parameters which affect the R&D exploitation and to explore R&D activities in closed science that positively affect those in open science. Based on 486 nanotechnology projects from five national R&D programs in South Korea, canonical correlation analysis is used to analyze the relationships among R&D parameters of inputs, outputs and outcomes and to determine principle parameters. As a result, this study concludes that the principal parameters are publications with high impact, patents, and academic degrees. This study also shows a positive correlation between activities in open science and closed science. The conclusions suggest that research results with high impact value should be endorsed by the Korean government and should try to keep a balance between R&D exploitation in open science and closed science. This study would be used for establishing South Korea's R&D policy effective for faster commercialization of nanotechnology related research.
Regression equations to predict h index trajectories up to 10 years ahead have been recently derived from the analysis of data from a large calibration sample of neuroscientists. These equations were regarded by their proponents as potentially useful decision aids for funding agencies, peer reviewers, and hiring committees. This paper presents the results of a validation study in a sample of Spanish psychologists including neuroscience psychologists for whom the regression equations would be expected to apply but including also psychologists in other areas of the social/behavioral sciences for whom the applicability of the regression equations might be questionable. The results do not support the equations for any of the two groups: Errors of prediction were generally large and mostly positive, the more so the larger was the value of the h index used to make the prediction. Although the validity of these regression equations could still be investigated in additional cross-validation studies, an alternative approach to predicting future h indices is outlined and illustrated in this paper.
Resilience thinking is a rising topic in environmental sciences and sustainability discourse. In this paper, a bibliometric method is used to analyse the trends in resilience research in the contexts of ecological, economic, social, and integrated socio-ecological systems. Based on 919 cited publications in English which appeared between 1973 and 2011, the analysis covers the following issues: general statistical description, influential journal outlets and top cited articles, geographic distribution of resilience publications and covered case studies, national importance of resilience researchers and leading research organisations by country. The findings show that resilience thinking continues to dominate environmental sciences and has experienced a dramatic increase since its introduction in 1973. More recently, new interest has emerged for broadening the scope and applying the concept to socio-economic systems and sustainability science. The paper also shows that resilience research overall is dominated by USA, Australia, UK and Sweden, and makes the case for the need to expand this work further in the urgent need for practically oriented solutions that would help arrest further ecological deterioration.
Price argued that the average scientific specialty consists of about 100 scientists, publishing an average 100 articles each during their career. Wray recently attempted to revise the number of scientists in a specialty based on the information that the average scientist publishes only 3.5 papers during their career. However, his final estimate, between 250 and 600 scientists, does not support Price's idea that a specialty fills about 10,000 articles, unless the ad hoc assumption is made that nearly 80 % of articles circulating in a field are from other fields. This article shows that by distinguishing between graduate students, who spend only a couple of years in a specialty, and professors, who spend their entire career in a field, the ad hoc assumption becomes unnecessary, and Wray's number of 600 scientists turns out to be a remarkable intuitive insight that is consistent with Price's 10,000 articles. A number of 520 scientists, or somewhat larger, is suggested for Price's estimate.
The practice of listing co-author surnames in alphabetical order, irrespective of their contribution, can make it difficult to effectively allocate research credit to authors. This article compares the percentages of articles with co-authors in alphabetical order (alphabetization) for two-author, three-author and four-author articles in eighteen social sciences in 1995 and 2010 to assess how widespread this practice is. There is some degree of alphabetization in all disciplines except one but the level varies substantially between disciplines. This level is increasing slightly over time, on average, but it has increased substantially in a few disciplines and decreased in others, showing that the practice of alphabetization is not fading away. A high correlation between alphabetical order and the proportion of first authors near the beginning of the alphabet confirms that high percentages of alphabetical order could affect the appropriate allocation of research credit. Similar patterns were found for science and the humanities. Finally, since some degree of alphabetization is almost universal in social science disciplines, this practice may be affecting careers throughout the social sciences and hence seems indefensible. (C) 2013 Elsevier Ltd. All rights reserved.
In citation network analysis, complex behavior is reduced to a simple edge, namely, node A cites node B. The implicit assumption is that A is giving credit to, or acknowledging, B. It is also the case that the contributions of all citations are treated equally, even though some citations appear multiply in a text and others appear only once. In this study, we apply text-mining algorithms to a relatively large dataset (866 information science articles containing 32,496 bibliographic references) to demonstrate the differential contributions made by references. We (1) look at the placement of citations across the different sections of a journal article, and (2) identify highly cited works using two different counting methods (CountOne and CountX). We find that (1) the most highly cited works appear in the Introduction and Literature Review sections of citing papers, and (2) the citation rankings produced by CountOne and CountX differ. That is to say, counting the number of times a bibliographic reference is cited in a paper rather than treating all references the same no matter how many times they are invoked in the citing article reveals the differential contributions made by the cited works to the citing paper. Published by Elsevier Ltd.
Journal metrics are employed for the assessment of scientific scholar journals from a general bibliometric perspective. In this context, the Thomson Reuters journal impact factors (JIFs) are the citation-based indicators most used. The 2-year journal impact factor (2-JIF) counts citations to one and two year old articles, while the 5-year journal impact factor (5-JIF) counts citations from one to five year old articles. Nevertheless, these indicators are not comparable among fields of science for two reasons: (i) each field has a different impact maturity time, and (ii) because of systematic differences in publication and citation behavior across disciplines. In fact, the 5-JIF firstly appeared in the journal Citation Reports OCR) in 2007 with the purpose of making more comparable impacts in fields in which impact matures slowly. However, there is not an optimal fixed impact maturity time valid for all the fields. In some of them two years provides a good performance whereas in others three or more years are necessary. Therefore, there is a problem when comparing a journal from a field in which impact matures slowly with a journal from a field in which impact matures rapidly. In this work, we propose the 2-year maximum journal impact factor (2M-JIF), a new impact indicator that considers the 2-year rolling citation time window of maximum impact instead of the previous 2-year time window. Finally, an empirical application comparing 2-JIF, 5-JIF, and 2M-JIF shows that the maximum rolling target window reduces the between-group variance with respect to the within-group variance in a random sample of about six hundred journals from eight different fields. (C) 2013 Elsevier Ltd. All rights reserved.
Introducing and studying two types of time series, referred to as R1 and R2, we try to enrich the set of time series available for time dependent informetric studies. In a first part we focus on mathematical properties, while in a second part we check if these properties are visible in real data. This practical application uses data in the social sciences related to top Chinese universities. R1 sequences always increase over time, tending relatively fast to one, while R2 sequences have a decreasing tendency tending to zero in practical cases. They can best be used over relatively short periods of time. R1 sequences can be used to detect the rate with which cumulative data increase, while R2 sequences detect the relative rate of development. The article ends by pointing out that these time series can be used to compare innovative activities in firms. Clearly, this investigation is just a first attempt. More studies are needed, including comparisons with other related sequences. (C) 2013 Elsevier Ltd. All rights reserved.
In an age of intensifying scientific collaboration, the counting of papers by multiple authors has become an important methodological issue in scientometric based research evaluation. Especially, how counting methods influence institutional level research evaluation has not been studied in existing literatures. In this study, we selected the top 300 universities in physics in the 2011 HEEACT Ranking as our study subjects. We compared the university rankings generated from four different counting methods (i.e. whole counting, straight counting using first author, straight counting using corresponding author, and fractional counting) to show how paper counts and citation counts and the subsequent university ranks were affected by counting method selection. The counting was based on the 1988-2008 physics papers records indexed in ISI WoS. We also observed how paper and citation counts were inflated by whole counting. The results show that counting methods affected the universities in the middle range more than those in the upper or lower ranges. Citation counts were also more affected than paper counts. The correlation between the rankings generated from whole counting and those from the other methods were low or negative in the middle ranges. Based on the findings, this study concluded that straight counting and fractional counting were better choices for paper count and citation count in the institutional level research evaluation. (C) 2013 Elsevier Ltd. All rights reserved.
This paper presents a comparative analysis of the structure of national higher education networks in six European countries using interlinking data. We show that national HE systems display a common core-periphery structure, which we explain by the lasting reputational differences in science, as well as the process of expansion and integration of HE systems. Furthermore, we demonstrate that centrality in national networks (coreness) is associated with organizational characteristics, reflecting that interlinking is motivated by access to resources and the status of the organizations concerned, and that national policies impact network structures by influencing the level of inequality in the distribution of resources and status. Finally, we show that, as an outcome of the core-periphery structure, the strength of ties between two HEIs is largely determined by their individual coreness, while the impact of distance is too small-scale to alter the network structure generated by organizational attributes. (C) 2013 Elsevier Ltd. All rights reserved.
Is more always better? We address this question in the context of bibliometric indices that aim to assess the scientific impact of individual researchers by counting their number of highly cited publications. We propose a simple model in which the number of citations of a publication depends not only on the scientific impact of the publication but also on other 'random' factors. Our model indicates that more need not always be better. It turns out that the most influential researchers may have a systematically lower performance, in terms of highly cited publications, than some of their less influential colleagues. The model also suggests an improved way of counting highly cited publications. (C) 2013 Elsevier Ltd. All rights reserved.
The study documents the growth in the number of journals and articles along with the increase in normalized citation rates of open access (OA) journals listed in the Scopus bibliographic database between 1999 and 2010. Longitudinal statistics on growth in journals/articles and citation rates are broken down by funding model, discipline, and whether the journal was launched or had converted to OA. The data were retrieved from the websites of SCIMago Journal and Country Rank (journal/article counts), JournalM3trics (SNIP2 values), Scopus (journal discipline) and Directory of Open Access Journals (DOAJ) (OA and funding status). OA journals/articles have grown much faster than subscription journals but still make up less that 12% of the journals in Scopus. Two-year citation averages for journals funded by Article Processing Charges (APCs) have reached the same level as subscription journals. Citation averages of OA journals funded by other means continue to lag well behind OA journals funded by APCs and subscription journals. We hypothesize this is less an issue of quality than due to the fact that such journals are commonly published in languages other than English and tend to be located outside the four major publishing countries. (C) 2013 Elsevier Ltd. All rights reserved.
Collaboration can be described using layered systems such as the article-author-institute-country structure. These structures can be considered 'cascades' or 'chains' of bipartite networks. We introduce a framework for characterizing and studying the intensity of collaboration between entities at a given level (e.g., between institutions). Specifically, we define the notions of significant, essential and vital nodes, and significant, essential and vital sub paths to describe the spread of knowledge through collaboration in such systems. Based on these notions, we introduce relative and absolute proper essential node (PEN) centrality as indicators of a node's importance for diffusion of knowledge through collaboration. We illustrate these concepts in an illustrative example and show how they can be applied using a small real-world example. Since collaboration implies knowledge sharing, it can be considered a special form of knowledge diffusion. (C) 2013 Elsevier Ltd. All rights reserved.
In our previous study (Wang et al., 2012), we analyzed scientists' working timetable of 3 countries, using realtime downloading data of scientific literatures. In this paper, we make a through analysis about global scientists' working habits. Top 30 countries/territories from Europe, Asia, Australia, North America, Latin America and Africa are selected as representatives and analyzed in detail. Regional differences for scientists' working habits exists in different countries. Besides different working cultures, social factors could affect scientists' research activities and working patterns. Nevertheless, a common conclusion is that scientists today are often working overtime. Although scientists may feel engaged and fulfilled about their hard working, working too much still warns us to reconsider the work life balance. (C) 2013 Elsevier Ltd. All rights reserved.
Empirical analysis of the relationship between the impact factor - as measured by the average number of citations - and the proportion of uncited material in a collection dates back at least to van Leeuwen and Moed (2005) where graphical presentations revealed striking patterns. Recently Hsu and Huang (2012) have proposed a simple functional relationship. Here it is shown that the general features of these observed regularities are predicted by a well-established informetric model which enables us to derive a theoretical van Leeuwen Moed lower bound. We also question some of the arguments of Hsu and Huang (2012) and Egghe (2013) while various issues raised by Egghe (2008, 2013) are also addressed. (C) 2013 Elsevier Ltd. All rights reserved.
In recent years there has been a sharp increase in collaborations among scholars and there are studies on the effects of scientific collaboration on scholars' performance. This study examines the hypothesis that geographically diverse scientific collaboration is associated with research impact. Here, the approach is differentiated from other studies by: (a) focusing on publications rather than researchers or institutes; (b) considering the geographical diversity of authors of each publication; (c) considering the average number of citations a publication receives per year (time-based normalization of citations) as a surrogate for its impact; and (d) not focusing on a specific country (developed or developing) or region. Analysis of the collected bibliometric data shows that a publication impact is significantly and positively associated with all related geographical collaboration indicators. But publication impact has a stronger association with the numbers of external collaborations at department and institution levels (inter-departmental and inter-institutional collaborations) compared to internal collaborations. Conversely, national collaboration correlates better with impact than international collaboration. (C) 2013 Elsevier Ltd. All rights reserved.
A new method of assessment of scientific papers, scientists, and scientific institutions was defined. The significance of a paper was assessed by the definition of the largest (the most prestigious) set, including that paper in its h-core. The sets of papers were defined by affiliation (country, city, university, department) or by subject (branches and sub-branches of science, journal). The inclusion of a paper in the h-core of certain set(s) was used as an indicator of the significance of that paper, and of the scientific output of its author(s), of their scientific institution(s), etc. An analogous procedure was used to assess the contribution of an individual to the scientific output of his/her scientific institution, branch of science, etc. (C) 2013 Elsevier Ltd. All rights reserved.
Unlike Web hyperlink data, Web traffic data have not yet been the focus of considerable study in Webometrics research. The relationships between Web traffic data and academic/business performance measures have not been as firmly established as the relationships between Web hyperlink data and such performance measures. Although various traffic data sources exist, few studies have examined and compared their relative merits. We carried out a study that aimed to address this lack. We selected groups of universities and businesses from the U.S. and China and collected their Web traffic data from three sources: Alexa Internet, Google Trends for Websites, and Compete. We found significant correlations between Web traffic data and organizational performance measures, specifically academic quality for universities and financial variables for businesses. We also examined the characteristics of the three data sources and compared their usefulness. We found that Alexa Internet outperformed the others. (C) 2013 Elsevier Ltd. All rights reserved.
To take into account the impact of the different bibliometric features of scientific fields and different size of both the publication set evaluated and the set used as reference standard, two new impact indicators are introduced. The Percentage Rank Position (PRP) indicator relates the ordinal rank position of the article assessed to the total number of papers in the publishing journal. The publications in the publishing journal are ranked by the decreasing citation frequency. The Relative Elite Rate (RER) indicator relates the number of citations obtained by the article assessed to the mean citation rate of the papers in the elite set of the publishing journal. The indices can be preferably calculated from the data of the publications in the elite set of journal papers of individuals, teams, institutes or countries. The number of papers in the elite set is calculated by the equation: P(pi(nu))=(10 log P) - 10, where P is the total number of papers. The mean of the PRP and RER indicators of the journal papers assessed may be applied for comparing the eminence of publication sets across fields. (C) 2013 Elsevier Ltd. All rights reserved.
A variety of bibliometric measures have been proposed to quantify the impact of researchers and their work The h-index is a notable and widely used example which aims to improve over simple metrics such as raw counts of papers or citations. However, a limitation of this measure is that it considers authors in isolation and does not account for contributions through a collaborative team. To address this, we propose a natural variant that we dub the Social h-index. The idea is to redistribute the h-index score to reflect an individual's impact on the research community. In addition to describing this new measure, we provide examples, discuss its properties, and contrast with other measures. (C) 2013 Elsevier Ltd. All rights reserved.
Bibliometrics has become an indispensable tool in the evaluation of institutions (in the natural and life sciences). An evaluation report without bibliometric data has become a rarity. However, evaluations are often required to measure the citation impact of publications in very recent years in particular. As a citation analysis is only meaningful for publications for which a citation window of at least three years is guaranteed, very recent years cannot (should not) be included in the analysis. This study presents various options for dealing with this problem in statistical analysis. The publications from two universities from 2000 to 2011 are used as a sample dataset (n = 2652, univ 1 = 1484 and univ 2 = 1168). One option is to show the citation impact data (percentiles) in a graphic and to use a line for percentiles regressed on 'distant' publication years (with confidence interval) showing the trend for the 'very recent' publication years. Another way of dealing with the problem is to work with the concept of samples and populations. The third option (very related to the second) is the application of the counterfactual concept of causality. (C) 2013 Elsevier Ltd. All rights reserved.
Scientific collaboration commonly takes place in a global and competitive environment. Coalitions and consortia are formed among universities, companies and research institutes to apply for research grants and to perform joint projects. In such a competitive environment, individual institutes may be strategic partners or competitors. Measures to determine partner importance have practical applications such as comparison and rating of competitors, reputation evaluation or performance evaluation of companies and institutes. Many network-centric metrics exist to measure the important of individuals or companies in social and collaborative networks. Here we present a novel context-based metric to measure the importance of partners in scientific collaboration networks. Well-established graph models such as the notion of hubs and authorities provide the basis for this work and are systematically extended to a flexible, context-aware network importance measure. (C) 2013 Elsevier Ltd. All rights reserved.
Q-measures are network indicators that gauge a node's brokerage role between different groups in the network. Previous studies have focused on their definition for different network types and their practical application. Little attention has, however, been paid to their theoretical and mathematical characterization. In this article we contribute to a better understanding of Q-measures by studying some of their mathematical properties in the context of unweighted, undirected networks. An external Q-measure complementing the previously defined local and global Q-measure is introduced. We prove a number of relations between the values of the global, the local and the external Q-measure and betweenness centrality, and show how the global Q-measure can be rewritten as a convex decomposition of the local and external Q-measures. Furthermore, we formally characterize when Q-measures obtain their maximal value. It turns out that this is only possible in a limited number of very specific circumstances. (C) 2013 Elsevier Ltd. All rights reserved.
Wide differences in publication and citation practices make impossible the direct comparison of raw citation counts across scientific disciplines. Recent research has studied new and traditional normalization procedures aimed at suppressing as much as possible these disproportions in citation numbers among scientific domains. Using the recently introduced IDCP (Inequality due to Differences in Citation Practices) method, this paper rigorously tests the performance of six cited-side normalization procedures based on the Thomson Reuters classification system consisting of 172 sub-fields. We use six yearly datasets from 1980 to 2004, with widely varying citation windows from the publication year to May 2011. The main findings are the following three. Firstly, as observed in previous research, within each year the shapes of sub-field citation distributions are strikingly similar. This paves the way for several normalization procedures to perform reasonably well in reducing the effect on citation inequality of differences in citation practices. Secondly, independently of the year of publication and the length of the citation window, the effect of such differences represents about 13% of total citation inequality. Thirdly, a recently introduced two-parameter normalization scheme outperforms the other normalization procedures over the entire period, reducing citation disproportions to a level very close to the minimum achievable given the data and the classification system. However, the traditional procedure of using sub-field mean citations as normalization factors yields also good results. (C) 2013 Elsevier Ltd. All rights reserved.
This paper examines the structural patterns of networks of internationally co-authored SCI papers in the domain of research driven by big data and provides an empirical analysis of semantic patterns of paper titles. The results based on data collected from the DVD version of the 2011 SCI database identify the U.S. as the most central country, followed by the U.K., Germany, France, Italy, Australia, the Netherlands, Canada, and Spain, in that order. However, some countries (e.g., Portugal) with low degree centrality occupied relatively central positions in terms of betweenness centrality. The results of the semantic network analysis suggest that internationally co-authored papers tend to focus on primary technologies, particularly in terms of programming and related database issues. The results show that a combination of words and locations can provide a richer representation of an emerging field of science than the sum of the two separate representations. (C) 2013 Elsevier Ltd. All rights reserved.
We describe the use of a domain-independent method to extend a natural language processing (NLP) application, SemRep (Rindflesch, Fiszman, & Libbus, 2005), based on the knowledge sources afforded by the Unified Medical Language System (UMLS (R); Humphreys, Lindberg, Schoolman, & Barnett, 1998) to support the area of health promotion within the public health domain. Public health professionals require good information about successful health promotion policies and programs that might be considered for application within their own communities. Our effort seeks to improve access to relevant information for the public health profession, to help those in the field remain an information-savvy workforce. Natural language processing and semantic techniques hold promise to help public health professionals navigate the growing ocean of information by organizing and structuring this knowledge into a focused public health framework paired with a user-friendly visualization application as a way to summarize results of PubMed (R) searches in this field of knowledge.
Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files.
Users' preferences for folders versus tags was studied in 2 working environments where both options were available to them. In the Gmail study, we informed 75 participants about both folder-labeling and tag-labeling, observed their storage behavior after 1 month, and asked them to estimate the proportions of different retrieval options in their behavior. In the Windows 7 study, we informed 23 participants about tags and asked them to tag all their files for 2 weeks, followed by a period of 5 weeks of free choice between the 2 methods. Their storage and retrieval habits were tested prior to the learning session and, after 7 weeks, using special classification recording software and a retrieval-habits questionnaire. A controlled retrieval task and an in-depth interview were conducted. Results of both studies show a strong preference for folders over tags for both storage and retrieval. In the minority of cases where tags were used for storage, participants typically used a single tag per information item. Moreover, when multiple classification was used for storage, it was only marginally used for retrieval. The controlled retrieval task showed lower success rates and slower retrieval speeds for tag use. Possible reasons for participants' preferences are discussed.
Knowledge sharing is a difficult task for most organizations, and there are many reasons for this. In this article, we propose that the nature of the knowledge shared and an individual's social network influence employees to find more value in person-to-person knowledge sharing, which could lead them to bypass the codified knowledge provided by a knowledge management system (KMS). We surveyed employees of a workman's compensation board in Canada and used social network analysis and hierarchical linear modeling to analyze the data. The results show that knowledge complexity and knowledge teachability increased the likelihood of finding value in person-to-person knowledge transfer, but knowledge observability did not. Contrary to expectations, whether the knowledge was available in the KMS had no impact on the value of person-to-person knowledge transfer. In terms of the social network, individuals with larger networks tended to perceive more value in the person-to-person transfer of knowledge than those with smaller networks.
Expertise retrieval has attracted significant interest in the field of information retrieval. Expert finding has been studied extensively, with less attention going to the complementary task of expert profiling, that is, automatically identifying topics about which a person is knowledgeable. We describe a test collection for expert profiling in which expert users have self-selected their knowledge areas. Motivated by the sparseness of this set of knowledge areas, we report on an assessment experiment in which academic experts judge a profile that has been automatically generated by state-of-the-art expert-profiling algorithms; optionally, experts can indicate a level of expertise for relevant areas. Experts may also give feedback on the quality of the system-generated knowledge areas. We report on a content analysis of these comments and gain insights into what aspects of profiles matter to experts. We provide an error analysis of the system-generated profiles, identifying factors that help explain why certain experts may be harder to profile than others. We also analyze the impact on evaluating expert-profiling systems of using self-selected versus judged system-generated knowledge areas as ground truth; they rank systems somewhat differently but detect about the same amount of pairwise significant differences despite the fact that the judged system-generated assessments are more sparse.
The number of research studies on social tagging has increased rapidly in the past years, but few of them highlight the characteristics and research trends in social tagging. A set of 862 academic documents relating to social tagging and published from 2005 to 2011 was thus examined using bibliometric analysis as well as the social network analysis technique. The results show that social tagging, as a research area, develops rapidly and attracts an increasing number of new entrants. There are no key authors, publication sources, or research groups that dominate the research domain of social tagging. Research on social tagging appears to focus mainly on the following three aspects: (a) components and functions of social tagging (e.g., tags, tagging objects, and tagging network), (b) taggers' behaviors and interface design, and (c) tags' organization and usage in social tagging. The trend suggest that more researchers turn to the latter two integrated with human computer interface and information retrieval, although the first aspect is the fundamental one in social tagging. Also, more studies relating to social tagging pay attention to multimedia tagging objects and not only text tagging. Previous research on social tagging was limited to a few subject domains such as information science and computer science. As an interdisciplinary research area, social tagging is anticipated to attract more researchers from different disciplines. More practical applications, especially in high-tech companies, is an encouraging research trend in social tagging.
The authors investigate the interplay between answer quality and answer speed across question types in community question-answering sites (CQAs). The research questions addressed are the following: (a) How do answer quality and answer speed vary across question types? (b) How do the relationships between answer quality and answer speed vary across question types? (c) How do the best quality answers and the fastest answers differ in terms of answer quality and answer speed across question types? (d) How do trends in answer quality vary over time across question types? From the posting of 3,000 questions in six CQAs, 5,356 answers were harvested and analyzed. There was a significant difference in answer quality and answer speed across question types, and there were generally no significant relationships between answer quality and answer speed. The best quality answers had better overall answer quality than the fastest answers but generally took longer to arrive. In addition, although the trend in answer quality had been mostly random across all question types, the quality of answers appeared to improve gradually when given time. By highlighting the subtle nuances in answer quality and answer speed across question types, this study is an attempt to explore a territory of CQA research that has hitherto been relatively uncharted.
Taking a structuration perspective and integrating reciprocity research in economics, this study examines the dynamics of reciprocal interactions in social question & answer communities. We postulate that individual users of social Q&A constantly adjust their kindness in the direction of the observed benefit and effort of others. Collective reciprocity emerges from this pattern of conditional strategy of reciprocation and helps form a structure that guides the very interactions that give birth to the structure. Based on a large sample of data from Yahoo! Answers, our empirical analysis supports the collective reciprocity premise, showing that the more effort (relative to benefit) an asker contributes to the community, the more likely the community will return the favor. On the other hand, the more benefit (relative to effort) the asker takes from the community, the less likely the community will cooperate in terms of providing answers. We conclude that a structuration view of reciprocity sheds light on the duality of social norms in online communities.
As a part of a research project aiming to connect library data to the unfamiliar data sets available in the Linked Data (LD) community's CKAN Data Hub (thedatahub.org), this project collected, analyzed, and mapped properties used in describing and accessing music recordings, scores, and music-related information used by selected music LD data sets, library catalogs, and various digital collections created by libraries and other cultural institutions. This article reviews current efforts to connect music data through the Semantic Web, with an emphasis on the Music Ontology (MO) and ontology alignment approaches; it also presents a framework for understanding the life cycle of a musical work, focusing on the central activities of composition, performance, and use. The project studied metadata structures and properties of 11 music-related LD data sets and mapped them to the descriptions commonly used in the library cataloging records for sound recordings and musical scores (including MARC records and their extended schema.org markup), and records from 20 collections of digitized music recordings and scores (featuring a variety of metadata structures). The analysis resulted in a set of crosswalks and a unified crosswalk that aligns these properties. The paper reports on detailed methodologies used and discusses research findings and issues. Topics of particular concern include (a) the challenges of mapping between the overgeneralized descriptions found in library data and the specialized, music-oriented properties present in the LD data sets; (b) the hidden information and access points in library data; and (c) the potential benefits of enriching library data through the mapping of properties found in library catalogs to similar properties used by LD data sets.
The aggregation of web performance data (page count and visibility) of internal university units could constitute a more precise indicator than the overall web performance of the universities and, therefore, be of use in the design of university web rankings. In order to test this hypothesis, a longitudinal analysis of the internal units of the Spanish university system was conducted over the course of 2010. For the 13,800 URLs identified, page count and visibility were calculated using the Yahoo! API. The internal values obtained were aggregated by university and compared with the values obtained from the analysis of the universities' general URLs. The results indicate that, although the correlations between general and internal values are high, internal performance is low in comparison to general performance, and that they give rise to different performance rankings. The conclusion is that the aggregation of unit performance is of limited use due to the low levels of internal development of the websites, and so its use is not recommended for the design of rankings. Despite this, the internal analysis enabled the detection of, among other things, a low correlation between page count and visibility due to the widespread use of subdirectories and problems accessing certain content.
The goal of this research is to evaluate the effect of ad rank on the performance of keyword advertising campaigns. We examined a large-scale data file comprised of nearly 7,000,000 records spanning 33 consecutive months of a major US retailer's search engine marketing campaign. The theoretical foundation is serial position effect to explain searcher behavior when interacting with ranked ad listings. We control for temporal effects and use one-way analysis of variance (ANOVA) with Tamhane's T2 tests to examine the effect of ad rank on critical keyword advertising metrics, including clicks, cost-per-click, sales revenue, orders, items sold, and advertising return on investment. Our findings show significant ad rank effect on most of those metrics, although less effect on conversion rates. A primacy effect was found on both clicks and sales, indicating a general compelling performance of top-ranked ads listed on the first results page. Conversion rates, on the other hand, follow a relatively stable distribution except for the top 2 ads, which had significantly higher conversion rates. However, examining conversion potential (the effect of both clicks and conversion rate), we show that ad rank has a significant effect on the performance of keyword advertising campaigns. Conversion potential is a more accurate measure of the impact of an ad's position. In fact, the first ad position generates about 80% of the total profits, after controlling for advertising costs. In addition to providing theoretical grounding, the research results reported in this paper are beneficial to companies using search engine marketing as they strive to design more effective advertising campaigns.
Disambiguation of ambiguous initialisms and acronyms is critical to the proper understanding of various types of texts. A model that attempts to solve this has previously been presented. This model contained various baseline features, including contextual relationship features, statistical features, and language-specific features. The domain of Jewish law documents written in Hebrew and Aramaic is known to be rich in ambiguous abbreviations and therefore this model was implemented and applied over 2 separate corpuses within this domain. Several common machine-learning (ML) methods were tested with the intent of finding a successful integration of the baseline feature variants. When the features were evaluated individually, the best averaged results were achieved by a library for support vector machines (LIBSVM); 98.07% of the ambiguous abbreviations, which were researched in the domain, were disambiguated correctly. When all the features were evaluated together, the J48 ML method achieved the best result, with 96.95% accuracy. In this paper, we examine the system's degree of success and the degree of its professionalism by conducting a comparison between this system's results and the results achieved by 39 participants, highly fluent in the research domain. Despite the fact that all the participants had backgrounds in religious scriptures and continue to study these texts, the system's accuracy rate, 98.07%, was significantly higher than the average accuracy result of the participants, 91.65%. Further analysis of the results for each corpus implies that participants overcomplicate the required task, as well as exclude vital information needed to properly examine the context of a given initialism.
One of the most significant inaccuracies of bibliometric databases is that of omitted citations, namely, missing electronic links between a paper of interest and some citing papers, which are (or should be) covered by the database. This paper proposes a novel approach for estimating a database's omitted-citation rate, based on the combined use of 2 or more bibliometric databases. A statistical model is also presented for (a) estimating the true number of citations received by individual papers or sets of papers, and (b) defining an appropriate confidence interval. The proposed approach could represent a first step towards the definition of a standard for evaluating the accuracy level of databases.
The aim of this paper is to visualize the history of evidence-based medicine (EBM) and to examine the characteristics of EBM development in China and the West. We searched the Web of Science and the Chinese National Knowledge Infrastructure database for papers related to EBM. We applied information visualization techniques, citation analysis, cocitation analysis, cocitation cluster analysis, and network analysis to construct historiographies, themes networks, and chronological theme maps regarding EBM in China and the West. EBM appeared to develop in 4 stages: incubation (1972-1992 in the West vs. 1982-1999 in China), initiation (1992-1993 vs. 1999-2000), rapid development (1993-2000 vs. 2000-2004), and stable distribution (2000 onwards vs. 2004 onwards). Although there was a lag in EBM initiation in China compared with the West, the pace of development appeared similar. Our study shows that important differences exist in research themes, domain structures, and development depth, and in the speed of adoption between China and the West. In the West, efforts in EBM have shifted from education to practice, and from the quality of evidence to its translation. In China, there was a similar shift from education to practice, and from production of evidence to its translation. In addition, this concept has diffused to other healthcare areas, leading to the development of evidence-based traditional Chinese medicine, evidence-based nursing, and evidence-based policy making.
The Semantic Web has been criticized for not being semantic. This article examines the questions of why and how the Web of Data, expressed in the Resource Description Framework (RDF), has come to be known as the Semantic Web. Contrary to previous papers, we deliberately take a descriptive stance and do not start from preconceived ideas about the nature of semantics. Instead, we mainly base our analysis on early design documents of the (Semantic) Web. The main determining factor is shown to be link typing, coupled with the influence of online metadata. Both factors already were present in early web standards and drafts. Our findings indicate that the Semantic Web is directly linked to older artificial intelligence work, despite occasional claims to the contrary. Because of link typing, the Semantic Web can be considered an example of a semantic network. Originally network representations of the meaning of natural language utterances, semantic networks have eventually come to refer to any networks with typed (usually directed) links. We discuss possible causes for this shift and suggest that it may be due to confounding paradigmatic and syntagmatic semantic relations.
Many dedicated scientists reject the concept of maintaining a work-life balance. They argue that work is actually a huge part of life. In the mind-set of these scientists, weekdays and weekends are equally appropriate for working on their research. Although we all have encountered such people, we may wonder how widespread this condition is with other scientists in our field. This brief communication probes work-life balance issues among JASIST authors and editors. We collected and examined the publication histories for 1,533 of the 2,402 articles published in JASIST between 2001 and 2012. Although there is no rush to submit, revise, or accept papers, we found that 11% of these events happened during weekends and that this trend has been increasing since 2005. Our findings suggest that working during the weekend may be one of the ways that scientists cope with the highly demanding era of publish or perish. We hope that our findings will raise an awareness of the steady increases in work among scientists before it affects our work-life balance even more.
The Democratic People's Republic of Korea (North Korea) is one of the world's most secretive and reclusive states. In scientometrics, even the United Nations, which compiles data from every country of the world, has been able to do little beyond counting the few scientific papers made publicly available (UNESCO 2010). The world could benefit from knowing more about North Korean science, which is quite well developed-witness all the concern about their nuclear energy and rocket launches. Here an analysis is presented of the North Korean presence in the world's scientific literature, and of the possibilities for collaboration which offers a mechanism for positive development for their citizens and also for their neighbours.
The present paper attempts to shed light on outstanding research performance using the example of citation distributions. In order to answer the question of how the analysis of outstanding performance, in general, and highly cited papers, in particular, could be integrated into standard techniques of evaluative scientometrics. Two general methods are proposed: One solution aims at quantifying the performance represented by the tail of citation distributions independently of the "mainstream", the second one, a parameter-free solution, provides performance classes for any level. Advantages and shortcoming of both methods are discussed.
In the present paper four myths of gender differences in scientific performance are presented and discussed. The persistence of these myths in different forms of evaluation is influencing the women's discriminations in research careers in combination with effects explained in other explanation models for the existence of the unseen barrier (glass ceiling) that keeps women from rising to the upper levels of the corporate ladder.
In this research, through the complete investigation of 100 American national universities. A list of 3,776 Chinese-American faculties are collected. Analysis is made from five aspects, including regional statistics, institution statistics, gender statistics, position statistics, and discipline statistics. New York, California and Pennsylvania have the most Chinese-American scholars, when the top three universities are The Ohio State University-Columbus, Emory University, and Texas A&M University. The number of male faculties is much greater than female, when the ratio is roughly 7:3. For the position statistics, the ratio of Professor, Associate Professor and Assistant Professor is 2.7:3:4.3. Biology, Medicine and Computer Science are the top three disciplines with the most Chinese-American faculties.
The disambiguation of named entities is a challenge in many fields such as scientometrics, social networks, record linkage, citation analysis, semantic web...etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions... Therefore, the search of names of persons or of organizations is difficult as soon as a single name might appear in many different forms. This paper proposes two approaches to disambiguate on the affiliations of authors of scientific papers in bibliographic databases: the first way considers that a training dataset is available, and uses a Naive Bayes model. The second way assumes that there is no learning resource, and uses a semi-supervised approach, mixing soft-clustering and Bayesian learning. The results are encouraging and the approach is already partially applied in a scientific survey department. However, our experiments also highlight that our approach has some limitations: it cannot process efficiently highly unbalanced data. Alternatives solutions are possible for future developments, particularly with the use of a recent clustering algorithm relying on feature maximization.
This study deals primarily with the effect of certain European Framework Programmes on EU-27 member states' publication output in nanotechnology, with a focus on their scientific collaboration over the last ten years. The study was conducted at three levels (category, journal and publication). The aim was to verify whether the newly launched category is sufficiently complete, as well as to identify the most prominent journals and compare the EU-27 member states' output to world production. Snapshots of European networking are also provided for three key dates (2001, 2006 and 2011) to ascertain the positions of emerging and central countries and analyse their variations over time. The results confirm the speedy development in the field and the importance of the EU-27s world role. They corroborate the close correlation between funding and increased output and the intensification of collaboration among member states. Finally, the information contained in the "Funding Agency" field in the Web of Science database was also compiled, with a view to substantiating the validity of the estimated impact of EU-funding programmes on member states' scientific output.
We introduce a novel set of metrics for triadic closure among individuals or groups to model how co-authorship networks become more integrated over time. We call this process of triadic, third-party mediated integration, research facilitation. We apply our research facilitation or RF-metrics to the development of the Pan-Asian SNP (PASNP) Consortium, the first inter-Asian genomics network. Our aim was to examine if the consortium catalyzed research facilitation or integration among the members and the wider region. The PASNP Consortium is an ideal case study of an emerging Asian Research Area because its members themselves asserted a regional Asian identity. To validate our model, we developed data mining software to extract and match full author and institutional information from the PDFs of scientific papers.
Many of the novel ideas that lead to scientific publications or yield technological advances are the result of collaborations among scientists or inventors. Although various aspects of collaboration networks have been examined, the impact of many network characteristics on knowledge creation and innovation production remains unclear due to the inconsistency of the conclusions from various research studies. One such network structure, called small world, has recently attracted much theoretical attention as it has been suggested that it can enhance the information transmission efficiency among the network actors. However, the existing empirical studies have failed to provide consistent results regarding the effect of small-world network properties on network performance in terms of its scientific and technological productivity. In this paper, using the data on 29 years of journal publications and patents in the field of biotechnology in Canada, the network of scientists' collaboration activities has been constructed based on their co-authorships in scientific articles. Various structural properties of this network have been measured and the relationships between the network structure and knowledge creation, and quantity and quality of technological performance have been examined. We found that the structure of the co-authorship network of Canadian biotechnology scientists has a significant effect on the knowledge and innovation production, but it produced no impact on the quality of patents generated by these scientists.
This paper discusses a concept for inferring attributes of 'frontier research' in peer-reviewed research proposals under the popular scheme of the European Research Council (ERC). The concept serves two purposes: firstly to conceptualize, define and operationalize in scientometric terms attributes of frontier research; and secondly to build and compare outcomes of a statistical model with the review decision in order to obtain further insight and reflect upon the influence of frontier research in the peer-review process. To this end, indicators across scientific disciplines and in accord with the strategic definition of frontier research by the ERC are elaborated, exploiting textual proposal information and other scientometric data of grant applicants. Subsequently, a suitable model is formulated to measure ex-post the influence of attributes of frontier research on the decision probability of a proposal to be accepted. We present first empirical data as proof of concept for inferring frontier research in grant proposals. Ultimately the concept is aiming at advancing the methodology to deliver signals for monitoring the effectiveness of peer-review processes.
Quantifying the scientific performance of investigators has become an integral part of decision-making in research policy. The aim of the present study was to evaluate if there is a correlation between journal impact factor (IF) and researchers' influence among a selected group of Brazilian investigators in the fields of clinical nephrology and neurosciences. This study was based on 94 senior investigators (36 in clinical nephrology and 58 in clinical neurosciences) receiving productivity scholarships from the Brazilian Council for Scientific and Technological Development (CNPq) according to a list provided by the agency in February 2009. Scientific performance indicators included in the analysis were: number of papers indexed by the Web of Science and Scopus databases, number of citations, h- and m-index. IFs were analyzed as (1) cumulative IF (Sigma IF), (2) IF adjusted by time (IF/t), and (3) average IF. There was a moderate positive correlation only between aIF and two indicators: total number of citations (P < 0.001) and h-index (P < 0.001). There was also a positive correlation between IF/t and m-index (P < 0.001). There was an agreement in these correlations between both groups (clinical nephrology and neurosciences). No significant correlation between the average IF and any of the scientific indicators was detected. A cut-off of 10.53 for IF/t showed the best performance in predicting researchers with m-index equal to or greater than 1. According to our findings, other qualitative and quantitative instruments rather than IF are clearly needed for identifying researchers with outstanding scientific output.
This paper applies panel threshold regression model to verify there is a triple threshold effect of patent citations/sales on the relationship between patent counts/sales and market value/sales in the American pharmaceutical industry. The results demonstrate that patent citations/sales moderates the relationship between patent counts/sales and market value/sales (i.e., the relationship between patent counts and market value). When patent citations/sales is less than or equal to the lowest threshold, 4.68, there is no significant relationship between patent counts and market value. Once patent citations/sales is more than the lowest threshold, there is a positive relationship between patent counts and market value. This study points out that the third regime is optimal because the extent of the positive relationship between patent counts and market value is most.
Have Chinese universities, after enormous investment over the past decade, embraced the university's third mission-contributing to industrial and technological progress? The literature has not sufficiently addressed this question. This study intends to advance understanding of this issue by empirically addressing this question from a business perspective in a bold and unconventional way. Unlike prior studies that simply used contingent and institutional factors to describe the link between Chinese universities and industrial firms by measuring such aspects as patent licensing, co-patenting, and co-authoring, our work goes further and applies longitudinal analysis to examine the ways firms access university-level knowledge and the impact of such knowledge on firm innovation outputs. We propose that if Chinese universities embraced their third mission, then we would observe a positive effect of university-industry collaborations on firms' subsequent innovation outputs. Empirical results based on a sample of the top 100 Chinese electronic firms in terms of output value support our hypothesis. Specifically, university patent licensing and co-patenting between universities and firms was found to positively affect firm innovation outputs. Moreover, we found that geographical distance and collaboration dominance moderate the co-patenting-innovation output relationship.
Most academic rankings attempt to measure the quality of university education and research. However, previous studies that examine the most influential rankings conclude that the variables they use could be an epiphenomenon of an X factor that has little to do with quality. The aim of this study is to investigate the existence of this hidden factor or profile in the two most influential global university rankings in the world: the Academic Ranking of World Universities (ARWU) of the University of Shanghai Jiao Tong, and the Times Higher Education (THE) ranking. Results support the existence of an underlying entity profile, characterized by institutions normally from the US that enjoy a high reputation. Results also support the idea that rankings lack the capacity to assess university quality in all its complexity, and two strategies are suggested in relation to the vicious circle created between institutional reputation and rankings.
Using the university-industry co-publications (UICP) propensity indicators developed by Tijssen (CWTS Working Paper Series, CWTS-WP-2012-009, 2009), this paper examines the impact of university-industry R&D collaboration on university technology commercialization output for leading US and Canadian universities. Our analysis suggests that UICPs do have a significant positive influence on universities' technology commercialization outputs, after controlling for the quantity and quality of their research and for their commercialization resources. The results are robust for all three common measures of university technology commercialization: patenting (both in terms of simple patent counts and citation-weighted counts), spin-off formation, and technology licensing. To supplement the aggregate regression findings, five case studies are provided that offer further insights on the causal mechanisms involved. Implications of these findings and possible future research directions are discussed.
Based on data from the Web of Science, international collaboration between China and the UK in food and agriculture has been investigated from various perspectives. A new method for classifying cross- or multi-disciplinary fields has been created. The comparative study focuses on China's collaboration with selected countries including the USA, the UK, Germany and Japan. The newly proposed Integrated Impact Indicator (I3) is applied to evaluate publication impact. Although China's total publications dropped in 2010, its research productivity in food and agriculture nevertheless kept growing and international collaboration, reflected by the number of publications, also increased in an exponential way. The growth rate of China's internationally collaborated publications was lower than that of China's total publications. The USA, Japan, Canada, Australia, the UK and Germany are the top partners for Chinese researchers in this field. China-UK joint publications overall increased although their share in China's total internationally collaborated publications decreased. To China, collaborating with the USA, the UK and Germany, instead of Japan, seems to offer an option to raise impact. The rapidly growing number of international publications and impact of Chinese research in food and agriculture offers great collaboration potential for the country. The fact that the average impact of China-UK collaborative publications is higher than the domestic publications of either country implies that collaboration benefits both sides as has been found in several other studies.
It was found that the surname-based representation of Jewish authors in the top US biomedical journals corresponds to the representation of Jewish Nobel Laureates in Medicine among US laureates: Both of them are almost equally disproportionately high, with the ratio of actual to expected number close to 20 (Kissin, Scientometrics 89:273-280, 2011). The main aim of this study was to determine whether the contribution of Jewish inventors is also disproportionately high. The number of patents (US Patent and Trademark Office database) per thousand persons with the same surname (2000 Census) was determined (index P). Index P was compared with index A, which represents the number of the articles in the top US biomedical journals, and index G, which is based on the representation of a surname in the Google's option "Discussions", reflecting a combination of various business and leisure activities (designed as a negative control). The collective contributions of the 96 Jewish surname groups for each of the above indices were calculated. The ratio of actual to expected number of US patents was found to be disproportionately high-6.1 (p < 0.0001). At the same time, this disproportionality was four-fold lower than that related to biomedical articles (ratio of 6.1 vs. 23.3, p < 0.0001). There was some degree of correlation between index P and index A (r = 0.407, p < 0.0001), but no significant correlation was found between index P and index G. The role of various factors in the observed disproportionalities is discussed. The greater degree of disproportionality with biomedical research articles might be a consequence of traditional Jewish inclination towards occupations in medicine.
This study aims to reveal the intellectual structure of Library and Information Science (LIS) in China during the period 2008-2012 utilizing co-word analysis. The status and trends of LIS in China are achieved by measuring the correlation coefficient of selected keywords extracted from relevant journals in the Chinese Journal Full-Text Database. In co-word analysis, multivariate statistical analysis and social network analysis are applied to obtain 13 clusters of keywords, a two-dimensional map, centrality and density of clusters, a strategic diagram and a relation network. Based on these results, the following conclusions can be drawn: (i) LIS in China has some established and well-developed research topics; (ii) a few emerging topics have a great potential for development; and (iii), the research topics in this LIS field are largely decentralized as a whole, where there are many marginal and immature topics.
Faculty of 1000 (F1000) is a post-publishing peer review web site where experts evaluate and rate biomedical publications. F1000 reviewers also assign labels to each paper from a standard list or article types. This research examines the relationship between article types, citation counts and F1000 article factors (FFa). For this purpose, a random sample of F1000 medical articles from the years 2007 and 2008 were studied. In seven out of the nine cases, there were no significant differences between the article types in terms of citation counts and FFa scores. Nevertheless, citation counts and FFa scores were significantly different for two article types: "New finding" and "Changes clinical practice": FFa scores value the appropriateness of medical research for clinical practice and "New finding" articles are more highly cited. It seems that highlighting key features of medical articles alongside ratings by Faculty members of F1000 could help to reveal the hidden value of some medical papers.
Since Schumpeter's (The theory of economic development, 1934) seminal work on economic development, innovation is considered as one of the main drivers of firm performance and economic growth. At the same time, technological innovations vary considerably in terms of impact with only a minority of new inventions contributing significantly to technological progress and economic growth. More recently a number of indicators derived from patent documents have been advanced to capture the nature and impact of technological inventions. In this paper, we compare and validate these indicators within the field of biotechnology. An extensive analysis of the recent history of biotechnology allows us to identify the most important inventions (n = 214) that shaped the field of biotechnology in the time period 1976-2001. A considerable number of these inventions have been patented between 1976 and 2001 (n = 117, 55 %). For all USPTO biotech patents filed between 1976 and 2001 (n = 84,119), relevant indicators have been calculated. In a subsequent step, we assess which indicators allow us to distinguish between the most important patented inventions and their less influential counterparts by means of logistic regression models. Our findings show that the use of multiple, complementary indicators provides the most comprehensive picture. In addition, it is clear that ex-post indicators reflecting impact and value outperform ex-ante indicators reflecting the nature and novelty of the invention in terms of precision and recall.
We propose a method for selecting the research guarantor when papers are co-authored. The method is simply based on identifying the corresponding author. The method is here applied to global scientific output based on the SCOPUS database in order to build a new output distribution by country. This new distribution is then compared with previous output distributions by country but which were based on whole or fractional counting, not only for the total output but also for the excellence output (papers belonging to the 10 % most cited papers). The comparison allows one to examine the effect of the different methodological approaches on the scientific performance indicators assigned to countries. In some cases, there was a very large variation in scientific performance between the total output (whole counting) and output as research guarantor. The research guarantor approach is especially interesting when used with the excellence output where the quantity of excellent papers is also a quality indicator. The impact of excellent papers naturally has less variability as they are all top-cited papers.
Based on the network comprised of 111,444 keywords of library and information science that are extracted from Scopus, and taken into consideration the major properties of average distance and clustering coefficients, the present authors, with the knowledge of complex network and by means of calculation, reveal the small-world effect of the keywords network. On the basis of the keywords network, the betweenness centrality is used to carry out a preliminary study on how to detect the research hotspots of a discipline. This method is also compared with that of detecting research hotspots by word frequency.
The study explores the international collaboration network consisting of 606 astronomical institutions through the analysis of international coauthored papers published in six journals in astronomy and astrophysics from 2001 to 2009. It shows that the Istituto Nazionale di Astrofisica (INAF) and European Southern Observatory (ESO) are the most notable actors, with the highest values of centrality in the network, while Japan Meteorological Agency (JMA) is the only institution that is completely separated from others. It is observed that national academies in major countries, international organizations, and large observatories are more likely to be the central actors. Yet some world-famous astronomical institutions, such as CfA, NASA, and Caltech, are identified as remarkable actors in the network, they show no strikingly high scores in the centrality measures. Overall, astronomical institutions' network position varies with time; nevertheless, not all of institutions present considerable changes during the investigation periods. While some institutions moved from central to relative peripheral positions, or in the opposite direction, the institutions which are positioned at the very center of the network tend to be stable over time.
Exploring and measuring technology-relatedness and its collateral technology divergence and convergence, would have far-reaching theoretical significance and academic value on the chain mode of technology development, and also on the mastery of the laws for technology evolution and progress. Taking the patentometric analysis of solar energy technology worldwide as a case, employing the methodology of technology co-classification analysis, choosing two indicators, namely, mean technology co-classification partners (MTCP) and mean technology co-classification index (MTCI), we have analyzed and measured the evolving process of technology-relatedness. The results not only demonstrate in a direct manner the continuously advancing character of solar energy technology in the tensions of technology divergence and convergence, but also reveal quantitatively that, due to the chain reaction of technology-relatedness, technology divergence and technology convergence would tend to evolve in parallel. Through these, it is indicated that technology divergence and technology convergence are two trends which would develop separately, react mutually, and serve as causation for each other, thus making chain progress and continuously pushing forward the innovation, creation and upgrading of technologies. This is a regular phenomenon on condition that the specific technology area is in a status of sustainable development. It still awaits further research on how to verify and reveal the general principles on the interaction between technology divergence and convergence by conducting empirical studies and combining patent analysis.
We investigated the development of astronomy and astrophysics research productivity in Turkey in terms of publication output and their impacts as reflected in the science citation index for the period 1980-2010. Our study involves 838 refereed publications, including 801 articles, 16 letters, 15 reviews, and six research notes. The number of papers were prominently increased after 2000 and the average number of papers per researcher is calculated as 0.89. Total number of received citations for 838 papers is 6938, while number of citations per papers is approximately 8.3 in 30 years. Publication performance of Turkish astronomers and astrophysicists was compared with those of seven countries that have similar gross domestic expenditures on research and development, and members of organization for economic co-operation and development. Our study reveals that the output of astronomy and astrophysics research in Turkey has gradually increased over the years.
Many recent studies on MEDLINE-based information seeking have shed light on scientists' behaviors and associated tool innovations that may improve efficiency and effectiveness. Few, if any, studies, however, examine scientists' problem-solving uses of PubMed in actual contexts of work and corresponding needs for better tool support. Addressing this gap, we conducted a field study of novice scientists (14 upper-level undergraduate majors in molecular biology) as they engaged in a problem-solving activity with PubMed in a laboratory setting. Findings reveal many common stages and patterns of information seeking across users as well as variations, especially variations in cognitive search styles. Based on these findings, we suggest tool improvements that both confirm and qualify many results found in other recent studies. Our findings highlight the need to use results from context-rich studies to inform decisions in tool design about when to offer improved features to users.
Images contained in scientific publications are widely considered useful for educational and research purposes, and their accurate indexing is critical for efficient and effective retrieval. Such image retrieval is complicated by the fact that figures in the scientific literature often combine multiple individual subfigures (panels). Multipanel figures are in fact the predominant pattern in certain types of scientific publications. The goal of this work is to automatically segment multipanel figuresa necessary step for automatic semantic indexing and in the development of image retrieval systems targeting the scientific literature. We have developed a method that uses the image content as well as the associated figure caption to: (1) automatically detect panel boundaries; (2) detect panel labels in the images and convert them to text; and (3) detect the labels and textual descriptions of each panel within the captions. Our approach combines the output of image-content and text-based processing steps to split the multipanel figures into individual subfigures and assign to each subfigure its corresponding section of the caption. The developed system achieved precision of 81% and recall of 73% on the task of automatic segmentation of multipanel figures.
Delays have become one of the most often cited complaints of web users. Long delays often cause users to abandon their searches, but how do tolerable delays affect information search behavior? Intuitively, we would expect that tolerable delays should induce decreased information search. We conducted two experiments and found that as delay increased, a point occurs at which time within-page information search increases; that is, search behavior remained the same until a tipping point occurs where delay increases the depth of search. We argue that situation normality explains this phenomenon; users have become accustomed to tolerable delays up to a point (our research suggests between 7 and 11s), after which search behavior changes. That is, some delay is expected, but as delay becomes noticeable but not long enough to cause the abandonment of search, an increase occurs in the stickiness of webpages such that users examine more information on each page before moving to new pages. The net impact of tolerable delays was counterintuitive: tolerable delays had no impact on the total amount of data searched in the first experiment, but induced users to examine more data points in the second experiment.
We perform session analysis for our domain of people search within a professional social network. We find that the content-based method is appropriate to serve as a basis for the session identification in our domain. However, there remain some problems reported in previous research which degrade the identification performance (such as accuracy) of the content-based method. Therefore, in this article, we propose two important refinements to address these problems. We describe the underlying rationale of our refinements and then empirically show that the content-based method equipped with our refinements is able to achieve an excellent identification performance in our domain (such as 99.820% accuracy and 99.707% F-measure in our experiments). Next, because the time-based method has extremely low computation costs, which makes it suitable for many real-world applications, we investigate the feasibility of the time-based method in our domain by evaluating its identification performance based on our refined content-based method. Our experiments demonstrate that the performance of the time-based method is potentially acceptable to many real applications in our domain. Finally, we analyze several features of the identified sessions in our domain and compare them with the corresponding ones in general web search. The results illustrate the profession-oriented characteristics of our domain.
English is by far the most used language on the web. In some domains, the existence of less content in the users' native language may not be problematic and even help to cope with the information overload. Yet, in domains such as health, where information quality is critical, a larger quantity of information may mean easier access to higher quality content. Query translation may be a good strategy to access content in other languages, but the presence of medical terms in health queries makes the translation process more difficult, even for users with very good language proficiencies. In this study, we evaluate how translating a health query affects users with different language proficiencies. We chose English as the non-native language because it is a widely spoken language and it is the most used language on the web. Our findings suggest that non-English-speaking users having at least elementary English proficiency can benefit from a system that suggests English alternatives for their queries, or automatically retrieves English content from a non-English query. This awareness of the user profile results in higher precision, more accurate medical knowledge, and better access to high-quality content. Moreover, the suggestions of English-translated queries may also trigger new health search strategies.
With the increasing popularity of social tagging systems, the potential for using social tags as a source of metadata is being explored. Social tagging systems can simplify the involvement of a large number of users and improve the metadata-generation process. Current research is exploring social tagging systems as a mechanism to allow nonprofessional catalogers to participate in metadata generation. Because social tags are not from controlled vocabularies, there are issues that have to be addressed in finding quality terms to represent the content of a resource. This research explores ways to obtain a set of tags representing the resource from the tags provided by users. Two metrics are introduced. Annotation Dominance (AD) is a measure of the extent to which a tag term is agreed to by users. Cross Resources Annotation Discrimination (CRAD) is a measure of a tag's potential to classify a collection. It is designed to remove tags that are used too broadly or narrowly. Using the proposed measurements, the research selects important tags (meta-terms) and removes meaningless ones (tag noise) from the tags provided by users. To evaluate the proposed approach to find classificatory metadata candidates, we rely on expert users' relevance judgments comparing suggested tag terms and expert metadata terms. The results suggest that processing of user tags using the two measurements successfully identifies the terms that represent the topic categories of web resource content. The suggested tag terms can be further examined in various usages as semantic metadata for the resources.
When information practices are understood to be shaped by social context, privilege and marginalization alternately affect not only access to, but also use of information resources. In the context of information, privilege, and community, politics of marginalization drive stigmatized groups to develop collective norms for locating, sharing, and hiding information. In this paper, we investigate the information practices of a subcultural community whose activities are both stigmatized and of uncertain legal status: the extreme body modification community. We use the construct of information poverty to analyze the experiences of 18 people who had obtained, were interested in obtaining, or had performed extreme body modification procedures. With a holistic understanding of how members of this community use information, we complicate information poverty by working through concepts of stigma and community norms. Our research contributes to human information behavior scholarship on marginalized groups and to Internet studies research on how communities negotiate collective norms of information sharing online.
Literature to date has treated as distinct two issues (a) the influence of pornography on young people and (b) the growth of Internet child pornography, also called child exploitation material (CEM). This article discusses how young people might interact with, and be affected by, CEM. The article first considers the effect of CEM on young victims abused to generate the material. It then explains the paucity of data regarding the prevalence with which young people view CEM online, inadvertently or deliberately. New analyses are presented from a 2010 study of search terms entered on an internationally popular peer-to-peer website, isoHunt. Over 91 days, 162 persistent search terms were recorded. Most of these related to file sharing of popular movies, music, and so forth. Thirty-six search terms were categorized as specific to a youth market and perhaps a child market. Additionally, 4 deviant, and persistent search terms were found, 3 relating to CEM and the fourth to bestiality. The article discusses whether the existence of CEM on a mainstream website, combined with online subcultural influences, may normalize the material for some youth and increase the risk of onset (first deliberate viewing). Among other things, the article proposes that future research examines the relationship between onset and sex offending by youth.
Groupthink behavior is always a risk in online groups and group decision support systems (GDSS), especially when not all potential alternatives for problem resolution are considered. It becomes a reality when individuals simply conform to the majority opinion and hesitate to suggest their own solutions to a problem. Anonymity has long been established to have an effect on conformity, but no previous research has explored the effects of different anonymity states in relation to an individual's likelihood to conform. Through a survey of randomly chosen participants from the English-language Wikipedia community, I explored the effects of anonymity on the likelihood of conforming to group opinion. In addition, I differentiated between actual states of anonymity and individuals' perceptions of anonymity. His findings indicate that although people perceive anonymity differently depending on their anonymity state, different states of anonymity do not have a strong effect on the likelihood of conforming to group opinion. Based on this evidence, I make recommendations for software engineers who have a direct hand in the design of online community platforms.
This article aims to understand the adoption of e-books by academic historians for the purpose of teaching and research. This includes an investigation into their knowledge about and perceived characteristics of this evolving research tool. The study relied on Rogers's model of the innovation-decision process to guide the development of an interview guide. Ten semistructured interviews were conducted with history faculty between October 2010 and December 2011. A grounded theory approach was employed to code and analyze the data. Findings about tradition, cost, teaching innovations, and the historical research process provide the background for designing learning opportunities for the professional development of historians and the academic librarians who work with them. While historians are open to experimenting with e-books, they are also concerned about the loss of serendipity in digital environments, the lack of availability of key resources, and the need for technological transparency. The findings show that Rogers's knowledge and persuasion stages are cyclical in nature, with scholars moving back and forth between these two stages. Participants interviewed were already weighing the five characteristics of the persuasion stage without having much knowledge about e-books. The study findings have implications for our understanding of the diffusion of innovations in academia: both print and digital collections are being used in parallel without one replacing the other.
In this article, we present findings from a survey of nearly 600 university employees' e-mail use. The study provides a detailed comparison of use patterns between work and personal e-mail accounts. Our results suggest that users engage in more keeping behaviors with work e-mail than with personal e-mailrespondents reported more frequent use of keeping actions and larger inbox sizes for their work accounts. However, we found correlations between individual respondents' e-mail behaviors in the two contexts, indicating that personal preferences can play a role. We also report results pointing to e-mail as an important boundary management artifact. We show evidence that the use of multiple e-mail accounts may be a work-personal boundary placement strategy, but also observe that a fair amount of boundary permeation occurs through e-mail. To our knowledge, this study is one of the first to compare e-mail use in both work and personal contexts across the same sample. Our findings extend prior research on personal information management regarding e-mail use, and help inform the role of e-mail in managing work-personal boundaries. The results have implications for the design of e-mail systems, organizational e-mail policies, user training, and understanding the impacts of technology on daily life.
This article compares doctoral students' and faculty members' referencing behavior through the analysis of a large corpus of scientific articles. It shows that doctoral students tend to cite more documents per article than faculty members, and that the literature they cite is, on average, more recent. It also demonstrates that doctoral students cite a larger proportion of conference proceedings and journal articles than faculty members and faculty members are more likely to self-cite and cite theses than doctoral students. Analysis of the impact of cited journals indicates that in health research, faculty members tend to cite journals with slightly lower impact factors whereas in social sciences and humanities, faculty members cite journals with higher impact factors. Finally, it provides evidence that, in every discipline, faculty members tend to cite a higher proportion of clinical/applied research journals than doctoral students. This study contributes to the understanding of referencing patterns and age stratification in academia. Implications for understanding the information-seeking behavior of academics are discussed.
This study investigates a range of metrics available when a nanoscience and nanotechnology article is published to see which metrics correlate more with the number of citations to the article. It also introduces the degree of internationality of journals and references as new metrics for this purpose. The journal impact factor; the impact of references; the internationality of authors, journals, and references; and the number of authors, institutions, and references were all calculated for papers published in nanoscience and nanotechnology journals in the Web of Science from 2007 to 2009. Using a zero-inflated negative binomial regression model on the data set, the impact factor of the publishing journal and the citation impact of the cited references were found to be the most effective determinants of citation counts in all four time periods. In the entire 2007 to 2009 period, apart from journal internationality and author numbers and internationality, all other predictor variables had significant effects on citation counts.
Interdisciplinary research has been attracting more attention in recent decades. In this article, we compare the similarity between scientific research domains and quantifying the temporal similarities of domains. We narrowed our study to three research domains: information retrieval (IR), database (DB), and World Wide Web (W3), because the rapid development of the W3 domain substantially attracted research efforts from both IR and DB domains and introduced new research questions to these two areas. Most existing approaches either employed a content-based technique or a cocitation or coauthorship network-based technique to study the development trend of a research area. In this work, we proposed an effective way to quantify the similarities among different research domains by incorporating content similarity and coauthorship network similarity. Experimental results on DBLP (DataBase systems and Logic Programming) data related to IR, DB, and W3 domains showed that the W3 domain was getting closer to both IR and DB whereas the distance between IR and DB remained relatively constant. In addition, comparing to IR and W3 with the DB domain, the DB domain was more conservative and evolved relatively slower.
Citation analysis of documents retrieved from the Medline database (at the Web of Knowledge) has been possible only on a case-by-case basis. A technique is presented here for citation analysis in batch mode using both Medical Subject Headings (MeSH) at the Web of Knowledge and the Science Citation Index at the Web of Science (WoS). This freeware routine is applied to the case of Brugada Syndrome, a specific disease and field of research (since 1992). The journals containing these publications, for example, are attributed to WoS categories other than cardiac and cardiovascular systems, perhaps because of the possibility of genetic testing for this syndrome in the clinic. With this routine, all the instruments available for citation analysis can now be used on the basis of MeSH terms. Other options for crossing between Medline, WoS, and Scopus are also reviewed.
This paper examines research collaborations in the field of business and management in Malaysia, a fast-developing economy in Southeast Asia. The country aims to become a developed nation by the year 2020, guided by its well-charted Wawasan 2020 or Vision 2020 program. Research and development are important agenda items within this program. Rarely, however, have studies investigated the research collaborations of researchers based in Malaysia from the network perspective. After a manual author disambiguation process, we examined the network of 285 business and management researchers at the individual, institutional, and international levels. Author collaborations per paper almost doubled between 2001 and 2010 compared to the period 1980-1990. The popularity of researchers and the strength and diversity of their ties with other researchers had significant effects on their research performance. Furthermore, geographical proximity still mattered in intra-national collaborations. Malaysian institutions more often collaborated intra-institutionally or with foreign partners than with other institutions within Malaysia. The country's five research universities are among the top-most productive of all institutions in Malaysia. Malaysia's top international partners are all developed countries, including the US, Australia, Japan, the UK, and Canada. Surprisingly, Malaysia has had relatively little collaboration with ASEAN nations, of which it is a prominent member and which has an important agenda of educational cooperation within its member states. Internationally co-authored articles have been cited almost three times more than locally co-authored articles. Based on these results, we suggest an effective co-authorship strategy.
We survey tenure-track faculty members employed in three fields in colleges of agriculture at land-grant universities-agricultural economics, agronomy, and food science-to evaluate the effects of different employment structures and incentives on research productivity. These evaluations include conducting statistical tests to assess any effects of different academic appointments and developing a regression model to measure the effects of these and other attributes on individual research productivity, as defined by the number of publications in the Thomson ISI Web of Science. We find faculty who hold larger teaching and extension appointments produce fewer publications; we also find positive effects on the number of publications for grants and university funding, multi-institutional research collaboration, and number of graduate students advised.
The number of internationally co-authored articles have significantly increased in recent years and now receive more citations than domestic works. Abramo et al. (Scientometrics 86:629-643, 2011b) investigated scholars in Italian universities and found a positive correlation between their research performance and degree of internationalization. This study uses a data set in chemistry to examine the robustness of the results presented by Abramo et al. (Scientometrics 86:629-643, 2011b) and the relationship between international collaboration and mobility among researchers. The results confirmed the robustness of the previous study and raised the possibility that the higher citation rate of international papers is not solely explained by the higher performance of researchers. Therefore, international research collaboration seems to exert some kind of "bonus" effect because of internationalization. The results also indicate that researchers who collaborate internationally accumulate science and technology human capital through collaboration. A positive relationship between the international mobility of researchers and their performance is also shown although the direction of the cause and effect is not yet clear.
It is becoming ever more common to use bibliometric indicators to evaluate the performance of research institutions, however there is often a failure to recognize the limits and drawbacks of such indicators. Since performance measurement is aimed at supporting critical decisions by research administrators and policy makers, it is essential to carry out empirical testing of the robustness of the indicators used. In this work we examine the accuracy of the popular "h" and "g" indexes for measuring university research performance by comparing the ranking lists derived from their application to the ranking list from a third indicator that better meets the requirements for robust and reliable assessment of institutional productivity. The test population is all Italian universities in the hard sciences, observed over the period 2001-2005. The analysis quantifies the correlations between the three university rankings (by discipline) and the shifts that occur under changing indicators, to measure the distortion inherent in use of the h and g indexes and their comparative accuracy for assessing institutions.
The objective of this paper is to propose a cluster analysis methodology for measuring the performance of research activities in terms of productivity, visibility, quality, prestige and international collaboration. The proposed methodology is based on bibliometric techniques and permits a robust multi-dimensional cluster analysis at different levels. The main goal is to form different clusters, maximizing within-cluster homogeneity and between-cluster heterogeneity. The cluster analysis methodology has been applied to the Spanish public universities and their academic staff in the computer science area. Results show that Spanish public universities fall into four different clusters, whereas academic staff belong into six different clusters. Each cluster is interpreted as providing a characterization of research activity by universities and academic staff, identifying both their strengths and weaknesses. The resulting clusters could have potential implications on research policy, proposing collaborations and alliances among universities, supporting institutions in the processes of strategic planning, and verifying the effectiveness of research policies, among others.
In this paper, we propose a 'scaling' approach to compare the scientific performance of Italian heterogeneous academic disciplines. This method is based on the idea that, after eliminating the percentages of 'silent' researchers, the distribution of bibliometric parameters of the different academic fields can be superimposed and collapse into a unique master curve by a single scaling parameter. By using data on the scientific production of around 2,500 scholars of the university of Rome 'La Sapienza' from the Web of Science from 2004 to 2008, we (i) demonstrate the existence of a master curve, (ii) determine the scaling factors that work like rates of substitution to compare the scientific production across different academic fields on a common ground, (iii) show that the master bibliometric distribution follows a log-normal law and (iv) illustrate the relevance of the proposed approach for research assessment and allocation of competitive funding at the university level.
Citation numbers and other quantities derived from bibliographic databases are becoming standard tools for the assessment of productivity and impact of research activities. Though widely used, still their statistical properties have not been well established so far. This is especially true in the case of bibliometric indicators aimed at the evaluation of individual scholars, because large-scale data sets are typically difficult to be retrieved. Here, we take advantage of a recently introduced large bibliographic data set, Google Scholar Citations, which collects the entire publication record of individual scholars. We analyze the scientific profile of more than 30,000 researchers, and study the relation between the h-index, the number of publications and the number of citations of individual scientists. While the number of publications of a scientist has a rather weak relation with his/her h-index, we find that the h-index of a scientist is strongly correlated with the number of citations that she/he has received so that the number of citations can be effectively be used as a proxy of the h-index. Allowing for the h-index to depend on both the number of citations and the number of publications, we find only a minor improvement.
In this study, doctoral dissertations conducted at the Turkish Universities for the period of 1990-2011 and the scientifically indexed publications of the professors supervising these dissertations were investigated. In the evaluations, the publications scores of the Google Scholar, Web of Science, and Scopus as well as the citations belonging to the publications in these indexes were included. During the relevant period, 617 professors supervised all the 1,906 doctoral dissertations in the field of economics. The first three universities with the highest number of doctoral dissertations were determined to be Istanbul University, Marmara University, and Dokuz Eylul University whereas the first three universities with the highest scientific indexes were Ihsan Dogramaci Bilkent University, Middle East Technical University, and Bogazi double dagger i University. The academicians of the last three universities also outperformed those at the other universities regarding the scientific publications they produced and the citations they received. Overall, there is a low correlation between the dissertations conducted and scientific citations. Finally, academicians with publications and/or citations in Web of Science or Scopus were researched further via logistic regressions. The results indicate that there is positive and meaningful relation with PhD degrees earned abroad and number of publication in the Web of Science.
Shrimp aquaculture constitutes a major economic activity of some middle- and low-level economies in the world. Though it is practiced by around 70 countries, it is primarily dominated by China, Thailand, Indonesia, Vietnam, Ecuador and India. These six countries account for 80 % of the global shrimp production. The study has highlighted the role of research in the development of the industry by taking the examples of Penaeus vannamei and P. monodon. In case of the former, a seven time rise in quantum of research (studied by the number of publications as a proxy) could induce five time increase in production, whereas, in the latter case similar pattern was not noticed. The study has observed that based on shrimp production and research contribution; the major 30 countries associated with shrimp aquaculture could be categorized as: (i) high production, high-research contribution, (ii) low production, high-research contribution and (iii) high production, low-research contribution. The countries under the third category are at great risk and may suffer huge economic losses in the event of outbreak of any disease. By generating network map of research linkage across different countries the study has highlighted the potential countries for strengthening the existing linkage and fostering new linkage for knowledge consolidation. The study has given some suggestion for policy formulation for achieving a rapid growth of shrimp aquaculture in the world.
The study aims to assess journals' structural influence in Internet research and uncover the impacts of network structures on journals' structural influence drawing on theories of network closure and structural holes. The data of the study are the citation exchanges among 1,210 journals in Communication and other seven social scientific fields (i.e., Business, Economics/Finance, Education, Information Science, Political Science, Psychology, and Sociology) in Internet research. The top two most influential journals in Internet research are American Economic Review and Journal of Personality and Social Psychology. Journals in "Communication" field emerge to be an important source of influence in Internet research, whose mean structural influence ranks third among the eight fields, below "Business" and "Economics/Finance", but above other five fields. Journals' structural influences are found to grow over time and the growth rates vary across journals. Network brokerage is found to exert a significant impact on journals' structural influence, while the impact of network closure on journals' structural influences is not significant. The impact of network brokerage on journals' structural influence will increase over time.
The aging of scientific has generally been studied using synchronous approaches, i.e., based on references made by papers. This paper uses a diachronous model based on citations received by papers to study the changes in the life expectancy of three corpus of papers: papers from G6 and BRICS countries, papers published in Science, Nature, Physical Review and the Lancet and all papers divided into four broad fields: medical sciences, natural sciences and engineering, social sciences and arts and humanities. It shows that that: (i) life expectancy is extensively different from a corpus to another and may be either finite or infinite, meaning that the corpus would never be obsolete from a mathematical perspective; (ii) life expectancy for scientific literature has lengthened over the 1980-2000 period; (iii) life expectancy of developed countries' (G6) literature is on average shorter than that of emerging countries (BRICS).
Relationships between publication language, impact factors and self-citations of journals published in individual countries, eight from Europe and one from South America (Brazil), are analyzed using bibliometric data from Thomson Reuters JCR Science Edition databases of ISI Web of Knowledge. It was found that: (1) English-language journals, as a rule, have higher impact factors than non-English-language journals, (2) all countries investigated in this study have journals with very high self-citations but the proportion of journals with high self-citations with reference to the total number of journals published in different countries varies enormously, (3) there are relatively high percentages of low self-citations in high subject-category journals published in English as well as non-English journals but national-language journals have higher self-citations than English-language journals, and (4) irrespective of the publication language, journals devoted to very specialized scientific disciplines, such as electrical and electronic engineering, metallurgy, environmental engineering, surgery, general and internal medicine, pharmacology and pharmacy, gynecology, entomology and multidisciplinary engineering, have high self-citations.
This article describes an analysis of keywords which was aimed at revealing publication patterns in the field of renewable energy, including the temporal evolution of its different research lines over the last two decades. To this end, we first retrieved the records of the sample, then we processed the keywords to resolve their obvious problems of synonymy and to limit the study to those most used. The final results showed a clear increase in scientific production related to alternative energies, and a structure corresponding to five major clusters which, at a finer level of resolution, were decomposed into 22. We analyzed the structure of the clusters and their temporal evolution, paying particular attention to uncovering the bursty periods of the different lines of research.
Many different measures are used to assess academic research excellence and these are subject to ongoing discussion and debate within the scientometric, university-management and policy-making communities internationally. One topic of continued importance is the extent to which citation-based indicators compare with peer-review-based evaluation. Here we analyse the correlations between values of a particular citation-based impact indicator and peer-review scores in several academic disciplines, from natural to social sciences and humanities. We perform the comparison for research groups rather than for individuals. We make comparisons on two levels. At an absolute level, we compare total impact and overall strength of the group as a whole. At a specific level, we compare academic impact and quality, normalised by the size of the group. We find very high correlations at the former level for some disciplines and poor correlations at the latter level for all disciplines. This means that, although the citation-based scores could help to describe research-group strength, in particular for the so-called hard sciences, they should not be used as a proxy for ranking or comparison of research groups. Moreover, the correlation between peer-evaluated and citation-based scores is weaker for soft sciences.
Accurate measurement of institutional research productivity should account for the real contribution of the research staff to the output produced in collaboration with other organizations. In the framework of bibliometric measurement, this implies accounting for both the number of co-authors and each individual's real contribution to scientific publications. Common practice in the life sciences is to indicate such contribution through the order of author names in the byline. In this work, we measure the distortion introduced to university-level bibliometric productivity rankings when the number of co-authors or their position in the byline is ignored. The field of observation consists of all Italian universities active in the life sciences (Biology and Medicine). The analysis is based on the research output of the university staff over the period 2004-2008. Based on the results, we recommend against the use of bibliometric indicators that ignore co-authorship and real contribution of each author to research outputs.
Expert finding is of vital importance for exploring scientific collaborations to increase productivity by sharing and transferring knowledge within and across different research areas. Expert finding methods, including content-based methods, link structure-based methods, and a combination of content-based and link structure-based methods, have been studied in recent years. However, most state-of-the-art expert finding approaches have usually studied candidates' personal information (e.g. topic relevance and citation counts) and network information (e.g. citation relationship) separately, causing some potential experts to be ignored. In this paper, we propose a topical and weighted factor graph model that simultaneously combines all the possible information in a unified way. In addition, we also design the Loopy Max-Product algorithm and related message-passing schedules to perform approximate inference on our cycle-containing factor graph model. Information Retrieval is chosen as the test field to identify representative authors for different topics within this area. Finally, we compare our approach with three baseline methods in terms of topic sensitivity, coverage rate of SIGIR PC (e.g. Program Committees or Program Chairs) members, and Normalized Discounted Cumulated Gain scores for different rankings on each topic. The experimental results demonstrate that our factor graph-based model can definitely enhance the expert-finding performance.
Bibliometric indicators can be determined by comparing specific citation records with the percentiles of a reference set. However, there exists an ambiguity in the computation of percentiles because usually a significant number of papers with the same citation count are found at the border between percentile rank classes. The present case study of the citations to the journal Europhysics Letters (EPL) in comparison with all physics papers from the Web of Science shows the deviations which occur due to the different ways of treating the tied papers in the evaluation of the percentage of highly cited publications. A strong bias can occur, if the papers tied at the threshold number of citations are all considered as highly cited or all considered as not highly cited.
Thanks to a unique individual dataset of French academics in economics, we explain individual publication and citation records by gender and age, co-authorship patterns (average number of authors per article and size of the co-author network) and specialisation choices (percentage of output in each JEL code). The analysis is performed on both EconLit publication scores (adjusted for journal quality) and Google Scholar citation indexes, which allows us to present a broad picture of knowledge diffusion in economics. Citations are largely driven by publication records, which means that these two measures are partly substitutes, but citations are also substantially increased by larger research team size and co-author networks.
Based on co-citation cluster analysis, we propose a knowledge-transfer analysis model for any technology field. In this model, patent data with backward citations to non-patent literature and forward citations by later patents would be analyzed. Co-citation clustering of the cited articles defines scientific knowledge sources, while that of the patents themselves defines technology fronts. According to the citation between the article and patent clusters, the landscape of knowledge-transfer including route and strength between scientific knowledge sources and technology fronts can be mapped out. The model has been applied to the field of transgenic rice. As a result of the analysis, ten scientific knowledge sources and eight technology fronts have emerged, and reasonable links between them have been established, which clearly show how knowledge has been transferred in this field.
The most popular method for evaluating the quality of a scientific publication is citation count. This metric assumes that a citation is a positive indicator of the quality of the cited work. This assumption is not always true since citations serve many purposes. As a result, citation count is an indirect and imprecise measure of impact. If instrumental citations could be reliably distinguished from non-instrumental ones, this would readily improve the performance of existing citation-based metrics by excluding the non-instrumental citations. A citation was operationally defined as instrumental if either of the following was true: the hypothesis of the citing work was motivated by the cited work, or the citing work could not have been executed without the cited work. This work investigated the feasibility of developing computer models for automatically classifying citations as instrumental or non-instrumental. Instrumental citations were manually labeled, and machine learning models were trained on a combination of content and bibliometric features. The experimental results indicate that models based on content and bibliometric features are able to automatically classify instrumental citations with high predictivity (AUC = 0.86). Additional experiments using independent hold out data and prospective validation show that the models are generalizeable and can handle unseen cases. This work demonstrates that it is feasible to train computer models to automatically identify instrumental citations.
This paper proposes a framework to identify and evaluate companies from the technological perspective to support merger and acquisition (M&A) target selection decision-making. This employed a text mining-based patent map approach to identify companies which can fulfill a specific strategic purpose of M&A for enhancing technological capabilities. The patent map is the visualized technological landscape of a technology industry by using technological proximities among patents, so companies which closely related to the strategic purpose can be identified. To evaluate the technological aspects of the identified companies, we provide the patent indexes that evaluate both current and future technological capabilities and potential technology synergies between acquiring and acquired companies. Furthermore, because the proposed method evaluates potential targets from the overall corporate perspective and the specific strategic perspectives simultaneously, more robust and meaningful result can be obtained than when only one perspective is considered. Thus, the proposed framework can suggest the appropriate target companies that fulfill the strategic purpose of M&A for enhancing technological capabilities. For the verification of the framework, we provide an empirical study using patent data related to flexible display technology.
There are a number of solutions that perform unsupervised name disambiguation based on the similarity of bibliographic records or common coauthorship patterns. Whether the use of these advanced methods, which are often difficult to implement, is warranted depends on whether the accuracy of the most basic disambiguation methods, which only use the author's last name and initials, is sufficient for a particular purpose. We derive realistic estimates for the accuracy of simple, initials-based methods using simulated bibliographic datasets in which the true identities of authors are known. Based on the simulations in five diverse disciplines we find that the first initial method already correctly identifies 97% of authors. An alternative simple method, which takes all initials into account, is typically two times less accurate, except in certain datasets that can be identified by applying a simple criterion. Finally, we introduce a new name-based method that combines the features of first initial and all initials methods by implicitly taking into account the last name frequency and the size of the dataset. This hybrid method reduces the fraction of incorrectly identified authors by 10-30% over the first initial method. (C) 2013 Elsevier Ltd. All rights reserved.
From the way that it was initially defined (Hirsch, 2005), the h-index naturally encourages focus on the most highly cited publications of an. author and this in turn has led to (predominantly) a rank-based approach to its investigation. However, Hirsch (2005) and Burrell (2007a) both adopted a frequency-based approach leading to general conjectures regarding the relationship between the h-index and the author's publication and citation rates as well as his/her career length. Here we apply the distributional results of Burrell (2007a, 2013b) to three published data sets to show that a good estimate of the h-index can often be obtained knowing only the number of publications and the number of citations. (Exceptions can occur when an author has one or more "outliers" in the upper tail of the citation distribution.) In other words, maybe the main body of the distribution determines the h-index, not the wild wagging of the tail. Furthermore, the simple geometric distribution turns out to be the key. (C) 2013 Elsevier Ltd. All rights reserved.
In this paper the accuracy of five current approaches to quantifying the byline hierarchy of a scientific paper is assessed by measuring the ability of each to explain the variation in a composite empirical dataset Harmonic credit explained 97% of the variation by including information about the number of coauthors and their position in the byline. In contrast, fractional credit, which ignored the byline hierarchy by allocating equal credit to all coauthors, explained less than 40% of the variation in the empirical dataset. The nearly 60% discrepancy in explanatory power between fractional and harmonic credit was accounted for by equalizing bias associated with the omission of relevant information about differential coauthor contribution. Including an additional parameter to describe a continuum of intermediate formulas between fractional and harmonic provided a negligible or negative gain in predictive accuracy. By comparison, two parametric models from the bibliometric literature both had an explanatory capacity of approximately 80%. In conclusion, the results indicate that the harmonic formula provides a parsimonious solution to the problem of quantifying the byline hierarchy. Harmonic credit allocation also accommodates specific indications of departures from the basic byline hierarchy, such as footnoted information stating that some or all coauthors have contributed equally or indicating the presence of a senior author. (C) 2013 The Author. Published by Elsevier Ltd. All rights reserved.
In this paper we deal with the problem of aggregating numeric sequences of arbitrary length that represent e.g. citation records of scientists. Impact functions are the aggregation operators that express as a single number not only the quality of individual publications, but also their author's productivity. We examine some fundamental properties of these aggregation tools. It turns out that each impact function which always gives indisputable valuations must necessarily be trivial. Moreover, it is shown that for any set of citation records in which none is dominated by the other, we may construct an impact function that gives any a priori-established authors' ordering. Theoretically then, there is considerable room for manipulation in the hands of decision makers. We also discuss the differences between the impact function-based and the multicriteria decision making-based approach to scientific quality management, and study how the introduction of new properties of impact functions affects the assessment process. We argue that simple mathematical tools like the h- or g-index (as well as other bibliometric impact indices) may not necessarily be a good choice when it comes to assess scientific achievements. (C) 2013 Elsevier Ltd. All rights reserved.
Quantile kernel regression is a flexible way to estimate the percentile of a scholar's quality stratified by a measurable characteristic, without imposing inappropriate assumption about functional form or population distribution. Quantile kernel regression is here applied to identifying the one-in-a-hundred economist per age cohort according to the Hirsch index. (C) 2013 Elsevier Ltd. All rights reserved.
The debate on the role of women in the academic world has focused on various phenomena that could be at the root of the gender gap seen in many nations. However, in spite of the ever more collaborative character of scientific research, the issue of gender aspects in research collaborations has been treated in a marginal manner. In this article we apply an innovative bibliometric approach based on the propensity for collaboration by individual academics, which permits measurement of gender differences in the propensity to collaborate by fields, disciplines and forms of collaboration: intramural, extramural domestic and international. The analysis of the scientific production of Italian academics shows that women researchers register a greater capacity to collaborate in all the forms analyzed, with the exception of international collaboration, where there is still a gap in comparison to male colleagues. (C) 2013 Elsevier Ltd. All rights reserved.
We present a simple generalization of Hirsch's h-index, Z = root h(2) + C/root 5, where C is the total number of citations. Z is aimed at correcting the potentially excessive penalty made by h on a scientist's highly cited papers, because for the majority of scientists analyzed, we find the excess citation fraction (C - h(2))/C to be distributed closely around the value 0.75, meaning that 75% of the author's impact is neglected. Additionally, Z is less sensitive to local changes in a scientist's citation profile, namely perturbations which increase h while only marginally affecting C. Using real career data for 476 physicists careers and 488 biologist careers, we analyze both the distribution of Z and the rank stability of Z with respect to the Hirsch index h and the Egghe index g. We analyze careers distributed across a wide range of total impact, including top-cited physicists and biologists for benchmark comparison. In practice, the Z-index requires the same information needed to calculate h and could be effortlessly incorporated within career profile databases, such as Google Scholar and ResearcherID. Because Z incorporates information from the entire publication profile while being more robust than h and g to local perturbations, we argue that Z is better suited for ranking comparisons in academic decision-making scenarios comprising a large number of scientists. (C) 2013 Elsevier Ltd. All rights reserved.
We address the question how citation-based bibliometric indicators can best be normalized to ensure fair comparisons between publications from different scientific fields and different years. In a systematic large-scale empirical analysis, we compare a traditional normalization approach based on a field classification system with three source normalization approaches. We pay special attention to the selection of the publications included in the analysis. Publications in national scientific journals, popular scientific magazines, and trade magazines are not included. Unlike earlier studies, we use algorithmically constructed classification systems to evaluate the different normalization approaches. Our analysis shows that a source normalization approach based on the recently introduced idea of fractional citation counting does not perform well. Two other source normalization approaches generally outperform the classification-system-based normalization approach that we study. Our analysis therefore offers considerable support for the use of source-normalized bibliometric indicators. (C) 2013 Elsevier Ltd. All rights reserved.
The objective of this work was to examine the relationship between attitudes about publishing across disciplines and the scientific impact of authors. We conducted a web survey of 1066 authors randomly selected from four disciplines in the Web of Knowledge: economics, anthropology, water resources and biochemistry (approximately 250 from each discipline). Authors were asked questions about publishing norms within their discipline. The h-index of authors was subsequently calculated from data available from the Web of Knowledge. Authors in biochemistry had on average twice the h-index of those in economics, anthropology and water resources. Biochemists had higher expectations about the number of articles published for hire and promotion, more strongly valued interdisciplinary publishing, felt the cutting edge of their science was clearer, and had more defined patterns of author credit assignment than the other disciplines. Anthropologists exhibited a lower relationship between h-index and the number of years since their first publication. We conclude that attitudinal differences between disciplines may lead to differences in the recognition of scientific findings and the therefore the establishment of normal science. (C) 2013 Elsevier Ltd. All rights reserved.
This study assesses whether eleven factors associate with higher impact research: individual, institutional and international collaboration; journal and reference impacts; abstract readability; reference and keyword totals; paper, abstract and title lengths. Authors may have some control over these factors and hence this information may help them to conduct and publish higher impact research. These factors have been previously researched but with partially conflicting findings. A simultaneous assessment of these eleven factors for Biology and Biochemistry, Chemistry and Social Sciences used a single negative binomial-logit hurdle model estimating the percentage change in the mean citation counts per unit of increase or decrease in the predictor variables. The journal Impact Factor was found to significantly associate with increased citations in all three areas. The impact and the number of cited references and their average citation impact also significantly associate with higher article citation impact. Individual and international teamwork give a citation advantage in Biology and Biochemistry and Chemistry but inter-institutional teamwork is not important in any of the three subject areas. Abstract readability is also not significant or of no practical significance. Among the article size features, abstract length significantly associates with increased citations but the number of keywords, title length and paper length are insignificant or of no practical significance. In summary, at least some aspects of collaboration, journal and document properties significantly associate with higher citations. The results provide new and particularly strong statistical evidence that the authors should consider publishing in high impact journals, ensure that they do not omit relevant references, engage in the widest possible team working, when appropriate, and write extensive abstracts. A new finding is that whilst is seems to be useful to collaborate and to collaborate internationally, there seems to be no particular need to collaborate with other institutions within the same country. (C) 2013 Elsevier Ltd. All rights reserved.
We apply the knowledge discovery process to the mapping of current topics in a particular field of science. We are interested in how articles form clusters and what are the contents of the found clusters. A framework involving web scraping, keyword extraction, dimensionality reduction and clustering using the diffusion map algorithm is presented. We use publicly available information about articles in high-impact journals. The method should be of use to practitioners or scientists who want to overview recent research in a field of science. As a case study, we map the topics in data mining literature in the year 2011. (C) 2013 Elsevier Ltd. All rights reserved.
We address issues concerning what one may learn from how citation instances are distributed in scientific articles. We visualize and analyze patterns of citation distributions in the full text of 350 articles published in the Journal of Informetrics. In particular, we visualize and analyze the distributions of citations in articles that are organized in a commonly seen four-section structure, namely, introduction, method, results, and conclusions (IMRC). We examine the locations of citations to the groundbreaking h-index paper by Hirsch in 2005 and how patterns associated with citation locations evolve over time. The results show that citations are highly concentrated in the first section of an article. The density of citations in the first section is about three times higher than that in subsequent sections. The distributions of citations to highly cited papers are even more uneven. (C) 2013 Elsevier Ltd. All rights reserved.
There are many indicators of journal quality and prestige. Although acceptance rates are discussed anecdotally, there has been little systematic exploration of the relationship between acceptance rates and other journal measures. This study examines the variability of acceptance rates for a set of 5094 journals in five disciplines and the relationship between acceptance rates and JCR measures for 1301 journals. The results show statistically significant differences in acceptance rates by discipline, country affiliation of the editor, and number of reviewers per article. Negative correlations are found between acceptance rates and citation-based indicators. Positive correlations are found with journal age. These relationships are most pronounced in the most selective journals and vary by discipline. Open access journals were found to have statistically significantly higher acceptance rates than non-open access journals. Implications in light of changes in the scholarly communication system are discussed. (C) 2013 Elsevier Ltd. All rights reserved.
For a number of researchers a number of publications for each author is simulated using the zeta distribution and then for each publication a number of citations per publication simulated. Bootstrap confidence intervals indicate that the difference between the average of ratios and the ratio of averages are not significant. It was found that the log-logistic distribution which is a general form for the ratio of two correlated Pareto random variables, give a good fit to the estimated ratios. (C) 2013 Elsevier Ltd. All rights reserved.
Publishing in scholarly peer reviewed journals usually entails long delays from submission to publication. In part this is due to the length of the peer review process and in part because of the dominating tradition of publication in issues, earlier a necessity of paper-based publishing, which creates backlogs of manuscripts waiting in line. The delays slow the dissemination of scholarship and can provide a significant burden on the academic careers of authors. Using a stratified random sample we studied average publishing delays in 2700 papers published in 135 journals sampled from the Scopus citation index. The shortest overall delays occur in science technology and medical (STM) fields and the longest in social science, arts/humanities and business/economics. Business/economics with a delay of 18 months took twice as long as chemistry with a 9 month average delay. Analysis of the variance indicated that by far the largest amount of variance in the time between submission and acceptance was among articles within a journal as compared with journals, disciplines or the size of the journal. For the time between acceptance and publication most of the variation in delay can be accounted for by differences between specific journals. (C) 2013 Elsevier Ltd. All rights reserved.
Given the growing use of impact metrics in the evaluation of scholars, journals, academic institutions, and even countries, there is a critical need for means to compare scientific impact across disciplinary boundaries. Unfortunately, citation-based metrics are strongly biased by diverse field sizes and publication and citation practices. As a result, we have witnessed an explosion in the number of newly proposed metrics that claim to be "universal." However, there is currently no way to objectively assess whether a normalized metric can actually compensate for disciplinary bias. We introduce a new method to assess the universality of any scholarly impact metric, and apply it to evaluate a number of established metrics. We also define a very simple new metric h(s), which proves to be universal, thus allowing to compare the impact of scholars across scientific disciplines. These results move us closer to a formal methodology in the measure of scholarly impact. Published by Elsevier Ltd.
For comparisons of citation impacts across fields and overtime, bibliometricians normalize the observed citation counts with reference to an expected citation value. Percentile-based approaches have been proposed as a non-parametric alternative to parametric central-tendency statistics. Percentiles are based on an ordered set of citation counts in a reference set, whereby the fraction of papers at or below the citation counts of a focal paper is used as an indicator for its relative citation impact in the set. In this study, we pursue two related objectives: (I) although different percentile-based approaches have been developed, an approach is hitherto missing that satisfies a number of criteria such as scaling of the percentile ranks from zero (all other papers perform better) to 100 (all other papers perform worse), and solving the problem with tied citation ranks unambiguously. We introduce a new citation-rank approach having these properties, namely P100; (2) we compare the reliability of P100 empirically with other percentile-based approaches, such as the approaches developed by the SCImago group, the Centre for Science and Technology Studies (CWTS), and Thomson Reuters (InCites), using all papers published in 1980 in Thomson Reuters Web of Science (WoS). How accurately can the different approaches predict the long-term citation impact in 2010 (in year 31) using citation impact measured in previous time windows (years 1-30)? The comparison of the approaches shows that the method used by InCites overestimates citation impact (because of using the highest percentile rank when papers are assigned to more than a single subject category) whereas the SCImago indicator shows higher power in predicting the long-term citation impact on the basis of citation rates in early years. Since the results show a disadvantage in this predictive ability for P100 against the other approaches, there is still room for further improvements. (C) 2013 Elsevier Ltd. All rights reserved.
In this paper, we develop a novel methodology within the IDCP measuring framework for comparing normalization procedures based on different classification systems of articles into scientific disciplines. Firstly, we discuss the properties of two rankings, based on a graphical and a numerical approach, for the comparison of any pair of normalization procedures using a single classification system for evaluation purposes. Secondly, when the normalization procedures are based on two different classification systems, we introduce two new rankings following the graphical and the numerical approaches. Each ranking is based on a double test that assesses the two normalization procedures in terms of the two classification systems on which they depend. Thirdly, we also compare the two normalization procedures using a third, independent classification system for evaluation purposes. In the empirical part of the paper we use: (i) a classification system consisting of 219 sub-fields identified with the Web of Science subject-categories; an aggregate classification system consisting of 19 broad fields, as well as a systematic and a random assignment of articles to sub-fields with the aim of maximizing or minimizing differences across sub-fields; (ii) four normalization procedures that use the field or sub-field mean citations of the above four classification systems as normalization factors; and (iii) a large dataset, indexed by Thomson Reuters, in which 4.4 million articles published in 1998-2003 with a five-year citation window are assigned to sub-fields using a fractional approach. The substantive results concerning the comparison of the four normalization procedures indicate that the methodology can be useful in practice. (C) 2013 Elsevier Ltd. All rights reserved.
The importance of a scientific journal is usually established by considering the number of citations received by the papers that the journal publishes. In this way, the number of citations received by a scientific journal can be considered as a measure of the total production of the journal. In this paper, in order to obtain measures of the efficiency in the production process, the approach provided by stochastic frontier analysis (SFA) is considered, and econometric models are proposed. These models estimate a frontier production, which is the maximum achievable number of citations to the journal based on its resources. The efficiency can then be measured by considering the difference between the actual production and the estimated frontier. This approach is applied to the measurement of the productive efficiency of the journals of the JCR social sciences edition database, which belong simultaneously to the areas of "economics" and "social sciences, mathematical methods". (C) 2013 Elsevier Ltd. All rights reserved.
Internationally co-authored papers are known to have more citation impact than nationally co-authored paper, on average. However, the question of whether there are systematic differences between pairs of collaborating countries in terms of the citation impact of their joint output, has remained unanswered. On the basis of all scientific papers published in 2000 and co-authored by two or more European countries, we show that citation impact increases with the geographical distance between the collaborating counties. (C) 2013 Elsevier Ltd. All rights reserved.
Many large digital collections are currently organized by subject; although useful, these information organization structures are large and complex and thus difficult to browse. Current online tools and visualization prototypes show small, localized subsets and do not provide the ability to explore the predominant patterns of the overall subject structure. This study describes subject tree modifications that facilitate browsing for documents by capitalizing on the highly uneven distribution of real-world collections. The approach is demonstrated on two large collections organized by the Library of Congress Subject Headings (LCSH) and Medical Subject Headings (MeSH). Results show that the LCSH subject tree can be reduced to 49% of its initial complexity while maintaining access to 83% of the collection, and the MeSH tree can be reduced to 45% of its initial complexity while maintaining access to 97% of the collection. A simple solution to negate the loss of access is discussed. The visual impact is demonstrated by using traditional outline views and a slider control allowing searchers to change the subject structure dynamically according to their needs. This study has implications for the development of information organization theory and human-information interaction techniques for subject trees.
In most intent recognition studies, annotations of query intent are created post hoc by external assessors who are not the searchers themselves. It is important for the field to get a better understanding of the quality of this process as an approximation for determining the searcher's actual intent. Some studies have investigated the reliability of the query intent annotation process by measuring the interassessor agreement. However, these studies did not measure the validity of the judgments, that is, to what extent the annotations match the searcher's actual intent. In this study, we asked both the searchers themselves and external assessors to classify queries using the same intent classification scheme. We show that of the seven dimensions in our intent classification scheme, four can reliably be used for query annotation. Of these four, only the annotations on the topic and spatial sensitivity dimension are valid when compared with the searcher's annotations. The difference between the interassessor agreement and the assessor-searcher agreement was significant on all dimensions, showing that the agreement between external assessors is not a good estimator of the validity of the intent classifications. Therefore, we encourage the research community to consider using query intent classifications by the searchers themselves as test data.
Topic ontologies or web directories consist of large collections of links to websites, arranged by topic in different categories. The structure of these ontologies is typically not flat because there are hierarchical and nonhierarchical relationships among topics. As a consequence, websites classified under a certain topic may be relevant to other topics. Although some of these relevance relations are explicit, most of them must be discovered by an analysis of the structure of the ontologies. This article proposes a family of models of relevance propagation in topic ontologies. An efficient computational framework is described and used to compute nine different models for a portion of the Open Directory Project graph consisting of more than half a million nodes and approximately 1.5 million edges of different types. After performing a quantitative analysis, a user study was carried out to compare the most promising models. It was found that some general difficulties rule out the possibility of defining flawless models of relevance propagation that only take into account structural aspects of an ontology. However, there is a clear indication that including transitive relations induced by the nonhierarchical components of the ontology results in relevance propagation models that are superior to more basic approaches.
Given an unsegmented multi-author text, we wish to automatically separate out distinct authorial threads. We present a novel, entirely unsupervised, method that achieves strong results on multiple testbeds, including those for which authorial threads are topically identical. Unlike previous work, our method requires no specialized linguistic tools and can be easily applied to any text.
Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.
Term proximity is effective for many information retrieval (IR) research fields yet remains unexplored in blogosphere IR. The blogosphere is characterized by large amounts of noise, including incohesive, off-topic content and spam. Consequently, the classical bag-of-words unigram IR models are not reliable enough to provide robust and effective retrieval performance. In this article, we propose to boost the blog postretrieval performance by employing term proximity information. We investigate a variety of popular and state-of-the-art proximity-based statistical IR models, including a proximity-based counting model, the Markov random field (MRF) model, and the divergence from randomness (DFR) multinomial model. Extensive experimentation on the standard TREC Blog06 test dataset demonstrates that the introduction of term proximity information is indeed beneficial to retrieval from the blogosphere. Results also indicate the superiority of the unordered bi-gram model with the sequential-dependence phrases over other variants of the proximity-based models. Finally, inspired by the effectiveness of proximity models, we extend our study by exploring the proximity evidence between uery terms and opinionated terms. The consequent opinionated proximity model shows promising performance in the experiments.
Two methods for comparing impact factors and citation rates across fields of science are tested against each other using citations to the 3,705 journals in the Science Citation Index 2010 (CD-Rom version of SCI) and the 13 field categories used for the Science and Engineering Indicators of the U. S. National Science Board. We compare (a) normalization by counting citations in proportion to the length of the reference list (1/N of references) with (b) rescaling by dividing citation scores by the arithmetic mean of the citation rate of the cluster. Rescaling is analytical and therefore independent of the quality of the attribution to the sets, whereas fractional counting provides an empirical strategy for normalization among sets (by evaluating the between-group variance). By the fairness test of Radicchi and Castellano (2012a), rescaling outperforms fractional counting of citations for reasons that we consider.
Using data compiled for the SCImago Institutions Ranking, we look at whether the subject area type an institution (university or research-focused institution) belongs to (in terms of the fields researched) has an influence on its ranking position. We used latent class analysis to categorize institutions based on their publications in certain subject areas. Even though this categorization does not relate directly to scientific performance, our results show that it exercises an important influence on the outcome of a performance measurement: Certain subject area types of institutions have an advantage in the ranking positions when compared with others. This advantage manifests itself not only when performance is measured with an indicator that is not field-normalized but also for indicators that are field-normalized.
Using data from the Web of Science (WoS), we analyze the mutual information among university, industry, and government addresses (U-I-G) at the country level for a number of countries. The dynamic evolution of the Triple Helix can thus be compared among developed and developing nations in terms of cross-sectional coauthorship relations. The results show that the Triple Helix interactions among the three subsystems U-I-G become less intensive over time, but unequally for different countries. We suggest that globalization erodes local Triple Helix relations and thus can be expected to have increased differentiation in national systems since the mid-1990s. This effect of globalization is more pronounced in developed countries than in developing ones. In the dynamic analysis, we focus on a more detailed comparison between China and the United States. Specifically, the Chinese Academy of the (Social) Sciences is changing increasingly from a public research institute to an academic one, and this has a measurable effect on China's position in the globalization.
The h-index can be a useful metric for evaluating a person's output of Internet media. Here I advocate and demonstrate adaption of the h-index and the g-index to the top video content creators on YouTube. The h-index for Internet video media is based on videos and their view counts. The h-index is defined as the number of videos with >= h x 10(5) views. The g-index is defined as the number of videos with >= g x 10(5) views on average. When compared with a video creator's total view count, the h-index and g-index better capture both productivity and impact in a single metric.
In this article, we propose a measure to assess scientific impact that discounts self-citations and does not require any prior knowledge of their distribution among publications. This index can be applied to both researchers and journals. In particular, we show that it fills the gap of the h-index and similar measures that do not take into account the effect of self-citations for authors or journals impact evaluation. We provide 2 real-world examples: First, we evaluate the research impact of the most productive scholars in computer science (according to DBLP Computer Science Bibliography, Universitat Trier, Trier, Germany); then we revisit the impact of the journals ranked in the Computer Science Applications section of the SCImago Journal & Country Rank ranking service (Consejo Superior de Investigaciones Cientificas, University of Granada, Extremadura, Madrid, Spain). We observe how self-citations, in many cases, affect the rankings obtained according to different measures (including h-index and ch-index), and show how the proposed measure mitigates this effect.
For 3 years, the authors of this article and several other colleagues have worked with 11 nonprofit community groups to help them take greater control of their information technology in terms of technology acceptance, adoption, and literacy through a research project. As part of this project, the authors explored informal learning methods that the groups could benefit from and practiced them with the community representatives who played key roles in the daily life of the organizations. In the present article, the authors reflect on the developmental trajectories observed for two individuals, each from a different nonprofit organization, with respect to information technology efficacy and ability. The authors analyze these trajectories as a sequence of the following four technology-related roles-technology consumers, technology planners, technology doers, and technology sustainers. The authors describe these roles, the methods used to promote informal learning, and implications for other researchers studying informal learning in communities.
With the increasing use of information systems (IS) in our everyday lives, people may feel an attachment to their software applications beyond simply perceiving them as a tool for enhancing task performance. However, attachment is still a largely unexplored concept in both IS research and practice. Drawing from the literature on attachment in consumer behavior research and auxiliary theories in IS use and community participation research, this study theoretically identifies and empirically explores the concept of attachment and its antecedents (i.e., relative visual aesthetics, personalization, relative performance) and outcome (i.e., community participation intention) in the IS context. Using web browsers as the target IS, an online survey was conducted. Results show that relative expressive visual aesthetics is the strongest antecedent of IS attachment, and that personalization is the second strongest antecedent of IS attachment, followed by relative performance. Furthermore, this study reveals that IS attachment has a strong positive impact on community participation intention. This study contributes theoretically and empirically to the body of IS use research and has managerial implications, suggesting that although superior performance is a necessary condition for attachment formation, improving users' experience through expressive visual aesthetics and personalization is critical to build strong attachment relationships with users.
Information technology (IT) outsourcing has become a widely accepted management strategy. As a consequence, a great deal of research on the IT outsourcing domain, covering a wide range of issues, has been conducted. This study investigates the IT outsourcing knowledge infrastructure from a network point of view. Triple helix indicators and social network analysis techniques are employed on 288 scholarly papers obtained from the Web of Science database using keywords related to IT outsourcing. The results reveal the key players in IT outsourcing research collaborations; their network characteristics, such as degree centrality; and the relationship of academia, industry, and government in terms of IT outsourcing knowledge production. This article also provides results-based implications.
Using a very small sample of 8 data sets it was recently shown by De Visscher (2011) that the g-index is very close to the square root of the total number of citations. It was argued that there is no bibliometrically meaningful difference. Using another somewhat larger empirical sample of 26 data sets I show that the difference may be larger and I argue in favor of the g-index.
In this position paper we discuss the current status of the core scientific journals in China. Based on discussions of journals' relation to a small group of full-text database providers, open access publishing and copyright problems, we conclude that China's digital publishing industry is not yet in a healthy state and some key issues related to revenue, digital piracy and copyright must be solved.
In science mapping, bibliographic coupling (BC) has been a standard tool for discovering the cognitive structure of research areas, such as constituent subareas, directions, schools of thought, or paradigms. Modelled as a set of documents, research areas are often sorted into document clusters via BC representing a thematic unit each. In this paper we propose an alternative method called age-sensitive bibliographic coupling: the aim is to enable the standard method to produce historically valid thematic units, that is, to yield document clusters that represent the historical development of the thematic structure of the subject as well. As such, the method is expected to be especially beneficial for investigations on science dynamics and the history of science. We apply the method within a bibliometric study in the modern history of bioscience, addressing the development of a complex, interdisciplinary discourse called the Species Problem. As a result, a quantitative and qualitative comparison of the standard and the proposed method of bibliographic coupling will be reported, together with a pilot study on the cognitive-historical structure of the Species Problem, regarding an important fragment of the discourse.
In order to gain a deeper understanding of the international collaboration of global library and information science (LIS), the present paper investigated the trends, networks as well as core groups of the international collaboration in LIS at the country and institution levels by combining bibliometric analysis and social network analysis. In this study, a total of 8,570 papers from 15 core journals during the period of 2000-2011 were collected. The results indicate that 66 % of papers are joint publications in global LIS. Two-country papers and two-institution papers are the two primary collaboration patterns in the international collaboration at the country and institution levels respectively. Through social network analysis, it is observed that the country collaboration network has reached a certain degree of maturity over the past 12 years in global LIS, while the international institution collaboration network has not yet matured and is made up of dozens of components. In the country collaboration network, the position of USA and UK are remarkable. Although the USA is positioned at the center of the network, institutions located in the USA are more inclined to have collaboration within domestic, suggesting institutions in the USA have a low tendency towards international collaboration. In the institution collaboration network, it is found that two groups located in the USA and Europe respectively. The results of the institution collaboration network also reveal that Katholieke Univ Leuven has not only the largest collaboration breadth, but also strong capabilities to control communication within the international institution collaboration network.
Research and development of rice, a major crop, has been promoted on an interdisciplinary basis with the involvement of various research fields ranging from natural sciences to socioeconomics in Japan. This paper focuses on the structure of interdisciplinarity in Japanese rice research and technology development by analyzing the relationship among all relevant disciplines with the use of a compiled bibliography of Japanese rice research with 19,389 articles in 1,611 journals in the publishing years of 1990-2000. The relationship among the disciplines was characterized by the frequency distribution of articles among journals classified into 24 categories based on the law of scattering originally identified by Bradford (Engineering 13:785-786, 1934). The 24 journal categories ranked in decreasing order of productivity of articles were divided into 3 zones; the first nuclear zone with a smaller number of highly productive journal disciplines; the second zone with a large number of less productive disciplines; and the last zone with a larger number of the least productive disciplines, which characterized the structure of interdisciplinarity in Japanese rice research and technology development. Other aspects of the interdisciplinarity were further explored with reference to peripheral journals with a minimal number of papers on a certain subject, and the Groos droop phenomenon at the end of Bradford's S-shape curve that is the region of the least productive journals with only one paper on a certain subject, by analyzing the frequency distribution of articles in journal categories.
This paper analyses the scientific cooperation between German and Chinese institutions in the field of the life sciences on the basis of co-publications published between 2007 and 2011 in Web of Science covered sources. After analyzing the global output of publications in the life sciences, and identifying China's most important international partners on country level, this study focuses on a network and cluster analysis of German-Chinese co-publications on an institutional level. Cleaning and standardizing all German and Chinese addresses, a total of 531 German and 700 Chinese institutions were identified that co-published together in the period under analysis. Disaggregating the institutes of Chinese Academy of Sciences made it possible to obtain more meaningful information on existing co-publication structures. Using VOSviewer the German-Chinese collaboration network in the life sciences is visualized and clusters of similar institutions identified. The seven most important clusters of German-Chinese co-publications partners are briefly described providing background information for funding agencies such as the German Federal Ministry of Education and Research or researchers in the life sciences, who wish to establish collaborations with German or Chinese institutions.
This study was designed to evaluate China's scientific output of chemical engineering in Science Citation Index Expanded in the Web of Science from 1992 to 2011. The document type, language, trend and collaboration patterns were analyzed, as well as the output of different journals. Distributions of article titles and abstracts, author keywords, KeyWords Plus of different periods, and the most cited articles were studied to figure out the research focuses and trends. Chinese Journal of Catalysis, Industrial & Engineering Chemistry Research, and Chinese Journal of Chemical Engineering published most of Chinese articles in the area of chemical engineering. The Chemical Engineering Department of Tsinghua University, Zhejiang University, Tianjin University, and East China University of Science and Technology were the top four institutions that published most articles in China. This study showed that adsorption, photocatalysis and synthesis have been the hot points of research in the past two decades, while ionic liquid tends to be the new area of special interest in future. Pseudo-second order model for sorption processes is getting more and more popular with great influence since its publication. In addition, the ratio of institutional independent articles: nationally collaborative articles: internationally collaborative articles has been developed to compare different institutions' publication characteristics.
This study aimed to identify and analyze characteristics of classic articles published in the Web of Science social work subject category from 1856 to 2011. Articles that have been cited at least 50 times were assessed regarding publication outputs, distribution of outputs in journals, publications of authors, institutions, countries as well as citation life cycles of articles with the highest total citations since its publication up to 2011 and the highest citations in 2011. Five bibliometric indicators were used to evaluate source countries, institutions, and authors. Results showed that 721 of the most highly referenced articles, published between 1957 and 2008, had been cited at least 50 times. Child Abuse & Neglect and American Journal of Community Psychology published the most classic articles. USA produced 89 % of classic articles and also published the most number of single, internationally collaborative, first author, and corresponding author classic articles. The top 38 productive institutions were all located in the US. The University of Illinois was the most productive institution for the total classic articles while University of California, Los Angeles produced the most inter-institutionally collaborative articles and Arizona State University published the most single institution articles. Furthermore, a new indicator, Y-index was successfully applied to evaluate publication characteristics of authors and institutions. High percentage of authors had the same numbers of first author and corresponding author status of classic articles in social work field.
Date palm (Phoenix dactylifera) is one of the commonly used polyphenolic rich fruits attributing also to various therapeutic effect in different diseases and disorders. We aimed to study and analyse the global research output related to date palm based on a fact of its large consumption and production in Middle East. We analysed 1,376 papers obtained from SCOPUS database for the period of 2000-11. The study examines major productive countries and their citation impact. We have also analysed inter-collaborative linkages, national priorities of date palm research, besides analysing the characteristics of its high productivity institutions, authors and journal.
Benford's Law is a logarithmic probability distribution function used to predict the distribution of the first significant digits in numerical data. This paper presents the results of a study of the distribution of the first significant digits of the number of articles published of journals indexed in the JCR(A (R)) Sciences and Social Sciences Editions from 2007 to 2011. The data of these journals were also analyzed by the country of origin and the journal's category. Results considering the number of articles published informed by Scopus are also presented. Comparing the results we observe that there is a significant difference in the data informed in the two databases.
Citation time series are not easy to compile from the most popular databases. The Data for Research service of the JSTOR journal database is a large and high-quality sample of citations, weighted towards humanities and social sciences. It provides time series citation data over many decades, back to the origins of the constituent journals. The citation trajectories of Nobel Prize winners in economics are analyzed here from 1930 to 2005. They are described mathematically by means of the Bass model of the diffusion of innovations. A bell-shaped curve provides a good fit with most prize winner citation trajectories, and suggests that economic knowledge follows the typical innovation cycle of adoption, peak, and decline within scholarly careers and shortly afterwards. Several variant trajectories are described.
The number of LA-C indexed journals in WoS has increased from 69 to 248 titles in just a period of four years (2006-2009). This unprecedented growth is related to a change in the editorial policy of WoS rather than to a change in the LA-C scientific community. We find that in the LA-C region, Brazil had the largest increase in its WoS production that also corresponded to a large increase in its production in its indexed local journals. As a consequence, Portuguese has been promoted to the second scientific language, only after English, in the LA-C production in WoS. However, while the Brazilian production in its local journals represents about one quarter of its whole WoS production, it shows a rather little effect on the respective number of citations. The rest of the LA-C countries represented in WoS still show very low levels in production and impact. Scopus has also enlarged considerably the database's coverage of LA-C journals but with a steady growth in the period considered in this study.
Since the 1990s, the scope of research evaluation has widened to encompass the societal products (outputs), societal use (societal references) and societal benefits (changes in society) of research. Research evaluation has been extended to include measures of the (1) social, (2) cultural, (3) environmental and (4) economic returns from publicly funded research. Even though no robust or reliable methods for measuring societal impact have yet been developed. In this study, we would like to introduce an approach which, unlike the currently common case study approach (and others), is relatively simple, can be used in almost every subject area and delivers results regarding societal impact which can be compared between disciplines. Our approach to societal impact starts with the actual function of science in society: to generate reliable knowledge. That is why a study (which we would like to refer to as an assessment report) summarising the status of the research on a certain subject represents knowledge which is available for society to access. Societal impact is given when the content of a report is addressed outside of science (in a government document, for example).
Translational medical research literatures have increased rapidly in last decades and there have been fewer attempts or efforts to map global research context of translational medical related research. The main purpose of this study is to evaluate the global progress and to assess the current quantitatively trends on translational medical research by using a scientometric approach to survey translational medicine related literatures in Science Citation Index Expanded (SCI-E), Social Science Citation Index and PubMed database from 1992 to 2012. The scientometric methods and knowledge visualization technologies were employed in this paper. The document types, languages, publication patterns, subject categories, journals, geographic and institutional distributions, top cited papers, and the distribution of keywords as well as MeSH terms were thoroughly examined. Translational medicine research has increased rapidly over past 20 years, most notably in the last 4 years. In total, there are currently 3,627 research articles in 1,062 journals listed in 91 SCI-E subject categories. The top 20 productive countries and institutes were analyzed herein, where 11 key papers in translational medical research and research foci were identified. Research outputs descriptors have suggested that the presence of a solid development in translational medical research, where research in this field has mainly focused on experimental medicine, general internal medicine, and medical laboratory technologies. All these outcomes have been concentrated in several journals such as Translational Research, Translational Oncology, Translational Stroke Research, and Translational Neuroscience. G7 countries make up the leading nations for translational medical research, where the center is located in USA. American institutions have made great advances in paper productions, citations, and cooperation, with overall great strengths and good development prospects. Moreover, the evolution pathway of translational medical research has been summarized as bellows: problems emerged, causes analyzed, challenges faced and solutions proposed, translational medical research programs been formally established, theoretical and applied research, all of which was in full swing. During this process, neoplasms and genomics, interdisciplinary communication between academic medical centers/institutes, drug design and development, cardiovascular and brain diseases, and even biomedical research have been identified as mainstream topics in translational medical research fields.
Few comprehensive and long time-span studies have examined the Information and Communication Technologies (ICT) sector in China and its implications for China's national and regional innovation system. Taking advantage of the patents granted by the State Intellectual Property Office of the People's Republic of China from 1985 to 2010, this paper examined innovation performance in the Chinese ICT industry with the help of bibliometric techniques. The analysis has been conducted from several perspectives, including the trend and character of patent outputs, the most prolific Chinese regions and their changes, the primary innovators and their type of institutions, and the collaboration among university (U)-industry (I)-research institutes (R). The results show that the great importance that the government and domestic enterprises attach to technology R&D and patent protection has brought significant improvements in the Chinese ICT sector, and enterprises have thus gradually become the main body of technological innovation in recent years. In terms of U-I-R collaborations, I-I collaborations are the most popular pattern, followed by U-I and I-R collaborations. In the last 20 years or so, U-I-R collaborations have improved, but they are still weak. In the future, U-I-R collaborations should be further reinforced, and more universities and research institutes should be encouraged to become involved in U-I-R collaborations to help enterprises enhance their innovative capabilities.
This paper reports on a bibliometric analysis of environmental sciences research in northern Australia between 2000 and 2011. It draws on publications data for Charles Darwin University (CDU) and James Cook University (JCU) researchers to present a bibliometric profile of the journals in which they publish, the citations to their research outputs, and the key research topics discussed in the publications. Framing this analysis, the study explored the relationship between the two universities' publications and their 'fit' with the environmental sciences field as defined by the Australian research assessment model, Excellence in Research for Australia (ERA). The Scopus database retrieved more records than Web of Science, although only minor differences were seen in the journals in which researchers published most frequently and the most highly cited articles. Strong growth in publications is evident in the 12 year period, but the journals in which the researchers publish most frequently differ from the journals in which the most highly cited articles are published. Many of the articles by CDU and JCU affiliated researchers are published in journals outside of the environmental sciences category as defined by Scopus and Web of Science categories and the ERA, however, the research conducted at each university aligns closely with that institution's research priorities.
This paper analyzes the positions of institutions from the private domain in bibliometric rankings of as many as 27,000 research institutions and highlights factors that are crucial for a proper interpretation of such positions. It was found that among the institutions with the largest output in terms of published research articles, private firms are underrepresented, whereas in the top quartile of institutions with the largest citation impact firms are overrepresented. A firm's publication output is not a good indicator of its R&D investment: big firms in Pharmaceutics are both heavy investors in R&D and frequent publishers of scientific articles, whereas in Automobiles firms tend to invest heavily in R&D but their publication output is low. This is ascribed to the fact that the former need a validation of their results by the scientific community, while the latter do less so. Private institutions generating the largest citation impact tend to collaborate with the best public research institutions. This reflects the crucial importance of publicly funded research for the private sector.
In addition to the factor of impact and other bibliometric indices, generation of a net profit year on year plays a central role in measuring overall journal publishing performance. However, some business models do not allow the academic journals continue to thrive since they are not financially sustainable. It raises a number of questions which have to be answered: how does the journal's wealth grow given a particular allocation strategy of journal resources? What is the optimal allocating strategy of journal resources that maximizes the growth rate of journal's wealth? What is the value of the side information for the selection of high-quality manuscripts? And, what is the effective growth rate of journal's wealth if there exists dependence among successive selections of high quality manuscripts? This paper proves that information theoretic quantities like entropy and mutual information arise as the answers to these fundamental questions in the selection of high-quality manuscripts and allocation of journal's wealth. Based on the uncovered relationships between the growth rate of journal's wealth and the selection of high-quality manuscripts, we propose a number of basic guidelines for improving the journal publishing performance (e.g., match probabilities of high quality when placing the allocations of journal resources among the submitted manuscripts and focus on management practices that promote selection processes with less uncertainty of the outcome).
The main goal of this research is to analyze the web structure and performance of units and services belonging to U.S. academic libraries in order to check their suitability for webometric studies. Our objectives include studying their possible correlation with economic data and assessing their use for complementary evaluation purposes. We conducted a survey of library homepages, institutional repositories, digital collections, and online catalogs (a total of 374 URLs) belonging to the 100 U.S. universities with the highest total expenditures in academic libraries according to data provided by the National Center for Education Statistics. Several data points were taken and analyzed, including web variables (page count, external links, and visits) and economic variables (total expenditures, expenditures on printed and electronic books, and physical visits). The results indicate that the variety of URL syntaxes is wide, diverse and complex, which produces a misrepresentation of academic libraries' web resources and reduces the accuracy of web analysis. On the other hand, institutional and web data indicators are not highly correlated. Better results are obtained by correlating total library expenditures with URL mentions measured by Google (r = 0.546) and visits measured by Compete (r = 0.573), respectively. Because correlation values obtained are not highly significant, we estimate such correlations will increase if users can avoid linkage problems (due to the complexity of URLs) and gain direct access to log files (for more accurate data about visits).
To find out whether replication of methods section in biosciences papers is a kind of plagiarism, the authors firstly surveyed the behavior of authors when writing the methods section in their published papers. Then the descriptions of one well-established method in randomly selected papers published in eight top journals were analyzed using CrossCheck to identify the extent of duplication. Finally, suggestions on preparing the methods sections were given. The survey results show that an author may employ different approaches to writing the methods section within a paper, repeating published methods is more often than give citation only or rewrite complete using one's own words. Authors are more likely to repeat the description of a method than simply to provide a citation. From the samples of the eight leading journals, plagiarize is very rare in such journals; Learning from Science, attachment may be a considerable choice for papers with common methods.
Academic effectiveness of universities is measured with the number of publications and citations. However, accessing all the publications of a university reveals a challenge related to the mistakes and standardization problems in citation indexes. The main aim of this study is to seek a solution for the unstandardized addresses and publication loss of universities with regard to this problem. To achieve this, all Turkey-addressed publications published between 1928 and 2009 were analyzed and evaluated deeply. The results show that the main mistakes are based on character or spelling, indexing and translation errors. Mentioned errors effect international visibility of universities negatively, make bibliometric studies based on affiliations unreliable and reveal incorrect university rankings. To inhibit these negative effects, an algorithm was created with finite state technique by using Nooj Transducer. Frequently used 47 different affiliation variations for Hacettepe University apart from "Hacettepe Univ" and "Univ Hacettepe" were determined by the help of finite state grammar graphs. In conclusion, this study presents some reasons of the inconsistencies for university rankings. It is suggested that, mistakes and standardization issues should be considered by librarians, authors, editors, policy makers and managers to be able to solve these problems.
It has been widely discussed how individuals change the way they act and react in studies just because they are under observation. In this paper, we try to analyse how this so-called Hawthorne effect applies to researchers that are the subject of bibliometric investigations. This encompasses individual assessments as well as international performance comparisons. We test various bibliometric indicators for notable changes in the last decade from a world-wide perspective and deduce explanations for changes from the observations. We then concentrate on the behaviour of German authors in particular, to show national trends. The German publication behaviour is evaluated in regard to citation rates and collaborations in publications and size, publisher country and impact of the journals chosen for publication. We can conclude that authors adapt their publication behaviour to aim for journals that are more internationally known and have a US publisher. Also, a trend from more specialized journals to journals with a broader scope can be observed that raises the question whether the implicit penalization of specialized fields in the bibliometrics leads to undesired shifts in conducted research.
With the world in the midst of an energy crisis, recent research has placed considerable emphasis on harnessing renewable and sustainable energy while efficiently using fossil fuels. Researchers create and sustain academic societies as a result of social interactions. This study takes a social network perspective to understand researchers' associations using two Organisation of Islamic Co-operation nations, Turkey and Malaysia, in the fast-developing field of 'Energy Fuels'. The study found both similarities and differences in the scholarly networks of these two countries. The mean distance between the authors in the Turkey and Malaysia networks was 8.4 and 6.5, respectively, confirming the small world nature of these networks. The popularity, position, and prestige of the authors in the network, as determined through centrality measures, had a statistically significant effect on research performance. These measures, however, were far more correlated with the research performance of the authors in the Malaysia network than in the Turkey network. PageRank centrality was found to be the most efficient topological measure when it came to correlation with research performance. We used authors' 'degree' to reach to the 'core' ('Deg-Core') of the network (in contrast to the K-Core method), which was found to capture more productive authors. A method to detect academic communities of productive authors by extracting motifs (large cliques) from the network is suggested. Finally, we visualize the cognitive structure of both countries using a 2-mode network representing research focus areas (RFAs) and prominent authors working in these RFAs.
Know and compare the Brazilian scientific production of researchers that did full PhD in Brazil or abroad may be important to evaluate the development of science in the country. In this context, the current study was planned to verify the evolution of scientific production of researchers that concluded PhD in Brazil or abroad between 1997 and 2002. The evaluation included specifically the scientific production of PhDs in the areas of biochemistry, physiology and pharmacology during the period of 9 years after the PhD conclusion. The data were obtained from the database of CAPES (Foundation for Higher Education Development in Brazil), CNPq (National Council of Technological and Scientific Development), Lattes, Web of Science (Institute for Scientific Information (ISI) and Scival-Scopus). In terms of quantity, researchers that did full PhD in Brazil published more articles than the researchers that did it abroad. However, articles from researchers that did the PhD in Brazil were published in journals with lower impact factor and received less citation than the articles published by researchers that did PhD abroad. The results indicate that the qualitative performance of researchers that did PhD abroad was better than those who did PhD in Brazil. Consequently, the policies of Brazilian government need to be devoted to enhance the relevance of Brazilian articles in terms of scientific quality and international insertion.
This paper investigates the technological innovation capabilities of the three Asian latecomers-namely Taiwan, China, and Korea-in the emergent solar photovoltaic industry. For this study, I deploy a new dataset of 75,540 solar photovoltaic patents taken out by Taiwan, China and Korea over the period of 31 years (1978-2008) and analyse the evolving technological innovation capabilities revealed in these patents using a set of four technology platforms that I constructed. This study demonstrates the patent portfolios of the three latecomers and explores what extent have the Taiwanese, Chinese, and Korean followers developed their technological innovation capabilities so as to surpass the US, German, and Japan and acquire the leading production positions; and how the variations of technological innovation capabilities among the major producers influence their business activities in the global solar photovoltaic industry. The results show the various strategies adopted by Taiwan, China, and Korea to develop their solar photovoltaic industries, reflect their different national innovation systems involved, and response to the current trends of technology development in the global solar photovoltaic industry.
Our aim was to evaluate the impact of anatomy as a multidisciplinary area and to identify trends in research by anatomists over time. Data from three main sources were analyzed: SCImago Journal & Country Rank (SJR), using the number of total documents as indicator; MEDLINE (PubMed) database (1898 through October 2012), using the keyword "anatomy" in the "affiliation" field; and the Journal Citation Report (JCR), gathering impact factor and quartile data. The number of publications by anatomists increased between 1898 and 1941, followed by a reduction until 1961 and then by a marked rise to reach 36,686 between 2002 and 2012. After 1941, anatomists began to publish in journals from JCR categories other than "Anatomy & Morphology", especially after 1962. Between 2007 and 2012, only 22.23 % of articles by anatomists in JCR-indexed journals were in the "Anatomy & Morphology" area and 77.77 % in journals from other categories; 58 % of their articles were in journals in the first and second quartiles. The contribution of anatomists to scientific knowledge is high quality and considerably greater than indicated by the SJR database. This input is especially relevant in the Neurosciences, Cell Biology, and Biology categories. In addition, more than two-thirds of manuscripts by anatomists appear in JCR-ranked publications, and more than half in the top two quartiles of the impact factor ranking. Our results show that the scientific production of anatomists has improved the quantity and quality of multi-disciplinary scientific activity in different knowledge areas.
Although, women's contribution to science is crucial to social development, gender difference has been for a long time affecting the quantity and quality of scholarly activity. In spite of some improvements, women are still suffering from gender gap and biases in science world. Using a scientometric method with a comparative approach, the present communication aims to study women performance in Nano Science & Technology in terms of their scientific productivity and impact and to contrast them to their male counterparts. The significance of the study relies on the importance of a balanced development of human society in general and in different scientific milieus in specific. According to the research results, although female Nano-researchers are scarce in number, they equally perform in terms of scientific productions and impacts. That may imply gender egalitarianism in the field.
This paper seeks to map out the emergence and evolution of entrepreneurship as an independent field in the social science literature from the early 1990s to 2009. Our analysis indicates that entrepreneurship has grown steadily during the 1990s but has truly emerged as a legitimate academic discipline in the latter part of the 2000s. The field has been dominated by researchers from Anglo-Saxon countries over the past 20 years, with particularly strong representations from the US, UK, and Canada. The results from our structural analysis, which is based on a core document approach, point to five large knowledge clusters and further 16 sub-clusters. We characterize the clusters from their cognitive structure and assess the strength of the relationships between these clusters. In addition, a list of most cited articles is presented and discussed.
Although bibliometrics has been a separate research field for many years, there is still no uniformity in the way bibliometric analyses are applied to individual researchers. Therefore, this study aims to set up proposals how to evaluate individual researchers working in the natural and life sciences. 2005 saw the introduction of the h index, which gives information about a researcher's productivity and the impact of his or her publications in a single number (h is the number of publications with at least h citations); however, it is not possible to cover the multidimensional complexity of research performance and to undertake inter-personal comparisons with this number. This study therefore includes recommendations for a set of indicators to be used for evaluating researchers. Our proposals relate to the selection of data on which an evaluation is based, the analysis of the data and the presentation of the results.
An effective bibliometric analysis based on the Science Citation Index-Expanded database was conducted to evaluate earth science sediment-related research from different perspectives from 1992 to 2011. The geographical influences of the authors were subsequently visualized. Sediment-related research experienced notable growth in the past two decades. Multidisciplinary geosciences and environmental sciences were the two major categories, and Environmental Science and Technology was the most active journal. Damst, JSS and Schouten S were the two most prolific authors with the most high-quality articles and the greatest geographic influences. The major spatial clusters of authors overlapped quite well with regions with high economic growth in the USA, Western Europe, and Eastern Asia. The USA was the largest contributor in global sediment research with the most independent and collaborative papers, and the dominance of the USA was also confirmed in the national collaboration network. National academic output was positively associated with its economic capability. The Chinese Academy of Sciences, the US Geological Survey and the Russian Academy of Sciences were the three major contributing institutions. A keywords analysis determined that "evolution", "water", "soil(s)", and "model" were consistent hotspots in sediment research. Several keywords such as "organic-matter", "Holocene", "dynamics", "erosion", "sediment transport", "climate", and "heavy-metal" received dramatically increased attention during the study period. Through co-word analysis, significant differences were observed between environmental and multidisciplinary geosciences in terms of the most frequently used keywords, and the prevalent research topic patterns were ascertained.
Apart from a few bibliometrical studies the South African scientific system is a scantly researched area and asking for more empirical evidence. This empirical study of academics and researchers (n = 204) from a selected province of South Africa examines the interrelationship between publication productivity and collaboration, and the sectoral differences between higher education institutions and research institutes. The study highlights the specific context of the scientific system in South Africa with its characteristics features of productivity and collaboration and shows how they are structurally facilitated and hindered. Being a prominent contributor to the development of science in Africa the study offers some interesting findings.
Recent discussion about the increase in international research collaboration suggests a comprehensive global network centred around a group of core countries and driven by generic socio-economic factors where the global system influences all national and institutional outcomes. In counterpoint, we demonstrate that the collaboration pattern for countries in Africa is far from universal. Instead, it exhibits layers of internal clusters and external links that are explained not by monotypic global influences but by regional geography and, perhaps even more strongly, by history, culture and language. Analysis of these bottom-up, subjective, human factors is required in order to provide the fuller explanation useful for policy and management purposes.
Writing academic books is one of the core expressions of academic research. In the process of writing, the author cites many types of publications such as journals, journal articles, reports, web sources and books. Collecting and analyzing these citations in selected academic books, leads to the creation of book citation indexes. Based on this concept and design, the Chinese book citation index (CBkCI) is produced. The value of the CBkCI lies not only in filling the domestic vacuum in book citation, but also in promoting the quality of academic book publishing, and in contributing to a better library collection development. More importantly it helps to lay a solid foundation in the area of academic evaluation.
Harzing (Scientometrics, 2013) showed that between April 2011 and January 2012, Google Scholar has very significantly expanded its coverage in Chemistry and Physics, with a more modest expansion for Medicine and a natural increase in citations only for Economics. However, we do not yet know whether this expansion of coverage was temporary or permanent, nor whether a further expansion of coverage has occurred. It is these questions we set out to respond in this research note. We use a sample of 20 Nobelists in Chemistry, Economics, Medicine and Physics and track their h-index, g-index and total citations in Google Scholar on a monthly basis. Our data suggest that-after a period of significant expansion for Chemistry and Physics-Google Scholar coverage is now increasing at a stable rate. Google Scholar also appears to provide comprehensive coverage for the four disciplines we studied. The increased stability and coverage might make Google Scholar much more suitable for research evaluation and bibliometric research purposes than it has been in the past.
Fractional calculus generalizes integer order derivatives and integrals. During the last half century a considerable progress took place in this scientific area. This paper addresses the evolution and establishes an assertive measure of the research development.
This study compares the research productivity and impact of inbred and non-inbred faculty employed at Australian law schools. The sample consists of 429 academics, employed at 21 law schools. To measure research productivity and impact we use articles published in top law journals, defined in six different ways, as well as total citations and two different citation indices. We report results including, and excluding, publications in the academic's home law review. We find evidence that silver-corded faculty outperform other faculty on one of the measures of publications in top journals, once the endogeneity of academic seniority, grant history and the status of the law school at which the individual is employed is addressed, but this finding is not robust across alternative measures of articles published in the top journals. We find that there is no statistically significant difference between the research productivity and impact of inbred and non-inbred faculty. This finding is robust to a range of different ways of measuring research productivity and impact and alternative econometric approaches, including using two-stage least squares to address the endogeneity of academic seniority, grant history and the status of the law school at which the legal academic is employed.
Interdisciplinarity is as trendy as it is difficult to define. Instead of trying to capture a multidimensional object with a single indicator, we propose six indicators, combining three different operationalizations of a discipline, two levels (article or laboratory) of integration of these disciplines and two measures of interdisciplinary diversity. This leads to a more meaningful characterization of the interdisciplinarity of laboratories' publication practices. Thanks to a statistical analysis of these indicators on 600 CNRS laboratories, we suggest that, besides an average value of interdisciplinarity, different laboratories can be mainly distinguished by the "distance" between the disciplines in which they publish and by the scale at which interdisciplinary integration is achieved (article or laboratory).
This paper suggests a new method to search main path, as a knowledge trajectory, in the citation network. To enhance the performance and remedy the problems suggested by other researchers for main path analysis (Hummon and Doreian, Social Networks 11(1): 39-63, 1989), we applied two techniques, the aggregative approach and the stochastic approach. The first technique is used to offer improvement of link count methods, such as SPC, SPLC, SPNP, and NPPC, which have a potential problem of making a mistaken picture since they calculate link weights based on a individual topology of a citation link; the other technique, the second-order Markov chains, is used for path dependent search to improve the Hummon and Doreian's priority first search method. The case study on graphene that tested the performance of our new method showed promising results, assuring us that our new method can be an improved alternative of main path analysis. Our method's beneficial effects are summed up in eight aspects: (1) path dependent search, (2) basic research search rather than applied research, (3) path merge and split, (4) multiple main paths, (5) backward search for knowledge origin identification, (6) robustness for indiscriminately selected citations, (7) availability in an acyclic network, (8) completely automated search.
We have studied the effects of performance-based research funding introduced to the Czech (CZ) R&D system in 2008 on outputs of R&D results. We have analyzed annual changes in number of various types of publications and applications including patents before and after this change. The growth-rate of almost all types of results has accelerated in 2005 or 2006 and the increase continued till 2010. The growth of result quantity in the CZ has been faster than in seven other European countries selected for comparison. Because the accelerated growth has started already before 2008, implementation of the performance-based funding could not have been its cause. Likely cause of the growth could be either the evaluation of R&D institutions introduced in 2004 itself and/or growth of public R&D funding in the past decade. Because the increase of the citation impact of publications lagged behind the increase of their quantity, we conclude that the evaluation is not based on optimal indicators.
International collaboration has played an important role in the development of nanotechnology. Patents encompass valuable technological information and collaborative efforts. Thus, this paper examines international collaboration development in nanotechnology using patent network analysis. The results show that the number of international collaboration nanotechnology patents has increased steadily and the proportion of them of total nanotechnology patents has likewise exhibited an upward trend. USA has always been the most influential participant with largest number of international collaboration patents. Asian countries/regions have shown an obvious increase in the number of international collaboration patents. By contrast, there have shown a generally decline in European countries. More and more countries have become actively engaged in international collaboration in nanotechnology with increasingly closer relationships. Two styles of international collaboration exit: while USA, Germany, UK and Japan collaborate with a wide range of countries/regions; Spain, Israel, Russia, Singapore and Taiwan are more selective in their collaboration partners. Though International collaboration has yet to find global significance in terms of patent citation impacts, it has nevertheless been incremental in improving patent citation impacts for most of the top 20 countries/regions since 2004.
This paper examines impact of gender both on publication productivity and on patterns of scientific collaborations in social sciences in Turkey. The research is based on bibliographic data on national level publications in Turkey. It consists of 7,835 papers written by 6,738 scientists. The findings suggest that (1) there are gender differences at publication productivity, participation, presence and contribution; that (2) there are significantly different tendencies at keeping established co-authorship ties for inter-gender and intra-gender pairs; that (3) there are significant regularities exhibited by coauthor pairs based on each partner author's publication productivity and findings further show that (4) regularities are different for inter-gender and intra-gender co-authorships. This study contributes to literature by exemplifying an integrated approach to better examine role of gender in scientific collaborations. In addition to descriptive social network analysis methods, it exploits and adopts parametric models from the literature: (1) Social Gestalt theory, a model based on bi-variate distributions of co-author pairs' frequencies; (2) Lotka's power law distribution on publication productivity of single authors; (3) Power law distributions of co-author pairs' frequencies.
The model proposed by Burrell (Information Processing and Management 28:637-645, 1992, Journal of Informetrics 1:16-25, 2007a) to describe the way that an individual author's publication/citation career develops in time is investigated further, the aim being to describe in more detail the form of the citation distribution and the way it evolves over time. Both relative and actual frequency distributions are considered. Theoretical aspects are developed analytically and graphically and then illustrated using small empirical data sets relating to some well-known informetrics scholars. Perhaps surprisingly, it is found that the distribution may well be approximated in some cases by a simple geometric distribution.
There is a wealth of research on technological learning in developing countries, but few scholars have clearly addressed the issue of learning time in an empirical way. This paper aims to fill this void by presenting an empirical investigation of the time needed by Chinese firms to learn from the technologies that they have in-licensed. Furthermore, we analyzed in detail the antecedents leading to an acceleration or deceleration of the learning process among Chinese licensees. The results of an event history analysis indicate that recipient firms take on average 5.8 years to learn from their in-licensed technologies. The absorptive capacity and firm age of the licensees, the technology licensing scale, the age of the licensed technology, and the desorptive capability of the licensor firm all play a role in shortening the learning time.
The scientific community of researchers in a research specialty is an important unit of analysis for understanding the field-specific shaping of scientific communication practices. These scientific communities are, however, a challenging unit of analysis to capture and compare because they overlap, have fuzzy boundaries, and evolve over time. We describe a network analytic approach that reveals the complexities of these communities through the examination of their publication networks in combination with insights from ethnographic field studies. We suggest that the structures revealed indicate overlapping subcommunities within a research specialty, and we provide evidence that they differ in disciplinary orientation and research practices. By mapping the community structures of scientific fields we increase confidence about the domain of validity of ethnographic observations as well as of collaborative patterns extracted from publication networks thereby enabling the systematic study of field differences. The network analytic methods presented include methods to optimize the delineation of a bibliographic data set to adequately represent a research specialty and methods to extract community structures from this data. We demonstrate the application of these methods in a case study of two research specialties in the physical and chemical sciences.
The author analyzes retracted biomedical literature to determine if open access and fee-for-access works differ in terms of the practice and effectiveness of retraction. Citation and content analysis were applied to articles grouped by accessibility (libre, gratis, and fee for access) for various bibliometric attributes. Open access literature does not differ from fee-for-access literature in terms of impact factor, detection of error, or change in postretraction citation rates. Literature found in the PubMed Central Open Access subset provides detailed information about the nature of the anomaly more often than less accessible works. Open access literature appears to be of similar reliability and integrity as the population of biomedical literature in general, with the added value of being more forthcoming about the nature of errors when they are identified.
There is increasing interest in topics at the nexus of collaboration and information behavior. A variety of studies conducted in organizational settings have provided us with key insights about the collaborative aspects of seeking, retrieving, and using information. Researchers have used a range of terms, including collaborative information seeking (CIS), collaborative information retrieval (CIR), collaborative search, collaborative sensemaking, and others to describe various pertinent activities. Consequently, we lack conceptual clarity concerning these activities, leading to a tendency to use terms interchangeably when in fact they may be referring to different issues. Here, we offer collaborative information behavior (CIB) as an umbrella term to connote the collaborative aspects of information seeking, retrieval, and use. We provide the contours of a model of CIB synthesized from findings of past studies conducted by our research team as well as other researchers. By reanalyzing and synthesizing the data from those studies, we conceptualize CIB as comprised of a set of constitutive activities, organized into three broad phasesproblem formulation, collaborative information seeking, and information use. Some of the activities are specific to a particular phase, whereas others are common to all phases. We explain how those constitutive activities are related to one another. Finally, we discuss the limitations of our model as well as its potential usefulness in advancing CIB research.
Social web content such as blogs, videos, and other user-generated content present a vast source of rich digital-traces of individuals' experiences. The use of digital traces to provide insight into human behavior remains underdeveloped. Recently, ontological approaches have been exploited for tagging and linking digital traces, with progress made in ontology models for well-defined domains. However, the process of conceptualization for ill-defined domains remains challenging, requiring interdisciplinary efforts to understand the main aspects and capture them in a computer processable form. The primary contribution of this article is a theory-driven approach to ontology development that supports semantic augmentation of digital traces. Specifically, we argue that (a) activity theory can be used to develop more insightful conceptual models of ill-defined activities, which (b) can be used to inform the development of an ontology, and (c) that this ontology can be used to guide the semantic augmentation of digital traces for making sense of phenomena. A case study of interpersonal communication is chosen to illustrate the applicability of the proposed multidisciplinary approach. The benefits of the approach are illustrated through an example application, demonstrating how it may be used to assemble and make sense of digital traces.
As the information field (IField) becomes more recognized by different constituencies for education and research, the need to better understand its intellectual characteristics becomes more compelling. Although there are various conceptualizations of the IField, to date, in-depth studies based on empirical evidence are scarce. This article reports a study that fills this gap. We focus on the first five ISchools in the ICaucus as a proxy to represent the IField. The intellectual characteristics are depicted by two independent sets of data on tenure track faculty as knowledge contributors: their intellectual heritages and the intellectual substance in their journal publications. We use a critical analysis method to examine doctoral training areas and 3 years of journal publications. Our results indicate that (a) the IField can be better conceptualized with empirical support by a four-component model that includes People, Information, Technology, and Management, as predicted by the I-Model (Zhang & Benjamin, 2007); (b) the ISchools' faculty members are diverse, interdisciplinary, and multidisciplinary as shown by their intellectual heritages, by their research foci, by journals in which they publish, by the contexts within which they conduct research, and by the levels of analysis in research investigations; (c) the five ISchools share similarities while evincing differences in both faculty heritages and intellectual substances; (d) ISchool tenure track faculty members do not collaborate much with each other within or across schools although there is great potential; and (e) intellectual heritages are not good predictors of scholars' intellectual substance. We conclude by discussing the implications of the findings on IField identity, IField development, new ISchool formation and existing ISchool evolution, faculty career development, and collaboration within the IField.
In this paper we present results from an investigation of religious information searching based on analyzing log files from a large general-purpose search engine. From approximately 15 million queries, we identified 124,422 that were part of 60,759 user sessions. We present a method for categorizing queries based on related terms and show differences in search patterns between religious searches and web searching more generally. We also investigate the search patterns found in queries related to 5 religions: Christianity, Hinduism, Islam, Buddhism, and Judaism. Different search patterns are found to emerge. Results from this study complement existing studies of religious information searching and provide a level of detailed analysis not reported to date. We show, for example, that sessions involving religion-related queries tend to last longer, that the lengths of religion-related queries are greater, and that the number of unique URLs clicked is higher when compared to all queries. The results of the study can serve to provide information on what this large population of users is actually searching for.
A new method for visualizing the relatedness of scientific areas has been developed that is based on measuring the overlap of researchers between areas. It is found that closely related areas have a high propensity to share a larger number of common authors. A method for comparing areas of vastly different sizes and to handle name homonymy is constructed, allowing for the robust deployment of this method on real data sets. A statistical analysis of the probability distributions of the common author overlap that accounts for noise is carried out along with the production of network maps with weighted links proportional to the overlap strength. This is demonstrated on 2 case studies, complexity science and neutrino physics, where the level of relatedness of areas within each area is expected to vary greatly. It is found that the results returned by this method closely match the intuitive expectation that the broad, multidisciplinary area of complexity science possesses areas that are weakly related to each other, whereas the much narrower area of neutrino physics shows very strongly related areas.
STEM, a set of fields that includes science, technology, engineering, and mathematics; allied disciplines ranging from environmental, agricultural, and earth sciences to life science and computer science; and education and training in these fields, is clearly at the top of the list of priority funding areas for governments, including the United States government. The U.S. has 11 federal agencies dedicated to supporting programs and providing funding for research and curriculum development. The domain of STEM education has significant implications in preparing the desired workforce with the requisite knowledge, developing appropriate curricula, providing teachers the necessary professional development, focusing research dollars on areas that have maximum impact, and developing national educational policy and standards. A complex undertaking such as STEM education, which attracts interest and valuable resources from a number of stakeholders needs to be well understood. In light of this, we attempt to describe the underlying structure of STEM education, its core areas, and their relationships through co-word analyses of the titles, keywords, and abstracts of the relevant literature using visualization and bibliometric mapping tools. Implications are drawn with respect to the nature of STEM education as well as curriculum and policy development.
In this article, we propose a new method to analyze structural changes in networks over time and examine how the representation of the world in two leading newspapers, the New York Times and Der Spiegel, has changed during the past 50 years. We construct international networks based on the co-occurrences of country names in news items and trace changes in their distribution of centrality over time. Supporting previous studies, our findings indicate a consistent gap between the most central and the least central countries over the years, with the United States remaining at the center of the network and African countries at its peripheries. Surprisingly, the most dynamic changes in the past 50 years occurred in what we call the middle range. In both outlets, we identified a trend of convergence, in other words, a more equal centrality of European, Middle Eastern, and Asian countries in the news. The implications of these findings are discussed.
Recommendation systems often compute fixed-length lists of recommended items to users. Forcing the system to predict a fixed-length list for each user may result in different confidence levels for the computed recommendations. Reporting the system's confidence in its predictions (the recommendation strength) can provide valuable information to users in making their decisions. In this article, we investigate several different displays of a system's confidence to users and conclude that some displays are easier to understand and are favored by most users. We continue to investigate the effect confidence has on users in terms of their perception of the recommendation quality and the user experience with the system. Our studies show that it is not easier for users to identify relevant items when confidence is displayed. Still, users appreciate the displays and trust them when the relevance of items is difficult to establish.
Based on real-world user demands, we demonstrate how animated visualization of evolving text corpora displays the underlying dynamics of semantic content. To interpret the results, one needs a dynamic theory of word meaning. We suggest that conceptual dynamics as the interaction between kinds of intellectual and emotional content and language is key for such a theory. We demonstrate our method by two-way seriation, which is a popular technique to analyze groups of similar instances and their features as well as the connections between the groups themselves. The two-way seriated data may be visualized as a two-dimensional heat map or as a three-dimensional landscape in which color codes or height correspond to the values in the matrix. In this article, we focus on two-way seriation of sparse data in the Reuters-21568 test collection. To achieve a meaningful visualization, we introduce a compactly supported convolution kernel similar to filter kernels used in image reconstruction and geostatistics. This filter populates the high-dimensional sparse space with values that interpolate nearby elements and provides insight into the clustering structure. We also extend two-way seriation to deal with online updates of both the row and column spaces and, combined with the convolution kernel, demonstrate a three-dimensional visualization of dynamics.
Using the option Analyze Results with the Web of Science, one can directly generate overlays onto global journal maps of science. The maps are based on the 10,000+ journals contained in the Journal Citation Reports (JCR) of the Science and Social Sciences Citation Indices (2011). The disciplinary diversity of the retrieval is measured in terms of Rao-Stirling's quadratic entropy (Izsak & Papp, 1995). Since this indicator of interdisciplinarity is normalized between 0 and 1, interdisciplinarity can be compared among document sets and across years, cited or citing. The colors used for the overlays are based on Blondel, Guillaume, Lambiotte, and Lefebvre's (2008) community-finding algorithms operating on the relations among journals included in the JCR. The results can be exported from VOSViewer with different options such as proportional labels, heat maps, or cluster density maps. The maps can also be web-started or animated (e.g., using PowerPoint). The citing dimension of the aggregated journal-journal citation matrix was found to provide a more comprehensive description than the matrix based on the cited archive. The relations between local and global maps and their different functions in studying the sciences in terms of journal literatures are further discussed: Local and global maps are based on different assumptions and can be expected to serve different purposes for the explanation.
Many authors appear to think that most open access (OA) journals charge authors for their publications. This brief communication examines the basis for such beliefs and finds it wanting. Indeed, in this study of over 9,000 OA journals included in the Directory of Open Access Journals, only 28% charged authors for publishing in their journals. This figure, however, was highest in various disciplines in medicine (47%) and the sciences (43%) and lowest in the humanities (4%) and the arts (0%).
Many nations are adopting higher education strategies that emphasize the development of elite universities able to compete at the international level in the attraction of skills and resources. Elite universities pursue excellence in all their disciplines and fields of action. The impression is that this does not occur in "non-competitive" education systems, and that instead, within single universities excellent disciplines will coexist with mediocre ones. To test this, the authors measure research productivity in the hard sciences for all Italian universities over the period 2004-2008 at the levels of the institution, their individual disciplines and fields within them. The results show that the distribution of excellent disciplines is not concentrated in a few universities: top universities show disciplines and fields that are often mediocre, while generally mediocre universities will often include top disciplines.
Individuals and organisations producing information or knowledge for others sometimes need to be able to provide evidence of the value of their work in the same way that scientists may use journal impact factors and citations to indicate the value of their papers. There are many cases, however, when organisations are charged with producing reports but have no real way of measuring their impact, including when they are distributed free, do not attract academic citations and their sales cannot be tracked. Here, the web impact report (WIRe) is proposed as a novel solution for this problem. A WIRe consists of a range of web-derived statistics about the frequency and geographic location of online mentions of an organisation's reports. WIRe data is typically derived from commercial search engines. This article defines the component parts of a WIRe and describes how to collect and analyse the necessary data. The process is illustrated with a comparison of the web impact of the reports of a large UK organisation. Although a formal evaluation was not conducted, the results suggest that WIRes can indicate different levels of web impact between reports and can reveal the type of online impact that the reports have.
Even if integrative and complementary medicine (ICM) is a growing scientific field, it is also a highly contested area in terms of scientific legitimacy. The aim of this article is to analyze the reception of ICM research in scientific journals. Is this kind of research acknowledged outside the ICM context, for example, in general or specialized medicine? What is the impact of ICM research? and Is it possible to identify any shift in content, from the original ICM research to the documents where it is acknowledged? The material consisted of two sets: documents published in 12 ICM journals in 2007; and all documents citing these documents during the years 2007-2012. These sets were analyzed with help from citation and co-word analysis. When analyzing the citation pattern, it was clear that a majority of the cited documents were acknowledged in journals and documents that could be related to research areas outside the ICM context, such as pharmacology & pharmacy and plant science-even if the most frequent singular journals and subject categories were connected to ICM. However, after analyzing the content of cited and citing documents, it was striking how similar the content was. It was also evident that much of this research was related to basic preclinical research, in fields such as cell biology, plant pharmacology, and animal experiments.
This paper studies disciplinary differences in citation impacts of different types of co-publishing. The citation impacts of international, domestic inter-organizational and domestic intra-organizational co-publications, and single-authored publications, are compared. In particular, we examine the extent to which the number of authors explains the potential differences in citation impacts when compared to the influence of different types of international and domestic collaborations. The analysis is based on Finland's publications in Thomson Reuters Web of Science database in 1990-2008. Finland is a small country, thus, it has fewer opportunities to find collaborators inside own country when compared to larger countries. Finland's science policy has underlined internationalization and research collaboration as key means to increase the quality and impact of Finnish research. This study indicates that both international and domestic co-publishing have steadily increased during the past two decades in all disciplinary groups. International co-publications gain on average more citations than domestic co-publications. In natural sciences and engineering, co-authorship explains only a small proportion of variability in publications' citation rates. When the effect of the number of authors is taken into account there are no big differences in citation impacts between international and domestic co-publications. However, international co-publications by ten authors or more gather significantly more citations than other publications. In humanities, the difference in citation impacts between co-authored publications in relation to single-authored publications is significant. However, international co-publications are not on average more highly cited in relation to domestic co-publications in humanities.
Both citations to an academic work and post-publication reviews of it are indicators that the work has had some impact on the research community. The Thomson Reuters evaluation and selection process for web of knowledge journals includes citation analysis but this is not systematically practised for evaluation of books for the book citation index (BKCI) due to the inconsistent methods of citing books, the volume of books and the variants of the titles, especially in non-English language. Despite the fact that correlations between citations to a book and the number of corresponding book reviews differ from research area to research area and are overall weak or non-existent, this study confirms that books with book reviews do not remain uncited and accrue a remarkable mean number of citations. Therefore, book reviews can be considered a suitable selection criterion for BKCIs. The approach suggested in this study is feasible and allows easy detection of corresponding books via its book reviews, which is particularly true for research areas where books play a more important role such as the social sciences, the arts and humanities.
Rapid technological advancements and increasing research and development (R&D) costs are making it necessary for national R&D plans to identify the coreness and intermediarity of technologies in selecting projects and allocating budgets. Studies on the coreness or intermediarity of technology sectors have used patent citations, but there are limitations to dealing with patent data. The limitations arise from the most current patents and patents that do not require citations, e.g. Korean patents. Further, few or no studies have simultaneously considered both coreness and intermediarity. Therefore, we propose a patent co-classification based method to measure coreness and intermediarity of technology sectors by incorporating the analytic network process and the social network analysis. Using IPC co-classifications of patents as technological knowledge flows, this method constructs a network of directed knowledge flows among technology sectors and measures the long-term importance and the intermediating potential of each technology sector, despite the limitations of patent-based analyses. Considering both coreness and intermediarity, this method can provide more detailed and essential knowledge for decision making in planning national R&D. We demonstrated this method using Korean national R&D patents from 2008 to 2011. We expect that this method will help in planning national R&D in a rapidly evolving technological environment.
Many governments have placed priority on excellence in higher education as part of their policy agendas. Processes for recruitment and career advancement in universities thus have a critical role. The efficiency of faculty selection processes can be evaluated by comparing the subsequent performance of competition winners against that of the losers and the pre-existing staff of equal academic rank. Our study presents an empirical analysis concerning the recruitment procedures for associate professors in the Italian university system. The results of a bibliometric analysis of the hard science areas reveal that new associate professors are on average more productive than the incumbents. However a number of crucial concerns emerge, in particular concerning occurrence of non-winner candidates that are more productive than the winners over the subsequent triennium, and cases of winners that are completely unproductive. Beyond the implications for the Italian case, the analysis offers considerations for all decision-makers regarding the ex post evaluation of the efficiency of the recruitment process and the desirability of providing selection committees with bibliometric indicators in support of evaluation (i.e. informed peer review).
Leaders are important for scientific groups. Authors of a research paper whose names are listed in the byline first, last, or as the corresponding author are often considered particularly important to that paper. The authorship preferences of scientific group leaders are examined for seven research fields and 11 geographic locations. There are some similarities and differences among research fields and geographic locations in listing group leaders. In the fields of "Mathematics" and "Physics, Particles & Fields", although the custom is for papers to list authors alphabetically, scientific group leaders from Egypt and Shanghai typically list their names first or last in the byline, the same as group leaders in other research fields. Opposite to the group leaders from other locations, leaders from Egypt often appear as the first authors. Scientific group leaders who are listed first in the byline typically also serve as the corresponding authors. For group leaders who are listed last in the byline, the proportion also serving as corresponding authors changes significantly. Accordingly, the proportion of papers in which group leaders are corresponding authors varies considerably among different research fields and geographic locations. The meaning of authorship for research group leaders is discussed in the end from the perspective of their roles in paper production.
A growing number of research information systems use a semantic linkage technique to represent in explicit mode information about relationships between elements of its content. This practice is coming nowadays to a maturity when already existed data on semantically linked research objects and expressed by this scientific relationships can be recognized as a new data source for scientometric studies. Recent activities to provide scientists with tools for expressing in a form of semantic linkages their knowledge, hypotheses and opinions about relationships between available information objects also support this trend. The study presents one of such activities performed within the Socionet research information system with a special focus on (a) taxonomy of scientific relationships, which can exist between research objects, especially between research outputs; and (b) a semantic segment of a research e-infrastructure that includes a semantic interoperability support, a monitoring of changes in linkages and linked objects, notifications and a new model of scientific communication, and at last-scientometric indicators built by processing of semantic linkages data. Based on knowledge what is a semantic linkage data and how it is stored in a research information system we propose an abstract computing model of a new data source. This model helps with better understanding what new indicators can be designed for scientometric studies. Using current semantic linkages data collected in Socionet we present some statistical experiments, including examples of indicators based on two data sets: (a) what objects are linked and (b) what scientific relationships (semantics) are expressed by the linkages.
Based on the patent co-authorship data from State Intellectual Property Office of China, this paper examines the evolution of small world network and its impact on patent productivity in China. Compared with the western countries, the small-world phenomenon of the innovation network in China is becoming more obvious. Empirical result shows that the small world network may only have significant impact on patent productivity in those patent productive provinces, e.g., Beijing and Guangdong that filed larger number of patents. Although the collaborations in the network are more endurable in China than ones in western countries, it may be less efficient in transmitting knowledge because of large ratio of administration oriented state owned enterprises (SOEs). With larger ratio of SOEs, the small world network has longer path length and knowledge thus flows less efficiently in Beijing than in Guangdong. The policy implication of the findings lies in that the Chinese government should let the market rather than the administration determine the collaboration of technological innovation, in order to encourage innovation and establish an effective small world network for speeding up flow of knowledge among different type of firms during the innovative process.
Citations are regarded as measures of quality yet citation rates vary widely within each of the top finance journals. Since article ordering is at the discretion of editors, lead articles can be interpreted as signals of quality that academics can use to allocate their attention and assert the value of their publications. Advances in electronic journal access allow researchers to directly access articles, suggesting article ordering may be less relevant today. We confirm the past importance of lead articles by examining citation rates from published papers as well as the wider source of papers that are listed in Google Scholar. Our findings also confirm using Google Scholar as a citation source provides congruent results to using citations from articles published in ISI-listed journals, with the additional benefit of it potentially being more timely since it includes wider citation sources, inclusive of working and conference papers.
The importance of the convergent approach to technology development has increased recently. Therefore, understanding the characteristics of technology convergence, which refers to the combination of two or more technological elements in order to create a new system with new functions, is an important issue not only for researchers in technology development, but also for company directors for their successful management of product competitiveness. Therefore, in order to investigate the patterns and the mechanism of technological convergence, we examine the printed electronics technology which has typical characteristics of technology convergence. Based on the printed electronics-related patents registered between 1976 and 2012, we perform network analysis of the technology components in order to identify key technologies which played a central role among the groups of convergence technologies and to examine their dynamic role corresponding to the development of technology convergence. The results show that control technologies which control the role of other technologies over the technology convergence process play significant role. The centrality value is highest in the case of control technology, and devices related technologies have the largest number of patents quantitatively, thereby confirming the results. In addition, the trajectory analysis of the centrality value reveals a co-evolution pattern in technology convergence.
This study presents an in-depth survey of research and citation performance of the School of Biological Sciences (SBS) 39-member faculty at Seoul National University (SNU), the most prestigious university in South Korea, for the years 2004-2009. Thirty-nine faculty members published a total of 640 publications during the period, representing an average of 16.4 publications per scientist. Among the 640 publications, 521 (81.4 %) were cited 9,204 times, an average of 14.4 citations per publication. More publications co-authored by the SBS faculty with foreign researchers (mostly from the U.S.A.) were published in mainstream journals than publications by three other co-authorship types. Accordingly, publications by international co-authorships received more citations compared to citation levels of three other co-authorship types in terms of the average citations per publication. The study has found a concentration effect, whereby quite a small number of publications received approximately one-third of the citation performance generated by the SBS faculty at SNU. The results demonstrate that the citation performance of the SBS at SNU can be influenced considerably by the presence and productivity of 'star' scientists.
This paper proposes h(l) -index as an improvement of the h-index, a popular measurement for the research quality of academic researchers. Although the h-index integrates the number of publications and the academic impact of each publication to evaluate the productivity of a researcher, it assumes that all papers that cite an academic article contribute equally to the academic impact of this article. This assumption, of course, could not be true in most times. The citation from a well-cited paper certainly brings more attention to the article than the citation from a paper that people do not pay attention to. It therefore becomes important to integrate the impact of papers that cite a researcher's work into the evaluation of the productivity of the researcher. Constructing a citation network among academic papers, this paper therefore proposes h(l) -index that integrating the h-index with the concept of lobby index, a measures that has been used to evaluate the impact of a node in a complex network based on the impact of other nodes that the focal node has direct link with. This paper also explores the characteristics of the proposed h (l) -index by comparing it with citations, h-index and its variant g-index.
The development of science is accompanied by growth of scholarly publications, primarily in the form of articles in peer-reviewed journals. Scientific work is often evaluated through the number of scientific publications in international journals and their citations. This article discusses the impact of open access (OA) on the number of citations for an institution from the field of civil engineering. We analyzed articles, published in 2007 in 14 international journals with impact factor, which are included in the Journal Citation Reports subject category "Civil Engineering". The influence of open access on the number of citations was analyzed. The aim of our research was to determine if open access articles from the field of civil engineering receive more citations than non-open access articles. Based on the value of impact factor and ranking in quartiles, we also looked at the influence of the rank of journals on the number of citations, separately for OA and Non OA articles, in databases Web of Science (WOS), Scopus and Google Scholar. For 2,026 studied articles we found out that 22 % of them were published as OA articles. They received 29 % of all citations in the observed period. We can conclude by the significance level 5 % or less that in the databases WOS and Scopus the articles from top ranked journals (first quartile) achieved more citations than Non OA articles. This argument can be confirmed for some other journals from second quartile as well, while for the journals ranked into the third quartile it can't be confirmed. This could be confirmed only partly for journals from the second quartile, and would not be confirmed for journals ranked into the third quartile. This shows that open access is not a sufficient condition for citation, but increases the number of citations for articles published in journals with high impact.
It is examined whether the number (J) of (joint) publications of a "main scientist" with her/his coauthors ranked according to rank (r) importance, i.e. J proportional to 1/r, as found by Ausloos (Scientometrics 95:895-909, 2013) still holds for subfields, i.e. when the "main scientist" has worked on different, sometimes overlapping, subfields. Two cases are studied. It is shown that the law holds for large subfields. As shown, in an Appendix, is also useful to combine small topics into large ones for better statistics. It is observed that the sub-cores are much smaller than the overall coauthor core measure. Nevertheless, the smallness of the core and sub-cores may imply further considerations for the evaluation of team research purposes and activities.
This paper proposes a new taxonomy for the internationalization patterns of innovation of the BRIC countries within the global innovation landscape during the period 1990-2009. Based on the BRICs' patents granted by the USPTO, we find (1) the BRICs gradually increased their roles in the global innovation arena with various degrees of internationalization; (2) the domestic-dominant pattern has widely countered the foreign dominance of innovation, while the collaborative multi-dominant pattern has increased; (3) a divergence of the BRICs' global innovation output growth emerged, while their internationalization pattern portfolios evolved towards greater similarity; and (4) China has differentiated itself by increasing its global innovation influence.
Biotechnology is an expanding interdisciplinary field in which the interactions of science and technology (S&T) are more and more intensified. Question raised regarding the dynamic interactions between S&T encourages us to propose a series of methodologies for examination. Using high-impact publications and patents as the proxy measures, two document sets are transformed into the scientific and technical front trajectories respectively, and then each subject is categorized into either basic science, or applied technology, or co-existence. The results show that, in the biotechnology field, subjects of embryonic or mesenchymal stem cells, RNA interference, microRNA, and microbial fuel cell are in the basic science phase; those of plant breeding, seed diversity, and taste receptors have been applied to practice. There also exists interactions between S&T in the subjects of disease treatment and gene analysis platform, in which the behavior of technology precedes science, science precedes technology, or synchronous development can be observed.
Papers that have received 1,000 or more citations, referred to as champion works here, pertaining to China and India have been studied. China had its first champion work 4 years after India had its in 1983. While India was ahead of China in the initial years, China increased its tally of champion works during 2001-2010 and has raced ahead of India during that decade. All the champion works of both the countries have been published in foreign journals except for the one Indian paper that has been published in an Indian journal. Most champion works of India have been in physics whereas it has been in biological/biomedical sciences for China. USA, Japan, Germany, England and France are some of the leading countries that India and China have collaborated with for their champion works. Leading institutions of both countries are also listed.
We use a new approach to study the ranking of journals in JCR categories. The objectives of this study were to empirically evaluate the effect of increases in citations on the computation of the journal impact factor (JIF) for a large set of journals as measured by changes in JIF, and to ascertain the influence of additional citations on the rank order of journals according their new JIFs within JCR groups. To do so, modified JIFs were computed by adding additional citations to the number used by Thomson-Reuters to compute the JIF of journals listed in the JCR for 2008. We considered the effect on rank order of a given journal of adding 1, 2, 3 or more citations to the number used to compute the JIF, keeping everything else equal (i.e., without changing the JIF of other journals in a given group). The effect of additional citations on the internal structure of rankings in JCR groups increased with the number of citations added. In about one third of JCR groups, about half the journals changed their rank order when 1-5 citations were added. However, in general the rank order tended to be relatively stable after small increases in citations.
The rise of the social web and its uptake by scholars has led to the creation of altmetrics, which are social web metrics for academic publications. These new metrics can, in theory, be used in an evaluative role, to give early estimates of the impact of publications or to give estimates of non-traditional types of impact. They can also be used as an information seeking aid: to help draw a digital library user's attention to papers that have attracted social web mentions. If altmetrics are to be trusted then they must be evaluated to see if the claims made about them are reasonable. Drawing upon previous citation analysis debates and web citation analysis research, this article discusses altmetric evaluation strategies, including correlation tests, content analyses, interviews and pragmatic analyses. It recommends that a range of methods are needed for altmetric evaluations, that the methods should focus on identifying the relative strengths of influences on altmetric creation, and that such evaluations should be prioritised in a logical order.
This paper is the first research applying the new approach, panel smooth transition regression (PSTR) model, in the field of patent analysis. This study uses PSTR model to verify whether there is a single threshold effect of Herfindahl-Hirschman Index of patents (HHI of patents) on the relationship between patents and market value in the American pharmaceutical industry. The results demonstrate that HHI of patents moderates the relationship between market value and patent performance, patent counts/assets and patent citations/assets. When HHI of patents is less than or equal to the threshold value, 0.3220, the positive relationship between patent performance and market value is lower. Once HHI of patents is more than the threshold value, 0.3220, the positive relationship between patent performance and market value is higher. This study points out that the second regime is optimal because the extent of the positive relationship between patent performance and market value is higher.
In psychological research there is huge literature on differences between the sexes. Typically it used to be thought that women were more verbally and men more spatially oriented. These differences now seem to be waning. In this article we present three studies on sex differences in the use of tables and graphs in academic articles. These studies are based on data mining from approximately 2,000 articles published in over 200 peer-reviewed journals in the sciences and social sciences. In Study 1 we found that, in the sciences, men used 26 % more graphs and figures than women, but that there were no significant differences between them in their use of tables. In Study 2 we found no significant differences between men and women in their use of graphs and figures or tables in social science articles. In Study 3 we found no significant differences between men and women in their use of what we termed 'data' and 'text' tables in social science articles. It is possible that these findings indicate that academic writing is now becoming a genre that is equally undertaken by men and women.
In this paper we focus on proximity as one of the main determinants of international collaboration in pharmaceutical research. We use various count data specifications of the gravity model to estimate the intensity of collaboration between pairs of countries as explained by the geographical, cognitive, institutional, social, and cultural dimensions of proximity. Our results suggest that geographical distance has a significant negative relation to the collaboration intensity between countries. The amount of previous collaborations, as a proxy for social proximity, is positively related to the number of cross-country collaborations. We do not find robust significant associations between cognitive proximity or institutional proximity with the intensity of international research collaboration. Our findings for cultural proximity do not allow of unambiguous conclusions concerning their influence on the collaboration intensity between countries.
Aquatic ecosystems are ecologically important, but continuously threatened by a growing number of human induced changes. This study evaluates the research trends of "aquatic ecosystem" between 1992 and 2011 in journals of all subject categories of the science citation index and social sciences citation index. The analyzed parameters include publication output, cited publication, document type, language, distributions of journal, author, country and institutes, and analysis of author keywords and keywords plus. The results showed that over the past two decades, there was a consistent growth in publication output with involvement of increasing number of countries and institutions, and North America was still the leading region in the subject. Classification of the top 30 author keywords indicated that more research attentions were paid to the study on aquatic organism, water environment and aquatic ecosystem condition. Aquatic ecosystem, water quality, and fish were the top three most frequently used author keywords. In addition, owing to its significant impact on aquatic ecosystems, climate change has been placed crucial emphasis recently. Aquatic ecosystem research trend was shifting from water environment to aquatic ecosystem wide issues.
The paper studied 211,946 articles indexed in Thomson Reuters's Web of Science from January 1st 2002 to December 31st 2011, in order to describe the growth and distribution of Chinese international research collaboration (IRC), from the perspective of amount, authors, countries, discipline fields and journals. By applying bibliometric and social network methods, this study provided the collaboration network of countries and fields. The main results were as follow: the number of article increased faster comparing with the stable growth in average annual of IRC degree; the articles collaborated with SAC are 80 % more than all IRC's; as to the fields, collaboration in Social science is at disadvantage, while the largest field is physics and the fastest field is molecular biology and genetics; mathematics, physics, multidisciplinary and space science had more in fluencies than others in corresponding respective journals; as to the network, USA, as the largest and most important partner, had 30 % IRC articles, and collaborated with China in all 22 ESI fields.
This paper draws on the findings from previous research work to present the UNIWEEES tool, designed to evaluate the quality of university websites that provide information about the European Higher Education Area (EHEA), already a reality, and the way they disseminate this information. This tool includes seven criteria (visibility, authority, updatedness, accessibility, dissemination of information, quality assessment, and navigability), further divided into 29 subcriteria that include 60 indicators. A peer-to-peer expert unified evaluation methodology was followed. Findings are presented here, focusing on the strengths and weaknesses of the information provided about the EHEA by the websites of Spanish universities and their dissemination strategies, in particular through their evolution along the last 5 years. Conclusions highlight a number of best practices identified and provide some guidelines to improve the evaluated aspects and dimensions, thus strengthening the role played by the university websites as quality information sources for the scholar community and the society.
This paper analyzes more than 30 years of rankings of the best 40 Dutch economists, and examines if performance in terms of weighted publications increased. One of the findings is that over time the differences between top-performers and those lower on the charts decrease, but also that the group of top-performers is small and persistent over the years. Further, the average scores of ranked economists also increase over time. At the same time, new entries usually decrease in the subsequent years. Finally, after 20 years the charts contain 95 % new names and, in general, inclusion in the rankings usually lasts only for about 5 years.
A scientometric analysis was applied in this work to evaluate the status and trends of electric vehicle papers published between 1993 and 2012 in any journal of all the subject categories of the Web of Science. Electric vehicle was used as a keyword to search parts of titles, abstracts, or keywords. Publication trends were analyzed by the retrieved results in publication outputs, subject categories and publication pattern, international productivity. The document co-citation analysis was done in CitespaceII to find out the intellectual base and research fronts of electric vehicle. The articles about electric vehicle increased fast in the last 20 years. 11 document types were found in all electric vehicle-related papers and proceedings paper was the most frequently used document type. Language analysis showed that English was the most dominating language. "Engineering electrical electronic", "Energy fuels" and "Transportation science technology" were the top three most popular subject categories. Journal of Power Sources, IEEE Transaction on Vehicular Technology and IEEE Transaction on Industrial Electronics were the representative journals in the field of electric vehicle. The USA, China and Japan were the most productive countries. University of Michigan, Harbin Institute of Technology and Ohio State University were the most productive countries. Vehicle-to-grid technology, control strategy, combination of power management and traffic information from GPS, plug-in electric vehicle, architectures and modeling, battery and policy about electric vehicle are the research fronts of electric vehicle.
Are institutional repositories mere warehouses for digital documents or are they in fact establishing themselves as a rigorous option for the spread of scientific knowledge? This study analyses the competitive environment of the Top100 university repositories, defined as leaders in terms of market participation and penetration. The study also analyses the basic functionalities of preservation and diffusion of academic production through factors related to the prestige of the repositories and of the institutions that operate them. The results show that repositories with a larger digital academic supply are associated with the production of demonstrated scientific rigor.
This paper concerns the development and use of a new interdisciplinary graphical approach in the statistical analysis of complexity of sentence structure for scientometric purposes. A scheme in three-dimensional space (barycentric plot) is used for a graphical representation of scientific research text correlations between the number of characters, the number of words, and the number of complex syllable words for sentences of several monolingual corpuses. The barycentric plots do not only drastically increase the visual information content in a given corpus, but at equal conditions of text-based corpus, they also contribute to the comparative analysis of different kinds of subject, section, author-style, journal, field, etc. As illustrated in present study, the proposed graphical approach can have broad implications and practical applications not only in scientometric field, but also in statistical linguistics, stylistic text research, and informetric research. This article explores the interdisciplinary approach research and applications of different areas of knowledge.
Articles published between January 1, 2006 and December 31, 2010 in 42 forestry journals (N = 16,258) were collected and, depending on their content and key words, classified in one of 22 sub-disciplines. Among the forestry sub-disciplines, the following are currently dominant: Mensuration and inventories, Forest management, Plant ecophysiology and Wood science. PCA ordination was used to visualize grouping tendencies and data separation. For each component, a number of characteristics contributed to the total variation, and significant importance was attached to those with the highest loading factors. The first component included Mensuration and inventories, Plant ecophysiology, Vegetation ecology and Forest management, as the highest loading factors. The second components comprised Sociological aspects, Plant ecophysiology, Wood science and Forest management. The most pronounced increase trend over the five-year period is noted for Genetics and breeding, Vegetation ecology, Fuels and energy, while the most pronounced decrease trend is visible in Forest health, Forest fire, Sociological aspects and Forest products. PCA suggests the existence of three groups of journals: the first group comprises Forest Ecology and Management and Canadian Journal of Forest Research, the dominating two, the second group comprises Annals of Forest Science, Plant Ecology, Tree Physiology and Trees-Structure and Function, while the rest of the journals belong to the third group. The Canadian Journal of Forest Research is the most diversified, while Tree Genetics and Genomes, Silvae Genetica and Tree-ring Research are narrowly specialized.
The Academic Ranking of World Universities (ARWU) published by researchers at Shanghai Jiao Tong University has become a major source of information for university administrators, country officials, students and the public at large. Recent discoveries regarding its internal dynamics allow the inversion of published ARWU indicator scores to reconstruct raw scores for 500 world class universities. This paper explores raw scores in the ARWU and in other contests to contrast the dynamics of rank-driven and score-driven tables, and to explain why the ARWU ranking is a score-driven procedure. We show that the ARWU indicators constitute sub-scales of a single factor accounting for research performance, and provide an account of the system of gains and non-linearities used by ARWU. The paper discusses the non-linearities selected by ARWU, concluding that they are designed to represent the regressive character of indicators measuring research performance. We propose that the utility and usability of the ARWU could be greatly improved by replacing the unwanted dynamical effects of the annual re-scaling based on raw scores of the best performers.
Quality in Higher Education Institutions is the subject of several debates in the academic community in a worldwide basis and various efforts are made towards identifying ways to quantify it. In this respect, the use of bibliometrics gains significant ground as an effective tool for the evaluation of universities' research output. In the present study, the research performance of the seven Greek medical schools is assessed by means of widely accepted and advanced bibliometric indices, such as total and average publications and citations, average and median h- and g-index with and without self-citations for all the 1,803 academics, while statistical analysis of the data was also performed in order to compare the observed differences in the mean values of the calculated indices. Considerable effort was exerted to overcome all inherent limitations of a bibliometric analysis through a meticulous data collection. This large-scale work was conducted both in school and academic rank level leading to interesting results concerning the scientific activity of the medical schools studied as units and of the various academic ranks separately, which can be partially justified with geographic and socioeconomic criteria. In general, bibliometrics demonstrate statistically significant difference in favour of Crete University medical school, while it was also found that self-citations have only marginal effect on the individual's research profile and the average indices. Finally, the useful findings of the present study render the methodology adopted of high viability for assessing the research performance of Higher Education Institutions even in a broader context.
S-curves analysis allows to study evolution and trends in specific technological fields; its theoretical background establishes that in order to achieve the best results the analysis must be done using an independent variable that shows the effort invested in R&D activities and a dependent variable that shows the cumulative performance in that field. Actually, S-curves are built using time as independent variable because of the constraints associated in the search of investment data. This paper examines the use of patent data applications as a sample of effort; using geothermal field as a case study, it was possible to test the relationship of Patent applications and investment (R-squared, 0.86), in first place, and the construction of S-curves using patent applications count against performance (R-Squared, 0.947). Results show a high correspondence value and potential of using patent counts to direct technological performance studies.
It is shown that the "Jaccardized Czekanowski index" is actually a reinterpretation of the Ruzicka index. Thereby, it is proved that its one-complement is a true distance function, which makes it particularly suitable for use in similarity studies even with multidimensional statistical techniques.
Science has become progressively more complex, requiring greater integration and collaboration between individuals, institutions and areas. Networking research establishes common rules and offers a suitable framework for this cooperation. Therefore, it is a good choice for both scientists and policy-makers. The objective of this study is to know whether the scientists perform better within these structures than outside them. As an example, we analysed the Biomedical Research Networking Centres in Spain and, for the exploratory investigation, we selected two disciplines (Psychiatry and Gastroenterology/Hepatology). The results showed that in every situation of networking research there were higher collaboration and impact rates. Furthermore, the main differences found between disciplines were related to the scope of cooperation, carried out at a more local level in Gastroenterology/Hepatology. Besides, HJ-Biplot technique allowed us to conclude that the outcomes may vary somewhat depending on the types of centres where the scientists work. Although further investigation is needed, the findings of this study might anticipate possible scenarios in which networking research could be the most natural way of collaboration.
This study examines China's performance on tissue engineering using scientometrics measures such as China's global publication share, rank, growth rate and citation impact, its publications in various sub-fields, top journals in terms of national share based on last 5 years (2008-2012) publications data obtained from ISI Science citation index expanded database. We have also determined Chinese share with international collaborative papers at the national level, as well as h-core papers and high-cited papers, etc.
Taking articles written by mainland China scholars from management related 258 journals indexed by Web of Science database as the data sets, this paper analyses the output of scientific research of Chinese scholars. It studies the structure, characteristics and development trend of collaboration network of Chinese scholars in management research area through scientometrics and social network analysis approaches. We found that the accumulated number of Chinese authors and the accumulated number of articles published by Chinese authors in the 258 journals increases by exponential form, most of which focus on Operations research & Management science. About half of the articles come up through international collaboration and the accumulated number of articles written through collaboration between Chinese and overseas scholars display an exponential increase. The evolution studies of the collaboration network indicate that the collaboration of Chinese scholars in the field of management is on a sharp rise. However, the collaboration network has not yet stepped into a mature and steady stage. Nonetheless, a tendency towards the stable stage is unveiled.
This study investigates the contribution of Iranian women in high priority fields of science and technology based on their scientific production and citations according to the records of Web of Science (WoS) during 2000-2010. The methodology relies on scientometrics techniques. The statistical population of this study was composed of 7,138 records extracted from WoS in 2,275 of them women had contributed. The gender data of Iranian authors was obtained via the WoS Excel output, author profile in Scopus, browsing the homepage of author's affiliated organization, searching in internet and sending email to the correspondents of paper. The descriptive results show that women in basic and applied sciences have more cooperation in comparison with technology and the most science products have been done in environmental field. Results show that 99 % of Iranian women research is done as joint publications and the average number of participants is three, four and two respectively. Most of the international cooperation is done with USA scientists and the main Iranian participant organization is Tehran University. The results indicated that there is a significant difference between scientific productivity of Iranian women in eight high priority fields of science and technology but no significant difference between pure and applied fields of science. Also, there is positive, direct and significant relationship between the number of authors and the score of citation to scientific products of women in high priority fields of science and technology.
A bibliometric analysis was conducted to evaluate the global scientific output of proteomics research in the Science Citation Index Expanded from 1995 to 2010. The document types, languages, journals, categories, countries, and institutions were analyzed to obtain publication patterns. Research focuses and trends were revealed by a word cluster method related to author keywords, title, abstract, and KeyWords Plus. Bradford's Law and the correlation between keywords and institutions were identified to look deeper into the nature works. Proteomics and Journal of Proteome Research published the most articles in proteomics research. The researchers focused on the categories of biochemical research methods, and biochemistry and molecular biology. The USA and Harvard University were the most productive country and institution, respectively, while China was the fastest-growing country due to the support by Chinese government. The distribution of author keywords provided the important clues of hot issues. Results showed that mass spectrometry and two-dimensional gel electrophoresis had been the most frequently used research methods in the past 16 years; and cancer proteomics had a strong potential in the near future. Furthermore, biologists contributed significantly to proteomics research, and were more likely to co-operate with medical scientists.
Web of Science (wos) and scopus have often been compared with regard to user interface, countries, institutions, author sets, etc., but rarely employing a more systematic assessment of major research fields and national production. The aim of this study was to appraise the differences among major research fields in scopus and wos based on a standardized classification of fields and assessed for the case of an entire country (Slovenia). We analyzed all documents and citations received by authors who were actively engaged in research in Slovenia between 1996 and 2011 (50,000 unique documents by 10,000 researchers). Documents were tracked and linked to scopus and wos using complex algorithms in the Slovenian cobiss bibliographic system and sicris research system where the subject areas or research fields of all documents are harmonized by the Frascati/oecd classification, thus offsetting some major differences between wos and scopus in database-specific subject schemes as well as limitations of deriving data directly from databases. scopus leads over wos in indexed documents as well as citations in all research fields. This is especially evident in social sciences, humanities, and engineering & technology. The least citations per document were received in humanities and most citations in medical and natural sciences, which exhibit similar counts. Engineering & technology reveals only half the citations per document compared to the previous two fields. Agriculture is found in the middle. The established differences between databases and research fields provide the Slovenian research funding agency with additional criteria for a more balanced evaluation of research.
We study the evolution of scientific collaboration at Atapuerca's archaeological complex along its emergence as a large-scale research infrastructure (LSRI). Using bibliometric and fieldwork data, we build and analyze co-authorship networks corresponding to the period 1992-2011. The analysis of such structures reveals a stable core of scholars with a long experience in Atapuerca's fieldwork, which would control coauthorship-related information flows, and a tree-like periphery mostly populated by 'external' researchers. Interestingly, this scenario corresponds to the idea of a Equipo de Investigacin de Atapuerca, originally envisioned by Atapuerca's first director 30 years ago. These results have important systemic implications, both in terms of resilience of co-authorship structures and of 'oriented' or 'guided' self-organized network growth. Taking into account the scientific relevance of LSRIs, we expect a growing number of quantitative studies addressing collaboration among scholars in this sort of facilities in general and, particularly, emergent phenomena like the Atapuerca case.
Research collaboration is the way forward in order to improve quality and impact of its research findings. International research collaboration has resulted in international co-authorship in scientific communications and publications. This study highlights the collaborating research and authorship trend in clinical medicine in Malaysia from 2001 to 2010. Malaysian-based author affiliation in the Web of Science (Science Citation Index Expanded) and clinical medicine journals (n = 999) and articles (n = 3951) as of 30th Oct 2011 were downloaded. Types of document analyzed were articles and reviews, and impact factors (IF) in the 2010 Journal Citation Report Science Edition were taken to access the quality of the articles. The number of publications in clinical medicine increased from 4.5 % (n = 178) in 2001 to 23.9 % (n = 944) in 2010. The top three contributors in the subject categories are Pharmacology and Pharmacy (13.9 %), General and Internal Medicine (13.6 %) and Tropical Medicine (7.3 %). By journal tier system: Tier 1 (18.7 %, n = 738), Tier 2 (22.5 %, n = 888), Tier 3 (29.6 %, n = 1170), Tier 4 (27.2 %, n = 1074), and journals without IF (2.1 %, n = 81). University of Malaya was the most productive. Local collaborators accounted for 60.3 % and international collaborations 39.7 %. Articles with international collaborations appeared in journals with higher journal IFs than those without international collaboration. They were also cited more significantly than articles without international collaborations. Citations, impact factor and journal tiers were significantly associated with international collaboration in Malaysia's clinical medicine publications. Malaysia has achieved a significant number of ISI publications in clinical medicine participation in international collaboration.
A new quantitative method is introduced to analyze the collaboration among different organizations. The method defines the collaboration score based on the number of people involved in collaboration, and then the collaboration strength is obtained by summing up the collaboration scores with this method. We choose "Project 985" universities, which represent the top universities in China, as an example to study the collaboration network, strength in leading collaboration and strength in participating collaboration. Results based on Scopus show some characteristics of such collaboration and verify the feasibility of the new approach.
Web of Science (WoS) and Google Scholar (GS) are prominent citation services with distinct indexing mechanisms. Comprehensive knowledge about the growth patterns of these two citation services is lacking. We analyzed the development of citation counts in WoS and GS for two classic articles and 56 articles from diverse research fields, making a distinction between retroactive growth (i.e., the relative difference between citation counts up to mid-2005 measured in mid-2005 and citation counts up to mid-2005 measured in April 2013) and actual growth (i.e., the relative difference between citation counts up to mid-2005 measured in April 2013 and citation counts up to April 2013 measured in April 2013). One of the classic articles was used for a citation-by-citation analysis. Results showed that GS has substantially grown in a retroactive manner (median of 170 % across articles), especially for articles that initially had low citations counts in GS as compared to WoS. Retroactive growth of WoS was small, with a median of 2 % across articles. Actual growth percentages were moderately higher for GS than for WoS (medians of 54 vs. 41 %). The citation-by-citation analysis showed that the percentage of citations being unique in WoS was lower for more recent citations (6.8 % for citations from 1995 and later vs. 41 % for citations from before 1995), whereas the opposite was noted for GS (57 vs. 33 %). It is concluded that, since its inception, GS has shown substantial expansion, and that the majority of recent works indexed in WoS are now also retrievable via GS. A discussion is provided on quantity versus quality of citations, threats for WoS, weaknesses of GS, and implications for literature research and research evaluation.
From the application point of view, this article introduces the framework of the Chinese Social Science Citation Index (CSSCI). It expounds the idea of designing the CSSCI system, and its major functions and features in particular. The data organization as well as data encoding methods of the CSSCI system is well explained. Moreover, this article elaborates on how the citation index data can be used in analyzing discipline features, exploring research hotspots and developing trends, identifying important academic works and constructing academic network. Such efforts are supposed to help the readers better understand the application value of citation index system. It also provides the academic circle with a new understanding of citation index system.
We report on the development of an interface to the US Patent and Trademark Office (USPTO) that allows for the mapping of patent portfolios as overlays to basemaps constructed from citation relations among all patents contained in this database during the period 1976-2011. Both the interface and the data are in the public domain; the freeware programs VOSViewer and/or Pajek can be used for the visualization. These basemaps and overlays can be generated at both the 3-digit and 4-digit levels of the International Patent Classification (IPC) of the world intellectual property organization (WIPO). The basemaps can provide a stable mental framework for analysts to follow developments over searches for different years, which can be animated. The full flexibility of the advanced search engines of USPTO are available for generating sets of patents and/or patent applications which can thus be visualized and compared. This instrument allows for addressing questions about technological distance, diversity in portfolios, and animating the developments of both technologies and technological capacities of organizations over time.
Getting cited is important for scholars and for the institutions in which they work. Whether because of the influence on scientific progress or because of the reputation of scholars and their institutions, understanding why some articles are cited more often than others can help scholars write more highly cited articles. This article builds upon earlier literature which identifies seemingly superficial factors that influence the citation rate of articles. Three Journal Citation Report subject categories are analyzed to identify these effects. From a set of 2,016 articles in Sociology, 6,957 articles in General & Internal Medicine, and 23,676 articles in Applied Physics, metadata from the Web of Knowledge was downloaded in addition to PDFs of the full articles. In this article number of words in title, number of pages, number of references, sentences in the abstract, sentences in the paper, number of authors and readability were identified as factors for analysis.
Scientific references in patent documents can be used as indicators signaling science-technology interactions. Whether they reflect a direct 'knowledge flow' from science to technology is subject of debate. Based on 33 interviews with inventors at Belgian firms and knowledge-generating institutes active in nanotechnology, biotechnology and life sciences, we analyze the extent to which scientific references in patents reflect sources of inspiration. Our results indicate that scientific knowledge acts as a source of inspiration for about 50 % of the inventions. At the same time, the scientific references cited in patent documents and available in patent databases do not provide an accurate picture in this respect: 30 % of patents that were inspired by scientific knowledge do not contain any scientific references. Moreover, if scientific references are present, half of them are evaluated as unimportant or background information by the inventor. Overall, these observations provide evidence that scientific references in patent documents signal relatedness with the implied inventions without necessarily implying a direct, inspirational, knowledge flow between both activity realms.
Eponyms are known to praise leading scientists for their contributions to science. Some are so widespread that they are even known by laypeople (e.g., Alzheimer's disease, Darwinism). However, there is no systematic way to discover the distributions of eponyms in scientific domains. Prior work has tackled this issue but has failed to address it completely. Early attempts involved the manual labelling of all eponyms found in a few textbooks of given domains, such as chemistry. Others relied on search engines to probe bibliographic records seeking a single eponym at a time, such as Nash Equilibrium. Nonetheless, we failed to find any attempt of eponym quantification in a large volume of full-text publications. This article introduces a semi-automatic text mining approach to extracting eponyms and quantifying their use in such datasets. Candidate eponyms are matched programmatically by regular expressions, and then validated manually. As a case study, the processing of 821 recent Scientometrics articles reveals a mixture of established and emerging eponyms. The results stress the value of text mining for the rapid extraction and quantification of eponyms that may have substantial implications for research evaluation.
The Author Affiliation Index (AAI) for ranking a set of academic journals was first presented by Gorman and Kanet (Manuf Serv Oper Manag 7:3-19, 2005). Since that time, it has become a popular method for assessing journal quality in a myriad of academic disciplines. However, a recent paper published by Agrawal et al. (Prod Oper Manag 20:280-300, 2011) pointed out several potential problems with the AAI. In this paper, we present a modified AAI that incorporates several improvements to the original AAI and addresses the three concerns expressed by Agrawal. The modified AAI allows for international institutions, introduces a weighting factor to allow for a greatly expanded set of prestigious institutions, considers the entire population of articles published in a journal during a specified time period, and utilizes a batch means approach to data collection to allow for proper statistical inference. We illustrate the modified AAI using a set of ten well-known journals that publish Operations Research and Operations Management research. The primary intent of this paper, however, is not to rank these ten journals; rather these ten journals are simply used to illustrate the use of the newly developed modified AAI.
Using an exhaustive database on academic publications in mathematics all over the world, we study the patterns of productivity by mathematicians over the period 1984-2006. We uncover some surprising facts, such as the weakness of age related decline in productivity and the relative symmetry of international movements, rejecting the presumption of a massive "brain drain" towards the US. We also analyze the determinants of success by top US departments. In conformity with recent studies in other fields, we find that selection effects are much stronger than local interaction effects: the best departments are most successful in hiring the most promising mathematicians, but not necessarily at stimulating positive externalities among them. Finally we analyze the impact of career choices by mathematicians: mobility almost always pays, but early specialization does not.
Using the possible synergy among geographic, size, and technological distributions of firms in the Orbis database, we find the greatest reduction of uncertainty at the level of the 31 provinces of China, and an additional 18.0 % at the national level. Some of the coastal provinces stand out as expected, but the metropolitan areas of Beijing and Shanghai are (with Tianjin and Chongqing) most pronounced at the next-lower administrative level of (339) prefectures, since these four "municipalities" are administratively defined at both levels. Focusing on high- and medium-tech manufacturing, a shift toward Beijing, Shanghai, and Tianjin (near Beijing) is indicated, but the synergy is on average not enhanced. High- and medium-tech manufacturing is less embedded in China than in Western Europe. Knowledge-intensive services "uncouple" the knowledge base from the regional economies mostly in Chongqing and Beijing. Unfortunately, the Orbis data is incomplete since it was collected for commercial and not for administrative or governmental purposes. However, we provide a methodology that can be used by others who may have access to higher-quality statistical data for the measurement.
Research fronts represent cutting edge studies in specific fields. One can better understand current and future development trends in the relevant field when updated with trends in research fronts. This study uses bibliographic coupling and sliding window to explore the organic light-emitting diodes (OLED) research fronts from 2000 to 2009, and identifies eighteen research fronts that match those predicted by subject experts related to OLED materials. Closer observation of the evolution shows that among the eighteen research fronts, there are four emerging fronts, two growing fronts, eleven stable fronts, and one shrinking front. Bibliographic coupling with sliding window is an effective tool to track the generation, growth, decline, and disappearance of research fronts. Therefore, this analytical method has great potential in discovering the evolution of research fronts.
Nowadays, the development of emerging technology has become a double-edged sword in the scientific world. It can not only bring lots of innovation to society, but may also cause some terrible consequences due to its unknown factors. International collaboration may be able to reduce risks, which means a lot to the exploration of the emerging technology. Taking dye-sensitized solar cells (DSSCs) as an example, this paper examines the rapid growth of Chinese DSSCs research and the rise of collaboration between China and other countries/region. We use bibliometric and social network analysis methods to explore the patterns of scientific collaboration at country, institution and individual levels using data from the Science Citation Index. Examining overall trends shows that China has increased her position in DSSCs around the world. Furthermore, by focusing on the individual level, we find that the most influential authors tend to have fixed co-author networks and author name order, which is something worth considering. We use co-author analysis software independently developed to check three kinds of fixed co-author networks to explore author contributions, influence, and Author Activity Index rank in collaboration networks and use the rank we calculated to further explain author contributions in the networks. Results show that Chinese-X (e.g., Chinese-American) authors have pushed the collaboration between country and country and almost every kind of small network has a top author in it to gather others together. The modified author activity index rank list may reflect real research level. Author collaboration patterns have been impacted by the kinds of their institutions to some degree. These results can undoubtedly promote the international collaboration and the innovation process in the similar emerging technology fields.
I examine the degree of specialization in various sub-fields of philosophy, drawing on data from the PhilPapers Survey. The following three sub-fields are highly specialized: Ancient philosophy, seventeenth/eighteenth century philosophy, and philosophy of physics. The following sub-fields have a low level of specialization: metaphilosophy, philosophy of religion, philosophy of probability, philosophy of the social sciences, decision theory, and philosophy of race and gender. Highly specialized sub-fields tend to require extensive knowledge in some area beyond the typical training of a philosopher, and outside of philosophy proper. In addition, there is a correlation between sub-field size and degree of specialization. Larger sub-fields tend to be more specialized.
An analysis of 9,957 papers published by Indian scientists and indexed by WoS in 12 sub-disciplines of life sciences during 2008-2009 indicates that academic institutions produced the highest number of papers. Of these, 340 (3.4 %) were contributed by female scientists exclusively and 4,671 (47 %) were written jointly by male and female scientists. Women scientists produced about 0.36 papers per author, while their male counter parts produced 0.50 papers per author. Significant number of women scientists was first author and about 23 % were corresponding authors in papers written jointly by both sexes. Women scientists emphasized on the sub-discipline of cell biology and reproductive biology and male scientists emphasized on the sub-discipline of zoology. Women scientists work in small teams and have very less international collaborative papers. Women scientists publish in low impact factor and domestic journals and also are cited less as compared to their male counter parts.
Although there is increasing interest in policy issues on university patents, studies hitherto have focused on certain limited factors or case studies. By using a two-mode network analysis, this study identifies idiosyncratic patterns and differences in technology-industry networks between the two groups of Korean university patents-commercialized and non-commercialized. We collected patent data including bibliographic information from Korean universities that have run a patent management advisor dispatch program since 2005. Then, network analysis and analysis of variance for the two groups were conducted to investigate the group differences. We found that the structure of the technology-industry network was significantly more direct and simpler for commercialized than for non-commercialized patents. Specifically, we found that both direct and indirect linkages between technology and related industry were more complex for the non-commercialized group than for the commercialized one: the direct linkage was stronger for the commercialized than for the non-commercialized group. Our study suggests an important aspect of technology commercialization from the perspective of the inherent characteristics of patents, which is at variance with the evolutionary approaches of previous studies.
In today's competitive business environment, the timely identification of potential technology opportunities is becoming increasingly important for the strategic management of technology and innovation. Existing studies in the field of technology opportunity discovery (TOD) focus exclusively on patent textual information. In this article, we introduce a new method that tackles TOD via technology convergence, using both patent textual data and patent citation networks. We identify technology groups with high convergence potential by measuring connectivity between clusters of patents. From such technology groups we select pairs of core patents based on their technological relatedness, on their past involvement in convergence, and on the impact of their new potential convergence. We finally carry out TOD by extracting representative keywords from the text of the selected patent pairs and organizing them into the basic description of a new invention, which the potential convergence of the patent pair might produce. We illustrate our proposed method using a data set of U.S. patents in the field of digital information and security.
Many scientists were respected by people and science has made great development in the twentieth century. What role do scientists play in the process of scientific development? Does scientific development bring more researchers into scientists? This paper mainly analyzes the two interested questions and suggests that: (a) not all researchers' output could be attributed to scientific knowledge, only the innovative output could promote science to develop. Scientists play a more significant role than the rest researchers in scientific development in average because scientists' innovative consciousness is far higher than that of non-scientists. (b) Distinguishing scientists from researchers in accordance with the fixed basic contribution of innovative output in the process of scientific development. Researchers' innovative work becomes easier with the accumulated scientific knowledge growing gradually in the initial stage of scientific development. Thus, scientific development could produce more and more scientists. On the contrary, researchers' innovative work becomes more challenged with the accumulated scientific knowledge increasing gradually while science develops to a certain stage. As a result, scientific development would make researchers become scientist increasingly difficult.
The goal of this study was to identify common mistakes made in research study manuscripts submitted to journals of Education and the effects of these mistakes on rejection by the journal editors and referees. An online questionnaire was developed for this purpose with 43 items and five open-ended questions. Common mistakes were identified by administering the 43 questions, which were to be answered in two stages: first by using 5-point Likert scale responses, and then by responses arranged according to semantic differential scale (for the effects of the mistakes on rejections). The online questionnaire was sent to the editors and referees of Turkish journals of Education indexed in SSCI and ULAKBIM. Data were then collected from 232 participants and examined. The quantitative data obtained from the questionnaire items were analyzed, and the mean and standard deviation scores were presented in tables. The qualitative data gathered from the open-ended questions were analyzed descriptively. The results show that researchers mostly make mistakes in the discussion, conclusion, and suggestions part of the manuscripts. However, mistakes made in the methods part are the most significant causes of manuscript rejection.
The number of references per paper, perhaps the best single index of a journal's scholarliness, has been studied in different disciplines and periods. In this paper we present a four decade study of eight engineering journals. A data set of over 70,000 references was generated after automatic data gathering and manual inspection for errors. Results show a significant increase in the number of references per paper, the average rises from 8 in 1972 to 25 in 2013. This growth presents an acceleration around the year 2000, consistent with a much easier access to search engines and documents produced by the generalization of the Internet.
In this study, the bibliometric study of cholinesterase inhibitors was used to find the trend of Alzheimer's disease (AD) research and the order of drugs which was most tolerated or more effective in AD treatment. 4,982 articles and reviews from the Science Citation Index Expanded during 1993-2012 were analyzed. The main results were as follows: The publication of cholinesterase inhibitor research increased overall during 1993-2012. Chinese Academy of Science had most publications, University of California, San Diego and Hebrew University of Jerusalem won first place with the highest average citation per paper and the highest h-index respectively. Neurosciences, pharmacology and chemistry were "raising" subject categories in cholinesterase inhibitors research. With the comprehensive analysis of distribution and change of author keywords in two 10-year-time periods, it can be concluded as follows: (i) the order of drugs which was most tolerated or more effective in AD treatment might be donepezil, galantamine, rivastigmine, tacrine, memantine and huperzine A, and memantine attracted increasing interest recently and might be used more frequently now, especially for moderate to severe dementia. (ii) The pathogenesis of oxidative stress hypothesis attracted extensive attention. The interest to beta-amyloid cascade hypothesis increased slightly but that of the cholinergic hypothesis decreased during the past decade. (iii) "Oxidative stress", "beta-amyloid", "neuroprotection", "memory" and "cognition" are the main orientations in the AD research in the future.
While there is a large body of research analyzing the overall structure of citation relations for patents, there has been very little research seeking to clarify the characteristics of fields relating to the diffusion of technology through observing the citation network surrounding each patent individually and tracing its growth. This study focused on the classifications assigned to patents and examined the diversity of the fields of patents citing each patent from the following two perspectives: (1) expected values for growth in the number of citing fields, when regarding the observation period as being in a synchronic state and assuming that the strength of connections between each patent and citing fields is constant; and (2) empirical values for growth in the number of citing fields according to the increase in the cumulative number of citations over time. From the results, it was confirmed that the strength of potential connections between each patent and citing fields changes over time. Especially in the fields of "chemistry; metallurgy" and "physics," the following change is considerable: a patent tends to receive citations repeatedly from a limited range of fields for a while, but later comes to be cited by various fields.
Transaction costs theory (TCT) has long been an important conceptual lens for examining International Business (IB) phenomena and perhaps especially relevant for the study of multinational corporations, entry mode choices and location selection. In this paper we examine the extent to which TCT been used and has impacted IB research. Methodologically, we conduct a bibliometric study of the articles published on nine top journals for publishing IB-related research. We use Jean-Fran double dagger ois Hennart's research as the key marker for TCT in IB research given that Hennart's work has been a hallmark in the discipline. On a sample of 377 articles published between 1982 and 2010, and using the works rather than the authors as the unit of analysis, we analyze citations, co-citations and a spatial visualization of the intellectual research themes delved into. Our analyses provide insights on the influence of Hennart but more broadly of TCT on IB research over the past three decades. We conclude that the TCT has a pervasive influence on a large array of IB research and that Hennart's work is boundary-spanning, connecting several research themes.
In this study, we analyze the dynamic usage history of Nature publications over time using Nature metrics data. We conduct analysis from two perspectives. On the one hand, we examine how long it takes before the articles' downloads reach 50 %/80 % of the total; on the other hand, we compare the percentage of total downloads in 7, 30, and 100 days after publication. In general, papers are downloaded most frequently within a short time period right after their publication. And we find that compared with non-Open Access papers, readers' attention on Open Access publications are more enduring. Based on the usage data of a newly published paper, regression analysis could predict the future expected total usage counts.
To explore the rules of knowledge transfer and application activities in knowledge space, defined at both temporal and spatial scales, the present study employs a unique dataset of Chinese patent licensing during the period of 2000-2012, with a total of 91,551 patents. Our results indicate that 70 % of patents were licensed out in the first 3 years. As time elapses, the annual average technology age decreases. There is a moderate difference among different types of licensors and patent types but not technology domains. With regards to the spatial dimension, 86 % of patents were licensed out within 1,000 km. The annual average geographical distance exhibits the same trend as technology age. Except for technology domains, a moderate difference among licensors and patent types is observed. Moreover, the interaction between geographical distance and technology age shows that as the technology age increases, this technology appears to be transferred and applied over greater distances.
University rankings by fields are usually based on the research output of universities. However, research managers and rankings consumers expect to see in such fields a reflection of the structure of their own organizational institution. In this study we address such misinterpretation by developing the research profile of the organizational units of two Spanish universities: University of Granada and Pompeu Fabra University. We use two classification systems, the subject categories offered by Thomson Scientific which are commonly used on bibliometric studies, and the 37 disciplines displayed by the Spanish I-UGR Rankings which are constructed from an aggregation of the former. We also describe in detail problems encountered when working with address data from a top down approach and we show differences between universities structures derived from the interdisciplinary organizational forms of new managerialism at universities. We conclude by highlighting that rankings by fields should clearly state the methodology for the construction of such fields. We indicate that the construction of research profiles may be a good solution for universities for finding out levels of discrepancy between organizational units and subject fields.
Citation classics identify those highly cited papers which are an important reference point in a research field. To identify a paper as a citation classic we have to fix a citation threshold value. Usually, this threshold value should not be the same for all research fields because each field presents its respective citation pattern. Studies of citation classics in the literature define particular criteria and methods to set citation thresholds, which are often set arbitrarily and designed ad-hoc, and do not allow the scientific community to validate and compare their results. In this paper we introduce the concept of H-Classics to overcome this problem and provide scientific community a standardization of key constructs. We present a new and systematic method to identify citation classics. This identification method of highly cited papers is based on the H-index and thank to the properties of H-index it is sensitive to the own characteristics of any research discipline and also its evolution. Therefore, the concept of H-Classics allows to systematize search procedure of citation classics for any field of research.
This review study is a first attempt to map the state of entrepreneurship research in China by focusing on the contributions of Chinese researchers. Leading contributors, research collaboration and theoretical underpinnings in both domestic-oriented and international-oriented research are discussed. The review comprises 508 articles published in domestic Chinese journals indexed by the Chinese Social Science Citation Index and 189 articles published in international journals indexed by the Social Science Citation Index between 2000 and 2011. Two bibliometric approaches, co-authorship analysis and co-citation analysis, were utilized. The results indicate that entrepreneurship research in China is characterized by a clear division, not only in terms of researchers in each community, collaborating network but also with regard to theoretical foundation. Domestic-oriented research is still in its infancy. The research community has attracted a majority of Chinese researchers who focus on inter-institutional collaboration based on mentorship and directing relationship. Scholars involved in international-oriented research engage in more open communication by collaborating not only with researchers from other Chinese institutions but also with those from foreign countries. At the same time, they contribute to the understanding of Chinese entrepreneurship by linking the entrepreneurship phenomenon in Chinese context to theoretical frameworks.
This study aimed to assess the association between some features of articles title and number of citations in a volume of Addictive Behavior journal. All research articles published in the volume number 32 (2007) in the Addictive Behaviors journal (n = 302) were analyzed by two independent authors. For each article, the following information has been extracted: number of citations up to June 2013 in the Scopus citation database, type of and characteristics of titles, having different words in the keywords, reference to place and presence of an acronym. The summary statistics showed that mean number of citation was 16.36 +/- A 19.55 times. Articles with combinational title (use of a hyphen or a colon separating different ideas within a sentence) and articles with different words in the keywords (at least two different keywords) had higher number of citations. The number of citations was not correlated with the number of words in the title (r = 0.05, P = 0.325). Our results suggested that some features in the paper such as type of the title and articles with keywords different from words included in the title can help to predict the number of citation counts. These findings can be used by authors and reviewers in order to maximize the impact of articles. The length of title is not associated with citation counts. Therefore, the guide for authors of journals can be more flexible regarding the length of the title.
The study aimed to analyse the global research output related to microRNA (miRNA), based on the fact that it has diverse expression patterns and might regulate various developmental and physiological processes. First miRNA was identified as small RNA in 1993 but its function as biological regulator was unknown till 2000. Since then the research in miRNAs has got momentum. To understand and visualize the research dynamics and the research structure of the field, the publications appeared in Science Citation Index expanded database for 2002-2012 under miRNA category using specific search string, were analysed. A sum of more than 14,000 documents found from Web of Science database for the same period. This study detected major productive countries, high productive-institutions, authors, research areas, journals and document types, along with their individual citation impacts. The inter-collaborative linkages of countries, organizations and authors were also analysed. The study has observed that number of publication increased from 8 in 2002 to 4,186 in 2012 with compound annual growth rate of 87 %. The compound annual growth rates of countries, institutions, number of journals, research areas, and authors are 36.60, 76.64, 64.80, 30.5, and 88.09 % respectively.
A survey of 170 Swedish mentors of PhD-students found that expertise in the research field and avoidance of conflict of interest were big motivators for finding an examiner from abroad for PhD theses. The survey also identified that concern by supervisors for facilitating the career paths of younger scientists in terms of introductions to potential labs for post-doctoral work and obtaining high quality neutral review of one's research was also important, as was the desire to set up collaborations. An expectation from the management of one's university of the PR-value of a foreign senior person as examiner also played a part. Although few were willing to admit that PR for one's own group was a motivating factor. A small fraction of responders expressed concern that, as some of the costs of the PhD-examination were being shifted on to the research groups themselves, this might impact the current situation. Language also played a subordinate role. To get the best out of the visiting examiner, it was important to educate and instruct them in their role in a Swedish PhD-examination protocol. Male supervisors had had more PhD-candidates than female, but they also had used more Sweden-based examiners than their female colleagues. We conclude that using a foreign examiner was motivated by factors that are likely to prevail for the foreseeable future. This Swedish practice may also provide a template for a common standard.
This study aims what knowledge capital accumulated by the public research institutes (PRIs) of South Korea and Taiwan to facilitate process configurations of new industrial structure. The patenting trends of two PRIs, ETRI of South Korea and ITRI of Taiwan, are assessed to highlight the established knowledge structures for emergence of multi-agent structure since 1990s. To examine their dynamics and variations of knowledge capital, the data series are separated into two phases (catching-up phase from 1970s to 1990s, and post catching-up phase since the 2000s) in accordance to (1) number of patents, (2) number of sole owned and co-owned patents, (3) backward and forward citations, (4) science-linked patents, and (5) fields of patent. When the role of PRIs in the latecomer country is evolving from a facilitator in the catching-up phase to become a mediator in the post catching-up phase, this study demonstrated their influence and dynamic effect in reinforcing industrial strategies and national approaches to attain endogenous structural change in the national innovation system. Our results signal telecommunications is the promising technology targeted by Korea's chaebols while Taiwan's small-medium size enterprises are utilizing the aggregate knowledge capital accumulated and derived from semiconductor technologies to develop their niches onto a diverse range of product innovations.
One of the critical issues in national research assessment exercises concerns the choice of whether to evaluate the entire scientific portfolio of the institutions or a subset composed of the best products. Under the second option, the capacities of the institutions to select the appropriate researchers and their best products (the UK case) or simply the best products of every researcher (the Italian case) becomes critical, both for purposes of correct assessment of the real quality of research in the institutions evaluated, and for the selective funding that follows. In this work, through case studies of three Italian universities, we analyze the efficiency of the product selection that is intended to maximize the universities' scores in the current national research assessment exercise, the results of which will be the basis for assigning an important share of public financing over the coming years.
The counting of patents and citations is commonly used to evaluate technological innovation and its impact. However, in an age of increasing international collaboration, the counting of international collaboration patents has become a methodological issue. This study compared country rankings using four different counting methods (i.e. whole counting, straight counting, whole-normalized counting, complete-normalized counting) in patent, citation and citation-patent ratio (CP ratio) counts. It also observed inflation depending on the method used. The counting was based on the complete 1992-2011 patent and citation data issued by United States Patent and Trademark Office. The results show that counting methods have only minor effects on country rankings in patent count, citation count and CP ratio count. All four counting methods yield reliable country ranks in technology innovation capability and impact. While the influences of counting methods vary between patent count, citation count and CP ratio count, counting methods may exert slightly greater effects on CP ratio counts than on patent and citation counts. As for the inflation, the distributions of higher and lower inflation by the four counting methods are different in patent, citation and CP ratio counts.
It is commonly accepted that scientific research or, more precisely, the number of scientific publications, in computer science has greatly increased over the last few years. The reason would appear to be the pressure to publish, coined by the expression "Publish or perish", which is, among other things, necessary for promotions and applications for grants or projects. In this paper we have conducted a study that covers computer science publications from 1936 to 2010 in order to quantify this increase in publications regarding computing research. We have considered the computing conferences and journals available in the DBLP computer science bibliography (DBLP 2013) database, including more than 1.5 million papers, and more than 4 million authors (more than 900,000 different people), corresponding to about 1,000 different journals and 3,000 different conferences and workshops. Our study confirms and quantifies these increases with regard to the number of papers, number of authors, number of papers per author, etc. However, it also reaches a surprising conclusion: the real productivity of researchers has decreased throughout history. The reason for this decrease is the average number of authors per paper, which has grown significantly and is currently three.
This paper presents a first approach to analyzing the factors that determine the citation characteristics of books. For this we use the Thomson Reuters' book citation index, a novel multidisciplinary database launched in 2011 which offers bibliometric data on books. We analyze three possible factors which are considered to affect the citation impact of books: the presence of editors, the inclusion in series and the type of publisher. Also, we focus on highly cited books to see if these factors may affect them as well. We considered as highly cited books, those in the top 5 % of those most highly cited in the database. We define these three aspects and present results for four major scientific areas in order to identify differences by area (science, engineering and technology, social sciences and arts and humanities). Finally, we report differences for edited books and publisher type, however books included in series showed higher impact in two areas.
This study introduces nation diffusion breadth and nation diffusion intensity by adapting the notions of field diffusion breadth and field diffusion intensity as defined by Liu and Rousseau, and a variation on the total cited influence indicator introduced by Hu et al. Knowledge diffusion across countries in the field of management is then analyzed as a case study. Main countries in the field of management studies are considered as centers in their own ego-centered citation networks. The three indicators mentioned above are then calculated for these ego-centered citation networks. They measure the scientific impact each of these countries has on other nations. A general picture of the knowledge diffusion process is given by the three indicators at the country level over four periods 1992-1996, 1997-2001, 2002-2006, and 2007-2011. The validity of the proposed indicators is verified by the calculated results.
Co-citation analysis is a form of content analysis that can be applied in the context of scholarly publications with the purpose of identifying prominent articles, authors and journals being referenced to by the citing authors. It identifies co-cited references that occur in the reference list of two or more citing articles, with the resultant co-citation network providing insights into the constituents of a knowledge domain (e.g., significant authors and papers). The contribution of the paper is twofold; (a) the demonstration of the added value of using co-citation analysis, and for this purpose the underlying dataset that is chosen is the peer-reviewed publication of the Society for Modeling and Simulation International (SCS)-SIMULATION; (b) the year 2012 being the 60th anniversary of the SCS, the authors hope that this paper will lead to further acknowledgement and appreciation of the Society in charting the growth of Modeling and Simulation (M&S) as a discipline.
In the last few decades, multi-authored articles have increased in different disciplines with increasing instances of authorship abuse although multi-authorship is not always due to undeserving authorship (McDonald et al. in Mayo Clin Proc 85(10):920-927, 2010). It may be necessitated by interdisciplinary research, the evolution of a discipline, or the intention of quality improvement. This article studies the relationship between the authorship and the quality of articles (publications in better impact factor journals or core journals) in the field of Oceanography. The result shows similar to 75 % increase in the number of authors per article from 1990 to 2009 in the discipline. The increase in authorship correlates not only with the percentage of articles in core journals but also with the mean impact factor (IF) of journals (where the articles were published). The ANOVA study shows that though multi-authorship has no influence on the preference to publish in core journals during the 1990s or 2000s, it does have a significant influence on the preference to publish in high IF journals in both the decades. So these findings establish that in the field of Oceanography, the increase in collaboration would have resulted in more publications in core journals (without any influence of authorship increase) and in better impact factor journals (due to the influence of authorship increase).
Scientific co-authorship of African researchers has become a fashionable topic in the recent scientometric literature. Researchers are investigating the effects, modes, dynamics and motives of collaboration in a continental research system which is in an embryonic stage and in different stages of development from country to country. In this article we attempt to provide some additional evidence by examining both patterns of collaboration at country and continental levels and the scientific disciplines emphasised. Our findings indicate that the continent's research emphasises medical and natural resources disciplines to the detriment of disciplines supporting knowledge based economies and societies. Furthermore, we identify that the collaborative patterns in Africa are substantial higher than in the rest of the world. A number of questions related to research collaboration and its effects are raised.
Internationalization of universities has become a worldwide phenomenon as global economic integration continues to make its way forcefully into the higher education. The objective of the study is to develop a model for internationalization of universities with the transformation of some promising macroeconomic variables i.e., educational reforms and economic growth in the seven largest regions of the world [namely, East Asia and Pacific (sample 25 countries); Europe and Central Asia (40 countries); Latin America and Caribbean (27 countries); Middle East and North Africa (17 countries); North America (22 countries); South Asia (7 countries) and Sub-Saharan Africa (21 countries)]. The data has been analyzed by panel fixed effect regression from the period of 1990-2011. In addition to transform inputs into output, the study employed eleven indicators of education and five indicators for growth, where the resulting vector is internationalization. The results show the dynamic linkages between educational indicators and economic factors in the selected regions of the World. In East Asia and Pacific region, tertiary and higher education expenditures per student increase the economic factors. Higher education is a powerful driver of long-term growth in Europe and Central Asia. Governments of the state should have to focus on higher education enrolment, as it does not have any significant contribution to increase GDP; gross capital formation and FDI in Latin America and Caribbean region. Higher education enrolment in MENA region significantly increases growth factors on the cost of increase gross national expenditures. Investment in general education and other generic human capital is of the utmost importance in creating an enabling environment for FDI in North America. It is imperative for South Asia to encourage the skill levels and education opportunities for females, in order to maximize the effects of FDI on the female human capital stock and therefore economic growth. Tertiary school enrolment and tertiary expenditures per student identified the importance of tertiary education in Sub-Saharan Africa. The results conclude that educational indicators improve the economic gains, which ultimately reap out the benefit of internationalization.
We performed an analysis of published literature related to fruit and vegetable and indexed in the Web of Science(A (R)), covering the period 2000-2009. The EU27 and the USA are the two leading actors in terms of number of fruit and vegetable articles published. This paper compares their publication outputs using bibliometric methods. We assessed the fruit and vegetable species, topics (from Web of Science(A (R)) categories), countries and institutions involved. The top species, topics and institutions are ranked according to their number of publications. Collaboration networks between countries were mapped to visualize the intensity of the relationships involved in international fruit and vegetable research and to obtain an overall picture of the fruit and vegetable research landscape. These results can be useful for policy makers.
In recent years there have been few bibliometric evaluations in dental sciences with an international approach. The aim of this study is to describe the scientific production of original and review articles published in ISI dental journals for the period 2007-2011, considering qualitative and quantitative measures across countries. In this study documents indexed in Science Citation Index Expanded of Web of Science were reviewed between January 2007 and December 2011. All "Article" and "Review" document types in the "Dentistry, Oral Medicine and Surgery" category were included. Quantitative and qualitative analyses were performed. A total of 37,571 documents were found for the entire period, growing 24.3 % annually from 2007 to 2011. The publication language was mostly English (98.6 %), and 54.5 % of productivity was concentrated in five countries. A total of 44 countries had at least 100 documents and were included in the analysis, representing 36,532 (97.23 %) documents. It was concluded that increasing productivity in some countries, such as Brazil, China, India, and Turkey, was observed. High levels and stability in terms of impact was determined in the Nordic countries. The USA continues to lead in terms of overall productivity.
Publishing histories can reveal changes in ornithological effort, focus or direction through time. This study presents a bibliometric content analysis of Emu (1901-2011) which revealed 115 trends (long-term changes in publication over time) and 18 fads (temporary increases in publication activity) from the classification of 9,039 articles using 128 codes organised into eight categories (author gender, author affiliation, article type, subject, main focus, main method, geographical scale and geographical location). Across 110 years, private authorship declined, while publications involving universities and multiple institutions increased; from 1960, female authorship increased. Over time, question-driven studies and incidental observations increased and decreased in frequency, respectively. Single species and 'taxonomic group' subjects increased while studies of birds at specific places decreased. The focus of articles shifted from species distribution and activities of the host organisation to breeding, foraging and other biological/ecological topics. Site- and Australian-continental-scales slightly decreased over time; non-Australian studies increased from the 1970s. A wide variety of fads occurred (e.g. articles on bird distribution, 1942-1951, and using museum specimens, 1906-1913) though the occurrence of fads decreased over time. Changes over time are correlated with technological, theoretical, social and institutional changes, and suggest ornithological priorities, like those of other scientific disciplines, are temporally labile.
In many databases, science bibliography database for example, name attribute is the most commonly chosen identifier to identify entities. However, names are often ambiguous and not always unique which cause problems in many fields. Name disambiguation is a non-trivial task in data management that aims to properly distinguish different entities which share the same name, particularly for large databases like digital libraries, as only limited information can be used to identify authors' name. In digital libraries, ambiguous author names occur due to the existence of multiple authors with the same name or different name variations for the same person. Also known as name disambiguation, most of the previous works to solve this issue often employ hierarchical clustering approaches based on information inside the citation records, e.g. co-authors and publication titles. In this paper, we focus on proposing a robust hybrid name disambiguation framework that is not only applicable for digital libraries but also can be easily extended to other application based on different data sources. We propose a web pages genre identification component to identify the genre of a web page, e.g. whether the page is a personal homepage. In addition, we propose a re-clustering model based on multidimensional scaling that can further improve the performance of name disambiguation. We evaluated our approach on known corpora, and the favorable experiment results indicated that our proposed framework is feasible.
The ability to activate and manage effective collaborations is becoming an increasingly important criteria in policies on academic career advancement. The rise of such policies leads to development of indicators that permit measurement of the propensity to collaborate for academics of different ranks, and to examine the role of several variables in collaboration, first among these being the researchers' disciplines. In this work we apply an innovative bibliometric approach based on individual propensity for collaboration to measure the differences in propensity across academic ranks, by discipline and for choice of collaboration forms-intramural, extramural domestic and international. The analysis is based on the scientific production of Italian academics for the period 2006-2010, totaling over 200,000 publications indexed in Web of Science. It shows that assistant professors register a propensity for intramural collaboration that is clearly greater than for professors of higher ranks. Vice versa, the higher ranks, but not quite so clearly, register greater propensity to collaborate at the international level.
In recent decades, the topic of internationalization has emerged as one of the defining issues of higher education globally. Different approaches are emerged for the internationalization process according to universities structures and strategic plans, however, universities are still facing the problems in identification of basic steps through which transformation of higher education to internationalization is possible. This study proposed a framework for higher education in Pakistan. In order to energize the whole process towards internationalization, three-step framework utilized for the internationalization of higher education in Pakistan. Study identifies the basic dimensions for the improvements of the services and structure, which leads to internationalization of higher education in Pakistan. Study proposed the use of define, measure, analysis, improve and control cycle for continuous improvements in higher education's institutions in Pakistan.
An integrative approach is taken to mapping the field of research on information literacy in health sciences and social sciences. The objective was to identify the conceptual structure of these areas, and to determine their main research fronts and descriptors, and the relationships between them. A further objective is to determine whether information literacy is a consistent area. The basis of the study is the use of the program VOSViewer to analyse the co-occurrence of the areas' descriptors, grouping them into clusters and generating a map of their connections. Information retrieval was by retrospective searches of the Web of Science (Thomson Reuters) and Scopus (Elsevier). The results for the health sciences area yielded four clusters. The centralmost descriptor was Education (with a total link strength of 1,470), which was strongly linked to the descriptor "Information retrieval", and weakly linked to "Information skills", "Information seeking", and "Information Science". In social sciences, there were six clusters. "Information literacy" was now the descriptor with most occurrences (812) as well as having the greatest weight-a total link strength of 2,340-followed by "Education" with 839 occurrences. The resulting maps provide a graphical identification of the main research issues and trends in information literacy in these two areas of expertise which, according to the data of the present study, correspond to lesser (health sciences) and greater (social sciences) scientific production. Information literacy was seen to be conceptually more consistent in health sciences than in social sciences. However, at least for the moment, it is a still growing conceptual space that is in need of solider indices of consistency and specificity.
In this study we analyse a corpus of 300 randomly selected research paper titles written in English and published between 1998 and 2012 in the most prestigious journals in the field of Astrophysics, an under-researched discipline from a linguistic standpoint. We specifically address issues related to the evolution of titles, their length, their lexical density, their type distribution and their semantic content. Our findings reveal a trend towards relatively long titles with a high lexical density, a preference for nominal and simple titles over verbal and compound ones, a very low occurrence of question constructions, and a prevalence of purpose and results over methods as key research concepts expressed in titles. We compare our findings with the results of previous studies on titles in other scientific disciplines and provide explanations for the differences and similarities observed.
In November 2012 the Google Scholar Metrics (GSM) journal rankings were updated, making it possible to compare bibliometric indicators in the ten languages indexed-and their stability-with the April 2012 version. The h-index and h-5 median of 1,000 journals were analysed, comparing their averages, maximum and minimum values and the correlation coefficient within rankings. The bibliometric figures grew significantly. In just seven and a half months the h-index of the journals increased by 15 % and the median h-index by 17 %. This growth was observed for all the bibliometric indicators analysed and for practically every journal. However, we found significant differences in growth rates depending on the language in which the journal is published. Moreover, the journal rankings seem to be stable between April and November, reinforcing the credibility of the data held by Google Scholar and the reliability of the GSM journal rankings, despite the uncontrolled growth of Google Scholar. Based on the findings of this study we suggest, firstly, that Google should upgrade its rankings at least semi-annually and, secondly, that the results should be displayed in each ranking proportionally to the number of journals indexed by language.
A system of four research levels, designed to classify scientific journals from most applied to most basic, was introduced by Francis Narin and colleagues in the 1970s. Research levels have been used since that time to characterize research at institutional and departmental levels. Currently, less than half of all articles published are in journals that been classified by research level. There is thus a need for the notion of research level to be extended in a way that all articles can be so classified. This article reports on a new model - trained from title and abstract words and cited references - that classifies individual articles by research level. The model covers all of science, and has been used to classify over 25 million articles from Scopus by research level. The final model and set of classified articles are further characterized. (C) 2013 Elsevier Ltd. All rights reserved.
The time evolution of mean received citations is calculated on a sample of journals from two ISI subject categories ("Chemistry, multidisciplinary", ISI Science Edition, and "Management", ISI Social Science edition) with the use of an original methodology. Mean received citations are plotted against the time gap in years existing between publication of the cited article and received citations. For most Chemistry journals in the sample the maximum number of average received citations occurs two years after publication, and then a decrease is experimented. Some peculiar cases present a different trend. Management journals, conversely, do not present in most cases a peak of citations: average received citations instead grow from year of publication to the age of 10 years (maximum time gap studied). A sub-sample of journals show similar results for longer time series (up to 23 years). Medians of average received citations per year partly show a similar behavior. Results suggest that citedness follows very different trends in very different fields, and partly suggest why differences in journal Impact Factor exist between different categories. At the end of the work conclusions are drawn, together with suggestions for future research. (C) 2013 Elsevier Ltd. All rights reserved.
Within the field of bibliometrics, there is sustained interest in how nations "compete" in terms of academic disciplines, and what determinants explain why countries may have a specific advantage in one discipline over another. However, this literature has not, to date, presented a comprehensive structured model that could be used in the interpretation of a country's research profile and academic output. In this paper, we use frameworks from international business and economics to present such a model. Our study makes four major contributions. First, we include a very wide range of countries and disciplines, explicitly including the Social Sciences, which unfortunately are excluded in most bibliometrics studies. Second, we apply theories of revealed comparative advantage and the competitive advantage of nations to academic disciplines. Third, we cluster our 34 countries into five different groups that have distinct combinations of revealed comparative advantage in five major disciplines. Finally, based on our empirical work and prior literature, we present an academic diamond that details factors likely to explain a country's research profile and competitiveness in certain disciplines. (C) 2013 Elsevier Ltd. All rights reserved.
This paper contributes to the longitudinal study and representation of the diffusion of scholarly knowledge through bibliometrics. The case of systems biology is used to illustrate a means for considering the structure and different roles of journals in the diffusion of a relatively new field to diverse subject areas. Using a bipartite network analysis of journals and subject categories, a core-intermediary-periphery diffusion structure is detected through comparative analysis of betweenness centrality over time. Systems biology diffuses from a core of foundational, theoretical areas to more specific, applied, practical fields, most of which relate to human health. Next, cluster analysis is applied to subject category co-occurrence networks to longitudinally trace the movement of fields within the core-intermediary-periphery structure. The results of these analyses reveal patterns of systems biology's diffusion across both theoretical and applied fields, and are also used to suggest how the dynamics of a field's interdisciplinary evolution can be realized. The author concludes by presenting a typology for considering how journals may function to support attributes of the core-intermediary-periphery structure and diffusion patterns more broadly. (C) 2013 Elsevier Ltd. All rights reserved.
Interdisciplinary teams are assembled in scientific research and are aimed at solving complex problems. Given their increasing importance, it is not surprising that considerable attention has been focused on processes of collaboration in interdisciplinary teams. Despite such efforts, we know less about the factors affecting the assembly of such teams in the first place. In this paper, we investigate the structure and the success of interdisciplinary scientific research teams. We examine the assembly factors using a sample of 1103 grant proposals submitted to two National Science Foundation interdisciplinary initiatives during a 3-year period, including both awarded and non-awarded proposals. The results indicate that individuals' likelihood of collaboration on a proposal is higher among those with longer tenure, lower institutional tier, lower H-index, and with higher levels of prior co-authorship and citation relationships. However, successful proposals have a little bit different relational patterns: individuals' likelihood of collaboration is higher among those with lower institutional tier, lower H-index, (female) gender, higher levels of prior co-authorship, but with lower levels of prior citation relationships. (C) 2013 Elsevier Ltd. All rights reserved.
The publication credit allocation problem is one of the fundamental problems in bibliometrics. There are two solutions which do not use any additional information: equal weights measure and the Shapley value. The paper justifies the equal weights measure by showing equivalence with the Shapley value approach for sharing co-authors performance in specific games. (C) 2013 Elsevier Ltd. All rights reserved.
One of the flaws of the journal impact factor (IF) is that it cannot be used to compare journals from different fields or multidisciplinary journals because the IF differs significantly across research fields. This study proposes a new measure of journal performance that captures field-different citation characteristics. We view journal performance from the perspective of the efficiency of a journal's citation generation process. Together with the conventional variables used in calculating the IF, the number of articles as an input and the number of total citations as an output, we additionally consider the two field-different factors, citation density and citation dynamics, as inputs. We also separately capture the contribution of external citations and self-citations and incorporate their relative importance in measuring journal performance. To accommodate multiple inputs and outputs whose relationships are unknown, this study employs data envelopment analysis (DEA), a multi-factor productivity model for measuring the relative efficiency of decision-making units without any assumption of a production function. The resulting efficiency score, called DEA-IF, can then be used for the comparative evaluation of multidisciplinary journals' performance. A case study example of industrial engineering journals is provided to illustrate how to measure DEA-IF and its usefulness. (C) 2013 Elsevier Ltd. All rights reserved.
Ever more frequently, governments have decided to implement policy measures intended to foster and reward excellence in scientific research. This is in fact the intended purpose of national research assessment exercises. These are typically based on the analysis of the quality of the best research products; however, a different approach to analysis and intervention is based on the measure of productivity of the individual scientists, meaning the. overall impact of their entire scientific production over the period under observation. This work analyzes the convergence of the two approaches, asking if and to what measure the most productive scientists achieve highly cited articles; or vice versa, what share of highly cited articles is achieved by scientists that are "non-top" for productivity. To do this we use bibliometric indicators, applied to the 2004-2008 publications authored by academics of Italian universities and indexed in the Web of Science. (C) 2013 Elsevier Ltd. All rights reserved.
Dynamic development is an intrinsic characteristic of research topics. To study this, this paper proposes two sets of topic attributes to examine topic dynamic characteristics: topic continuity and topic popularity. Topic continuity comprises six attributes: steady, concentrating, diluting, sporadic, transforming, and emerging topics; topic popularity comprises three attributes: rising, declining, and fluctuating topics. These attributes are applied to a data set on library and information science publications during the past 11 years (2001-2011). Results show that topics on "web information retrieval", "citation and bibliometrics", "system and technology", and "health science" have the highest average popularity; topics on "h-index", "online communities", "data preservation", "social media", and "web analysis" are increasingly becoming popular in library and information science. (C) 2013 Elsevier Ltd. All rights reserved.
The increasing costs of research and the decreasing lifetime of products and processes make the decisions on allocation of R&D funds strategically important. Therefore, ability to predict research trends is crucial in minimizing risks of R&D expenditure planning. The purpose of this paper is to propose a model for efficient prediction of research trends in a chosen branch of science. The approach is based on population dynamics with Burgers' type global interaction and selective neighborhood. The model is estimated based on a training set. Then, an out-of-sample forecast is performed. The research trends of filtration and rectification processes were analyzed in this paper. The simulation results show that the model is able to predict the trends with a considerable accuracy and should, therefore, be tested on a wider range of research fields. (C) 2013 Elsevier Ltd. All rights reserved.
Journal self-citations strongly affect journal evaluation indicators (such as impact factors) at the mesa- and micro-levels, and therefore they are often increased artificially to inflate the evaluation indicators in journal evaluation systems. This coercive self-citation is a form of scientific misconduct that severely undermines the objective authenticity of these indicators. In this study, we developed the feature space for describing journal citation behavior and conducted feature selection by combining GA-Wrapper with RelifF. We also constructed a journal classification model using the logistic regression method to identify normal and abnormal journals. We evaluated the performance of the classification model using journals in three subject areas (BIOLOGY, MATHEMATICS and CHEMISTRY, APPLIED) during 2002-2011 as the test samples and good results were achieved in our experiments. Thus, we developed an effective method for the accurate identification of coercive self-citations. (C) 2013 Elsevier Ltd. All rights reserved.
The non-citation rate refers to the proportion of papers that do not attract any citation over a period of time following their publication. After reviewing all the related papers in Web of Science, Google Scholar and Scopus database, we find the current literature on citation distribution gives more focus on the distribution of the percentages and citations of papers receiving at least one citation, while there are fewer studies on the time-dependent patterns of the percentage of never-cited papers, on what distribution model can fit their time-dependent patterns, as well as on the factors influencing the non-citation rate. Here, we perform an empirical pilot analysis to the time-dependent distribution of the percentages of never-cited papers in a series of different, consecutive citation time windows following their publication in our selected six sample journals, and study the influence of paper length on the chance of papers' getting cited. Through the above analysis, the following general conclusions are drawn: (1) a three-parameter negative exponential model can well fit timedependent distribution curve of the percentages of never-cited papers; (2) in the initial citation time window, the percentage of never-cited papers in each journal is very high. However, as the citation time window becomes wider and wider, the percentage of never-cited papers begins to drop rapidly at first, and then drop more slowly, and the total degree of decline for most of journals is very large; (3) when applying the wider citation time windows, the percentage of never-cited papers for each journal begins to approach a stable value, and after that value, there will be very few changes in these stable percentages, unless we meet a large amount of "Sleeping Beauties" type papers; (4) the length of an paper has a great influence on whether it will be cited or not. (C) 2013 Elsevier Ltd. All rights reserved.
Across the various scientific domains, significant differences occur with respect to research publishing formats, frequencies and citing practices, the nature and organisation of research and the number and impact of a given domain's academic journals. Consequently, differences occur in the citations and h-indices of the researchers. This paper attempts to identify cross-domain differences using quantitative and qualitative measures. The study focuses on the relationships among citations, most-cited papers and h-indices across domains and for research group sizes. The analysis is based on the research output of approximately 10,000 researchers in Slovenia, of which we focus on 6536 researchers working in 284 research group programmes in 2008-2012. As comparative measures of cross-domain research output, we propose the research impact cube (RIC) representation and the analysis of most-cited papers, highest impact factors and citation distribution graphs (Lorenz curves). The analysis of Lotka's model resulted in the proposal of a binary citation frequencies (BCF) distribution model that describes well publishing frequencies. The results may be used as a model to measure, compare and evaluate fields of science on the global, national and research community level to streamline research policies and evaluate progress over a definite time period. (C) 2013 Elsevier Ltd. All rights reserved.
We have developed a (freeware) routine for "Referenced Publication Years Spectroscopy" (RPYS) and apply this method to the historiography of "iMetrics," that is, the junction of the journals Scientometrics, Informetrics, and the relevant subset of JASIST (approx. 20%) that shapes the intellectual space for the development of information metrics (bibliometrics, scientometrics, informetrics, and webometrics). The application to information metrics (our own field of research) provides us with the opportunity to validate this methodology, and to add a reflection about using citations for the historical reconstruction. The results show that the field is rooted in individual contributions of the 1920s to 1950s (e.g., Alfred J. Lotka), and was then shaped intellectually in the early 1960s by a confluence of the history of science (Derek de Solla Price), documentation (e.g., Michael M. Kessler's "bibliographic coupling"), and "citation indexing" (Eugene Garfield). Institutional development at the interfaces between science studies and information science has been reinforced by the new journal Inforrnetrics since 2007. In a concluding reflection, we return to the question of how the historiography of science using algorithmic means in terms of citation practices can be different from an intellectual history of the field based, for example, on reading source materials. (C) 2013 Elsevier Ltd. All rights reserved.
The findings of Bornmann, Leydesdorff, and Wang (2013b) revealed that the consideration of journal impact improves the prediction of long-term citation impact. This paper further explores the possibility of improving citation impact measurements on the base of a short citation window by the consideration of journal impact and other variables, such as the number of authors, the number of cited references, and the number of pages. The dataset contains 475,391 journal papers published in 1980 and indexed in Web of Science (WoS, Thomson Reuters), and all annual citation counts (from 1980 to 2010) for these papers. As an indicator of citation impact, we used percentiles of citations calculated using the approach of Hazen (1914). Our results show that citation impact measurement can really be improved: If factors generally influencing citation impact are considered in the statistical analysis, the explained variance in the long-term citation impact can be much increased. However, this increase is only visible when using the years shortly after publication but not when using later years. (C) 2013 Elsevier Ltd. All rights reserved.
The journal impact factor (JIF) reported in journal citation reports has been used to represent the influence and prestige of a journal. Whereas the consideration of the stochastic nature of a statistic is a prerequisite for statistical inference, the estimation of JIF uncertainty is necessary yet unavailable for comparing the impact among journals. Using journals in the Database of Research in Science Education (DoRISE), the current study proposes bootstrap methods to estimate the JIF variability. The paper also provides a comprehensive exposition of the sources of JIF variability. The collections of articles in the year of interest and in the preceding years both contribute to JIF variability. In addition, the variability estimate differs depending on the way a database selects its journals for inclusion. In the bootstrap process, the nested structure of articles in a journal was accounted for to ensure that each bootstrap replication reflects the actual citation characteristics of articles in the journal. In conclusion, the proposed point and interval estimates of the JIF statistic are obtained and more informative inferences on the impact of journals can be drawn. (C) 2013 Elsevier Ltd. All rights reserved.
Author co-citation analysis (ACA) has long been used as an effective method for identifying the intellectual structure of a research domain, but it relies on simple co-citation counting, which does not take the citation content into consideration. The present study proposes a new method for measuring the similarity between co-cited authors by considering author's citation content. We collected the full-text journal articles in the information science domain and extracted the citing sentences to calculate their similarity distances. We compared our method with traditional ACA and found out that our approach, while displaying a similar intellectual structure for the information science domain as the other baseline methods, also provides more details about the sub-disciplines in the domain than with traditional ACA. (C) 2013 Elsevier Ltd. All rights reserved.
this contribution we show how results obtained in a series of papers by Egghe can be refined in the sense that we need fewer additional conditions. In these articles Egghe considered a general h-type index which has a value n if n is the largest natural number such that the first n publications (ranked according to the number of received citations) have received at least f(n) citations, with f(n) any increasing function defined on the strictly positive numbers. His results deal with increments I-2 and I-1 defined by: I-2(n)-I-1 (n + 1)-I-1 (n) where I-1 (n) = (n + 1)f(n + 1) - nf(n). Our results differ from Egghe's because we also consider I-k(0), k= 1,2. We, moreover, provide a non-recursive definition of the increment functions I-k(n) (C) 2013 Elsevier Ltd. All rights reserved.
Research institutions play an important role in scientific research and technical innovation. The topical analysis of research institutions in different countries can facilitate mutual learning and promote potential collaboration. In this study, we illustrate how an unsupervised artificial neural network technique Self-Organizing Map (SOM) can be used to visually analyze the research fields of research institutions. A novel SOM display named Compound Component Plane (CCP) was presented and applied to determine the institutions which made significant contributions to the salient research fields. Eighty-seven Chinese and American LIS institutions and the technical LIS fields were taken as examples. Potential international and domestic collaborators were identified based upon their research similarities. An approach of dividing research institutions into clusters was proposed based on their geometric distances in the SOM display, the U-matrix values and the most salient research topics they involved. The concepts of swarm institutions, pivots and landmarks were also defined and their instances were identified. (C) 2013 Elsevier Ltd. All rights reserved.
This paper introduces a novel application in bibliometrics of the barycenter method. Using places of publication barycenters, we measure internationalization of book publishing in the Social Sciences and Humanities. Based on 2002-2011 data for Flanders, Belgium, we demonstrate how the geographic center of weight of book publishing is different for the Social Sciences than for the Humanities. Whereas the latter still rely predominantly on domestic Flemish and continental European publishers, the former are firmly Anglo-Saxon oriented. The Humanities, however, show a more pronounced evolution toward further internationalization. For the already largely internationally oriented Social Sciences, in most recent years, the share of British publishers has grown. The barycenter method proves to be a valuable tool in the representation of research internationalization of book publications. This is especially the case when applied non-Anglophone countries. (C) 2013 Elsevier Ltd. All rights reserved.
This study established a technological impact factor (TIF) derived from journal impact factor (JlF), which is proposed to evaluate journals from the aspect of practical innovation. This impact factor mainly examines the influence of journal articles on patents by calculating the number of patents cited to a journal divided by the number of articles published in that particular journal. The values of TIF for five-year (TIF5) and ten-year (TIF10) periods at the journal level and aggregated TIF values (TIFAGG-3 and TIFAGG-10) at the category level were provided and compared to the JIF. The results reveal that journals with higher TIF values showed varied performances in the JCR, while the top ten journals on JIF(5) showed consistent good performance in TIFs. Journals in three selected categories - Electrical & Electronic Engineering, Research & Experimental Medicine, and Organic Chemistry - showed that TIF5 and TIF10, values are not strongly correlated with JIF5. Thus, TIFs can provide a new indicator for evaluating journals from the aspect of practical innovation. (C) 2013 Elsevier Ltd. All rights reserved.
We axiomatize the well-known Hirsch index (h-index), which evaluates researcher productivity and impact on a field, and formalize a new axiom called head-independence. Under head-independence, a decrease, to some extent, in the number of citations of "frequently cited papers" has no effect on the index. Together with symmetry and axiom D, head-independence uniquely characterizes the h-index on a certain domain of indices. Some relationships between our axiomatization and those in the literature are also investigated. (C) 2013 Elsevier Ltd. All rights reserved.
Knowledge transfer between science and technology has been studied at micro- and macro-levels of analysis. This has contributed to the understanding of the mechanisms and drivers, but actual transfer mechanism and process, be they through codified or tacit sources, have very rarely been mapped and measured to completeness and remain, to a large extent, a black box. We develop a novel method for mapping science-technology flows and introduce 'concept clusters' as an instrument to do so. Using patent and publication data, we quantitatively and visually demonstrate the flows of knowledge between academia and industry. We examine the roles of exogenous and endogenous knowledge sources, and of co-inventors and co-authors in the application of university-generated knowledge. When applied to a stylised case, we show that the method is able to trace the linkages between base knowledge and skill sets and their application to a technology, which in some instances span over twenty-five years. (C) 2014 Elsevier Ltd. All rights reserved.
It is widely believed that collaboration is advantageous in science, for example, with collaboratively written articles tending to attract more citations than solo articles and strong arguments for the value of interdisciplinary collaboration. Nevertheless, it is not known whether the same is true for research that produces books. This article tests whether coauthored scholarly monographs attract more citations than solo monographs using books published before 2011 from 30 categories in the Web of Science. The results show that solo monographs numerically dominate collaborative monographs, but give no evidence of a citation advantage for collaboration on monographs. In contrast, for nearly all these subjects (28 out of 30) there was a citation advantage for collaboratively produced journal articles. As a result, research managers and funders should not incentivise collaborative research in book-based subjects or in research that aims to produce monographs, but should allow the researchers themselves to freely decide whether to collaborate or not. (C) 2013 Elsevier Ltd. All rights reserved.
In this paper, we show that an information source composed with n random variables may be split into 2(n) or 2(n) -1 "states"; therefore, one could compute the maximum entropy of the source. We derive the efficiency and the unused capacity of an information source. We demonstrate that in more than two dimensions, the transmission's variability depends on the system configuration; thus, we determine the upper and the lower bounds to the mutual information and propose the transmission power as an indicator of the Triple Helix of university-industry-government relationships. The transmission power is defined as the fraction of the total 'configurational information' produced in a system; it appears like the efficiency of the transmission and may be interpreted as the strength of the variables dependency, the strength of the synergy between the system's variable or the strength of information flow within the system. (C) 2013 Elsevier Ltd. All rights reserved.
This paper describes the results of a multi-level network analysis of web-citations among the 1,000 universities with the greatest presence on the world wide web. Using data from January 2011, it describes the web-citation network of the world's universities and ascertains the antecedent factors that determine its structure. At the university level, the network is composed of ten groups, and the most central universities are mainly from the United States. The factors that predict the structure of the network are, whether or not the universities are in the same country, the language of instruction, the size and excellence of the institution (university ranking and the number of Nobel Prizes received), if they offer doctoral degrees, and the infrastructure of its country. Physical distance was not a determinant of the network's structure. At the nation-state level, international connections among a nation's universities are composed of a single cluster with the United States, United Kingdom and Germany at the center. The structure of the international network may be predicted by the countries' overall hyperlink connections, international co-authorships, student flows and the number of Nobel Prizes won by its citizen.
Mutual information in three (or more) dimensions can be considered as a Triple-Helix indicator of possible synergy in university-industry-government relations. An open-source routine th4.exe makes the computation of this indicator interactively available at the internet, and thus applicable to large sets of data. Th4.exe computes all probabilistic entropies and mutual information in two, three, and, if available in the data, four dimensions among, for example, classes such as geographical addresses (cities, regions), technological codes (e.g. OECD's NACE codes), and size categories; or, alternatively, among institutional addresses (academic, industrial, public sector) in document sets. The relations between the Triple-Helix indicator-as an indicator of synergy-and the Triple-Helix model that specifies the possibility of feedback by an overlay of communications, are also discussed.
This paper investigates the dynamic evolution profiles of science and technology knowledge production in Brazil and the Republic of Korea from 2000 to 2009. The two countries have followed different models of publication profiles, bioenvironmental model and Japanese model, and they currently belong to periphery countries in terms of the center-periphery framework. Brazil and the Republic of Korea have established a few core disciplines successfully and increased their share in the world publication of scientific papers over the last decade. Notwithstanding the fact that the two countries have recorded sustained growth in the percentage of published scientific papers, South Korea has evolved into a more balanced science and technology knowledge production system, whereas Brazil into the more unbalanced knowledge production system. Core-lagging or periphery-lagging patterns of science production have been revealed in Brazil and indirectly imply that the existing science base has not been fully stimulated or utilized.
In recent years, the Triple Helix model has identified feasible approaches to measuring relations among universities, industries, and governments. Results have been extended to different databases, regions, and perspectives. This paper explores how bibliometrics and text mining can inform Triple Helix analyses. It engages Competitive Technical Intelligence concepts and methods for studies of Newly Emerging Science & Technology (NEST) in support of technology management and policy. A semantic TRIZ approach is used to assess NEST innovation patterns by associating topics (using noun phrases to address subjects and objects) and actions (via verbs). We then classify these innovation patterns by the dominant categories of origination: Academy, Industry, or Government. We then use TRIZ tags and benchmarks to locate NEST progress using Technology Roadmapping. Triple Helix inferences can then be related to the visualized patterns. We demonstrate these analyses via a case study for dye-sensitized solar cells.
Since the cluster began to receive attention as a critical environmental factor in geographical economics, it has provided a major research methodology across multiple disciplines from industrial organization, strategic management, regional innovation system, and Triple Helix to virtual clusters. Network structure analysis (NSA) offers a common framework to observe clusters that have been studied separately from the viewpoint of industrial organization and strategic management. Industrial structure analysis, is based on the externality of a network and the resource-based view, focused on the inherent network capacity, have been combined with the study of structural changes through cluster NSA, to create a new direction for the growth of industry and individual firms. This study aims to analyze the correlation between the networking of structural change and a firm's performance by selecting a software industrial cluster as a representative case for the knowledge industry. We examine the network structural positions of each node during the cluster evolution process. This empirical study has significance for establishing a firm's growth strategy as well as supporting the policy about clusters, through outlining the dynamic evolution process of the networking activities in a knowledge industry cluster.
The purpose of this study is to explore the possibility of applying research collaboration as a new way of measuring research performance in Korean universities. In this study, we examine whether the activeness of research collaboration between university-government-industry can also enact as a way to measure the research performance aside from the typical indicators such as number of published articles or citations resulted from universities. Also this study focuses to analyze whether such performance differs according to universities' characteristics and disciplines. For the analysis of the study, we gathered publication and citation data (2000-2009) of 46 Korean universities that are actively involved in research and analyzed their science citation index-expanded and the social sciences citation index (SSCI) data. Notable findings include (1) Several low ranked universities have shown rapid improvement with their research performance despite the rigid hierarchical characteristic of Korean higher education system, (2) Although universities in Korea are involved in various kinds of collaboration methods, it was evident that such dynamic is not necessarily reflected in existing hierarchy structure, (3) Academic relations with education oriented universities and research oriented universities have different dynamics and patterns in research collaboration, (4) In terms of the collaborative publication rate, private universities collaborate more actively amongst university sector whereas public universities collaborate more with government and industry. (5) Due to the nature of the social science subject itself, it was found that the research in SSCI is inevitably more based on the researcher's independence, hence more international collaboration was found amongst researchers in natural science and engineering subjects.
Information Communication Technology (ICT) has a significant impact on the socioeconomic development of a country. However, the inequitable access to ICT still remains a major issue in developing countries. In this context, this study examines ICT knowledge infrastructure in South Asia from a network point of view. Existing research on ICTs are useful in understanding the common facts, but are limited in revealing the hidden structures and properties of the ICT research domain in South Asia. The hidden structures and properties, like key players, network of key players for scientific collaborations, and their network characteristics are analyzed and synthesized in this study. This study applies the mixed approach of Social Network Analysis techniques and Triple Helix indicators on scholarly papers obtained from the Web of Science database. Further, burst detection algorithm is applied on keywords appearing in the titles of the South Asian ICT scholarly papers to understand the emerging trends in the ICT research domain. This study helps in providing a better understanding of current trends, strengths, and weaknesses of ICT in South Asia, which provides a better understanding to bridge the digital divide and achieve socioeconomic development through ICT.
The literature on the characteristics of the Triple Helix (TH) and university-industry-government relationships has not sufficiently shared research on this topic in Asia. Based on the assumption that the literature does not provide a sufficient overview of the sparse and complex yet diverse process of TH research in Asia, this study examines the characteristics of TH scholars such as their affiliations, preferred journals, international linkage patterns, and semantic discourse networks by analyzing their research articles. The results identify the most prominent TH scholars, journals, issues, and research trends in Asia and suggest a need for deeper and more creative analyses of the TH model in the region and for longer time periods for longitudinal analyses.
This contribution explores how work on Triple Helix (TH) indicators has evolved. Over the past 15 years a body of literature has emerged that brings together a variety of approaches to capture, map or measure the dynamics of TH relationships. We apply bibliographic coupling and co-citation in combination with content analysis to develop a better understanding of this literature. We identify several clusters that can be aggregated to two broad streams of work-one 'neo-evolutionary', the other 'neo-institutional' in nature. We make this observation both for bibliographic coupling and co-citation analyses which we take as indication of an emerging differentiation of the field. Our content analysis underlines this observation about the 'two faces' of the TH. We conclude this paper with a discussion of future opportunities for research. We see great potential in developing the application side of TH indicators.
This study examines the implications of the predicted big data revolution in social sciences for the research using the Triple Helix (TH) model of innovation and knowledge creation in the context of developing and transitional economies. While big data research promises to transform the nature of social inquiry and improve the world economy by increasing the productivity and competitiveness of companies and enhancing the functioning of the public sector, it may also potentially lead to a growing divide in research capabilities between developed and developing economies. More specifically, given the uneven access to digital data and scarcity of computational resources and talent, developing countries are at disadvantage when it comes to employing data-driven, computational methods for studying the TH relations between universities, industries and governments. Scientometric analysis of the TH literature conducted in this study reveals a growing disparity between developed and developing countries in their use of innovative computational research methods. As a potential remedy, the extension of the TH model is proposed to include non-market actors as subjects of study as well as potential providers of computational resources, education and training.
By considering Korea's presidential election on December 19, 2012, this study examines how a presidential campaign can be measured using (negative) entropy indicators. We collected data from Google-indexed web documents, Twitter, and Facebook for four time periods. More specifically, we measured bilateral, trilateral, and quadruple relationships based on the number of web and social media mentions referring only to a candidate (this is, no mention of other candidates or the term "president"). The results indicate that Twitter tended to generate the highest entropy value across the three time periods but that President Geun-Hye Park outperformed the other candidates across all three periods on Google in terms of (negative) entropy indicators.
This study aims to explore the effects of both journal self citations and mutual citations within a group of journals on the increase in the impact factors (IFs) for social sciences journals published in Eastern Europe. We found that the practice of mutual citations is prevalent among the new journals, a trend that raises questions about possible manipulation of the IF and potential isolation of the recent journals from the international network of scholarly communication.
Based on country-level comparisons, this study applies geographic (internal vs. external) and knowledge (exploitation vs. exploration) boundaries to explore the influence of knowledge sources and ambidexterity on production and innovation performance in the thin film transistor-liquid crystal display (TFT-LCD) industries of the three major players, Japan, Korea, and Taiwan, from 1995 to 2009. Our findings suggest that different resource-based industrial development strategies are associated with the specific knowledge acquisition strategies in the technology leader, Japan and its followers, Korea and Taiwan. The contribution of this study is empirical verification of the influence of knowledge sources and ambidextrous capabilities on production and innovation activities in the TFT-LCD industries of these countries. Since each country is endowed with different resources, this study aims to reveal the strong implications of this for the design of an industrial strategy that has to acquire both known and new knowledge through internal and external sources simultaneously, while carefully integrating them and exploiting their interactions.
The family of indicators presented in this paper includes indices created by taking into account not only the direct but also the indirect impact of citations and references. Three types of citation graphs are presented, namely, the Paper-Citation graph, the Author-Citation graph and the Journal-Citation graph, along with different methods for constructing them. In addition, the concept of generations of citations is examined in detail, again by presenting various methods for defining them found in the literature. Finally, a number of indirect indicators for papers, authors and journals are discussed, which among others, include PageRank, CiteRank, indirect h-index and the EigenFactor score.
Journal impact factors (JIF) are computed by Thomson Reuters to three decimal places. Some authors have cast doubt on the validity of the third decimal place in JIFs. In this paper I present a new approach to evaluate the significance of decimal places in JIFs. To do so, two modified JIFs were computed by adding or removing one citation to the number used by Thomson Reuters to compute the JIF for journals listed in the 2008 Journal Citation Report. The rationale is that one citation is the minimum amount of impact that can be observed and analyzed. Next, the modified JIFs were compared with the original JIF to identify the decimal place that changed as consequence of adding or removing one citation. The results suggest that for about two-third of journals, the number of places used by Thomson Reuters to compute JIFs can be considered appropriate for the most part.
The measurement of the quality of academic research is a rather controversial issue. Recently Hirsch has proposed a measure that has the advantage of summarizing in a single summary statistics the information that is contained in the citation counts of each scientist. From that seminal paper, a huge amount of research has been lavished, focusing on one hand on the development of correction factors to the h index and on the other hand, on the pros and cons of such measure proposing several possible alternatives. Although the h index has received a great deal of interest since its very beginning, only few papers have analyzed its statistical properties and implications. In the present work we propose a statistical approach to derive the distribution of the h index. To achieve this objective we work directly on the two basic components of the h index: the number of produced papers and the related citation counts vector, by introducing convolution models. Our proposal is applied to a database of homogeneous scientists made up of 131 full professors of statistics employed in Italian universities. The results show that while "sufficient" authors are reasonably well detected by a crude bibliometric approach, outstanding ones are underestimated, motivating the development of a statistical based h index. Our proposal offers such development and in particular confidence intervals to compare authors as well as quality control thresholds that can be used as target values.
The paper presents results from social network analysis applied to data on patenting of academics inventors employed in two Italian universities (Trieste University and Udine university, both located in Friuli Venezia Giulia region). The aim is to compare the co-invention networks generated by the academic inventors, tenured by one of the two universities, in their patenting activity with several organisations-firms, public research organisations-and in their activity for patents owned by one of the two universities. Results show that, despite the structural similarity, non-marginal differences emerge in the interaction of the two forms of patenting across the two universities. Empirical evidence suggests new research questions related in particular to the role played by the differing university patenting strategies in shaping local networks.
It is examined whether the relationship J ae A/r (alpha) , and the subsequent coauthor (CA) core notion (Ausloos, Scientometrics 95(3):895-909, 2013), between the number (J) of joint publications (JPs) by a "main scientist" [leading investigator (LI)] with her/his CAs can be extended to a team-like system. This is done by considering that each CA can be so strongly tied to the LI that they are forming binary scientific star (BSS) systems with respect to their other collaborators. Moreover, publications in peer review journals and in "proceedings", both often thought to be of "different quality", are separately distinguished. The role of a time interval for measuring J and alpha is also examined. New indirect measures are also introduced. For making the point, two LI cases with numerous CAs are studied. It is found that only a few BSS need to be usefully examined. The exponent alpha turns out to be "second scientist" weakly dependent, but still "size" and "publication type" dependent, according to the number of CAs or JP. The CA core value is found to be (CA or JP) size and publication type dependent, but remains in an understandable range. Somewhat unexpectedly, no special qualitative difference on the BSS CA core value is found between publications in peer review journals and in proceedings. In conclusion, some remark is made on partner cooperation in BSS teams. It is suggested that such measures can serve as criteria for distinguishing the role of scientists in a team.
This article, which began as an effort to gauge trends in and contributions to the broad field of "entrepreneur/entrepreneurship," reviews 5,476 academic articles on entrepreneurship that were published in 522 Social Sciences Citation Index and Science Citation Index journals from 1996 to June 2012. This survey identifies keywords and conducts a review to search for and identify related articles in the Institute for Scientific Information Web of Science database. We then present our findings, including the number of publications by year, categorization of article types, main academic journals, authors, and most-cited articles. The citation counts for authors, journals, and articles are also analyzed. This study indicates that the number of articles related to the keyword entrepreneur increased from 1996 to the end of 2011, which is a sign of an upward trend in the influence of entrepreneurs. Entrepreneur research fascinated numerous scholars during the study period covering 16.5 years. In particular, researchers from the USA, England, Canada, Germany, and the Netherlands have made the most contributions to this field. This literature review provides evidence that the concept of entrepreneur attracted academic researchers, resulting in significant contributions to the field of entrepreneur research.
A bibliometric research based on the Science Citation Index Expanded was carried out to provide insights into research activities on bioinformatics in China. Annual publication output has been on continuous increase both worldwide and for China from 1998 to 2012. In recent years, China showed faster growth rates than world average. As the second productive country in the field of bioinformatics, China did not do equally well in terms of citation counts and h-index. Chinese Academy of Sciences and Shanghai Jiao Tong University were among the ten most productive institutes in the world, and their basic metrics and collaboration patterns were compared with other institutes, especially two institutes from Japan. The journal PLoS One was found to have published the most papers from China. In addition, this paper compared the most active categories in Web of Science worldwide with those of China. Personal perspectives of bioinformatics research in China were also presented.
In this paper the question of returns to scale in scientific production is analysed using non-parametric techniques of multidimensional efficiency measurement. Based on survey data for German research groups from three scientific fields, it is shown that the multidimensional production possibility sets are weakly non-convex and locally strictly non-convex. This suggests that the production functions for the groups in the sample are characterised by increasing returns to scale in some regions and at least constant returns to scale otherwise. This has two implications for the organisation of scientific research: first, the size of at least some groups in our sample is suboptimal and they would benefit from growth. Second, greater specialisation in certain tasks in science (e.g. transfer-oriented groups vs. research-oriented groups) would increase the output of the overall system.
Webometrics and web mining are two fields where research is focused on quantitative analyses of the web. This literature review outlines definitions of the fields, and then focuses on their methods and applications. It also discusses the potential of closer contact and collaboration between them. A key difference between the fields is that webometrics has focused on exploratory studies, whereas web mining has been dominated by studies focusing on development of methods and algorithms. Differences in type of data can also be seen, with webometrics more focused on analyses of the structure of the web and web mining more focused on web content and usage, even though both fields have been embracing the possibilities of user generated content. It is concluded that research problems where big data is needed can benefit from collaboration between webometricians, with their tradition of exploratory studies, and web miners, with their tradition of developing methods and algorithms.
Here we show how the same organizational structures can arise across seemingly unrelated domains of human activities. To this end we examine the example of academic journals publishing and stock market. A number of academic journals with low-prestige and limited resources may compete in the same selection process of high-quality manuscripts. This shared selection process is performed by an independent editorial committee. A journal editor is interested in maximizing the growth rate of journal wealth based on an optimal strategy of allocations on candidate manuscripts. Here we introduce the system of optimality equations for the maximization problem. Next, we find an optimal set of manuscripts to allocate on, as well as the optimal allocation fractions. It can be easily implemented by a simple algorithm for use at the shared selection process of high-quality manuscripts. The proposed structure presents a loose network of economic transactions, i.e. journal editors compete on somewhat like manuscript market by making stakes and risking their money. We provide a publicly available suite of web-based tools designed to the computation of the optimal set of manuscripts and the respective allocation fractions. Examples of the performance of the Web application for allocating journal resources are presented for two different selection processes of high-quality manuscripts.
A scientometric analysis of the BabeAY-Bolyai University in Romania is provided, highlighting the strong and the weak points with respect to a range of leading international universities and referencing to some extent to nation-wide data from several countries. Taken into account are such items as total number of publications, analyses per subject area or per research field, number of citations, types of publications, Hirsch indexes, and books. Internationally, chemistry, physics, mathematics, computer science, religion, area studies, geology, paleontology, and public administration are identified as the most active areas. Nationally, a number of additional strong points are identified, such as psychology, history, and environmental sciences. The percentage of researchers with reasonably high activity (e.g., at least similar to one publication per year as indexed in major databases) is relatively low (similar to 10 %), and the percentage with reasonably high international competitiveness (based on citation counts, number of publications, books indexed in international libraries) is at only similar to 2 %. The decisive factor controlling an exponential increase in publications since similar to 2000-2004 appears to have been a conservatively managed exponential increase of the national GDP and implicitly of the research budgets.
The degree to which scholarly journal articles published in subscription-based journals could be provided open access (OA) through publisher-permitted uploading to freely accessible web locations, so called green OA, is an underexplored area of research. This study combines article volume data originating from the Scopus bibliographic database with manually coded publisher policies of the 100 largest journal publishers measured by article output volume for the year 2010. Of the 1.1 million articles included in the analysis, 80.4 % could be uploaded either as an accepted manuscript or publisher version to an institutional or subject repository after one year of publication. Publishers were found to be substantially more permissive with allowing accepted manuscripts on personal webpages (78.1 % of articles) or in institutional repositories (79.9 %) compared to subject repositories (32.8 %). With previous studies suggesting realized green OA to be around 12 % of total annual articles the results highlight the substantial unused potential for green OA.
Peer evaluation of research grant applications is a crucial step in the funding decisions of many science funding agencies. Funding bodies take various measures to increase the independence and quality of this process, sometimes leading to difficult combinatorial problems. We propose a novel method based on network flow theory to find assignments of evaluators to grant applications that obey the rules formulated by the Slovak Research and Development Agency.
In an era of energy crisis, biomass-based bioenergy research has attracted the attention of R&D managers and policy makers around the world. This study explores the structural and dynamic patterns of biomass-based bioenergy research. We measure the profile of biomass research on both macro scales (nations) and meso level (institutions) in an international context. We find that biomass publications are intensively distributed in developed regions and some emerging economies. The U.S. leads in this emerging field as evidenced by research quantity, impact, and international collaboration links. China is developing rapidly in this domain in terms of publication volume and collaborative links but suffers from low research visibility. The study also finds strong interactions are taking place both within macro-disciplines and between macro-disciplines. Research limitations are presented.
Drawing on social choice theory we derive a rationale in which each reviewer is asked to provide his or her second, third, and fourth choice in addition to his/her first choice recommendation regarding the acceptance/revision/rejection of a given manuscript. All reviewers' hierarchies of alternatives are collected and combined such that an overall ranking can be computed. Consequently, conflicting recommendations are resolved not by asking a third adjudicating reviewer for his/her recommendation as is usual editorial praxis in many scientific journals, but rather by using more information from the available judges. After a brief introduction into social choice theory and a description and justification of the maximum likelihood rule for ranking alternatives, we describe and demonstrate a public available web application that provides easy-to-use tools to apply these methods for aggregating conflicting reviewers' recommendations. This application might be accessed by editors to aid their decision process in case they receive conflicting recommendations by their reviewers.
Several publication metrics are used for the evaluation of academic productivity. h index and g index are relatively new statistics for this purpose. Our aim is to evaluate academic psychiatrists' h and g indices at different academic ranks in the United States. 30 psychiatry programs from the American Medical Association's FREIDA online database were included to the study. From each academic rank, the total number of papers (P (total)), the single authored papers (P (single)) and the h and g indexes of faculty members were calculated by using one way ANOVA for multiple comparisons as primary analysis test. The metric medians as follows; P (total) = 34.5, P (single) = 13, g index = 19.5 and h index = 9. h index significantly differed between academic ranks except chairperson-professor. The other indices failed to distinguish junior academic ranks (associated professor-assistant professor) in addition to chairperson-professor. The strongest correlation was between h index and g indexes. Of the indices evaluated, the h-index is best tracked with academic ranking in psychiatry programs studied.
This paper presents a bibliometric study of the world's research activity in Sustainable Development using scientific literature. The study was conducted using data from the Scopus database over the time period of 2000-2010. We investigated the research landscape in Sustainable Development at country level and at institute level. Sustainable Development and its sub-areas are defined by keywords vetted by the domain experts, allowing publications to be identified independent of the journals and conferences in which they are published. The results indicate that institutes strong in Sustainable Development overall may not be strong in all sub-areas and that institutes not strong in Sustainable Development overall may have significant niche strengths in a given sub-area. It is also noted that China appears strong in terms of publication output in Sustainable Development and its sub-areas but it does not appear strong in terms of citation counts. The information produced in this study can be useful for government research agencies in terms of understanding how to more effectively knit together the various niche strengths in the country; and for the institutes to find strategic partners that can coordinate in niche areas of Sustainable Development and complement their strengths. In order to conduct bibliometric analysis in an interdisciplinary research area, the keyword collection approach appears to be very useful. This approach is flexible and can be used to conduct such analysis for interdisciplinary research fields.
Authors have various motivations in citing references during scientific production. The study of these motivations has led to the introduction of different theories like normative theory and social constructive theory of citing behavior. Using the social constructive approach to citing behavior, this research introduces citing conformity whereby some authors' social, personal or non professional citing behaviors are determined by societal pressure. This is explained at three levels namely; normative, informational and identification. This paper aims to design, validate and determine the reliability of a questionnaire to measure citing conformity at these three levels. In order to devise the instrument, a questionnaire with 45 items was preliminarily designed. After face validity of the questionnaire had been determined by ten scholars, data was gathered. 150 Iranian authors with at least two articles indexed in Arts and Humanities Citation Index (AHCI) or Social Science Citation Index (SSCI) during the period 2001-2010 were selected using systematic random allocation and were asked to fill out the questionnaire. Exploratory factor analysis was used to analyze the data. Factor analysis was administered using principal components analysis (PCA) with Varimax rotation, eigenvalue more than one, and factor loading 0.45 to extract three factors. Out of 45 items, 11 were deleted by the software due to low factor loading. The remaining 34 items were retained and constitute tree factors: normative (13 items), informational (13 items), and identification (8 items). KMO coefficient test was 0.726 and Bartlett sphericity index was 2431.91 (P < 0.0001) which proved the sufficiency of sample size and the reliability of the test. Cronbach's alpha was employed to determine the reliability of this instrument. Cronbach's alpha coefficient for normative, informational and identification conformities was 0.86, 0.81 and 0.85 respectively. Therefore, the reliability of all the factors was acceptable with approximately high coefficients. As the Cronbach's alpha coefficients convey, the reliability of all factors was acceptable. The development of a citing conformity instrument at normative, informational and identification levels, provides a scale to measure authors' citing behavior in social, personal or non professional aspects according to the above mentioned psychological variables (normative, informational and identification conformities). Therefore, this instrument will be able to explain the authors' citing behavior and motivations in a large extent of a subject area.
The scientific knowledge contributed by industries remains ambiguous because prior studies have evaluated only specific industries, large companies, or industries collaborating with universities. We conducted a bibliometric analysis of data in the Web of Science database to explore the research partners of industries and determine which industries generate scientific articles, observing industrial trends in Taiwan from 1982 to 2011. The results showed that articles were published related to 26 industries, and the electronic components industry generated the highest percentage of articles (42.4 %), followed by the computer, electronics, and optical product industry (12.3 %). High-tech industries dominated, generating 84.5 % of the articles and demonstrating an annual increase in publications. In addition, industry researchers tended to cooperate with researchers affiliated with domestic institutions, particularly universities. Those in high-tech industries produced a higher percentage of articles coauthored with universities compared with those in low-tech industries.
In recent years, cultural heritage institutions have increasingly used social tagging. To better understand the nature of these tags, we analyzed tags assigned to a collection of 100 images of art (provided by the steve. museum project) using subject matter categorization. Our results show that the majority of tags describe the people and objects in the image and are generic in nature. This contradicts prior subject matter analyses of queries, tags, and index terms of other image collections, suggesting that the nature of social tags largely depends on the type of collection and on user needs. This insight may help cultural heritage institutions improve their management and use of tags.
Search engine users typically engage in multiquery sessions in their quest to fulfill their information needs. Despite a plethora of research findings suggesting that a significant group of users look for information within a specific geographical scope, existing reformulation studies lack a focused analysis of how users reformulate geographic queries. This study comprehensively investigates the ways in which users reformulate such needs in an attempt to fill this gap in the literature. Reformulated sessions were sampled from a query log of a major search engine to extract 2,400 entries that were manually inspected to filter geo sessions. This filter identified 471 search sessions that included geographical intent, and these sessions were analyzed quantitatively and qualitatively. The results revealed that one in five of the users who reformulated their queries were looking for geographically related information. They reformulated their queries by changing the content of the query rather than the structure. Users were not following a unified sequence of modifications and instead performed a single reformulation action. However, in some cases it was possible to anticipate their next move. A number of tasks in geo modifications were identified, including standard, multi-needs, multi-places, and hybrid approaches. The research concludes that it is important to specialize query reformulation studies to focus on particular query types rather than generically analyzing them, as it is apparent that geographic queries have their special reformulation characteristics.
This study explores the use of online newsgroups and discussion groups by people in situations of information poverty. Through a qualitative content analysis of 200 posts across Internet groups, we identify topics and information needs expressed by people who feel they have no other sources of support available to them. We uncover various health, well-being, social, and identity issues that are not only crucial to the lives of the people posting but which they are unwilling to risk revealing elsewhere-offering evidence that these online environments provide an outlet for the expression of critical and hidden information needs. To enable this analysis, we first describe our method for reliably identifying situations of information poverty in messages posted to these groups and outline our coding approach. Our work contributes to the study of both information seeking within the context of information poverty and the use of Internet groups as sources of information and support, bridging the two by exploring the manifestation of information poverty in this particular online setting.
E-patients seeking information online often seek specific advice related to coping with their health condition(s) among social networking sites. They may be looking for social connectivity with compassionate strangers who may have experienced similar situations to share opinions and experiences rather than for authoritative medical information. Previous studies document distinct technological features and different levels of social support interaction patterns. It is expected that the design of the social media functions will have an impact on the user behavior of social support exchange. In this part of a multipart study, we investigate the social support types, in particular information support types, across multiple computer-mediated communication formats (forum, journal, and notes) within an alcoholism community using descriptive content analysis on 3 months of data from a MedHelp online peer support community. We present the results of identified informational support types including advice, referral, fact, personal experiences, and opinions, either offered or requested. Fact type was exchanged most often among the messages; however, there were some different patterns between notes and journal posts. Notes were used for maintaining relationships rather than as a main source for seeking information. Notes were similar to comments made to journal posts, which may indicate the friendship between journal readers and the author. These findings suggest that users may have initially joined the MedHelpAlcoholism Community for information-seeking purposes but continue participation even after they have completed with information gathering because of the relationships they formed with community members through social media features.
Searches for specific factual health information constitute a significant part of consumer health information requests, but little is known about how users search for such information. This study attempts to fill this gap by observing users' behavior while using MedlinePlus to search for specific health information. Nineteen students participated in the study, and each performed 12 specific tasks. During the search process, they submitted short queries or complete questions, and they examined less than 1 result per search. Participants rarely reformulated queries; when they did, they tended to make a query more specific or more general, or iterate in different ways. Participants also browsed, primarily relying on the alphabetical list and the anatomical classification, to navigate to specific health topics. Participants overall had a positive experience with MedlinePlus, and the experience was significantly correlated with task difficulty and participants' spatial abilities. The results suggest that, to better support specific item search in the health domain, systems could provide a more "natural" interface to encourage users to ask questions; effective conceptual hierarchies could be implemented to help users reformulate queries; and the search results page should be reconceptualized as a place for accessing answers rather than documents. Moreover, multiple schemas should be provided to help users navigate to a health topic. The results also suggest that users' experience with information systems in general and health-related systems in particular should be evaluated in relation to contextual factors, such as task features and individual differences.
Multi-session search tasks are complex and span more than one web session. Such tasks are challenging because searchers must keep track of their search progress and the information they encounter across sessions. Multi-session tasks can be cognitively taxing for visually impaired users because the lack of persistence of screen readers causes the load on working memory to be high. In this article, we first discuss the habitual behavior of visually impaired participants for multi-session tasks when using popular search interfaces. We then present the evaluation of a search interface developed to support complex information seeking for visually impaired users. The user evaluation was structured in two sessions to simulate a multi-session task. Thus, we discuss the strategies observed among participants to resume the search, to review previously encountered information, and to satisfy their evolved information need. We also compare the information-seeking behavior across the two sessions and examine how the proposed interface supports participants for multi-session tasks. Findings from this evaluation contribute to our understanding of the information-seeking behavior of visually impaired users and have implications for the design of tools to support searchers to manage and make sense of information during multi-session search tasks.
This study examines early warning from the users' perspective as a special category of information seeking. Specifically, we look at the 2009 Victorian bushfires in Australia as an instructive case of early warning information seeking. The bushfires, the worst in Australia's recorded history, were unique in its ferocity and damage caused, but also in the amount of data and research that was generated. We analyzed the affected residents' information needs, seeking and use in terms of their cognitive, affective, and situational dimensions. We found that residents wanted information that would act as a "trigger for action," provide timely warning, and indicate clearly fire severity. Nearly two thirds of residents surveyed did not receive an official warning. Almost half first found out that the bushfire was in their area through personal observation of smoke, embers, or flames. We suggest that a form of normalcy bias may have been at work during information seeking, causing people to interpret their situations as "normal" even when disaster warnings have been issued. Although the authorities had adopted a "Stay or Go" policy to help residents use warning information to decide between staying to defend their property or leaving early, the policy's effectiveness was undermined by information challenges.
Here we present an investigation of the use of computers and mobile phones by Hispanic day laborers at Casa Latina, a community-building nonprofit organization for Latino immigrants in Seattle, Washington. Drawing from 95 structured interviews, 6 in-depth interviews, a focus group, and a series of participatory observations of computer training classes at Casa Latina, we find that information and communication technologies (ICT) help immigrant day laborers remain connected with their families and their employers and facilitate their navigation of, and integration into, the society in which they have precarious social and economic standing. ICT help immigrant day laborers maintain links with their past and their roots, offer tools to navigate their present needs, and help them build future plans and aspirations. Hispanic day laborers experience ICT mostly through mobile phones used to communicate with employers and families; they use computers and the Internet to communicate with family and friends. In addition, the experience of the immigrant day laborers is strongly influenced by their English-language proficiency-which helps them navigate daily life in the United States and communicate with employers-and their use of transportation to move around the city for work and daily life. The results of this study offer new insight into the ways in which day laborers in Seattle use ICT to help them meet personal and employment needs and realize their long-term goals.
Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can later be used to rank new query results. These training sets are costly and laborious to produce, requiring human annotators to assess the relevance or order of the documents in relation to a query. Active learning algorithms are able to reduce the labeling effort by selectively sampling an unlabeled set and choosing data instances that maximize a learning function's effectiveness. In this article, we propose a novel two-stage active learning method for L2R that combines and exploits interesting properties of its constituent parts, thus being effective and practical. In the first stage, an association rule active sampling algorithm is used to select a very small but effective initial training set. In the second stage, a query-by-committee strategy trained with the first-stage set is used to iteratively select more examples until a preset labeling budget is met or a target effectiveness is achieved. We test our method with various LETOR benchmarking data sets and compare it with several baselines to show that it achieves good results using only a small portion of the original training sets.
News portals are a popular destination for web users. News providers are therefore interested in attaining higher visitor rates and promoting greater engagement with their content. One aspect of engagement deals with keeping users on site longer by allowing them to have enhanced click-through experiences. News portals have invested in ways to embed links within news stories but so far these links have been curated by news editors. Given the manual effort involved, the use of such links is limited to a small scale. In this article, we evaluate a system-based approach that detects newsworthy events in a news article and locates other articles related to these events. Our system does not rely on resources like Wikipedia to identify events, and it was designed to be domain independent. A rigorous evaluation, using Amazon's Mechanical Turk, was performed to assess the system-embedded links against the manually-curated ones. Our findings reveal that our system's performance is comparable with that of professional editors, and that users find the automatically generated highlights interesting and the associated articles worthy of reading. Our evaluation also provides quantitative and qualitative insights into the curation of links, from the perspective of users and professional editors.
The success of an enterprise information retrieval system is determined by interactions among three key entities: the search engine employed; the service provider who delivers, modifies, and maintains the engine; and the users of the service within the organization. Evaluations of an enterprise search have predominately focused on the effectiveness and efficiency of the engine, with very little analysis of user involvement in the process, and none on the role of service providers. We propose and evaluate a model of costs and benefits to a service provider when investing in enhancements to the ranking of documents returned by their search engine. We demonstrate the model through a case study to analyze the potential impact of using domain experts to provide enhanced mediated search results. By demonstrating how to quantify the cost and benefit of an improved information retrieval system to the service provider, our case study shows that using the relevance assessments of domain experts to rerank original search results can significantly improve the accuracy of ranked lists. Moreover, the service provider gains substantial return on investment and a higher search success rate by investing in the relevance assessments of domain experts. Our cost and benefit analysis results are contrasted with standard modes of effectiveness analysis, including quantitative (using measures such as precision) and qualitative (through user preference surveys) approaches. Modeling costs and benefits explicitly can provide useful insights that the other approaches do not convey.
Using the referencing patterns in articles in Cognitive Science over three decades, we analyze the knowledge base of this literature in terms of its changing disciplinary composition. Three periods are distinguished: (A) construction of the interdisciplinary space in the 1980s, (B) development of an interdisciplinary orientation in the 1990s, and (C) reintegration into "cognitive psychology" in the 2000s. The fluidity and fuzziness of the interdisciplinary delineations in the different visualizations can be reduced and clarified using factor analysis. We also explore newly available routines ("CorText") to analyze this development in terms of "tubes" using an alluvial map and compare the results with an animation (using "Visone"). The historical specificity of this development can be compared with the development of "artificial intelligence" into an integrated specialty during this same period. Interdisciplinarity should be defined differently at the level of journals and of specialties.
Almost any conceivable authorship attribution problem can be reduced to one fundamental problem: whether a pair of (possibly short) documents were written by the same author. In this article, we offer an (almost) unsupervised method for solving this problem with surprisingly high accuracy. The main idea is to use repeated feature subsampling methods to determine if one document of the pair allows us to select the other from among a background set of "impostors" in a sufficiently robust manner.
The authors sought to systematically examine patterns of change over time, as well as current variations, in educational journal specifications and publishing, including higher education. The 100 journals in this sample included research and practitioner venues across a range of subdomains and specializations. The authors gathered data for 3 points in time, at 10-year intervals during a 20-year period, from 1989-2009. They examined the following in detail: journal profiles for publishing characteristics, submission specifications for manuscript parameters, and full-text published manuscripts for compliance with specifications. All were analyzed as point-in-time comparisons as well as trajectories of change over time. Journal profiles demonstrated patterns of increased centralization and digitization, inclusiveness of readership, increased frequency of issues, and increased length of articles. Key findings for manuscript parameter trends included increased specificity of detail, range of manuscript types and research designs accepted, and average manuscript length. In addition, journals have more explicitly specified elements previously left implicit, such as professional and ethical standards. Criteria for submission procedures and manuscript quality are consistent with university faculty performance standards for productivity and technological advancement. Findings carry implications for publishing, journal management, faculty work, and performance evaluation.
Citations measure an aspect of scientific quality: the impact of publications (A.F.J. van Raan, 1996). Percentiles normalize the impact of papers with respect to their publication year and field without using the arithmetic average. They are suitable for visualizing the performance of a single scientist. Beam plots make it possible to present the distributions of percentiles in the different publication years combined with the medians from these percentiles within each year and across all years.
The notions that information seeking is not always a solitary activity and that people working in collaboration for information intensive tasks should be studied and supported have become more prevalent in recent years. Several new research questions, methodologies, and systems have emerged around these notions that may prove to be useful beyond the field of collaborative information seeking (CIS), with relevance to the broader area of information seeking and behavior. This article provides an overview of such key research work from a variety of domains, including library and information science, computer-supported cooperative work, human-computer interaction, and information retrieval. It starts with explanations of collaboration and how CIS fits in different contexts, emphasizing the interactive, intentional, and mutually beneficial nature of CIS activities. Relations to similar and related fields such as collaborative information retrieval, collaborative information behavior, and collaborative filtering are also clarified. Next, the article presents a synthesis of various frameworks and models that exist in the field today, along with a new synthesis of 12 different dimensions of group activities. A discussion on issues and approaches relating to evaluating various parameters in CIS follows. Finally, a list of known issues and challenges is presented to provide an overview of research opportunities in this field.
Open access (OA) is free, unrestricted access to electronic versions of scholarly publications. For peer-reviewed journal articles, there are two main routes to OA: publishing in OA journals (gold OA) or archiving of article copies or manuscripts at other web locations (green OA). This study focuses on summarizing and extending current knowledge about green OA. A synthesis of previous studies indicates that green OA coverage of all published journal articles is approximately 12%, with substantial disciplinary variation. Typically, green OA copies become available after considerable time delays, partly caused by publisher-imposed embargo periods, and partly by author tendencies to archive manuscripts only periodically. Although green OA copies should ideally be archived in proper repositories, a large share is stored on home pages and similar locations, with no assurance of long-term preservation. Often such locations contain exact copies of published articles, which may infringe on the publisher's exclusive rights. The technical foundation for green OA uploading is becoming increasingly solid largely due to the rapid increase in the number of institutional repositories. The number of articles within the scope of OA mandates, which strongly influence the self-archival rate of articles, is nevertheless still low.
This paper examines how scientists working in government agencies in the U.S. are reacting to the "ethos of sharing" government-generated data. For scientists to leverage the value of existing government data sets, critical data sets must be identified and made as widely available as possible. However, government data sets can only be leveraged when policy makers first assess the value of data, in much the same way they decide the value of grants for research outside government. We argue that legislators should also remove structural barriers to interoperability by funding technical infrastructure according to issue clusters rather than administrative programs. As developers attempt to make government data more accessible through portals, they should consider a range of other nontechnical constraints attached to the data. We find that agencies react to the large number of constraints by mostly posting their data on their own websites only rather than in data portals that can facilitate sharing. Despite the nontechnical constraints, we find that scientists working in government agencies exercise some autonomy in data decisions, such as data documentation, which determine whether or not the data can be widely shared. Fortunately, scientists indicate a willingness to share the data they collect or maintain. However, we argue further that a complete measure of access should also consider the normative decisions to collect ( or not) particular data.
In the wake of the global financial crisis, the U.S. Dodd-Frank Wall Street Reform and Consumer Protection Act (Dodd-Frank) was enacted to provide increased transparency in financial markets. In response to Dodd-Frank, a series of rules relating to swaps record keeping have been issued, and one such rule calls for the creation of a financial products classification system. The manner in which financial products are classified will have a profound effect on data integration and analysis in the financial industry. This article considers various approaches that can be taken when classifying financial products and recommends the use of facet analysis. The article argues that this type of analysis is flexible enough to accommodate multiple viewpoints and rigorous enough to facilitate inferences that are based on the hierarchical structure. Various use cases are examined that pertain to the organization of financial products. The use cases confirm the practical utility of taxonomies that are designed according to faceted principles.
Using citation data of articles written by some Nobel Prize winners in physics, we show that concave, convex, and straight curves represent different types of interactions between old ideas and new insights. These cases illustrate different diffusion characteristics of academic knowledge, depending on the nature of the knowledge in the new publications. This work adds to the study of the development of science and links this development to citation analysis.
Understanding book-locating behavior in libraries is important and leads to more effective services that support patrons throughout the book-locating process. This study adopted a design-based approach to incorporate robotic assistance in investigating the book-locating behaviors of child patrons, and developed a service robot for child patrons in library settings. We describe the iterative cycles and process to develop a robot to assist with locating resources in libraries. Stakeholders, including child patrons and librarians, were consulted about their needs, preferences, and performance in locating library resources with robotic assistance. Their needs were analyzed and incorporated into the design of the library robot to provide comprehensive support. The results of the study suggest that the library robot was effective as a mobile and humanoid service agent for providing motivation and knowledgeable guidance to help child patrons in the initially complicated sequence of locating resources.
Readership popularity has been an important proxy for success for many emerging online interactive media sites. Given the exponential growth of new web applications and services, and the intense competition among them, attaining and retaining popularity are difficult. One possible approach to this problem is to enhance the competitiveness of a web presence by using appropriate web design mechanisms. Research in this area has mainly focused on technological issues and usability studies. Few studies have paid attention to the socially constructed meaning ( cognition) that is embedded in the context of media. Drawing on the heuristic model from the social cognition perspective, the author posits that source credibility and content freshness are two important media-embedded heuristics potentially influential to the readership popularity of online interactive media sites. The content analysis results of 100 leading web-logs strongly supported this hypotheses. Key quantitative findings were also consistent with the qualitative evidence provided by 28 expert practitioners. This study expands our understanding of the related constructs from a social cognition perspective and calls for design attention to influential media-embedded traits when developing online interactive media and related websites.
Democracy is represented on web interface design (Li, 2010). Wittfogel's (1957) Eastern autocracy states that 2 environmental dimensions, rainfall and sea border, influence the origin of democracy. This study examined Wittfogel's Eastern autocracy theory through statistical analysis of average annual precipitation, land boundaries, latitudes, and annual temperature of 196 countries and territories with their freedom levels defined by Freedom House, to find out the correlations between these geospatial factors and democracy. In addition, this study extended its investigation to web interface design by examining democracy represented on college/university websites in correlations with these geospatial factors. A total of 130 college/university websites selected from 65 countries were coded and examined systematically in linear and multiple regression analyses. This study concluded that democracy correlates positively with annual precipitation and latitude, but negatively with land boundaries and annual temperature. Furthermore, this study indicated that these 4 geospatial variables associate with democracy represented on web interface design, although the associations are not statistically significant. This study also suggested that it is more accurate to predict democracy if the 4 geospatial factors are considered together as dependent variables. By examining Wittfogel's theory of hydraulic civilization on web interface design, this study not only extended its sociological perspective to the information science arena, but also provided a better understanding of the functionality of the Internet in information dissemination and its cultural and sociological aspects.
Portfolio analysis of the publication profile of a unit of interest, ranging from individuals and organizations to a scientific field or interdisciplinary programs, aims to inform analysts and decision makers about the position of the unit, where it has been, and where it may go in a complex adaptive environment. A portfolio analysis may aim to identify the gap between the current position of an organization and a goal that it intends to achieve or identify competencies of multiple institutions. We introduce a new visual analytic method for analyzing, comparing, and contrasting characteristics of publication portfolios. The new method introduces a novel design of dual-map thematic overlays on global maps of science. Each publication portfolio can be added as one layer of dual-map overlays over 2 related, but distinct, global maps of science: one for citing journals and the other for cited journals. We demonstrate how the new design facilitates a portfolio analysis in terms of patterns emerging from the distributions of citation threads and the dynamics of trajectories as a function of space and time. We first demonstrate the analysis of portfolios defined on a single source article. Then we contrast publication portfolios of multiple comparable units of interest; namely, colleges in universities and corporate research organizations. We also include examples of overlays of scientific fields. We expect that our method will provide new insights to portfolio analysis.
Bioinformatics is a fast-growing field based on the optimal use of "big data" gathered in genomic, proteomics, and functional genomics research. In this paper, we conduct a comprehensive and in-depth bibliometric analysis of the field of bioinformatics by extracting citation data from PubMed Central full-text. Citation data for the period 2000 to 2011, comprising 20,869 papers with 546,245 citations, was used to evaluate the productivity and influence of this emerging field. Four measures were used to identify productivity; most productive authors, most productive countries, most productive organizations, and most popular subject terms. Research impact was analyzed based on the measures of most cited papers, most cited authors, emerging stars, and leading organizations. Results show the overall trends between the periods 2000 to 2003 and 2004 to 2007 were dissimilar, while trends between the periods 2004 to 2007 and 2008 to 2011 were similar. In addition, the field of bioinformatics has undergone a significant shift, co-evolving with other biomedical disciplines.
Some 1,857 highly cited reviews, namely those cited at least 1,000 times since publication to 2011, were identified using the data hosted on the Science Citation Index Expanded (TM) ae database (Thomson Reuters, New York, NY) between 1899 and 2011. The data are disaggregated by publication date, citation counts, journals, Web of Science (R) (Thomson Reuters) subject areas, citation life cycles, and publications by Nobel Prize winners. Six indicators, total publications, independent publications, collaborative publications, first-author publications, corresponding-author publications, and single-author publications were applied to evaluate publication of institutions and countries. Among the highly cited reviews, 33% were single-author, 61% were singleinstitution, and 83% were single-country reviews. The United States ranked top for all 6 indicators. The G7 (United States, United Kingdom, Germany, Canada, France, Japan, and Italy) countries were the site of almost all the highly cited reviews. The top 12 most productive institutions were all located in the United States with Harvard University (Cambridge, MA) the leader. The top 3 most productive journals were Chemical Reviews, Nature, and the Annual Review of Biochemistry. In addition, the impact of the reviews was analyzed by total citations from publication to 2011, citations in 2011, and citation in publication year.
The study of interhuman communication requires amore complex framework than Claude E. Shannon's (1948) mathematical theory of communication because "information" is defined in the latter case as meaningless uncertainty. Assuming that meaning cannot be communicated, we extend Shannon's theory by defining mutual redundancy as a positional counterpart of the relational communication of information. Mutual redundancy indicates the surplus of meanings that can be provided to the exchanges in reflexive communications. The information is redundant because it is based on "pure sets" (i.e., without subtraction of mutual information in the overlaps). We show that in the three-dimensional case (e. g., of a triple helix of university-industry-government relations), mutual redundancy is equal to mutual information (R-xyz = T-xyz); but when the dimensionality is even, the sign is different. We generalize to the measurement in N dimensions and proceed to the interpretation. Using Niklas Luhmann's (1984-1995) social systems theory and/or Anthony Giddens's (1979, 1984) structuration theory, mutual redundancy can be provided with an interpretation in the sociological case: Different meaning-processing structures code and decode with other algorithms. A surplus of ("absent") options can then be generated that add to the redundancy. Luhmann's "functional (sub) systems" of expectations or Giddens's "ruleresource sets" are positioned mutually, but coupled operationally in events or "instantiated" in actions. Shannon-type information is generated by the mediation, but the "structures" are (re-)positioned toward one another as sets of (potentially counterfactual) expectations. The structural differences among the coding and decoding algorithms provide a source of additional options in reflexive and anticipatory communications.
Text mining and machine learning methodologies have been applied toward knowledge discovery in several domains, such as biomedicine and business. Interestingly, in the business domain, the text mining and machine learning community has minimally explored company annual reports with their mandatory disclosures. In this study, we explore the question "How can annual reports be used to predict change in company performance from one year to the next?" from a text mining perspective. Our article contributes a systematic study of the potential of company mandatory disclosures using a computational viewpoint in the following aspects: (a) We characterize our research problem along distinct dimensions to gain a reasonably comprehensive understanding of the capacity of supervised learning methods in predicting change in company performance using annual reports, and (b) our findings from unbiased systematic experiments provide further evidence about the economic incentives faced by analysts in their stock recommendations and speculations on analysts having access to more information in producing earnings forecast.
Until now, most of the methods published for polarity classification in Twitter have used a supervised approach. The differences between them are only the features selected and the method used for weighting them. In this article, we present an unsupervised method for polarity classification in Twitter. The method is based on the expansion of the concepts expressed in the tweets through the application of PageRank to WordNet. In addition, we integrate SentiWordNet to compute the final value of polarity. The synsets values are weighted with the PageRank scores obtained in the previous random walk process over WordNet. The results obtained show that disambiguation and expansion are good strategies for improving overall performance.
The h-index, as originally proposed (Hirsch, 2005), is a purely heuristic construction. Burrell (2013) showed that efforts to derive formulae from the mathematical framework of Lotkaian informetrics could lead to misleading results. On this note, we argue that a simple heuristic "thermodynamical" model can enable a better three-dimensional (3D) evaluation of the information production process leading to what we call the zynergy-index.
Google Scholar has been well received by the research community. Its promises of free, universal, and easy access to scientific literature coupled with the perception that it covers the social sciences and the humanities better than other traditional multidisciplinary databases have contributed to the quick expansion of Google Scholar Citations and Google Scholar Metrics: 2 new bibliometric products that offer citation data at the individual level and at journal level. In this article, we show the results of an experiment undertaken to analyze Google Scholar's capacity to detect citation-counting manipulation. For this, we uploaded 6 documents to an institutional web domain that were authored by a fictitious researcher and referenced all the publications of the members of the EC3 research group at the University of Granada. The detection by Google Scholar of these papers caused an outburst in the number of citations included in the Google Scholar Citations profiles of the authors. We discuss the effects of such an outburst and how it could affect the future development of such products, at both the individual level and the journal level, especially if Google Scholar persists with its lack of transparency.
Complex cognitive activities, such as analytical reasoning, problem solving, and sense making, are often performed through the mediation of interactive computational tools. Examples include visual analytics, decision support, and educational tools. Through interaction with visual representations of information at the visual interface of these tools, a joint, coordinated cognitive system is formed. This partnership results in a number of relational properties-those depending on both humans and tools-that researchers and designers must be aware of if such tools are to effectively support the performance of complex cognitive activities. This article presents 10 properties of interactive visual representations that are essential and relational and whose values can be adjusted through interaction. By adjusting the values of these properties, better coordination between humans and tools can be effected, leading to higher quality performance of complex cognitive activities. This article examines how the values of these properties affect cognitive processing and visual reasoning and demonstrates the necessity of making their values adjustable-all of which is situated within a broader theoretical framework concerned with human-information interaction in complex cognitive activities. This framework can facilitate systematic research, design, and evaluation in numerous fields including information visualization, health informatics, visual analytics, and educational technology.
This study investigates the retrieval effectiveness of collaborative tags and author keywords in different environments through controlled experiments. Three test collections were built. The first collection tests the impact of tags on retrieval performance when only the title and abstract are available (the abstract environment). The second tests the impact of tags when the full text is available (the full-text environment). The third compares the retrieval effectiveness of tags and author keywords in the abstract environment. In addition, both single-word queries and phrase queries are tested to understand the impact of different query types. Our findings suggest that including tags and author keywords in indexes can enhance recall but may improve or worsen average precision depending on retrieval environments and query types. Indexing tags and author keywords for searching using phrase queries in the abstract environment showed improved average precision, whereas indexing tags for searching using single-word queries in the full-text environment led to a significant drop in average precision. The comparison between tags and author keywords in the abstract environment indicates that they have comparable impact on average precision, but author keywords are more advantageous in enhancing recall. The findings from this study provide useful implications for designing retrieval systems that incorporate tags and author keywords.
Several researchers have studied serendipitous knowledge discovery in information-seeking behavior. Electronic data in the form of semantic predications have a potential role in literature-based discovery, which can be guided by serendipitous knowledge discovery research findings. We sought to model information-seeking behavior within the context of serendipitous knowledge discovery by leveraging existing research. These efforts were done with an eye for a potential literature-based discovery application that utilizes semantic predications. We performed a literature search, reviewed the results, and applied the findings in developing a model for serendipitous knowledge discovery as an information-seeking behavior. The literature review indicated four important themes in serendipitous knowledge discovery: iteration, change or clarification, a seeker's prior knowledge, and the role of information organization and presentation. The Interaction Flow in Serendipitous Knowledge Discovery (IF-SKD) model includes these themes, and accommodates iterative, evolving search interests. Output can be presented in a manner to enhance short-term memory conceptualization and connections with prior knowledge. Although the IF-SKD model is currently a theoretical structure, its utility is demonstrated through replicating a literature-based discovery event, using a documented search method within the model's steps. The IF-SKD model can potentially serve as the foundation for future literature-based discovery applications.
Twitter is a social network in which people publish publicly accessible brief, instant messages. With its exponential growth and the public nature and transversality of its contents, more researchers are using Twitter as a source of data for multiple purposes. In this context, the ability to retrieve those messages (tweets) related to a certain topic becomes critical. In this work, we define the topic-related tweet retrieval task and propose a dynamic, graph-based method with which to address it. We have applied our method to capture a data set containing tweets related to the participation of the Spanish team in the Euro 2012 soccer competition, measuring the precision and recall against other simple but commonly used approaches. The results demonstrate the effectiveness of our method, which significantly increases coverage of the chosen topic and is able to capture related but unknown a priori subtopics.
Online social networks (OSNs) have been built as platforms for information sharing, with their concomitant potential for misuse of information and unsafe sharing practices. The frontline of defense against such threats is the "privacy settings" controls provided by OSNs such as Facebook. However, the efficacy of these settings is often undermined by their poor design. The current design fatigues users with information overload and fails to provide them with a more integrative and global understanding of their information-sharing practices. In this article, we develop a more efficacious design for the display of OSNs' privacy settings by following recommendations for appropriate use of visualization techniques. The new "wheel" interface simplifies the presentation of privacy settings to reduce information overload. It also incorporates an additional layer of information, indicating the safety of users' settings. A within-subject experiment with 67 students suggests that this interface is more versatile than the current tabular interfaces typically used on OSNs. More important, it allows users to easily comprehend complex information and provides them with a more integrative, higher level understanding of their privacy settings. This research focuses on an important niche at the intersection of information representation, interface design, and OSN privacy.
Recent research has involved identifying communities in networks. Traditional methods of community detection usually assume that the network's structural information is fully known, which is not the case in many practical networks. Moreover, most previous community detection algorithms do not differentiate multiple relationships between objects or persons in the real world. In this article, we propose a new approach that utilizes social interaction data (e.g., users' posts on Facebook) to address the community detection problem in Facebook and to find the multiple social groups of a Facebook user. Some advantages to our approach are (a) it does not depend on structural information, (b) it differentiates the various relationships that exist among friends, and (c) it can discover a target user's multiple communities. In the experiment, we detect the community distribution of Facebook users using the proposed method. The experiment shows that our method can achieve the result of having the average scores of Total-Community-Purity and Total-Cluster-Purity both at approximately 0.8.
This study explored differences between genders regarding adolescents' behavioral characteristics and moral judgment in the Internet environment. A questionnaire was administered to 1,048 students in the 7th to 11th grades in six different schools, one class in each grade. The questionnaire included personal data, characteristics of Internet interaction patterns, moral dilemmas in daily life, and moral dilemmas in the virtual environment. No significant differences were found between the genders regarding the age usage of the Internet began, Internet experience, and average daily hours of Internet use. We found that boys prefer, more than girls, to surf at school and in Internet cafes. Girls tend to use the Internet more for doing homework and blogs than boys, whereas boys tend to play Internet games more than girls. Gender differences were found regarding immoral behavior. Boys were involved more frequently than girls in behaviors such as cyberbullying, plagiarism, impersonation, and downloading music and movies illegally from the Internet. A correlation was found between gender and moral judgment. Although both boys and girls made relatively little "humane judgment" in the Internet environment, girls tended to make "humane judgment" more frequently than boys. In the Internet environment, boys tended to make "absence of judgment" evaluations more than girls. Girls tended, relatively more, toward "normative judgment" that reflects adherence to peer-group conventions with minimal reflexivity.
A model based on a set of bibliometric indicators is proposed for the prediction of the ranking of applicants to an academic position as produced by a committee of peers. The results show that a very small number of indicators may lead to a robust prediction of about 75% of the cases. We start with 12 indicators to build a few composite indicators by factor analysis. Following a discrete choice model, we arrive at 3 comparatively good predicative models. We conclude that these models have a surprisingly good predictive power and may help peers in their selection process.
This study examines smartphone adoption behavior among American college students by combining all components of innovation diffusion theory (IDT), the technology acceptance model (TAM), the value-based adoption model (VAM), and the social influence (SI) model. Data indicate that the smartphone adoption rates are beyond the early majority and are now approaching the late majority. The findings of analysis of variance tests revealed that all variables of TAM, VAM, and SI varied across the adopter groups: The current adopter's mean values of the variables were the highest, followed by those of potential and nonadoption groups. Multinomial logistic regression (MLR) analyses revealed that perceived value and affiliation mainly determine the different perceptions of adoption groups. Smartphone adoption, however, was relatively unaffected by perceived ease of use and perceived usefulness. Perceived popularity, perceived price, and ethnicity played a role in distinctive determinants between current adopters and nonadopters. The results imply that adopters perceive smartphones as not only a worthwhile device in which to invest money but also a symbolic device to signal their affiliation and timely technology adoption. Another intriguing finding is the differences of interest in contents between current adopters and nonadopters. Social interactions via social networking services, acquisition for lifestyle, information seeking, and entertainment via gaming were the main applications of interest.
This article reports on a research study investigating the use and perceptions of corporate information agencies by competitive intelligence (CI) practitioners. The corporate information agency is a corporate library or an information/knowledge center. CI practitioner refers to those business professionals, in various organizations, who are particularly committed to strategic and competitive intelligence analysis and production activities. In this study, we administered a survey to a sample of 214 CI practitioners to ascertain the extent to which they utilize, are aware of, and perceive the usefulness of corporate information agencies provided by their organizations. With 63 valid responses, we observed high degrees of use, awareness, and perceived usefulness. Multiple regression results also show significant correlations between perceived usefulness and use of the corporate information agency among the responding CI practitioners. Supported by empirical evidence, these findings provide a benchmark of knowledge regarding the value of corporate information agencies in CI practices.
Health research shows that knowing about health risks may not translate into behavior change. However, such research typically operationalizes health information acquisition with knowledge tests. Information scientists who investigate socially embedded information behaviors could help improve understanding of potential associations between information behavior-as opposed to knowledge-and health behavior formation, thus providing new opportunities to investigate the effects of health information. We examine the associations between information behavior and HIV testing intentions among young men who have sex with men (YMSM), a group with high rates of unrecognized HIV infection. We used the theory of planned behavior (TPB) to predict intentions to seek HIV testing in an online sample of 163 YMSM. Multiple regression and recursive path analysis were used to test two models: (a) the basic TPB model and (b) an adapted model that added the direct effects of three information behaviors (information exposure, use of information to make HIV-testing decisions, prior experience obtaining an HIV test) plus self-rated HIV knowledge. As hypothesized, our adapted model improved predictions, explaining more than twice as much variance as the original TPB model. The results suggest that information behaviors may be more important predictors of health behavior intentions than previously acknowledged.
An organization performing environmental scanning generally monitors or tracks various events concerning its external environment. One of the major resources for environmental scanning is online news documents, which are readily accessible on news websites or infomediaries. However, the proliferation of the World Wide Web, which increases information sources and improves information circulation, has vastly expanded the amount of information to be scanned. Thus, it is essential to develop an effective event episode discovery mechanism to organize news documents pertaining to an event of interest. In this study, we propose two new metrics, Term Frequency x Inverse Document Frequency(Tempo) (TFxIDF(Tempo)) and TFxEnhanced-IDFTempo, and develop a temporal-based event episode discovery (TEED) technique that uses the proposed metrics for feature selection and document representation. Using a traditional TFxIDF-based hierarchical agglomerative clustering technique as a performance benchmark, our empirical evaluation reveals that the proposed TEED technique outperforms its benchmark, as measured by cluster recall and cluster precision. In addition, the use of TFxEnhanced-IDFTempo significantly improves the effectiveness of event episode discovery when compared with the use of TFxIDF(Tempo).
Scientometric indicators influence the standing of journals among peers, thus affecting decisions regarding manuscript submissions, scholars' careers, and funding. Here we hypothesize that impact-factor boosting (unethical behavior documented previously in several underperforming journals) should not be considered as exceptional, but that it affects even the top-tier journals. We performed a citation analysis of documents recently published in 11 prominent general science and biomedical journals. In these journals, only 12 to 79% of what they publish was considered original research, whereas editorial materials alone constituted 11 to 44% of the total document types published. Citations to commissioned opinion articles comprised 3 to 15% of the total citations to the journals within 3 postpublication years, with even a higher share occurring during the first postpublication year. An additional 4 to 15% of the citations were received by the journals from commissioned opinion articles published in other journals. Combined, the parallel world of uncitable documents was responsible for up to 30% of the total citations to the top-tier journals, with the highest values found for medical science journals (New England Journal of Medicine, JAMA, and the Lancet) and lower values found for the Science, Nature, and Cell series journals. Self-citations to some of the top-tier journals reach values higher than the total citation counts accumulated by papers in most of the Web of Science-indexed journals. Most of the self-citations were generated by commissioned opinion articles. The parallel world of supposedly uncitable documents flourishes and severely distorts the commonly used scientometric indicators.
F1000 is a postpublication peer review service for biological and medical research. F1000 recommends important publications in the biomedical literature, and from this perspective F1000 could be an interesting tool for research evaluation. By linking the complete database of F1000 recommendations to the Web of Science bibliographic database, we are able to make a comprehensive comparison between F1000 recommendations and citations. We find that about 2% of the publications in the biomedical literature receive at least one F1000 recommendation. Recommended publications on average receive 1.30 recommendations, and more than 90% of the recommendations are given within half a year after a publication has appeared. There turns out to be a clear correlation between F1000 recommendations and citations. However, the correlation is relatively weak, at least weaker than the correlation between journal impact and citations. More research is needed to identify the main reasons for differences between recommendations and citations in assessing the impact of publications.
Previous research indicates that during the past 20 years, the highest-quality work has been published in an increasingly diverse and larger group of journals. In this article, we examine whether this diversification has also affected the handful of elite journals that are traditionally considered to be the best. We examine citation patterns during the past 40 years of seven long-standing traditionally elite journals and six journals that have been increasing in importance during the past 20 years. To be among the top 5% or 1% cited papers, papers now need about twice as many citations as they did 40 years ago. Since the late 1980s and early 1990s, elite journals have been publishing a decreasing proportion of these top-cited papers. This also applies to the two journals that are typically considered as the top venues and often used as bibliometric indicators of "excellence": Science and Nature. On the other hand, several new and established journals are publishing an increasing proportion of the most-cited papers. These changes bring new challenges and opportunities for all parties. Journals can enact policies to increase or maintain their relative position in the journal hierarchy. Researchers now have the option to publish in more diverse venues knowing that their work can still reach the same audiences. Finally, evaluators and administrators need to know that although there will always be a certain prestige associated with publishing in "elite" journals, journal hierarchies are in constant flux.
Data collected by social media platforms have been introduced as new sources for indicators to help measure the impact of scholarly research in ways that are complementary to traditional citation analysis. Data generated from social media activities can be used to reflect broad types of impact. This article aims to provide systematic evidence about how often Twitter is used to disseminate information about journal articles in the biomedical sciences. The analysis is based on 1.4 million documents covered by both PubMed and Web of Science and published between 2010 and 2012. The number of tweets containing links to these documents was analyzed and compared to citations to evaluate the degree to which certain journals, disciplines, and specialties were represented on Twitter and how far tweets correlate with citation impact. With less than 10% of PubMed articles mentioned on Twitter, its uptake is low in general but differs between journals and specialties. Correlations between tweets and citations are low, implying that impact metrics based on tweets are different from those based on citations. A framework using the coverage of articles and the correlation between Twitter mentions and citations is proposed to facilitate the evaluation of novel social-media-based metrics.
The majority of the effort in metrics research has addressed research evaluation. Far less research has addressed the unique problems of research planning. Models and maps of science that can address the detailed problems associated with research planning are needed. This article reports on the creation of an article-level model and map of science covering 16 years and nearly 20 million articles using cocitation-based techniques. The map is then used to define discipline-like structures consisting of natural groupings of articles and clusters of articles. This combination of detail and high-level structure can be used to address planning-related problems such as identification of emerging topics and the identification of which areas of science and technology are innovative and which are simply persisting. In addition to presenting the model and map, several process improvements that result in greater accuracy structures are detailed, including a bibliographic coupling approach for assigning current papers to cocitation clusters and a sequential hybrid approach to producing visual maps from models.
The Gene Ontology (GO), a scientific vocabulary widely used in molecular biology databases, is examined by an analysis of its structure, a comparison of its principles to those of traditional controlled vocabularies, and by a detailed analysis of a single concept within it. It is found that the GO deviates in some respects from its principles of ontological realism, and it is suggested the two forms of vocabulary could benefit from adopting good practice from the other.
Subject repositories are open web collections of working papers or manuscript copies of published scholarly articles, specific to particular scientific disciplines. The first repositories emerged in the early 1990s, and in some fields of science they have become an important channel for the dissemination of research results. With quite strict inclusion criteria, 56 subject repositories were identified from a much larger number indexed in 2 repository indices. A closer study of these demonstrated a huge variety in sizes, organizational models, functions, and topics. When they first started to emerge, subject repositories catered to a strong market demand, but the later development of Internet search engines, the rapid growth of institutional repositories, and the tightening of journal publisher open access policies seems to be slowing their growth.
Searches conducted on web search engines reflect the interests of users and society. Google Trends, which provides information about the queries searched by users of the Google web search engine, is a rich data source from which a wealth of information can be mined. We investigated the possibility of using web search volume data from Google Trends to predict academic fame. As queries are language-dependent, we studied universities from two countries with different languages, the United States and Spain. We found a significant correlation between the search volume of a university name and the university's academic reputation or fame. We also examined the effect of some Google Trends features, namely, limiting the search to a specific country or topic category on the search volume data. Finally, we examined the effect of university sizes on the correlations found to gain a deeper understanding of the nature of the relationships.
Academic social network sites Academia.edu and ResearchGate, and reference sharing sites Mendeley, Bibsonomy, Zotero, and CiteULike, give scholars the ability to publicize their research outputs and connect with each other. With millions of users, these are a significant addition to the scholarly communication and academic information-seeking eco-structure. There is thus a need to understand the role that they play and the changes, if any, that they can make to the dynamics of academic careers. This article investigates attributes of philosophy scholars on Academia.edu, introducing a median-based, time-normalizing method to adjust for time delays in joining the site. In comparison to students, faculty tend to attract more profile views but female philosophers did not attract more profile views than did males, suggesting that academic capital drives philosophy uses of the site more than does friendship and networking. Secondary analyses of law, history, and computer science confirmed the faculty advantage (in terms of higher profile views) except for females in law and females in computer science. There was also a female advantage for both faculty and students in law and computer science as well as for history students. Hence, Academia.edu overall seems to reflect a hybrid of scholarly norms (the faculty advantage) and a female advantage that is suggestive of general social networking norms. Finally, traditional bibliometric measures did not correlate with any Academia.edu metrics for philosophers, perhaps because more senior academics use the site less extensively or because of the range informal scholarly activities that cannot be measured by bibliometric methods.
University rankings generally present users with the problem of placing the results given for an institution in context. Only a comparison with the performance of all other institutions makes it possible to say exactly where an institution stands. In order to interpret the results of the SCImago Institutions Ranking (based on Scopus data) and the Leiden Ranking (based on Web of Science data), in this study we offer thresholds with which it is possible to assess whether an institution belongs to the top 1%, top 5%, top 10%, top 25%, or top 50% of institutions in the world. The thresholds are based on the excellence rate or PPtop 10%. Both indicators measure the proportion of an institution's publications which belong to the 10% most frequently cited publications and are the most important indicators for measuring institutional impact. For example, while an institution must achieve a value of 24.63% in the Leiden Ranking 2013 to be considered one of the top 1% of institutions worldwide, the SCImago Institutions Ranking requires 30.2%.
The S-shaped functional relation between the mean citation score and the proportion of top 10% publications for the 500 Leiden Ranking universities is explained using results of the shifted Lotka function. Also the concave or convex relation between the proportion of top 100 theta% publications, for different fractions theta, is explained using the obtained new informetric model.
The h-index provides us with 9 natural classes which can be written as a matrix of 3 vectors. The 3 vectors are: X = (X-1, X-2, X-3) and indicates publication distribution in the h-core, the h-tail, and the uncited ones, respectively; Y = (Y-1, Y-2, Y-3) denotes the citation distribution of the h-core, the h-tail and the so-called "excess" citations (above the h-threshold), respectively; and Z = (Z(1), Z(2), Z(3)) = (Y-1-X-1, Y-2-X-2, Y-3-X-3). The matrix V = (X, Y, Z) T constructs a measure of academic performance, in which the 9 numbers can all be provided with meanings in different dimensions. The "academic trace" tr(V) of this matrix follows naturally, and contributes a unique indicator for total academic achievements by summarizing and weighting the accumulation of publications and citations. This measure can also be used to combine the advantages of the h-index and the integrated impact indicator (I3) into a single number with a meaningful interpretation of the values. We illustrate the use of tr(V) for the cases of 2 journal sets, 2 universities, and ourselves as 2 individual authors.
We introduce the quantitative method named "Reference Publication Year Spectroscopy" (RPYS). With this method one can determine the historical roots of research fields and quantify their impact on current research. RPYS is based on the analysis of the frequency with which references are cited in the publications of a specific research field in terms of the publication years of these cited references. The origins show up in the form of more or less pronounced peaks mostly caused by individual publications that are cited particularly frequently. In this study, we use research on graphene and on solar cells to illustrate how RPYS functions, and what results it can deliver.
Log analysis shows that PubMed users frequently use author names in queries for retrieving scientific literature. However, author name ambiguity may lead to irrelevant retrieval results. To improve the PubMed user experience with author name queries, we designed an author name disambiguation system consisting of similarity estimation and agglomerative clustering. A machine-learning method was employed to score the features for disambiguating a pair of papers with ambiguous names. These features enable the computation of pairwise similarity scores to estimate the probability of a pair of papers belonging to the same author, which drives an agglomerative clustering algorithm regulated by 2 factors: name compatibility and probability level. With transitivity violation correction, high precision author clustering is achieved by focusing on minimizing false-positive pairing. Disambiguation performance is evaluated with manual verification of random samples of pairs from clustering results. When compared with a state-of-the-art system, our evaluation shows that among all the pairs the lumping error rate drops from 10.1% to 2.2% for our system, while the splitting error rises from 1.8% to 7.7%. This results in an overall error rate of 9.9%, compared with 11.9% for the state-of-the-art method. Other evaluations based on gold standard data also show the increase in accuracy of our clustering. We attribute the performance improvement to the machine-learning method driven by a large-scale training set and the clustering algorithm regulated by a name compatibility scheme preferring precision. With integration of the author name disambiguation system into the PubMed search engine, the overall click-through-rate of PubMed users on author name query results improved from 34.9% to 36.9%.
The use of robo-readers to analyze news texts is an emerging technology trend in computational finance. Recent research has developed sophisticated financial polarity lexicons for investigating how financial sentiments relate to future company performance. However, based on experience from fields that commonly analyze sentiment, it is well known that the overall semantic orientation of a sentence may differ from that of individual words. This article investigates how semantic orientations can be better detected in financial and economic news by accommodating the overall phrase-structure information and domain-specific use of language. Our three main contributions are the following: (a) a human-annotated finance phrase bank that can be used for training and evaluating alternative models; (b) a technique to enhance financial lexicons with attributes that help to identify expected direction of events that affect sentiment; and (c) a linearized phrase-structure model for detecting contextual semantic orientations in economic texts. The relevance of the newly added lexicon features and the benefit of using the proposed learning algorithm are demonstrated in a comparative study against general sentiment models as well as the popular word frequency models used in recent financial studies. The proposed framework is parsimonious and avoids the explosion in feature space caused by the use of conventional n-gram features.
Group-based trajectory modeling (GBTM) is applied to the citation curves of articles in six journals and to all citable items in a single field of science (virology, 24 journals) to distinguish among the developmental trajectories in subpopulations. Can citation patterns of highly-cited papers be distinguished in an early phase as "fast-breaking" papers? Can "late bloomers" or "sleeping beauties" be identified? Most interesting, we find differences between "sticky knowledge claims" that continue to be cited more than 10 years after publication and "transient knowledge claims" that show a decay pattern after reaching a peak within a few years. Only papers following the trajectory of a "sticky knowledge claim" can be expected to have a sustained impact. These findings raise questions about indicators of "excellence" that use aggregated citation rates after 2 or 3 years (e. g., impact factors). Because aggregated citation curves can also be composites of the two patterns, fifth-order polynomials (with four bending points) are needed to capture citation curves precisely. For the journals under study, the most frequently cited groups were furthermore much smaller than 10%. Although GBTM has proved a useful method for investigating differences among citation trajectories, the methodology does not allow us to define a percentage of highly cited papers inductively across different fields and journals. Using multinomial logistic regression, we conclude that predictor variables such as journal names, number of authors, etc., do not affect the stickiness of knowledge claims in terms of citations but only the levels of aggregated citations (which are field-specific).
The number of authors collaborating to write scientific articles has been increasing steadily, and with this collaboration, other factors have also changed, such as the length of articles and the number of citations. However, little is known about potential discrepancies in the use of tables and graphs between single and collaborating authors. In this article, we ask whether multiauthor articles contain more tables and graphs than single-author articles, and we studied 5,180 recent articles published in six science and social sciences journals. We found that pairs and multiple authors used significantly more tables and graphs than single authors. Such findings indicate that there is a greater emphasis on the role of tables and graphs in collaborative writing, and we discuss some of the possible causes and implications of these findings.
This article provides first-time empirical evidence that the digital age has first increased and then (only very recently) decreased global, international, and national inequalities of information and communication capacities among and within societies. Previous studies on the digital divide were unable to capture the detected trends appropriately, because they worked with proxies, such as the number of subscriptions or related investments, without considering the vast heterogeneity in informational performance among technological devices. We created a comprehensive data set (based on over 1,100 sources) that allows measuring information capacity directly, in bits per second, bits, and instructions per second. The newly proposed indicators provide insights into inequalities in access to, usage of, and impact of digitized information flows. It shows that the digital divide has gone into a second stage, which is based on a relative universalization of technological devices and a continuously evolving divide in terms of communication capacity.
With the recent interest in socially created metadata as a potentially complementary resource for image description in relation to established tools such as thesauri and other forms of controlled vocabulary, questions remain about the quality and reuse value of these metadata. This study describes and examines a set of tags using quantitative and qualitative methods and assesses relationships among categories of image tags, tag assignment order, and users' perceptions of usefulness of index terms and user-contributed tags. The study found that tags provide much descriptive information about an image but that users also value and trust controlled vocabulary terms. The study found no correlation between tag length and assignment order, and tag length and its perceived usefulness. The findings of this study can contribute to the design of controlled vocabularies, indexing processes, and retrieval systems for images. In particular, the findings of the study can advance the understanding of image tagging practices, tag facet/category distributions, relative usefulness and importance of these categories to the user, and potential mechanisms for identifying useful terms.
An index is proposed that is based on the h-index and a 3-year publication/citation window. When updated regularly, it shows the current scientific performance of researchers rather than their lifetime achievement as indicated by common scientometric indicators. In this respect, the new rating scheme resembles established sports ratings such as in chess or tennis. By the example of ACM SIGMOD E.F. Codd Innovations Award winners and Priestley Medal recipients, we illustrate how the new rating can be represented by a single number and visualized.
Technological change in the digital age is a combination of both more and better technology. This work quantifies how much of the technologically-mediated information and communication explosion during the period of digitization (1986-2007) was driven by the deployment of additional technological devices, and how much by technological progress in hardware and software. We find that technological progress has contributed between two to six times more than additional technological infrastructure. While infrastructure seems to reach a certain level of saturation at roughly 20 storage devices per capita and 2 to 3 telecommunication subscriptions per capita, informational capacities are still expanding greatly. Besides progress in better hardware, software for information compression turns out to be an important and often neglected driver of the global growth of technologically-mediated information and communication capacities.
This article estimates the first constant quality price index for Internet domain names. The suggested index provides a benchmark for domain name traders and investors looking for information on price trends, historical returns, and the fundamental risk of Internet domain names. The index increases transparency in the market for this newly emerged asset class. Acointegration analysis shows that domain registrations and resale prices form a long-run equilibrium and indicates supply constraints in domain space. This study explores a large data set of domain sales spanning the years 2006 to 2013. Differences in the quality of individual domain names are controlled for in hedonic repeat sales regressions.
This study analyzes coauthorship patterns in the social sciences and humanities (SSH) for the period 2000 to 2010. The basis for the analysis is the Flemish Academic Bibliographic Database for the Social Sciences and Humanities (VABB-SHW), a comprehensive bibliographic database of peer-reviewed publications in the SSH by researchers affiliated with Flemish universities. Combining data on journal articles and book chapters, our findings indicate that collaborative publishing in the SSH is increasing, though considerable differences between disciplines remain. Conversely, we did observe a sharp decline in single-author publishing. We further demonstrate that coauthored SSH articles in journals indexed in the Web of Science (WoS) generally have a higher (and growing) number of coauthors than do either those in non-WoS journals or book chapters. This illustrates the need to include non-WoS data and book chapters when studying coauthorship in the SSH.
In this paper we describe a study aimed at evaluating and improving the quality of online deliberation. We consider the rationales used by participants in deletion discussions on Wikipedia in terms of the literature on democratic and online deliberation and collaborative information quality. Our findings suggest that most participants in these discussions were concerned with the notability and credibility of the topics presented for deletion, and that most presented rationales rooted in established site policies. We found that factors like article topic and unanimity (or lack thereof) were among the factors that tended to affect the outcome of the debate. Our results also suggested that the blackout of the site in response to the proposed Stop Online Piracy Act (SOPA) law affected the decisions of deletion debates that occurred close to the event. We conclude by suggesting implications of this study for broader considerations of online information quality and democratic deliberation.
A systematic understanding of factors and criteria that affect consumers' selection of sources for health information is necessary for the design of effective health information services and information systems. However, current studies have overly focused on source attributes as indicators for 2 criteria, source quality and accessibility, and overlooked the role of other factors and criteria that help determine source selection. To fill this gap, guided by decision-making theories and the cognitive perspective to information search, we interviewed 30 participants about their reasons for using a wide range of sources for health information. Additionally, we asked each of them to report a critical incident in which sources were selected to fulfill a specific information need. Based on the analysis of the transcripts, 5 categories of factors were identified as influential to source selection: source-related factors, user-related factors, user-source relationships, characteristics of the problematic situation, and social influences. In addition, about a dozen criteria that mediate the influence of the factors on source-selection decisions were identified, including accessibility, quality, usability, interactivity, relevance, usefulness, familiarity, affection, anonymity, and appropriateness. These results significantly expanded the current understanding of the nature of costs and benefits involved in source-selection decisions, and strongly indicated that a personalized approach is needed for information services and information systems to provide effective access to health information sources for consumers.
Understanding specific patterns or knowledge of self-disclosing health information could support public health surveillance and healthcare. This study aimed to develop an analytical framework to identify self-disclosing health information with unusual messages on web forums by leveraging advanced text-mining techniques. To demonstrate the performance of the proposed analytical framework, we conducted an experimental study on 2 major human immunodeficiency virus (HIV)/acquired immune deficiency syndrome (AIDS) forums in Taiwan. The experimental results show that the classification accuracy increased significantly (up to 83.83%) when using features selected by the information gain technique. The results also show the importance of adopting domain-specific features in analyzing unusual messages on web forums. This study has practical implications for the prevention and support of HIV/AIDS healthcare. For example, public health agencies can re-allocate resources and deliver services to people who need help via social media sites. In addition, individuals can also join a social media site to get better suggestions and support from each other.
This paper presents an evaluation study of the navigation effectiveness of a multifaceted organizational taxonomy that was built on the Dewey Decimal Classification and several domain thesauri in the area of library and information science education. The objective of the evaluation was to detect deficiencies in the taxonomy and to infer problems of applied construction steps from users' navigation difficulties. The evaluation approach included scenario-based navigation exercises and postexercise interviews. Navigation exercise errors and underlying reasons were analyzed in relation to specific components of the taxonomy and applied construction steps. Guidelines for the construction of the hierarchical structure and categories of an organizational taxonomy using existing general classification schemes and domain thesauri were derived from the evaluation results.
The scientific metadata model proposed in this article encompasses both classical descriptive metadata such as those defined in the Dublin Core Metadata Element Set (DC) and the innovative structural and referential metadata properties that go beyond the classical model. Structural metadata capture the structural vocabulary in research publications; referential metadata include not only citations but also data about other types of scholarly output that is based on or related to the same publication. The article describes the structural, descriptive, and referential (SDR) elements of the metadata model and explains the underlying assumptions and justifications for each major component in the model. ScholarWiki, an experimental system developed as a proof of concept, was built over the wiki platform to allow user interaction with the metadata and the editing, deleting, and adding of metadata. By allowing and encouraging scholars (both as authors and as users) to participate in the knowledge and metadata editing and enhancing process, the larger community will benefit from more accurate and effective information retrieval. The ScholarWiki system utilizes machine-learning techniques that can automatically produce self-enhanced metadata by learning from the structural metadata that scholars contribute, which will add intelligence to enhance and update automatically the publication of metadata Wiki pages.
Most measures of networks are based on the nodes, although links are also elementary units in networks and represent interesting social or physical connections. In this work we suggest an option for exploring networks, called the h-strength, with explicit focus on links and their strengths. The h-strength and its extensions can naturally simplify a complex network to a small and concise subnetwork (h-subnet) but retains the most important links with its core structure. Its applications in 2 typical information networks, the paper cocitation network of a topic (the h-index) and 5 scientific collaboration networks in the field of "water resources," suggest that h-strength and its extensions could be a useful choice for abstracting, simplifying, and visualizing a complex network. Moreover, we observe that the 2 informetric models, the Glanzel-Schubert model and the Hirsch model, roughly hold in the context of the h-strength for the collaboration networks.
This study continues a long history of author cocitation analysis (and more recently, author bibliographic coupling analysis) of the intellectual structure of information science (IS) into the time period 2006 to 2010 (IS 2006-2010). We find that web technologies continue to drive developments, especially at the research front, although perhaps more indirectly than before. A broadening of perspectives is visible in IS 2006-2010, where network science becomes influential and where full-text analysis methods complement traditional computer science influences. Research in the areas of the h-index and mapping of science appears to have been highlights of IS 2006-2011. This study tests and confirms a forecast made previously by comparing knowledge-base and research-front findings for IS 2001-2005, which expected both the information retrieval (IR) systems and webometrics specialties to shrink in 2006 to 2010. A corresponding comparison of the knowledge base and research front of IS 2006-2010 suggests a continuing decline of the IR systems specialty in the near future, but also a considerable (re) growth of the webometrics area after a period of decline from 2001 to 2005 and 2006 to 2010, with the latter due perhaps in part to its contribution to an emerging web science.
Science linkage is a widely used patent bibliometric indicator to measure patent linkage to scientific research based on the frequency of citations to scientific papers within the patent. Science linkage is also regarded as noisy because the subject of patent citation behavior varies from inventors/applicants to examiners. In order to identify and ultimately reduce this noise, we analyzed the different citing motivations of examiners and inventors/applicants. We built 4 hypotheses based upon our study of patent law, the unique economic nature of a patent, and a patent citation's market effect. To test our hypotheses, we conducted an expert survey based on our science linkage calculation in the domain of catalyst from U. S. patent data (2006-2009) over 3 types of citations: self-citation by inventor/applicant, non-self-citation by inventor/applicant, and citation by examiner. According to our results, evaluated by domain experts, we conclude that the non-self-citation by inventor/applicant is quite noisy and cannot indicate science linkage and that self-citation by inventor/applicant, although limited, is more appropriate for understanding science linkage.
Journal-based citations are an important source of data for impact indices. However, the impact of journal articles extends beyond formal scholarly discourse. Measuring online scholarly impact calls for new indices, complementary to the older ones. This article examines a possible alternative metric source, blog posts aggregated at ResearchBlogging.org, which discuss peer-reviewed articles and provide full bibliographic references. Articles reviewed in these blogs therefore receive "blog citations." We hypothesized that articles receiving blog citations close to their publication time receive more journal citations later than the articles in the same journal published in the same year that did not receive such blog citations. Statistically significant evidence for articles published in 2009 and 2010 support this hypothesis for seven of 12 journals (58%) in 2009 and 13 of 19 journals (68%) in 2010. We suggest, based on these results, that blog citations can be used as an alternative metric source.
Web search engines are important gateways for users to access health information. This study explored whether a search interface based on the Bing API and enabled by Scatter/Gather, a well-known document-clustering technique, can improve health information searches. Forty participants without medical backgrounds were randomly assigned to two interfaces: a baseline interface that resembles typical web search engines and a Scatter/Gather interface. Both groups performed two lookup and two exploratory health-related tasks. It was found that the baseline group was more likely to rephrase queries and less likely to access general-purpose sites than the Scatter/Gather group when completing exploratory tasks. Otherwise, the two groups did not differ in behavior and task performance, with participants in the Scatter/Gather group largely overlooking the features (key words, clusters, and the recluster function) designed to facilitate the exploration of semantic relationships between information objects, a potentially useful means for users in the rather unfamiliar domain of health. The results suggest a strong effect of users' mental models of search on their use of search interfaces and a high cognitive cost associated with using the Scatter/Gather features. It follows that novel features of a search interface should not only be compatible with users' mental models but also provide sufficient affordance to inform users of how they can be used. Compared with the interface, tasks showed more significant impacts on search behavior. In future studies, more effort should be devoted to identify salient features of health-related information needs.
This study explores the effect of interaction in research on knowledge creation (KC) and its dependence on the conceptualization of a human being. A framework for understanding KC with hermeneutic phenomenology is developed, based on an analysis of recent KC research and key texts on hermeneutic phenomenology. The results obtained indicate that recent KC research still emphasizes the concept of knowledge as an asset inside the human mind, although the interest is in knowing and interpersonal relationships in working communities. Exploration of the use of the effect of interaction in research on KC shows that successful interaction is connected to the ideas of openness, critical thinking, and awareness of past experiences. These elements reflect the general ideas of the hermeneutic tradition without taking into account the historical roots of hermeneutics or questioning the concept of a human being behind them. It is concluded that the hermeneutic circle and phenomenological conceptualization of a human being provide a better defined and more coherent structure for understanding the event of KC as a future-oriented, conscious act of interaction. The framework developed offers three fundamental areas for exploration: structure of the interactive event, construction of the human experience in interaction, and modes of being in interaction.
Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).
An altogether different view on the properties of a good performance measure than that given in Egghe (2012) is offered. Egghe argued that a good impact measure should reward nonconsistency; that is, the more citations over papers are unequally distributed, the higher the impact should be. Here, a quantitative proxy for consistency is offered, and it is shown that as consistency increases, the ideal performance measure, which is sensitive to changes in consistency, should increase, reflecting this virtue.
Recently, Harzing's Publish or Perish software was updated to include Microsoft Academic Search as a second citation database search option for computing various citation-based metrics. This article explores the new search option by scoring 50 top economics and finance journals and comparing them with the results obtained using the original Google Scholar-based search option. The new database delivers significantly smaller scores for all metrics, but the rank correlations across the two databases for the h-index, g-index, AWCR, and e-index are significantly correlated, especially when the time frame is restricted to more recent years. Comparisons are also made to the Article Influence score from eigenfactor.org and to the RePEc h-index, both of which adjust for journal-level self-citations.
This article provides a technical review of semantic search methods used to support text-based search over formal Semantic Web knowledge bases. Our focus is on ranking methods and auxiliary processes explored by existing semantic search systems, outlined within broad areas of classification. We present reflective examples from the literature in some detail, which should appeal to readers interested in a deeper perspective on the various methods and systems implemented in the outlined literature. The presentation covers graph exploration and propagation methods, adaptations of classic probabilistic retrieval models, and query-independent link analysis via flexible extensions to the PageRank algorithm. Future research directions are discussed, including development of more cohesive retrieval models to unlock further potentials and uses, data indexing schemes, integration with user interfaces, and building community consensus for more systematic evaluation and gradual development.
Previous studies have shown that users' cognitive styles play an important role during web searching. However, only a limited number of studies have showed the relationship between cognitive styles and web search behavior. Most importantly, it is not clear which components of web search behavior are influenced by cognitive styles. This article examines the relationships between users' cognitive styles and their web searching and develops a model that portrays the relationship. The study uses qualitative and quantitative analyses based on data gathered from 50 participants. A questionnaire was utilized to collect participants' demographic information, and Riding's (1991) Cognitive Styles Analysis (CSA) test to assess their cognitive styles. Results show that users' cognitive styles influenced their information-searching strategies, query reformulation behavior, web navigational styles, and information-processing approaches. The user model developed in this study depicts the fundamental relationships between users' web search behavior and their cognitive styles. Modeling web search behavior with a greater understanding of users' cognitive styles can help information science researchers and information systems designers to bridge the semantic gap between the user and the systems. Implications of the research for theory and practice, and future work, are discussed.
This article calls for a conceptual and empirical research agenda on ways in which policymakers and researchers can aggregate socioeconomic information shared by diverse communities without losing contextual information that is important for extracting meaning from the data. We describe the knowledge loss that occurs when information is aggregated across diverse ontologies into databases or archives relying on a single schema and use a series of illustrative examples demonstrate the significance of this information loss for policy design and implementation. While there are important gains from information aggregation across ontologies, the potential trade-offs involved in creating large-scale databases are significant. The differences between locally constituted ways of knowing and the organizing ontology used for larger scale databases affects the extent to which these collections, or "knowledge banks," provide accurate guidance for policy and action. The article draws on insights from information science and social science to discuss two classes of socio-technical approaches for overcoming information loss at the interface between ontologies: first, technology-enabled efforts to soften ontological interfaces by making data open, unconstructed, and available and/or creating ontologies collaboratively and, second, organizational changes that reduce the need for information to cross interfaces, such as reconstructing knowledge platforms to be more interactive, thereby decentralizing decision-making. The framing of the challenges involved in building large-scale knowledge banks as a matter of ontology mismatch creates an opportunity for an interdisciplinary and analytically integrated research agenda to implement and test these potential approaches.
While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n-grams in document representation instead of unigrams. However, the majority of early works on n-grams obtained only modest performance improvement. On the other hand, the use of information based on supporting terms or "contexts" of queries has been found to be promising. In particular, recent studies showed that using new context-dependent term weights improved the performance of relevance feedback (RF) retrieval compared with using traditional bag-of-words BM25 term weights. Calculation of the new term weights requires an estimation of the local probability of relevance of each query term occurrence. In previous studies, the estimation of this probability was based on unigrams that occur in the neighborhood of a query term. We explore an integration of the n-gram and context approaches by computing context-dependent term weights based on a mixture of unigrams and bigrams. Extensive experiments are performed using the title queries of the Text Retrieval Conference (TREC)-6, TREC-7, TREC-8, and TREC-2005 collections, for RF with relevance judgment of either the top 10 or top 20 documents of an initial retrieval. We identify some crucial elements needed in the use of bigrams in our methods, such as proper inverse document frequency (IDF) weighting of the bigrams and noise reduction by pruning bigrams with large document frequency values. We show that enhancing context-dependent term weights with bigrams is effective in further improving retrieval performance.
This article offers a comparative analysis of the personal profiling capabilities of the two most important free citation-based academic search engines, namely, Microsoft Academic Search (MAS) and Google Scholar Citations (GSC). Author profiles can be useful for evaluation purposes once the advantages and the shortcomings of these services are described and taken into consideration. In total, 771 personal profiles appearing in both the MAS and the GSC databases were analyzed. Results show that the GSC profiles include more documents and citations than those in MAS but with a strong bias toward the information and computing sciences, whereas the MAS profiles are disciplinarily better balanced. MAS shows technical problems such as a higher number of duplicated profiles and a lower updating rate than GSC. It is concluded that both services could be used for evaluation proposes only if they are applied along with other citation indices as a way to supplement that information.
Since its creation in 1991, arXiv has become central to the diffusion of research in a number of fields. Combining data from the entirety of arXiv and the Web of Science (WoS), this article investigates (a) the proportion of papers across all disciplines that are on arXiv and the proportion of arXiv papers that are in the WoS, (b) the elapsed time between arXiv submission and journal publication, and (c) the aging characteristics and scientific impact of arXiv e-prints and their published version. It shows that the proportion of WoS papers found on arXiv varies across the specialties of physics and mathematics, and that only a few specialties make extensive use of the repository. Elapsed time between arXiv submission and journal publication has shortened but remains longer in mathematics than in physics. In physics, mathematics, as well as in astronomy and astrophysics, arXiv versions are cited more promptly and decay faster than WoS papers. The arXiv versions of papers-both published and unpublished-have lower citation rates than published papers, although there is almost no difference in the impact of the arXiv versions of published and unpublished papers.
With the rapid growth of Web 2.0, community question answering (CQA) has become a prevalent information seeking channel, in which users form interactive communities by posting questions and providing answers. Communities may evolve over time, because of changes in users' interests, activities, and new users joining the network. To better understand user interactions in CQA communities, it is necessary to analyze the community structures and track community evolution over time. Existing work in CQA focuses on question searching or content quality detection, and the important problems of community extraction and evolutionary pattern detection have not been studied. In this article, we propose a probabilistic community model (PCM) to extract overlapping community structures and capture their evolution patterns in CQA. The empirical results show that our algorithm appears to improve the community extraction quality. We show empirically, using the iPhone data set, that interesting community evolution patterns can be discovered, with each evolution pattern reflecting the variation of users' interests over time. Our analysis suggests that individual users could benefit to gain comprehensive information from tracking the transition of products. We also show that the communities provide a decision-making basis for business.
Scholar metadata have traditionally centered on descriptive representations, which have been used as a foundation for scholarly publication repositories and academic information retrieval systems. In this article, we propose innovative and economic methods of generating knowledge-based structural metadata (structural keywords) using a combination of natural language processing-based machine-learning techniques and human intelligence. By allowing low-barrier participation through a social media system, scholars (both as authors and users) can participate in the metadata editing and enhancing process and benefit from more accurate and effective information retrieval. Our experimental web system ScholarWiki uses machine learning techniques, which automatically produce increasingly refined metadata by learning from the structural metadata contributed by scholars. The cumulated structural metadata add intelligence and automatically enhance and update recursively the quality of metadata, wiki pages, and the machine-learning model.
Metadata quality presents a challenge faced by many digital repositories. There is a variety of proposed quality assurance frameworks applied in repositories that are deployed in various contexts. Although studies report that there is an improvement of the quality of the metadata in many of the applications, the transfer of a successful approach from one application context to another has not been studied to a satisfactory extent. This article presents the empirical results of the application of a metadata quality assurance process that has been developed and successfully applied in an educational context (learning repositories) to 2 different application contexts to compare results with the previous application and assess its generalizability. More specifically, it reports results from the adaptation and application of this process in a library context (institutional repositories) and in a cultural context (digital cultural repositories). Initial empirical findings indicate that content providers seem to be gaining a better understanding of metadata when the proposed process is put in place and that the quality of the produced metadata records increases.
Source-based writing assignments conducted by groups of students are a common learning task used in information literacy instruction. The fundamental assumption in group assignments is that students' collaboration substantially enhances their learning. The present study focused on the group work strategies adopted by upper secondary school students in source-based writing assignments. Seventeen groups authored Wikipedia or Wikipedia-style articles and were interviewed during and after the assignment. Group work strategies were analyzed in 6 activities: planning, searching, assessing sources, reading, writing, and editing. The students used 2 cooperative strategies: delegation and division of work, and 2 collaborative strategies: pair and group collaboration. Division of work into independently conducted parts was the most popular group work strategy. Also group collaboration, where students worked together to complete an activity, was commonly applied. Division of work was justified by efficiency in completing the project and by ease of control in the fair division of contributions. The motivation behind collaboration was related to quality issues and shared responsibility. We suggest that the present designs of learning tasks lead students to avoid collaboration, increasing the risk of low learning outcomes in information literacy instruction.
Social image-sharing websites have attracted a large number of users. These systems allow users to associate geolocation information with their images, which is essential for many interesting applications. However, only a small fraction of social images have geolocation information. Thus, an automated tool for suggesting geolocation is essential to help users geotag their images. In this article, we use a large data set consisting of 221 million Flickr images uploaded by 2.2 million users. For the first time, we analyze user uploading patterns, user geotagging behaviors, and the relationship between the taken-time gap and the geographical distance between two images from the same user. Based on the findings, we represent a user profile by historical tags for the user and build a multinomial model on the user profile for geotagging. We further propose a unified framework to suggest geolocations for images, which combines the information from both image tags and the user profile. Experimental results show that for images uploaded by users who have never done geotagging, our method outperforms the state-of-the-art method by 10.6 to 34.2%, depending on the granularity of the prediction. For images from users who have done geotagging, a simple method is able to achieve very high accuracy.
This article studies the impact of differences in citation practices at the subfield, or Web of Science subject category level, using the model introduced in Crespo, Li, and Ruiz-Castillo (2013a), according to which the number of citations received by an article depends on its underlying scientific influence and the field to which it belongs. We use the same Thomson Reuters data set of about 4.4 million articles used in Crespo et al. (2013a) to analyze 22 broad fields. The main results are the following: First, when the classification system goes from 22 fields to 219 subfields the effect on citation inequality of differences in citation practices increases from similar to 14% at the field level to 18% at the subfield level. Second, we estimate a set of exchange rates (ERs) over a wide [660, 978] citation quantile interval to express the citation counts of articles into the equivalent counts in the all-sciences case. In the fractional case, for example, we find that in 187 of 219 subfields the ERs are reliable in the sense that the coefficient of variation is smaller than or equal to 0.10. Third, in the fractional case the normalization of the raw data using the ERs (or subfield mean citations) as normalization factors reduces the importance of the differences in citation practices from 18% to 3.8% (3.4%) of overall citation inequality. Fourth, the results in the fractional case are essentially replicated when we adopt a multiplicative approach.
We present a novel 3-step self-training method for author name disambiguation-SAND (self-training associative name disambiguator)-which requires no manual labeling, no parameterization (in real-world scenarios) and is particularly suitable for the common situation in which only the most basic information about a citation record is available (i.e., author names, and work and venue titles). During the first step, real-world heuristics on coauthors are able to produce highly pure (although fragmented) clusters. The most representative of these clusters are then selected to serve as training data for the third supervised author assignment step. The third step exploits a state-of-the-art transductive disambiguation method capable of detecting unseen authors not included in any training example and incorporating reliable predictions to the training data. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation, demonstrate that our proposed method outperforms all representative unsupervised author grouping disambiguation methods and is very competitive with fully supervised author assignment methods. Thus, different from other bootstrapping methods that explore privileged, hard to obtain information such as self-citations and personal information, our proposed method produces topnotch performance with no (manual) training data or parameterization and in the presence of scarce information.
I analyze the text of an article that appeared in this journal in 2007 that published the results of a questionnaire in which a number of experts were asked to define the concepts of data, information, and knowledge. I apply standard information retrieval techniques to build a list of the most frequent terms in each set of definitions. I then apply information extraction techniques to analyze how the top terms are used in the definitions. As a result, I draw data-driven conclusions about the aggregate opinion of the experts. I contrast this with the original analysis of the data to provide readers with an alternative viewpoint on what the data tell us.
This Brief Communication discusses the benefits of citation analysis in research evaluation based on Galton's "Wisdom of Crowds" (1907). Citations are based on the assessment of many which is why they can be considered to have some credibility. However, we show that citations are incomplete assessments and that one cannot assume that a high number of citations correlates with a high level of usefulness. Only when one knows that a rarely cited paper has been widely read is it possible to say-strictly speaking-that it was obviously of little use for further research. Using a comparison with "like" data, we try to determine that cited reference analysis allows for a more meaningful analysis of bibliometric data than times-cited analysis.
I propose the use of h-plots for visualizing the asymmetric relationships between the citing and cited profiles of journals in a commonmap. With this exploratory tool, we can understand better the journal's dual roles of citing and being cited in a reference network. The h-plot is introduced and its use is validated with a set of 25 journals belonging to the statistics area. The relatedness factor is considered for describing the relations of citations from a journal "i" to a journal "j," and the citations from the journal "j" to the journal "i." More information has been extracted from the h-plot, compared with other statistical techniques for modelling and representing asymmetric data, such as multidimensional unfolding.
This study examines collaboration dynamics with the goal to predict and recommend collaborations starting from the current topology. Author-, institution-, and country-level collaboration networks are constructed using a ten-year data set on library and information science publications. Different statistical approaches are applied to these collaboration networks. The study shows that, for the employed data set in particular, higher-level collaboration networks (i.e., country-level collaboration networks) tend to yield more accurate prediction outcomes than lower-level ones (i.e., institution- and author-level collaboration networks). Based on the recommended collaborations of the data set, this study finds that neighbor-information-based approaches are more clustered on a 2-D multidimensional scaling map than topology-based ones. Limitations of the applied approaches on sparse collaboration networks are also discussed. (C) 2014 Elsevier Ltd. All rights reserved.
This paper examines the effects of inflationary and equalizing bias on publication output rankings. Any identifiable amount of bias in authorship accreditation was detrimental to accuracy when ranking a select group of leading Canadian aquaculture researchers. Bias arose when publication scores were calculated without taking into account information about multiple authorship and differential coauthor contributions. The ensuing biased equal credit scores, whether fractional or inflated, produced rankings that were fundamentally different from the ranking of harmonic estimates of actual credit calculated by using all relevant byline information in the source data. In conclusion, the results indicate that both fractional and inflated rankings are misleading, and suggest that accurate accreditation of coauthors is the key to reliable publication performance rankings. (C) 2014 The Author. Published by Elsevier Ltd. All rights reserved.
Citation based approaches, such as the impact factor and h-index, have been used to measure the influence or impact of journals for journal rankings. A survey of the related literature for different disciplines shows that the level of correlation between these citation based approaches is domain dependent. We analyze the correlation between the impact factors and h-indices of the top ranked computer science journals for five different subjects. Our results show that the correlation between these citation based approaches is very low. Since using a different approach can result in different journal rankings, we further combine the different results and then re-rank the journals using a combination method. These new ranking results can be used as a reference for researchers to choose their publication outlets. (C) 2014 Elsevier Ltd. All rights reserved.
One problem confronting the use of citation-based metrics in science studies and research evaluations is the Matthew effect. This paper reviews the role of citations in science and decomposes the Matthew effect in citations into three components: networking, prestige, and appropriateness. The networking and prestige effects challenge the validity of citation-based metrics, but the appropriateness effect does not. Using panel data of 1279 solo-authored papers' citation histories and fixed effects models, we test these three effects controlling for unobserved paper characteristics. We find no evidence of retroactive networking effect and only weak evidence of prestige effect (very small and not always significant), which provides some support for the use of citation-based metrics in science studies and evaluation practices. In addition, adding the appropriateness effect reduces the size of the prestige effect considerably, suggesting that previous studies controlling for paper quality but not appropriateness may have overestimated the prestige effect. (C) 2014 Elsevier Ltd. All rights reserved.
In the present paper the Percentage Rank Position (PRP) index concluding from the principle of Similar Distribution of Information Impact in different fields of science (Vinkler, 2013), is suggested to assess journals in different research fields comparatively. The publications in the journals dedicated to a field are ranked by citation frequency, and the PRP-index of the papers in the elite set of the field is calculated. The PRP-index relates the citation rank number of the paper to the total number of papers in the corresponding set. The sum of the PRP-index of the elite papers in a journal, PRP(j,F) may represent the eminence of the journal in the field. The non-parametric and non-dimensional PRP(j,F) index of journals is believed to be comparable across fields. (C) 2014 Elsevier Ltd. All rights reserved.
The objective of the study is to examine the empirical relationship between educational indicators and research outcomes in top twenty nations of the World in terms of number of publications, citations and patents. The literature on higher education is useful in expressing the general and visible characteristics of a research domain, but cannot reveal the possible interaction between educational reforms and research outcomes. In order to overcome this limitation, the current study employed a panel cointegration technique to evaluate the long-run relationship between educational indicators and research productivity over a period of 1980-2011. The results reveal that educational indicators act as an important driver to increase research productivity in the panel of selected countries. The most promising educational factors i.e., higher education enrolment increases GDP and number of publications by 0.898% and 1.425%, respectively. Similarly, higher education expenditures per student increases research and development (R&D) expenditures, number of citations and number of patents by 1.128%, 0.968% and 0.714%, respectively. Finally, increasing school-life expectancy contributed to researchers in R&D by 0.401%. The study concludes that there is a window of opportunity to equip the youth with necessary skills to ensure a sustainable future for the nations. Higher education empowers and enables students to compete in a highly competitive and interconnected world through research and innovations, which are the drivers of new ideas, businesses and economic growth. (C) 2014 Elsevier Ltd. All rights reserved.
The distribution of impact factors has been modeled in the recent informetric literature using two-exponent law proposed by Mansilla, Koppen, Cocho, and Miramontes (2007). This paper shows that two distributions widely-used in economics, namely the Dagum and Singh-Maddala models, possess several advantages over the two-exponent model. Compared to the latter, the former models give as good as or slightly better fit to data on impact factors in eight important scientific fields. In contrast to the two-exponent model, both proposed distributions have closed-from probability density functions and cumulative distribution functions, which facilitates fitting these distributions to data and deriving their statistical properties. (C) 2014 Elsevier Ltd. All rights reserved.
Subject classification arises as an important topic for bibliometrics and scientometrics, searching to develop reliable and consistent tools and outputs. Such objectives also call for a well delimited underlying subject classification scheme that adequately reflects scientific fields. Within the broad ensemble of classification techniques, clustering analysis is one of the most successful. Two clustering algorithms based on modularity the VOS and Louvain methods are presented here for the purpose of updating and optimizing the journal classification of the SCImago Journal & Country Rank (SJR) platform. We used network analysis and Pajek visualization software to run both algorithms on a network of more than 18,000 SJR journals combining three citation-based measures of direct citation, co-citation and bibliographic coupling. The set of clusters obtained was termed through category labels assigned to SJR journals and significant words from journal titles. Despite the fact that both algorithms exhibited slight differences in performance, the results show a similar behaviour in grouping journals. Consequently, they are deemed to be appropriate solutions for classification purposes. The two newly generated algorithm-based classifications were compared to other bibliometric classification systems, including the original SJR and WoS Subject Categories, in order to validate their consistency, adequacy and accuracy. In addition to some noteworthy differences, we found a certain coherence and homogeneity among the four classification systems analysed. (C) 2014 Elsevier Ltd. All rights reserved.
We show mathematically that the success-index can be any of the following impact indices, dependent on the value of the threshold used in the definition of the success-index: Hirsch-index (h-index), g-index, generalized Wu- and Kosmulski-indices, the average. (C) 2014 Elsevier Ltd. All rights reserved.
This paper shows how bibliometric models can be used to assist peers in selecting candidates for academic openings. Several studies have demonstrated that a relationship exists between results from peer-review evaluations and results obtained with certain bibliometric indicators. However, very little has been done to analyse the predictive power of models based on bibliometric indicators. Indicators with high predictive power will be seen as good instruments to support peer evaluations. The goal of this study is to assess the predictive power of a model based on bibliometric indicators for the results of academic openings at the level of Associado and Catedratico at Portuguese universities. Our results suggest that the model can predict the results of peer-review at this level with a reasonable degree of accuracy. This predictive power is better when only the scientific performance is assessed by peers. (C) 2014 Elsevier Ltd. All rights reserved.
The journal impact factor is not comparable among fields of science and social science because of systematic differences in publication and citation behavior across disciplines. In this work, a source normalization of the journal impact factor is proposed. We use the aggregate impact factor of the citing journals as a measure of the citation potential in the journal topic, and we employ this citation potential in the normalization of the journal impact factor to make it comparable between scientific fields. An empirical application comparing some impact indicators with our topic normalized impact factor in a set of 224 journals from four different fields shows that our normalization, using the citation potential in the journal topic, reduces the betwen-group variance with respect to the within-group variance in a higher proportion than the rest of indicators analyzed. The effect of journal self-citations over the normalization process is also studied. (C) 2014 Elsevier Ltd. All rights reserved.
Unlike Impact Factors (IF), Article Influence (AI) scores assign greater weight to citations that appear in highly cited journals. The natural sciences tend to have higher citation rates than the social sciences. We might therefore expect that relative to IF, AI overestimates the citation impact of social science journals in subfields that are related to (and presumably cited in) higher-impact natural science disciplines. This study evaluates that assertion through a set of simple and multiple regressions covering seven social science disciplines: anthropology, communication, economics, education, library and information science, psychology, and sociology. Contrary to expectations. AI underestimates 5IF (five-year Impact Factor) for journals in science-related subfields such as scientific communication, science education, scientometrics, biopsychology, and medical sociology. journals in these subfields have low AI scores relative to their 5IF values. Moreover, the effect of science-related status is considerable typically 0.60 5IF units or 0.50 SD. This effect is independent of the more general finding that AI scores underestimate 5IF for higher-impact journals. It is also independent of the very modest curvilinearity in the relationship between AI and 5IF. (C) 2014 Elsevier Ltd. All rights reserved.
A great deal of work has been done to understand how science contributes to technological innovation and medicine. This is no surprise given the amount of money invested annually in R&D. However, what is not well known is that US science (R&D) investment is only one-sixth that of the annual revenue received by non-profit organizations (NPOs) in the US. The large majority of NPO revenues are devoted to the remaining landscape of altruistic causes - those not relying as heavily on scientific inquiry. Given this broader context, one might reasonably expect the non-profit world to have been as well characterized as that of scientific research. The unfortunate truth is that no map of altruistic missions and causes exists; the landscape of altruistic activity is virtually unknown. In this paper, we present the first maps of altruistic mission space. These maps were created using the text from websites of 125,000 non-profit organizations (NPOs) in the US. The maps consist of 357 topics covering areas such as religion, education, sports, culture, human services, public policy and medical care. The role of science in this altruistic landscape is examined. Possible applications are discussed.
Here we show a novel technique for comparing subject categories, where the prestige of academic journals in each category is represented statistically by an impact-factor histogram. For each subject category we compute the probability of occurrence of scholarly journals with impact factor in different intervals. Here impact factor is measured with Thomson Reuters Impact Factor, Eigenfactor Score, and Immediacy Index. Assuming the probabilities associated with a pair of subject categories our objective is to measure the degree of dissimilarity between them. To do so, we use an axiomatic characterization for predicting dissimilarity between subject categories. The scientific subject categories of Web of Science in 2010 were used to test the proposed approach for benchmarking Cell Biology and Computer Science Information Systems with the rest as two case studies. The former is best-in-class benchmarking that involves studying the leading competitor category; the latter is strategic benchmarking that involves observing how other scientific subject categories compete.
We examine the incidence and extent of co-authorship in environmental and resource economics by investigating the leading journal of environmental and resource economics: the Journal of Environmental Economics and Management. Previous studies of general economic journals have offered empirical evidence for the fact that intellectual collaboration is most prevalent in the field of environmental and resource economics. However, no previous study has examined this finding more carefully. This is a gap in the literature we hope to fill. Accordingly, we investigate all 1,436 papers published in JEEM from 1974 until 2010 with respect to potential drivers of co-authorship. We start with a descriptive analysis in order to depict the most important trends in the past 36 years. We then employ empirical methods to test several hypotheses that are commonly used to analyze the structure of co-authorship. However, we do not stick to hypotheses but investigate also other potentially relevant drivers of co-authorship as e.g. external funding. We find empirical support for a relation between the number of authors and key characteristics of an article like the number of equations, tables or the presence of external funding. Research in environmental and resource economics is demanding in terms of both disciplinary and interdisciplinary skills, so the likelihood of collaboration and jointly written publications is present and significant.
Peer review works as the hinge of the scientific process, mediating between research and the awareness/acceptance of its results. While it might seem obvious that science would regulate itself scientifically, the consensus on peer review is eroding; a deeper understanding of its workings and potential alternatives is sorely needed. Employing a theoretical approach supported by agent-based simulation, we examined computational models of peer review, performing what we propose to call redesign, that is, the replication of simulations using different mechanisms. Here, we show that we are able to obtain the high sensitivity to rational cheating that is present in literature. In addition, we also show how this result appears to be fragile against small variations in mechanisms. Therefore, we argue that exploration of the parameter space is not enough if we want to support theoretical statements with simulation, and that exploration at the level of mechanisms is needed. These findings also support prudence in the application of simulation results based on single mechanisms, and endorse the use of complex agent platforms that encourage experimentation of diverse mechanisms.
This paper demonstrates that basic research has been overshadowed by applied research in China for decades, from the perspective of S&T policy. The data involves 4,707 Chinese S&T policies during the period between 1949 and 2010, which are grouped into five phases, based on the process of S&T system reform in China. We also found that S&T policies in China are leaning more towards basic research, and the gap between basic research and applied research is shrinking.
Based on publications in mathematics of Chinese authors indexed in Chinese domestic and international databases, namely, the CNKI and the Web of Science, the current paper tries to explore impact of collaboration and funding support on academic productivity. Collaboration is classified into domestic and international collaboration, and domestic collaboration is further divided into within-institutional collaboration and cross-institutional collaboration. Regional performance in terms of collaboration and funding support has also been investigated. The results show that collaboration and funded support are highly skewed among Chinese regions. Beijing, Jiangsu, Shanghai, and Zhejiang are most active in collaboration and are the major winners of research funds. Zhejiang and Shaanxi perform in a contrast way: the former publishes mostly internationally whereas the latter mainly domestically. Compared with within-institutional collaboration, cross-institutional and international collaboration perform better in raising productivity and achieving research funds.
Digital and scientific realms are commonly believed to be gendered. The wide pervasiveness of e-science may result in an interaction between the scientific and digital gender divides, increasing the disparities against women. Selecting web-presence as a manifestation of web activity, and applying a quasi-experimental scientometric method, the present study aims to investigate the effects of the interaction, if any, on web-present females and males compared to web-absent ones in Nanoscience and Nanotechnology. The results show that the web-present Nanoscientists are not necessarily superior in their scientific production, though they are higher in their recognition. The web-present females and males are equal in their numbers and productions. Although the female web-present are found to be equal in their recognition to their male counterparts, there is a significant difference between the web-present and web-absent males in this regard, signifying the higher impact of the web on males' recognition.
Because of enhanced anthropogenic nitrogen input, eutrophication, hypoxia, and acidification threaten the health of aquatic ecosystems. To better understand the current state of research and emerging trends in this area, a bibliometric approach was applied to quantitatively evaluate global nitrogen research at the watershed scale. Using 9,748 articles selected from among 10,163 returned by a search in the Science Citation Index Expanded (SCI-Expanded) database from 1900 to 2011, spatial and temporal characteristics of the articles, authors, institutions, countries, and keywords are presented, and focal research areas are derived. Compared with the annual increase in all articles in the SCI-Expanded (4.5 %), the studies on nitrogen in watersheds increased more quickly (11.2 %), indicating an increasing interest in this area. The relationship between authors and their output was evaluated by a two-step function, in which 6,074 authors (26.8 %) publishing on this topic were key scientists who contributed 56.4 % of the total articles. Based on the number of authors, first authors, international collaborators, and citations, four types of authors were analyzed using cluster methods. The influence of the authors, institutions and countries was also analyzed in terms of publication and citation, and a co-occurrence analysis was used to assess cooperation among countries and research hotpots. The keywords were compared among countries to assist our understanding of interests of research and modes. From the analysis of the primary subjects and the co-occurrence of keywords, studies involving nitrogen's environmental effects, the nitrogen process and models are increasing, which indicates that they are likely to become a primary research focus in the near future.
This study represents one of the first attempts to use empirical analysis to estimate academic productivity complex and proves the thesis that academic productivity is a function of multidimensional combination of the work of academic researchers: the scientific work, education, and external relationships. Given the complexity of academic productivity, it is necessary to clarify that it is divided into scientific productivity of the first type (scientific publications); scientific productivity of the second type (awards and academic positions); productivity in terms of external relationships (or external advice); and educational productivity. This objective of this paper is achieved through a sample survey (2,738 academics responded) conducted by Italian researchers from the PIR research project. The results obtained, however (as a case of estimates obtained using the results of a sample survey), are the result of a working reality that Italian academics are flooded by a myriad of activities that are not always consistent with the primary aims of the work of a researcher with an organisational and environmental well-being at the limit of iper productivity (or hyper productivity). The overall productivity (academic productivity) is significantly correlated with the four dimensions: average annual scientific productivity of the first type, average annual scientific productivity of the second type, the productivity external advice and, lastly, teaching productivity. The estimate of the sizes for the four indicators of productivity are the result of a literature search of the primary techniques used to assess productivity in academia. By comparing the most significant indicators, we managed to select all of the technical aspects missing in the Italian system of evaluation. This process allowed for us to add additional variables characterising the various aspects of productivity and prove the validity of our theory about the multidimensionality of academic productivity.
This paper provides an analysis of the relationship between research performance and individual characteristics (e.g., career path information) of researchers, based on information provided in the curriculum vitaes of 565 excellent researchers within the life sciences and medical sciences fields in Japan. I specifically analyzed the relationship between the experiences of practical physicians and research performance. As a result, I found that the experience as a practical physician had a statistically positive relationship with the number of research papers, but there was not a significant relationship with the number of citations. Moreover, the diversity of a researcher's career related significantly to the number of citations and patents. An employment experience at a young age with a company or independent administrative agency had a significant and positive relationship with number of coauthors. However, a significant relationship between work experience in a foreign country and research performance was not observed.
Hirsch's h-index cannot be used to compare academics that work in different disciplines or are at different career stages. Therefore, a metric that corrects for these differences would provide information that the h-index and its many current refinements cannot deliver. This article introduces such a metric, namely the hI,annual (or hIa for short). The hIa-index represents the average annual increase in the individual h-index. Using a sample of 146 academics working in five major disciplines and representing a wide variety of career lengths, we demonstrate that this metric attenuates h-index differences attributable to disciplinary background and career length. It is also easy to calculate with readily available data from all major bibliometric databases, such as Thomson Reuters Web of Knowledge, Scopus and Google Scholar. Finally, as the metric represents the average number of single-author-equivalent "impactful" articles that an academic has published per year, it also allows an intuitive interpretation. Although just like any other metric, the hIa-index should never be used as the sole criterion to evaluate academics, we argue that it provides a more reliable comparison between academics than currently available metrics.
Research evaluation is a necessity for management of academic units (scientists, research groups, departments, institutes, universities) and for government decision making in science and technology. Yet, wrong conclusions may be drawn due to errors in assignments of authors to institutions. To improve existing techniques of institution name disambiguation (IND) based on word similarity or editing distance, a rule-based algorithm is proposed in this study. One-to-many relationships between an institution and many variant names under which it is referred to in bylines of publications are recognized with the aid of statistical methods and specific rules. The performance of the rule based IND algorithm is evaluated on large datasets in four fields. These experimental results demonstrate that the precision of the algorithm is high. Yet, recall should be improved.
In a previews paper we introduced the quantitative method named reference publication year spectroscopy (RPYS). With this method one can determine the historical roots of research fields and quantify their impact on current research. RPYS is based on the analysis of the frequency with which references are cited in the publications of a specific research field in terms of the publication years of these cited references. In this study, we illustrate that RPYS can also be used to reveal the origin of scientific legends. We selected "Darwin finches" as an example for illustration. Charles Darwin, the originator of evolutionary theory, was given credit for finches he did not see and for observations and insights about the finches he never made. We have shown that a book published in 1947 is the most-highly cited early reference cited within the relevant literature. This book had already been revealed as the origin of the term "Darwin finches" by Sulloway through careful historical analysis.
This article is concerned with the cooperation patterns of science among European countries from the viewpoint of small countries. This is an issue, which empirical literature so far has overlooked, but which is still relevant for understanding the implications of integration processes in EU. We have replicated and expanded in sample, indicator and time dimensions the empirical analysis suggested by Frenken (Economic Systems Research 14(4):345-361, 2002) for assessing the homogeneity of cooperation patterns among European countries. We find that small states are less homogenously collaborating with all countries in the European research system and their intra-national research cooperation is also more fragmented. Our analysis reveals the outcomes of cooperation processes, and also highlights the factors such as research funding and research specialisation that can impact the results of the connectivity measurement. We also show that the results are sensitive to the size and measurement of the science system.
This paper investigates the impact of burgeoning Chinese publication on academic alphabetical authorship in the 25 subject categories that have the highest percentage of intentionally alphabetical publications. The use of alphabetical authorship is common in the social sciences and humanities, mathematics, and in some physical disciplines. Chinese academic publication has increased rapidly in recent decades (Hong Kong and Macau were excluded from the study because Hong Kong and Macau are much more internationalized than mainland China). However, authors from mainland China do not prefer alphabetical authorship. The increase in publications from mainland China lowers the probability of intentional alphabetical authorship in the natural science and technology subject categories that we examined. In some natural science and technology categories, the influence is strong. But for the social sciences and humanities, the influence is weak, due to the lower share of world publications from mainland China. Yet, in some social science and humanities subject categories such as 'Economics', the relative share of publications from mainland China is increasing rapidly, and the results on alphabetical authorship trends will be felt in the near future.
This paper aims to evaluate the health issues related to urbanization and get an overview of urban health with the bibliometric approach, the powerful tool in quantitatively macroscopic analysis across multiple disciplines. A total of 11,299 articles and 5,579 Medical Subject Headings (MeSH) terms from the year 1978-2012 were retrieved by searching PubMed/MEDLINE using MeSH term "urban health". The bibliographic information was analyzed to summarize the overall research characteristics. MeSH terms were sorted by their normalized frequency. Top 10 % of the high-frequency MeSH terms were classified into categories (physical environment, health effects, social environment and counter-measures) and analyzed. We investigated the themes and their tendency of the corresponding categories by co-occurrence word (co-word) and regression analysis. We concluded and elaborated nine themes of physical environment, ten themes of health effects, three themes of social environment and four themes of counter-measures in urban health, as well as the main themes in five representative countries (USA, India, China, South Africa and Japan). We present a data-based overview of the issues in urban health, as reference for further researchers.
In this study, we validated the usefulness of examiners' forward citations, especially from the viewpoint of the applicants' self-selection (ASS) decisions during the patent application procedure. We believe that the ASS in an early stage would be decided by a potential-value comparison among patent applications. We focused on six self-selection decision points of the applicants: whether to file patent applications in foreign countries, request for examination, request for accelerated examination, reply to a notification of reasons for refusal, appeal after receiving a decision of refusal, and register after receiving a decision to grant a patent as patent value parameters. We found that application groups that selected "Yes" have a significantly larger number of examiners' forward citations than groups that selected "No" at all decision points. In addition, we confirmed that applications that were finally granted and those that were renewed for a full term after grant have a significantly large number of examiners' forward citations. We concluded that the number of examiners' forward citations would be a useful indicator of the potential value of patent applications in macroscopic analysis.
Comparison, rating, and ranking of alternative solutions, in case of multicriteria evaluations, have been an eternal focus of operations research and optimization theory. There exist numerous approaches at practical solving the multicriteria ranking problem. The recent focus of interest in this domain was the event of parametric evaluation of research entities in Poland. The principal methodology was based on pairwise comparisons. For each single comparison, four criteria have been used. One of the controversial points of the assumed approach was that the weights of these criteria were arbitrary. The main focus of this study is to put forward a theoretically justified way of extracting weights from the opinions of domain experts. Theoretical bases for the whole procedure are based on a survey and its experimental results. Discussion and comparison of the two resulting sets of weights and the computed inconsistency indicator are discussed.
A Triple Helix (TH) network of bi- and trilateral relations among universities, industries, and governments can be considered as an ecosystem in which uncertainty can be reduced when functions become synergetic. The functions are based on correlations among distributions of relations, and therefore latent. The correlations span a vector space in which two vectors (P and Q) can be used to represent forward "sending" and reflexive "receiving," respectively. These two vectors can also be understood in terms of the generation versus reduction of uncertainty in the communication field that results from interactions among the three bi-lateral channels of communication. We specify a system of Lotka-Volterra equations between the vectors that can be solved. Redundancy generation can then be simulated and the results can be decomposed in terms of the TH components. Furthermore, we show that the strength and frequency of the relations are independent parameters in the model. Redundancy generation in TH arrangements can be decomposed using Fourier analysis of the time-series of empirical studies. As an example, the case of co-authorship relations in Japan is re-analyzed. The model allows us to interpret the sinusoidal functions of the Fourier analysis as representing redundancies.
This paper reports results on a bibliometric case study of the long-term development of research organizations, using an internationally leading biomedical institute as example. Using scientometric concepts, small group theory, organizational ecology, and process-based organizational theory, we developed a life cycle based theoretical model for analyzing long-term development of research groups and institutes. Three bibliometric indicators are proposed for growth, activity profile stability, and focus. With these, the research dynamics of the case institute are described. First, overall output growth matches developments internationally in developmental biology and stem cell research, and, in line with this, journal article output increasingly dominates the institute's activity profile. Second, superposed on the overall growth curve, a stepwise development is observed, consisting of long phases of growth and stabilisation. These steps reflect local conditions and events. Historical sources from the Institutes' archive and interviews with the current staff of the institute suggest that the pattern of life cycles reflects a strong influence of pioneering individuals. But once settled, pioneering directors who remain in function for many years delay adaptation of the institutes' mission to field developments. Furthermore, national science policies on PhD training, and on priority areas have influenced the life cycles, as did merging with other institutes. As in a social science case, also in this case study stabilized local conditions lead to adaptation to research field dynamics in a delayed fashion. In the present case stable output periods lasted at most 15 years, when local impulses led to new growth of research output and thus prevented onset of a lifecycle decline. The continued growth in the larger field both promoted and legitimized these local impulses.
This study examines long-term trends and shifting behavior in the collaboration network of mathematics literature, using a subset of data from Mathematical Reviews spanning 1985-2009. Rather than modeling the network cumulatively, this study traces the evolution of the "here and now" using fixed-duration sliding windows. The analysis uses a suite of common network diagnostics, including the distributions of degrees, distances, and clustering, to track network structure. Several random models that call these diagnostics as parameters help tease them apart as factors from the values of others. Some behaviors are consistent over the entire interval, but most diagnostics indicate that the network's structural evolution is dominated by occasional dramatic shifts in otherwise steady trends. These behaviors are not distributed evenly across the network; stark differences in evolution can be observed between two major subnetworks, loosely thought of as "pure" and "applied", which approximately partition the aggregate. The paper characterizes two major events along the mathematics network trajectory and discusses possible explanatory factors.
Scientifically liberated and developed countries produce huge amounts of cutting-edge publications in peer-reviewed impact-creating journals. These publications may become basis for various policies/other blueprints. There is no reported study regarding the publication trends of Periodontists from India. The aim of this study was to assess the trends of Indian Periodontist's publications in Pubmed database till 1st March, 2012 by taking quantitative bibliometric approach. Studies were identified by running select search phrases on Pubmed search engine. Search inputs included, 'dental', 'oral', 'periodontal', 'gingiva', 'gingival', 'periodontology', 'periodontics', 'periodontia', 'periodontitis', 'gingivitis', and 'dental implant'. A parallel search with above phrases along with 'India' also done to assess India-specific publications. All publications with or without available abstracts were analyzed for selected parameters. Analysis was performed to determine name of the journal, number of authors, year of publication, type of institute, statewide distribution, type of study etc. The approximate contribution of Indian Dental/Periodontal literature to Pubmed database is 1.45 % till 1st March 2012. The number of articles published by Indian Periodontists is 764 across 107 journals and starting from 1960. The number of original articles published were 510 (66.75 %) as opposed to 127 (16.62 %) each for review articles and case reports/case series. The average contribution of an Indian Periodontist to Pubmed database is 0.53 articles. The contribution of Indian Periodontists to world literature through Pubmed database is not voluminous but, the publications are multiplying every passing year almost in an exponential way. There is also an increasing trend towards original articles to be published.
Mobile health (mHealth) platforms offer a promising solution to some of the more important problems facing the current healthcare system. This paper examines some of the key challenges facing mHealth with a focus on privacy and security issues. In the first part of the paper, the security engineering process is described, which can assist healthcare organizations in developing an architecture-level protection strategy that is compliant with privacy and security legislation and industry initiatives. In the second part of the paper, use cases are selected to illustrate the diverse security architecture contexts in which the protection strategy will be deployed, and to emphasize the importance of integrating security across these contexts. In the third part of the paper, industry and government security best practices are discussed, which can assist healthcare organizations in implementing security architectures to meet their specific privacy and security requirements.
Amyotrophic lateral sclerosis (ALS) is a progressively debilitating neurodegenerative condition that occurs in adulthood and targets the motor neurons. Social support is crucial to the well-being and quality of life of people with unpredictable and incurable diseases such as ALS. Members of the PatientsLikeMe (PLM) ALS online support community share social support but also exchange and build distributed knowledge within their discussion forum. This qualitative analysis of 1,000 posts from the PLM ALS online discussion examines the social support within the PLM ALS online community and explores ways community members share and build knowledge. The analysis responds to 3 research questions: RQ1: How and why is knowledge shared among the distributed participants in the PLM-ALS threaded discussion forum?; RQ2: How do the participants in the PLM-ALS threaded discussion forum work together to discover knowledge about treatments and to keep knowledge discovered over time?; and RQ3: How do participants in the PLM-ALS forum co-create and treat authoritative knowledge from multiple sources including the medical literature, healthcare professionals, lived experiences of patients and "other" sources of information such as lay literature and alternative health providers? The findings have implications for supporting knowledge sharing and discovery in addition to social support for patients.
In the age of mobile commerce, users receive floods of commercial messages. How do users judge the relevance of such information? Is their relevance judgment affected by contextual factors, such as location and time? How do message content and contextual factors affect users' privacy concerns? With a focus on mobile ads, we propose a research model based on theories of relevance judgment and mobile marketing research. We suggest topicality, reliability, and economic value as key content factors and location and time as key contextual factors. We found mobile relevance judgment is affected mainly by content factors, whereas privacy concerns are affected by both content and contextual factors. Moreover, topicality and economic value have a synergetic effect that makes a message more relevant. Higher topicality and location precision exacerbate privacy concerns, whereas message reliability alleviates privacy concerns caused by location precision. These findings reveal an interesting intricacy in user relevance judgment and privacy concerns and provide nuanced guidance for the design and delivery of mobile commercial information.
There are untold conceptions of information in information science, and yet the nature of information remains obscure and contested. This article contributes something new to the conversation as the first arts-informed, visual, empirical study of information utilizing the draw-and-write technique. To approach the concept of information afresh, graduate students at a North American iSchool were asked to respond to the question "What is information?" by drawing on a 4- by 4-inch piece of paper, called an iSquare. One hundred thirty-seven iSquares were produced and then analyzed using compositional interpretation combined with a theoretical framework of graphic representations. The findings indicate how students visualize information, what was drawn, and associations between the iSquares and prior renderings of information based on words. In the iSquares, information appears most often as pictures of people, artifacts, landscapes, and patterns. There are also many link diagrams, grouping diagrams, symbols, and written text, each with distinct qualities. Methodological reflections address the relationship between visual and textual data, and the sample for the study is critiqued. A discussion presents new directions for theory and research on information, namely, the iSquares as a thinking tool, visual stories of information, and the contradictions of information. Ideas are also provided on the use of arts-informed, visual methods and the draw-and-write technique in the classroom.
Children represent an increasing group of web users. Some of the key problems that hamper their search experience is their limited vocabulary, their difficulty in using the right keywords, and the inappropriateness of their general-purpose query suggestions. In this work, we propose a method that uses tags from social media to suggest queries related to children's topics. Concretely, we propose a simple yet effective approach to bias a random walk defined on a bipartite graph of web resources and tags through keywords that are more commonly used to describe resources for children. We evaluate our method using a large query log sample of queries submitted by children. We show that our method outperforms by a large margin the query suggestions of modern search engines and state-of-the art query suggestions based on random walks. We improve further the quality of the ranking by combining the score of the random walk with topical and language modeling features to emphasize even more the child-related aspects of the query suggestions.
This qualitative investigation is situated in the field of information seeking and use, and decision-making theory provided a framework for the study. In a naturalistic setting and across a range of curriculum areas, it investigated the behavior of secondary school students undertaking information search tasks. Research questions focused on students' criteria for assessing the relevance and reliability of information. Thirty-seven students between 14 and 17 years of age from a southeastern Australian school participated. The study collected data from journals; interviews, including video-stimulated recall interviews; think-aloud reports; video screen captures; and questionnaires. Data analysis culminated in grounded theory. Initial judgments of an item's relevance were based on comprehensibility, completeness of source, whether the item needed to be purchased, whether video sources were suitable, and whether factual or opinionative material met students' needs. Participants preferred information that provided topic overviews, information that linked to prior knowledge, and sources that treated topics in acceptable depth and were structured to facilitate accessibility. Students derived clues about reliability from URLs and considered the reputation of sources. The ability of an item to corroborate prior knowledge, its graphic design, its style of writing, and the perceived authority of its creators influenced participants' decisions about reliability.
On February 15, 2013, asteroid 2012 DA14 passed close to Earth during its flyby. We used this opportunity to analyze how the event affected the social-networking community of Twitter. We analyzed whether the flyby of the asteroid elicited more tweets about the asteroid close to the asteroid's trajectory compared to the neutral search term NASA. A spatio-temporal analysis of tweets about NASA revealed a natural movement of the geographical mean from east to west, mirroring the Sun's path through the sky. For the geolocation of users tweeting about the asteroid, this east-west movement changed direction, mirroring the asteroid's trajectory (from south-east to north-west) as soon as the asteroid was potentially visible from Earth. This effect appears to represent emotionally contagious flocking behavior among Twitter users influenced by the position of the asteroid itself.
Recently, Twitter has received much attention, both from the general public and researchers, as a new method of transmitting information. Among others, the number of retweets (RTs) and user types are the two important items of analysis for understanding the transmission of information on Twitter. To analyze this point, we applied text classification and feature extraction experiments using random forests machine learning with conventional stylistic and Twitter-specific features. We first collected tweets from 40 accounts with a high number of followers and created tweet texts from 28,756 tweets. We then conducted 15 types of classification experiments using a variety of combinations of features such as function words, speech terms, Twitter's descriptive grammar, and information roles. We deliberately observed the effects of features for classification performance. The results indicated that class classification per user indicated the best performance. Furthermore, we observed that certain features had a greater impact on classification. In the case of the experiments that assessed the level of RT quantity, information roles had an impact. In the case of user experiments, important features, such as the honorific postpositional particle and auxiliary verbs, such as "desu" and "masu," had an impact. This research clarifies the features that are useful for categorizing tweets according to the number of RTs and user types.
The Brazilian Lattes Platform is an important academic/resume data set that registers all academic activities of researchers associated with different major knowledge areas. The academic information collected in this data set is used to evaluate, analyze, and document the scientific production of research groups. Information about the interactions between Brazilian researchers in the form of coauthorships, however, has not been analyzed. In this article, we identified and characterized Brazilian academic coauthorship networks of researchers registered in the Lattes Platform using topological properties of graphs. For this purpose, we explored (a) strategies to develop a large Lattes curricula vitae data set, (b) an algorithm for identifying automatic coauthorships based on bibliographic information, and (c) topological metrics to investigate interactions among researchers. This study characterized coauthorship networks to gain an in-depth understanding of the network structures and dynamics (social behavior) among researchers in all available major Brazilian knowledge areas. In this study, we evaluated information from a total of 1,131,912 researchers associated with the eight major Brazilian knowledge areas: agricultural sciences; biological sciences; exact and earth sciences; humanities; applied social sciences; health sciences; engineering; and linguistics, letters, and arts.
This article first analyzes library and information science (LIS) research articles published in core LIS journals in 2005. It also examines the development of LIS from 1965 to 2005 in light of comparable data sets for 1965, 1985, and 2005. In both cases, the authors report (a) how the research articles are distributed by topic and (b) what approaches, research strategies, and methods were applied in the articles. In 2005, the largest research areas in LIS by this measure were information storage and retrieval, scientific communication, library and information-service activities, and information seeking. The same research areas constituted the quantitative core of LIS in the previous years since 1965. Information retrieval has been the most popular area of research over the years. The proportion of research on library and information-service activities decreased after 1985, but the popularity of information seeking and of scientific communication grew during the period studied. The viewpoint of research has shifted from library and information organizations to end users and development of systems for the latter. The proportion of empirical research strategies was high and rose over time, with the survey method being the single most important method. However, attention to evaluation and experiments increased considerably after 1985. Conceptual research strategies and system analysis, description, and design were quite popular, but declining. The most significant changes from 1965 to 2005 are the decreasing interest in library and information-service activities and the growth of research into information seeking and scientific communication.
The article describes a semi-supervised approach to extracting multiword aspects of user-written reviews that belong to a given category. The method starts with a small set of seed words, representing the target category, and calculates distributional similarity between the candidate and seed words. We compare 3 distributional similarity measures (Lin's, Weeds's, and balAPinc), and a document retrieval function, BM25, adapted as a word similarity measure. We then introduce a method for identifying multiword aspects by using a combination of syntactic rules and a co-occurrence association measure. Finally, we describe a method for ranking multiword aspects by the likelihood of belonging to the target aspect category. The task used for evaluation is extraction of restaurant dish names from a corpus of restaurant reviews.
How to provide users a positive experience during interaction with information (i.e., the "Information eXperience" (IX)) is still an open question. As a starting point, this work investigates how the emotion of interest can be influenced by modifying the complexity of the information presented to users. The appraisal theory of interest suggests a "sweet spot" where interest will be at its peak: information that is novel and complex yet still comprehensible. This "sweet spot" is approximated using two studies. Study One develops a computational model of textual complexity founded on psycholinguistic theory on processing difficulty. The model was trained and tested on 12,420 articles, achieving a classification performance of 90.87% on two classes of complexity. Study Two puts the model to its ultimate test: Its application to change the user's IX. Using 18 news articles the influence of complexity on interest and its appraisals is unveiled. A structural equation model shows a positive influence of complexity on interest, yet a negative influence of comprehensibility, confirming a seemingly paradoxical relationship between complexity and interest. By showing when complexity becomes interesting, this paper shows how information systems can use the model of textual complexity to construct an interesting IX.
This work investigates recent claims that citation in a review article provokes a decline in a paper's later citation count; citations being given to the review article instead of the original paper. Using the Science Citation Index Expanded, we looked at the yearly percentages of lifetime citations of papers published in 1990 first cited in review articles in 1992 and 1995 in the field of biomedical research, and found that no significant change occurred after citation in a review article, regardless of the papers' citation activity or specialty. Additional comparison was done for papers from the field of clinical research, and this yielded no meaningful results to support the notion that review articles have any substantial effect on the citation count of the papers they review.
In this brief communication, we show how a simple 3D bibliometric performance evaluation based on the zynergy-index (Prathap, 2013) can be simplified by the recently introduced 3-class approach (Ye & Leydesdorff, in press).
Author ambiguity mainly arises when several different authors express their names in the same way, generally known as the namesake problem, and also when the name of an author is expressed in many different ways, referred to as the heteronymous name problem. These author ambiguity problems have long been an obstacle to efficient information retrieval in digital libraries, causing incorrect identification of authors and impeding correct classification of their publications. It is a nontrivial task to distinguish those authors, especially when there is very limited information about them. In this paper, we propose a graph based approach to author name disambiguation, where a graph model is constructed using the co-author relations, and author ambiguity is resolved by graph operations such as vertex (or node) splitting and merging based on the co-authorship. In our framework, called a Graph Framework for Author Disambiguation (GFAD), the namesake problem is solved by splitting an author vertex involved in multiple cycles of coauthorship, and the heteronymous name problem is handled by merging multiple author vertices having similar names if those vertices are connected to a common vertex. Experiments were carried out with the real DBLP and Arnetminer collections and the performance of GFAD is compared with three representative unsupervised author name disambiguation systems. We confirm that GFAD shows better overall performance from the perspective of representative evaluation metrics. An additional contribution is that we released the refined DBLP collection to the public to facilitate organizing a performance benchmark for future systems on author disambiguation.
This study uncovers the evolution of a fuel cell research network through a bibliometric study focusing on a period from 1991 to 2010. From a dataset of 37,435 research articles, the study focuses on the evolution of fuel cell research networks at a national level. Focusing solely on the expansion of the research networks, and the policies effecting collaboration, the paper poses three research questions (1) Is research into fuel cells more unconcentrated than in science overall and if so, (2) is there changes within time and (3) can we identify a cluster among certain countries. To answer the research questions, the data was compared to findings on the overall scientific output worldwide. In addition, an ego network analysis was performed and a modularity algorithm was used in order to identify clusters from the network data. The study showed that fuel cell research co-operation has had a distinct evolution within the time frame of the study. Research has increased in both volume and in co-operation, but research co-operation is more unconcentrated than in science overall. Non-TRIAD countries have a stronger role in fuel cell research than in science overall. Clusters in research co-operation have evolved into two modes of co-operation-one around Asia and North America and the second around European co-operation with US and Asia.
Interdisciplinarity results from dynamics at two levels. Firstly, research questions are approached using inputs from a variety of disciplinary fields. Secondly, the results of this multidisciplinary research feed back into the various research fields. This may either contribute to the further development of these fields, or may lead to disciplinary reconfiguration. If the latter is the case, a new interdisciplinary field may emerge. Following this perspective, the scientific landscape of river research and river science is mapped to assess to which current river research is a multi-disciplinary endeavor, and to which extent it results in a new emerging (inter) disciplinary field of river science. The paper suggests that this two level approach is a useful method to study interdisciplinary research and, more generally, disciplinary dynamics. With respect to river research, we show that it is mainly performed in several fields (limnology, fisheries & fish research, hydrology & water resources, and geomorphology) that hardly exchange knowledge. The different river research topics are multidisciplinary in nature, as they are shared by different fields. However, river science does not emerge as an interdisciplinary field, and often-mentioned new interdisciplinary fields such as hydroecology or hydromorphology are not (yet) visible. There is hardly any involvement of social within river research. Finally, the field of ecology occupies a central position within river research, whereas an expected engineering field is shown absent. This together may signal the acceptance of the ecosystem-based paradigm in river management, replacing the traditional engineering paradigm.
Technological change evolves along a cyclical divergent-convergent pattern in knowledge diffusion paths. Technological divergence occurs as a breakthrough innovation, or discontinuity, inaugurating an era of ferment in which several competing technologies emerge and gradually advance. Technological convergence occurs as a series of evolutionary, variant changes that are gradually combined or fused together to open the industry to successive dominant designs or guideposts. To visualize such a pattern of technological evolution, we choose to study lithium iron phosphate (LFP) battery technology through an extension of the citation-based main path analysis, namely the key-route main path analysis. The key-route method discloses the main paths that travel through a specified number of key citations. The resulting multiple paths reveal the structure of the knowledge diffusion paths. The citation network is constructed from 1,531 academic articles on LFP battery technology published between 1997 and early 2012. Findings illustrate that LFP battery technology has completed two full technological cycles and is in the middle of the third cycle.
Nanoscience and nanotechnology are research areas of a multidisciplinary nature. Having a good knowledge of the rapidly evolving nature of these research areas is important to understand the research paths, as well as national and global developments in these areas. Accordingly, in this reported study nanoscience and nanotechnology research undertaken globally was compared with that of Australia by way of analyzing research publications. Initially, four different bibliometric Boolean-based search methodologies were used to analyze publications in the Web of Science database (Thomson Reuters ISI Web of Knowledge). These methodologies were (a) lexical query, (b) search in nanoscience and nanotechnology journals, (c) combination of lexical query and journal search and (d) search in the ten nano-journals with the highest impact factors. Based on results obtained, the third methodology was found to be the most comprehensive approach. Consequently, this search methodology was used to compare global and Australian nanoscience and nanotechnology publications for the period 1988-2000. Results demonstrated that depending on the search technique used, Australia ranks fourteenth to seventeenth internationally with a higher than world average number of nanoscience and nanotechnology publications. Over the last decade, Australia showed a relative growth rate in nanoscience and nanotechnology publications of 16 % compared to 12 % for the rest of the world. Researchers from China, the USA and the UK are from the main countries that collaborate with Australian researchers in nanoscience and nanotechnology publications.
Technological trajectory is a representation of the development of technology. Based on the analysis of the trajectories of prominent technologies, we can explore the phenomena of technology evolution and knowledge diffusion. In this study, we focus on explaining knowledge diffusion in the core technology used in fuel cells, i.e. the development of 5-layer membrane electrode assembly (MEA) technologies. Through an investigation of path analysis, this study explores how the knowledge of this technology has evolved and diffused across different locations. The empirical analysis also explains how certain technological knowledge plays a critical role in main path. In this study, patent data of 5-layer MEA technologies for fuel cells is collected from the US Patent Office, for a total of 1,356 patents, followed by constructing a patent citation network based on citation relationships, recognising prominent patents with many citations through path analysis. Using the local main path analysis and global key-route method, we identify three stages of technological development, including an improvement of the proton exchange membrane (PEM) and catalyst synthesis. Additionally, we use regression analysis to demonstrate that patents with specific characteristics play a vital role in the process of knowledge diffusion. Patents from Japan and South Korea are relatively more important than patents from other countries. The brokerage characteristics of a patent (e.g., coordinating domestically or liaising among three or more countries) also facilitate the diffusion of technological knowledge. However, the importance of these brokerages changes when we look at inventing time. Furthermore, the technological diversification of a patent exerted no substantial influence on its network position.
This study seeks to bridge the gap between scientometrics literature on scientific collaboration and science and technology management literature on partner selection by linking scientists' collaborator preferences to the marginal advantage in citation impact. The 1981-2010 South Korea NCR (National Citation Report), a subset of the Web of Science that includes 297,658 scholarly articles, was used for this research. We found that, during this period, multi-author scientific articles increasingly dominated single-author articles: multi-university collaboration grew significantly; and the numbers of research publications produced by teams working within a single institution or by a single author diminished. This study also demonstrated that multi-university collaboration produces higher-impact articles when it includes "Research Universities," that is, top-tier university schools. We also found that elite universities experienced impact degradation of their scientific results when they collaborated with lower-tier institutions, whereas their lower-tier partners gained impact benefits from the collaboration. Finally, our research revealed that Korean universities are unlikely to work with other universities in the same tier. This propensity for cross-tier collaboration can be interpreted as strategic partner selection by lower-tier schools seeking marginal advantage in citation impact.
Upflow anaerobic sludge blanket/bed (UASB) has been recognized as a robust technology arousing wide concern in wastewater treatment research recently. In this study, a bibliometric analysis was performed to evaluate the publications on UASB research from 1983 to 2012, based on the Science Citation Index databases. It was identified that a total of 2363 UASB-related outputs were published in 220 journals over the past 30 years. Results showed that China and Indian Institute of Technology in India came as the most productive country and institute publishing most articles on UASB, respectively. The most productive field of "wastewater treatment" would still maintain the leading role as to provide a good reference on the UASB research in the future. Besides, the performance improving approaches and practical applications of the UASB would probably continue as the two main developing orientations. This study is to serve as an alternative and innovative way of revealing the research trends in UASB.
This main purpose of this paper is to investigate the causal relationship between knowledge (research output) and economic growth in US over 1981-2011. To overcome the issues of ignoring possible instability and hence, falsely assuming a constant relationship through the years, we use bootstrapped Granger non-causality tests with fixed-size rolling-window to analyze time-varying causal links between two series. Instead of just performing causality tests on the full sample which assumes a single causality relationship, we also perform Granger causality tests on the rolling sub-samples with a fixed-window size. Unlike the full-sample Granger causality test, this method allows us to capture any structural shifts in the model, as well as, the evolution of causal relationships between sub-periods, with the bootstrapping approach controlling for small-sample bias. Full-sample bootstrap causality tests reveal no causal relationship between research and growth in the US. Further, parameter stability tests indicate that there were structural shifts in the relationship, and hence, we cannot entirely rely on full-sample results. The bootstrap rolling-window causality tests show that during the sub-periods of 2003-2005 and 2009, GDP Granger caused research output; while in 2010, the causality ran in the opposite direction. Using a two-state regime switching vector smooth autoregressive model, we find unidirectional Granger causality from research output to GDP in the full sample.
In this study we examined a sample of 100 European astrophysicists and their publications indexed by the citation database Scopus, submitted to the arXiv repository and bookmarked by readers in the reference manager Mendeley. Although it is believed that astrophysicists use arXiv widely and extensively, the results show that on average more items are indexed by Scopus than submitted to arXiv. A considerable proportion of the items indexed by Scopus appear also on Mendeley, but on average the number of readers who bookmarked the item on Mendeley is much lower than the number of citations reported in Scopus. The comparisons between the data sources were done based on the authors and the titles of the publications.
Small and medium enterprises (SMEs) have difficulties identifying appropriate technology opportunities under severe capability and resource constraints. To tackle this issue, we suggest a method for identifying technology opportunities that is customized to the existing technologies and technological capabilities of SMEs through two-stage patent analysis. An expert-based technological attribute-application table makes it possible to identify basic opportunities by multiple keyword matching. Also, non-traditional opportunities can be explored and identified by an iterative action-object analysis of patents. This two-stage patent analysis approach provides managers with a way of identifying specific technology opportunities in which their existing technologies can be utilized to the maximum extent, therefore helping them to develop technology strategies.
The concept of citer analysis investigated earlier by Ajiferuke and Wolfram (In: B. Larsen, J. Leta (eds.) Proceedings of the 12th international conference of the international society for scientometrics and informetrics (ISSI) pp. 798-808, 2009, Scientometrics 83:623-638, 2010) is extended to journals where different citing units (citers, citing articles, citing journals) are compared with the journal impact factor and each other to determine if differences in ranking arise from different measures. The citer measures for the 31 high impact journals studied from information science and library science are significantly correlated, even more so than the earlier citer analysis findings, indicating that there is a close relationship among the different units of measure. Still, notable differences in rankings for the journals examined were evident for the different measures used, especially from either 5-year impact factor or number of citing articles per publication to the number of citing journals per publication. The journals that are adversely affected seem to be those whose citations are concentrated in a few journals. This informed the need to develop a journal citation concentration index, which can serve as a complementary measure to the existing journal impact indices.
Understanding the direction and magnitude of soil science publication in the Philippines is crucial in formulating research priorities and funding allocation. There is no consensus on the current state of soil science publication in the Philippines, thus this study was conducted to elucidate the trend in the soil science publication. We conducted an in-depth analysis on the total number of publications and the total number of citations of soil science publications collected from Thomson ISI database. Results revealed an upsurge in soil science publication from 1970 to 2000 with no indication that this trend is slowing down. Increases in the number of citations with time are consistent with increases in the total number of publications (r = 0.93; p < 0.05). Results further revealed that the soil science publication in the Philippines is biased towards rice research particularly soil water with very few studies were published for plant nutrition and soil chemistry. The present study highlights the need for a paradigm shift in soil science research from mostly rice related research to environmental research. Ways to increase soil science publication among Filipino soil scientist's particularly in academic institutions is proposed. Finally, since only a few government-funded research have been published, future studies should stress on identifying factors that influence scientific productivity of most soil scientists in the Philippines.
To demonstrate the importance and the actual research situation of Antarctic studies in the humanities and social sciences, we collected data from the SSCI and A&HCI covering a period of over 100 years and focused on the number of articles published each year, major journals, types of document, authors and their countries publishing the most articles, collaboration, the major research subjects covered, and citations. Comparisons were also made with the Arctic studies to show some similarities and differences. The results suggest that the research in the fields of humanities and social sciences has been in the long-run developing without interruption over 100 years. With regard to the number of articles in high-capacity journals, Geographical Journal performs best, followed by the Petermanns Geographische Mitteilungen and Scottish Geographical Magazine. The documentation is rather scattered without a strong cohesion, while book review and article are the two most common types of document. There haven't many stable collaborated teams on Antarctic topics. Joyner, Savours, and Beck are the three authors having the highest number of publications. USA is the most active country while the most active research institute is University of Tasmania in Australia. The Antarctic expedition has been the main theme lasted for centuries. In addition, the research in the fields of humanities and social sciences has generated a lot of high-impact articles, among which the article entitled "Chemical concentrations of pollutant lead aerosols, terrestrial dusts and sea salts in Greenland and Antarctic snow strata" enjoys the highest citation counts.
This paper uses two large databases, one of given names and one of family names, to categorise the names of researchers from Italy, Sweden, the UK and the USA whose papers in astronomy and oncology were published in 2006-2007 and in 2011-2012 by sex (gender) and ethnicity or national origin. For all the countries, there were relatively many more females publishing papers in oncology than in astronomy, but their share of contributions was lower than the percentage of researchers. Sweden and the UK had much higher percentages of both other European and Rest of the World researchers than Italy did. US researchers with non-European names were categorised in six main country groups. The ones with the greatest presence were Chinese (mainly Mandarin) and South Asians (mainly Indians). The method could be adapted to investigate the progress of women in research in many other countries, and the role played by non-national researchers in their scientific output.
This study demonstrates the continued existence of gender disparity with respect to salary in four neurologic specialties in the largest public healthcare system of the Western United States without the bias of self-report. We extracted physician salary information from the publicly available UC pay system database and obtained Scopus (http://www.scopus.com/home.url) and Web of Science publication counts and h-indices via searching individual faculty by name and specialty. Faculty gender, institution, specialty, ranking, chairmanship, degrees, and salary data were collected through review of departmental websites and individual faculty profiles. All faculty members (n = 433) from the departments of ophthalmology, otolaryngology, neurosurgery and neurology in the UC pay system database in 2008 were selected for analysis. We found that female faculty members in the 2008 UC healthcare system were significantly underrepresented from the highest salary brackets, representing only 12.5 and 2.6 % of those earning $300,001-$400,000 and over $400,000, respectively (p < 0.01). The female-to-male salary ratio in 2008 for all UC physicians earning over $100,000 was 0.698 (p < 0.00001). Multivariate regression modeling demonstrated a 12 % salary deficit (95 % CI 2-21 %, p = 0.02) for women in the UC healthcare system after controlling for institution, professorial rank, chairmanship, specialty, Scopus publication count, and Scopus h-index. Despite recent efforts at educational equality in the training of physicians, gender disparities still persist within academic medicine.
This study proposes an archaeology as a means of exploring the practices by which digitally encoded resources are generated, circulated, and received. The discussion grapples with the ambiguous relationship between digitizations and their exemplars in the well-known database, Early English Books Online (EEBO), and suggests ways in which digitizations might be analyzed as witnesses of current perceptions about the past and used accordingly in scholarly research. The article therefore offers a critical reading of EEBO and its digitizations as part of a broader effort to investigate the role of digitally encoded resources in the transmission of ideas and the production of cultural heritage.
The automated detection of plagiarism is an information retrieval task of increasing importance as the volume of readily accessible information on the web expands. A major shortcoming of current automated plagiarism detection approaches is their dependence on high character-based similarity. As a result, heavily disguised plagiarism forms, such as paraphrases, translated plagiarism, or structural and idea plagiarism, remain undetected. A recently proposed language-independent approach to plagiarism detection, Citation-based Plagiarism Detection (CbPD), allows the detection of semantic similarity even in the absence of text overlap by analyzing the citation placement in a document's full text to determine similarity. This article evaluates the performance of CbPD in detecting plagiarism with various degrees of disguise in a collection of 185,000 biomedical articles. We benchmark CbPD against two character-based detection approaches using a ground truth approximated in a user study. Our evaluation shows that the citation-based approach achieves superior ranking performance for heavily disguised plagiarism forms. Additionally, we demonstrate CbPD to be computationally more efficient than character-based approaches. Finally, upon combining the citation-based with the traditional character-based document similarity visualization methods in a hybrid detection prototype, we observe a reduction in the required user effort for document verification.
Electronic health record (EHR) systems can improve service efficiency and quality within the health care sector and thus have been widely considered for adoption. Yet the introduction of such systems has caused much concern about patients' information privacy. This study provides new insights into how privacy concerns play a role in patients' decisions to permit digitization of their personal health information. We conducted an online experiment and collected data from 164 patients who are involved in the nonmandatory EHR adoption in the Netherlands. We found that the negative effect of information privacy concerns on patients' willingness to opt in is influenced by the degree of EHR system interoperability and patients' ability to control disclosure of their information. The results show that, for a networked EHR system, the negative effect of privacy concerns on opt-in behavior was reinforced more than for the stand-alone system. The results also suggest that giving patients greater ability to control their information can alleviate their privacy concerns when they make opt-in decisions. We discuss the implications of these findings.
Language as a symbolic medium plays an important role in virtual communications. In a primarily linguistic environment such as cyberspace, words are an expressed form of intent and actions. We investigate the functions of words and actions in identifying behavioral anomalies of social actors to safeguard the virtual organization. Social actors are likened to "sensors" as they observe changes in a focal individual's behavior during computer-mediated communications. Based on social psychology theories and pragmatic views of words and actions in online communications, we theorize a dyadic attribution model that helps make sense of anomalous behavior in creative online experiments. This model is then tested in an experiment. Findings show that observation of the behavioral differences between words and actions, based on either external or internal causality, can offer increased ability to detect the compromised trustworthiness of observed individuals-possibly leading to early detection of insider threat potential. The dyadic attribution model developed in this sociotechnical study can function to detect behavioral anomalies in cyberspace, and protect the operations of a virtual organization.
A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
Some curricula vitae (web CVs) of academics on the web, including homepages and publication lists, link to open-access (OA) articles, resources, abstracts in publishers' websites, or academic discussions, helping to disseminate research. To assess how common such practices are and whether they vary by discipline, gender, and country, the authors conducted a large-scale e-mail survey of astronomy and astrophysics, public health, environmental engineering, and philosophy across 15 European countries and analyzed hyperlinks from web CVs of academics. About 60% of the 2,154 survey responses reported having a web CV or something similar, and there were differences between disciplines, genders, and countries. A follow-up outlink analysis of 2,700 web CVs found that a third had at least one outlink to an OA target, typically a public eprint archive or an individual self-archived file. This proportion was considerably higher in astronomy (48%) and philosophy (37%) than in environmental engineering (29%) and public health (21%). There were also differences in linking to publishers' websites, resources, and discussions. Perhaps most important, however, the amount of linking to OA publications seems to be much lower than allowed by publishers and journals, suggesting that many opportunities for disseminating full-text research online are being missed, especially in disciplines without established repositories. Moreover, few academics seem to be exploiting their CVs to link to discussions, resources, or article abstracts, which seems to be another missed opportunity for publicizing research.
Although there is evidence that counting the readers of an article in the social reference site, Mendeley, may help to capture its research impact, the extent to which this is true for different scientific fields is unknown. In this study, we compare Mendeley readership counts with citations for different social sciences and humanities disciplines. The overall correlation between Mendeley readership counts and citations for the social sciences was higher than for the humanities. Low and medium correlations between Mendeley bookmarks and citation counts in all the investigated disciplines suggest that these measures reflect different aspects of research impact. Mendeley data were also used to discover patterns of information flow between scientific fields. Comparing information flows based on Mendeley bookmarking data and cross-disciplinary citation analysis for the disciplines revealed substantial similarities and some differences. Thus, the evidence from this study suggests that Mendeley readership data could be used to help capture knowledge transfer across scientific disciplines, especially for people that read but do not author articles, as well as giving impact evidence at an earlier stage than is possible with citation counts.
Structural cohesion, hierarchy, holes, and percolating clusters share a complementary existence in many social networks. Although the individual influences of these attributes on the structure and function of a network have been analyzed in detail, a more accurate picture emerges in proper perspective and context only when research methods are employed to integrate their collective impacts on the network. In a major research project, we have undertaken this examination. This paper presents an extract from this project, using a global network assessment of these characteristics. We apply our methods to analyze the collaboration networks of a subset of researchers in India through their coauthored papers in peer-reviewed journals and conference proceedings in management science, including related areas of information technology and economics. We find the Indian networks to be currently suffering from a high degree of fragmentation, which severely restricts researchers' long-rage connectivities in the networks. Comparisons are made with networks of a similar sample of researchers working in the United States.
According to many studies, social network sites (SNS) have become some of the most popular online destinations. It has been pointed out that very little is known about the psychosocial variables that predict people's use of these websites. In this article, our general objective is to understand behavioral intentions to use SNS employing the well-known unified theory of acceptance and use of technology (UTAUT), which has been validated in a number of studies. Data were collected through a questionnaire survey from a final sample of 1,039 social networking website users in Africa. We have shown that there are contexts, such as the African context, in which the UTAUT does not hold up. Explanations are provided to support the findings.
This article explores the mental models of article indexing of taggers and experts in keyword usage. Better understanding of the mental models of taggers and experts and their usage gap may inspire better selection of appropriate keywords for organizing information resources. Using a data set of 3,972 tags from CiteULike and 6,708 descriptors from Library and Information Science Abstracts (LISA) from 1,489 scholarly articles of 13 library and information science journals, social network analysis and frequent-pattern tree methods were used to capture and build up the mental models of article indexing of taggers and experts when using keywords, and to generalize their structures and patterns. When measured with respect to the terms used, a power-law distribution, a comparison of terms used as tags and descriptors, social network analysis (including centrality, overall structure, and role equivalent) and frequent-pattern tree analysis, little similarity was found between the mental models of taggers and experts. Twenty-five patterns of path-based rules and 12 identical rules of frequent-pattern trees were shared by taggers and experts. Title-and topic-related keyword categories were the most popular keyword categories used in path-based rules of frequent-pattern trees, and also the most popular members of 25 patterns and the starting point of the 12 identical rules.
With electronic book (e-book) sales and readership rising, are e-books positioned to replace print books? This study examines the preference for e-books and print books in the contexts of reading purpose, reading situation, and contextual variables such as age, gender, education level, race/ethnicity, income, community type, and Internet use. In addition, this study aims to identify factors that contribute to e-book adoption. Participants were a nationally representative sample of 2,986 people in the United States from the Reading Habits Survey, conducted by the Pew Research Center's Internet & American Life Project (http://pewinternet.org/Shared-Content/Data-Sets/2011/December-2011-Reading -Habits.aspx). While the results of this study support the notion that e-books have firmly established a place in people's lives, due to their convenience of access, e-books are not yet positioned to replace print books. Both print books and e-books have unique attributes and serve irreplaceable functions to meet people's reading needs, which may vary by individual demographic, contextual, and situational factors. At this point, the leading significant predictors of e-book adoption are the number of books read, the individual's income, the occurrence and frequency of reading for research topics of interest, and the individual's Internet use, followed by other variables such as race/ethnicity, reading for work/school, age, and education.
Prior studies have shown that articulating and sharing rationales in traditional small-group activities contribute to the maintenance of common ground, members' knowledge awareness, and contribution awareness. It is likely that the importance of articulating and sharing rationales will be increasingly acknowledged in online crowdsourcing because in such a context, large-scale participation is expected with participants often not knowing each other and being flexible about their participation status (e.g., participants may join after the activity has started and leave before it completes), and thus more grounding efforts/support are expected. To better understand the role of shared rationales in online crowdsourcing, three experiments were conducted investigating whether and how rationale awareness affects the ideation crowdsourcing task and idea-evaluation crowdsourcing task based on the findings about the rationale awareness effects in small-group idea-generation activities. The results suggest that one's awareness of previous workers' rationales in the current task can slightly improve the average quality of generated ideas in an iterative approach. In addition, one's evaluation of an idea could be positively or negatively affected by the idea's rationale depending on the quality of the rationales. The results also suggest that showing previous workers' rationales in the ideation task may not be an effective approach for improving the best quality of generated ideas.
Linked open data allow interlinking and integrating any kind of data on the web. Links between various data sources play a key role insofar as they allow software applications (e.g., browsers, search engines) to operate over the aggregated data space as if it was a unique local database. In this new data space, where DBpedia, a data set including structured information from Wikipedia, seems to be the central hub, we analyzed and highlighted outgoing links from this hub in an effort to discover broken links. The paper reports on an experiment to examine the causes of broken links and proposes some treatments for solving this problem.
The role of conference proceedings for scientific communication varies among the different research fields. It is thus difficult to determine how to use them in bibliometric studies that cover all or at least a variety of the research fields without favouring or penalizing observation subjects that are specialized in fields that rely heavily on conference proceedings. Also, the coverage of conference proceedings in bibliometric databases is often unclear. Not only have there been reports of misclassification but also of insufficient coverage. In this study, the Web of Science is used to give an overview of coverage of conference proceedings as well as advantages and pitfalls of their usage in bibliometric analyses. In particular, the focus lies on different citation behaviour of and for conference proceedings and the implications for bibliometric indicators. This is complemented by an investigation of the development of coverage and publication behaviour in conference proceedings which is compared to those of journal publications. Finally, the importance but also drawbacks and opportunities of acknowledging conference proceedings publications for bibliometric studies are summarized.
We collected 382 landmark papers written by 193 Nobel Laureates in physics from 1901 to 2012 and used bibliometric methods, citation frequencies, impact factor (IF), and tendency of the landmark journals to analyze their contents. The results show: (1) Of landmark papers published during 1980-2009, 74.7 % were cited more than 500 times. Average citation frequencies and proportion of highly cited papers were higher for theoretic discoveries than for experimental methods. However, the proportion of highly cited papers in both domains was lower than for an invention. The average test period for the latter was markedly shorter too. (2) Landmark papers by Nobelists were mainly published in journals with IF from 5.0 to 10.0, but journals below IF 5.0 ranked first among all landmark journals. (3) As to countries where landmark papers were published, the Netherlands ranked at the top of the countries with the most landmark journals, besides the United States and England. In addition, the majority of landmark papers written by non-mainstream countries' Nobelists were published in foreign journals with IF < 7.0. These data indicate some regularity and tendency of landmark papers written by Nobelists in physics.
The purpose of this research is to furnish the OR/MS research community with an updated assessment of the discipline's journals set with refinements that also highlight the various characteristics of OR/MS journals. More specifically, we apply a refined PageRank method initially proposed by Xu et al. (2011) to evaluate the top 31 OR/MS journals for 2010, and report our findings. We also report the shifts in the rankings that span 5 years, from 2006 to 2010. We observe that Manufacturing and Service Operations Management, indexed by the SCI only in 2008, is a specialized journal that is consistently highly regarded within the discipline. The rankings also suggest that Management Science is more established as a generalized journal as it has more external impact. In general, our ranking results correlate with expert opinions, and we also observe, report and discuss some interesting patterns that have emerged over the past 5 years from 2006 to 2010.
Scientific research collaboration networks are well-established research topics, which can be divided into two kinds of research paradigms: (1) The topological features of the whole scientific collaboration networks and the collaboration representations in some given fields. (2) The individual nodes' characteristics in the collaboration networks and their endorsements in the networks. However, in the above studies, all the nodes' roles in the scientific collaboration network are the same, all of whom are called collaborators, thus the relationships among all the nodes in the scientific collaboration network are symmetric, and the scientific collaboration network is undirected. Such symmetric roles and relationships in the undirected networks have no incentive effects on the members' participations and efforts in the team's scientific research. In this paper, the roles of team members in the scientific research collaborations are defined, including the scientific research pioneers and contributors, their collaboration relationships are considered from the viewpoint of principal-agent theory, and then the directed scientific collaboration network is built. Then the benefit distribution mechanism in the team members' networked scientific research collaborations is presented, which will encourage the team members with different roles to make their efforts in their scientific research collaborations and improve the quality of scientific research outputs. An example is used to test the above ideas and conclude that the individual member's real outputs not only lie in his/her real scientific research efforts, but also rest with his/her contributions to other members' scientific research.
Defined errors are entered into data collections in order to test their influence on the reliability of multivariate rankings. Random numbers and real ranking data serve as data origins. In the course of data collection small random errors often lead to a switch in ranking, which can influence the general ranking picture considerably. For stabilisation an objective weighting method is evaluated. The robustness of these rankings is then compared to the original forms. Robust forms of the published Shanghai top 100 rankings are calculated and compared to each other. As a result, the possibilities and restrictions of this type of weighting become recognisable.
Within the same research field, different subfields and topics may exhibit varied citation behaviors and scholarly communication patterns. For a more effect scientific evaluation at the topic level, this study proposes a topic-based PageRank approach. This approach aims to evaluate the scientific impact of research entities (e.g., papers, authors, journals, and institutions) at the topic-level. The proposed topic-based PageRank, when applied to a data set on library and information science publications, has effectively detected a variety of research topics and identified authors, papers, and journals of the highest impact from each topic. Evaluation results show that compared with the standard PageRank and a topic modeling technique, the proposed topic-based PageRank has the best performance on relevance and impact. Different perspectives of organizing scientific literature are also discussed and this study recommends the mode of organization that integrates stable research domains and dynamic topics.
This paper probes into the current status of collaboration regarding the field of the Chinese humanities and social sciences in respects of the degree of collaboration and the status of the relationships. It researches the status quo in humanities, the growth of social development science and cross-disciplinary social science, and the maturity of applied social science. In addition, it further highlights the important roles of economics, management, and library and information science in the collaboration network of humanities and social science with their extensive intra-disciplinary cooperation and crucial roles in the whole collaboration network.
This study aims at exploring whether significant inventions are more technologically diversified or have more diverse applications, investigating whether there are any innovation laws existing in R&D activities. Based on technology co-classification analysis, we select patent dataset meets the specific standard from the worldwide patent database named Derwent Innovations Index as sample dataset. Three indicators out of four verify the proposed hypotheses, i.e., significant inventions are more diversified in terms of individual invention. The fourth indicator implies that focusing on some core technology domains maybe better for creating significant inventions when R&D activities are considered as a whole. The results are of great theoretical significance by helping us identifying the diversified characteristic laws of significant inventions; moreover, they are of crucial practical meanings to R&D work and technology innovation activities etc.
An analysis of the number of research papers from India and China in the fields of sciences and engineering between the years 1975 and 2012 is presented. The results show that while Indian research output has increased steadily, the Chinese research output has been increasing at a rate far outpacing that of India. The research output of China has been increasing with distinct inflection points that show an acceleration in output growth. The research output for India shows periodic inflection points that show either an acceleration or deceleration in output growth. The possible reasons for the inflection points are discussed. Simple statistical analyses are used to analyze the trends in output. Although multiple factors affect a nation's research output, this paper highlights that the government programs targeted to increase the research output from universities may create inflection points resulting in a rapid increase in the research output. The article also highlights that India has fallen far behind China in terms of scientific and engineering research output, providing important clues for the future growth of the two countries.
Although the nuclear era and the Cold War superpower competition have long since passed, governments are still investing in Big Science, although these large facilities are nowadays mostly geared towards areas of use closer to utility. Investments in Big Science are also motivated not only by promises of scientific breakthroughs but also by expectations (and demands) of measurable impact, and with an emerging global market of competing user-oriented Big Science facilities, quantitative measures of productivity and quality have become mainstream. Among these are rather simple and one-sided publication counts. This article uses publication counts and figures of expenditure for three cases that are disparate but all represent the state-of-the-art of Big Science of their times, discussing at depth the problems of using simple publication counts as a measure of performance in science. Showing, quite trivially, that Big Science is very expensive, the article also shows the absurd consequences of consistently using simple publication counts to display productivity and quality of Big Science, and concludes that such measures should be deemed irrelevant for analyses on the level of organizations in science and replaced by qualitative assessment of the content of the science produced.
The aim of this paper is to empirically test whether interlinking patterns between higher education institutions (HEIs) conform to a document model, where links are motivated by webpage content, or a social relationship model, where they are markers of underlying social relationships between HEIs. To this aim, we analyzed a sample of approximately 400 European HEIs, using the number of pages on their web domains and the total number of links sent and received; in addition we test whether these two characteristics are associated with organizational size, reputation, and the volume of teaching and research activities. Our main findings are as follows: first, the number of webpages of HEI websites is strongly associated with their size, and to a lesser extent, with the volume of their educational activities, research orientation, and reputation; differences between European countries are rather limited, supporting the insight that the academic Web has reached a mature stage. Second, the distribution of connectivity (as measured by the total degree of HEI's) follows a lognormal distribution typical of social networks between organizations, while counts of weblinks can be predicted with good precision from organizational characteristics. HEIs with larger websites tend to send and receive more links, but the effect is rather limited and does not fundamentally modify the resulting network structure. We conclude that aggregated counts of weblinks between pairs of HEIs are not significantly affected by the web policies of HEIs and thus can be considered as reasonably robust measures. Furthermore, interlinking should be considered as proxies of social relationships between HEIs rather than as reputational measures of the content published on their websites.
Emerging scientific fields are commonly identified by different citation based bibliometric parameters. However, their main shortcoming is the existence of a time lag needed for a publication to receive citations. In the present study, we assessed the relationship between the age of references in scientific publications and the change in publication rate within a research field. Two indices based on the age of references are presented, the relative age of references and the ratio of references published during the preceding 2 years, and applied thereafter on four datasets from the previously published studies, which assessed eutrophication research, sturgeon research, fisheries research, and the general field of ecology. We observed a consistent pattern that the emerging research topics had a lower median age of references and a higher ratio of references published in the preceding 2 years than their respective general research fields. The main advantage of indices based on the age of references is that they are not influenced by a time lag, and as such they are able to provide insight into current scientific trends. The best potential of the presented indices is to use them combined with other approaches, as each one can reveal different aspects and properties of the assessed data, and provide validation of the obtained results. Their use should be however assessed further before they are employed as standard tools by scientists, science managers and policy makers.
Due to recession in the world economy there is a trend towards a reduction in growth of R&D expenditure in the G7 countries. At the same time countries like China and Korea are investing more in scientific research. We compare the differences in the inputs to science for different countries and explore the level of efficiency in the output of scientific papers with respect to inputs such as manpower and investment. We find that the EU countries are relatively more efficient than Japan, the USA and also China and Korea so far as the production of papers is concerned. However, if efficiency is considered in terms of patents, Japan Korea and the USA are ahead. We compare our results with Albuquerque's model linking patent to paper ratios and development, and find significant deviations for some countries. We deduce that there has been a shift from publishing towards patenting in certain countries and link it to high contribution from the business sector to R&D expenditure. Preliminary results of this analysis have been presented in Basu (In Proceedings of the 14th International Society for Scientometrics and Informetrics (ISSI) Conference, 2013).
The purpose of this paper is to highlight the weakness of innovative activities and guide the improvement of innovation efficiency at country-level through carefully comparing innovation efficiency across countries. Following the conceptual framework which divides innovation processes into knowledge production process (KPP) and knowledge commercialization process (KCP) and applying dual network-DEA models, this paper tries to take economic benefit of R&D outputs into account. Moreover, we construct the production frontier of the innovation processes and two component processes under different assumptions (e.g., constant returns-to-scale, variable returns-to-scale and non-increasing returns-to-scale) for 35 countries over the period 2007-2011. Based on the production frontier, we do not only estimate technical efficiency and scale efficiency for each country but also investigate and verify whether returns-to-scale of each country are decreasing or increasing. Furthermore, we add together the radial movement and the slack movement to acquire input redundancy. We decompose the input redundancy into two parts: redundancy caused by technical inefficiency (R_TI) and redundancy caused by scale inefficiency (R_SI), and carry out a detail analysis of the input redundancy. We find specific circumstances of inefficiency and redundancy vary with the different countries' characteristics and development stages. Moreover, innovation efficiency statistically mainly depends on the KCP efficiency. In particular, the study reveals that China suffers scale inefficiency is attributed to insufficient macro-level coordination, malfunctioning funding system, and flawed evaluations and incentives. Finally, public policy implications are proposed for the inefficient countries.
This paper compared and contrasted patent counts by examining the inventor country and the assignee country. An empirical analysis of the patent data revealed how assignment principles (i.e. by the inventor country and by the assignee county) and counting methods (i.e. whole counts, first country and fractional counts) generate different results. Quadrant diagrams were utilised to present the patent data of the 33 selected countries. When countries had similar patent counts by inventor country with patent counts by assignee country, all the countries allocated along the diagonal line in the quadrant diagram were developed countries. When countries had more patent counts by inventor than by assignee, developed countries were more likely to sit in the right upper section of the quadrant diagram, while more developing countries were situated in the left lower section. Countries with higher patent counts by assignee than by inventor were more likely to be tax havens. A significant contribution of this paper resides in the recommendation that patent counts be analysed using both the inventor country and the assignee country at the same time if meaningful implications from patent statistics are to be obtained.
Delayed recognition refers to the phenomenon where papers did not achieve recognition in terms of citations until some years after their original publication. A paper with delayed recognition was termed a "sleeping beauty": a princess sleeps (goes unnoticed) for a long time and then, almost suddenly, is awakened (receives a lot of citations) by a prince (another article). There are a sleeping period and an awakening period in the definition of a "sleeping beauty". Apart from and prior to the two periods, an awaking period was found in citation curves of some publications, "sleeping beauties" was hence expanded to "all-elements-sleeping-beauties". The opposite effect of "delayed recognition" was described as "flash in the pan": documents that were noticed immediately after publication but did not seem to have a lasting impact. In this work, we briefly discussed the citation curves of two remarkable "all-elements-sleeping-beauties". We found they appeared "flash in the pan" first and then "delayed recognition". We also found happy endings of sleeping beauties and princes, and hence suggest the citation curve of an "all-elements-sleeping-beauty" include an awaking period, a sleeping period, an awakening period and a happy ending.
This paper describes the range and variation in access and use control policies and tools used by 24 web-based data repositories across a variety of fields. It also describes the rationale provided by repositories for their decisions to control data or provide means for depositors to do so. Using a purposive exploratory sample, we employed content analysis of repository website documentation, a web survey of repository managers, and selected follow-up interviews to generate data. Our results describe the range and variation in access and use control policies and tools employed, identifying both commonalities and distinctions across repositories. Using concepts from commons theory as a guiding theoretical framework, our analysis describes the following five dimensions of repository rules, or data commons boundaries: locus of decision making (depositor vs. repository), degree of variation in terms of use within the repository, the mission of the repository in relation to its scholarly field, what use means in relation to specific sorts of data, and types of exclusion.
This article reports the results of a study that examined relationships between primary emotions, secondary emotions, and mood in the online information search context. During the experiment, participants were asked to search Google to obtain information on the two given search tasks. Participants' primary emotions were inferred from analysis of their facial expressions, data on secondary emotions were obtained through participant interviews, and mood was measured using the Positive Affect Negative Affect Scale (PANAS; Watson, Clark, & Tellegen, 1988) prior, during, and after the search. The search process was represented by the collection of search actions, search performance, and search outcome quality variables. The findings suggest existence of direct relationships between primary emotions and search actions, which in turn imply the possibility of inferring emotions from search actions and vice versa. The link between secondary emotions and searchers' evaluative judgments, and lack of evidence of any relationships between secondary emotions and other search process variables, point to the strengths and weaknesses of self-reported emotion measures in understanding searchers' affective experiences. Our study did not find strong relationships between mood and search process and outcomes, indicating that while mood can have a limited effect on search activities, it is a relatively stable and long-lasting state that cannot be easily altered by the search experience and, in turn, cannot significantly affect the search. The article proposes a model of relationships between emotions, mood, and several facets of the search process. Directions for future work are also discussed.
This article reports a research study about historians' experiences using digital archival collections for research articles that they published in the American Historical Review. We contacted these authors to ask about their research processes, with regard to digital archival collections, and their perceptions of the usefulness of digital archival collections to historical research. This study presents a realistic portrayal of the uses and impacts of digital primary sources from the perspectives of historians who use digital collections for their research projects. The findings from this study indicate that digital archival collections are important source materials for historical studies for various reasons. However, the amount of authority digital materials possess as historical resources was disputed. Many historians preferred documents in their original form, but historians' preferences began to change as they increasingly consulted digital formats. As the web has developed into an important research platform, historians have adopted different research patterns, one of which is using random web searches to find digital primary sources. Historians' understandings of the use of digital archival collections revealed a spectrum of activities including finding, understanding, interpreting, and citing digital information. Historians in this study worked concurrently on multiple studies or on a larger project for a book, and each of their searches for digital collections had the potential to provide them with useful results for several research studies.
Are our memories of the world well described by the international news coverage in our country? If so, sources central to international news may also be central to international recall patterns; in particular, they may reflect an American-centric focus, given the previously proposed central U.S. position in the news marketplace. We asked people of four different nationalities (China, Israel, Switzerland, and the United States) to list all the countries they could name. We also constructed a network representation of the world for each nation based on the co-occurrence pattern of countries in the news. To compare news and memories, we developed a computational model that predicts the recall order of countries based on the news networks. Consistent with previous reports, the U.S. news was central to the news networks overall. However, although national recall patterns reflected their corresponding national news sources, the Chinese news was substantially better than other national news sources at predicting both individual and aggregate memories across nations. Our results suggest that news and memories are related but may also reflect biases in the way information is transferred to long-term memory, potentially biased against the transient coverage of more free presses. We discuss possible explanations for this Chinese news effect in relation to prominent cognitive and communications theories.
Traditional citation analysis has been widely applied to detect patterns of scientific collaboration, map the landscapes of scholarly disciplines, assess the impact of research outputs, and observe knowledge transfer across domains. It is, however, limited, as it assumes all citations are of similar value and weights each equally. Content-based citation analysis (CCA) addresses a citation's value by interpreting each one based on its context at both the syntactic and semantic levels. This paper provides a comprehensive overview of CAA research in terms of its theoretical foundations, methodical approaches, and example applications. In addition, we highlight how increased computational capabilities and publicly available full-text resources have opened this area of research to vast possibilities, which enable deeper citation analysis, more accurate citation prediction, and increased knowledge discovery.
The acknowledgments in scientific publications are an important feature in the scholarly communication process. This research analyzes funding acknowledgment presence in scientific publications and introduces a novel approach for discovering text patterns by discipline in the acknowledgment section of papers. First, the presence of acknowledgments in 38,257 English-language papers published by Spanish researchers in 2010 is studied by subject area on the basis of the funding acknowledgment information available in the Web of Science database. Funding acknowledgments are present in two thirds of Spanish articles, with significant differences by subject area, number of authors, impact factor of journals, and, in one specific area, basic/applied nature of research. Second, the existence of specific acknowledgment patterns in English-language papers of Spanish researchers in 4 selected subject categories (cardiac and cardiovascular systems, economics, evolutionary biology, and statistics and probability) is explored through a combination of text mining and multivariate analyses. Peer interactive communication predominates in the more theoretical or social-oriented fields (statistics and probability, economics), whereas the recognition of technical assistance is more common in experimental research (evolutionary biology), and the mention of potential conflicts of interest emerges forcefully in the clinical field (cardiac and cardiovascular systems). The systematic inclusion of structured data about acknowledgments in journal articles and bibliographic databases would have a positive impact on the study of collaboration practices in science.
Marketing professionals' work activities are heavily reliant on access to and the use of large amounts of quality information. This study aims to examine the information journey experienced by marketing professionals, including task-driven information seeking, information judgments, information use, and information sharing, from a more contextualized and holistic viewpoint. The information journey presents a more comprehensive picture of user-information interaction than is usually offered in the literature. Using a diary method and post-diary in-depth interviews, data consisting of 1,198 diary entries relating to 101 real work tasks were collected over a period of 5 work days. The data were used to ascertain characteristics of the stages of marketing professionals' information journeys as well as the relationships between them. Five stages of the information journey, including determining the need for work task-generated information, seeking such information, judging and evaluating the information found, making sense of and using the obtained information, and sharing the obtained or assembled information, were identified. The information journey also encompassed types of gaps and gap-bridge techniques that occurred during information seeking and use. Based on the empirical findings, an information journey model was developed. The implications for information systems design solutions that enable different stages of the information journey to be linked together are also discussed.
A substantial fraction of web search queries contain references to entities, such as persons, organizations, and locations. Recently, methods that exploit named entities have been shown to be more effective for query expansion than traditional pseudorelevance feedback methods. In this article, we introduce a supervised learning approach that exploits named entities for query expansion using Wikipedia as a repository of high-quality feedback documents. In contrast with existing entity-oriented pseudorelevance feedback approaches, we tackle query expansion as a learning-to-rank problem. As a result, not only do we select effective expansion terms but we also weigh these terms according to their predicted effectiveness. To this end, we exploit the rich structure of Wikipedia articles to devise discriminative term features, including each candidate term's proximity to the original query terms, as well as its frequency across multiple article fields and in category and infobox descriptors. Experiments on three Text REtrieval Conference web test collections attest the effectiveness of our approach, with gains of up to 23.32% in terms of mean average precision, 19.49% in terms of precision at 10, and 7.86% in terms of normalized discounted cumulative gain compared with a state-of-the-art approach for entity-oriented query expansion.
Evaluating collections of XML documents without paying attention to the schema they were written in may give interesting insights into the expected characteristics of a markup language, as well as any regularity that may span vocabularies and languages, and that are more fundamental and frequent than plain content models. In this paper we explore the idea of structural patterns in XML vocabularies, by examining the characteristics of elements as they are used, rather than as they are defined. We introduce from the ground up a formal theory of 8 plus 3 structural patterns for XML elements, and verify their identifiability in a number of different XML vocabularies. The results allowed the creation of visualization and content extraction tools that are completely independent of the schema and without any previous knowledge of the semantics and organization of the XML vocabulary of the documents.
Terminology registries (TRs) are a crucial element of the infrastructure required for resource discovery services, digital libraries, Linked Data, and semantic interoperability generally. They can make the content of knowledge organization systems (KOS) available both for human and machine access. The paper describes the attributes and functionality for a TR, based on a review of published literature, existing TRs, and a survey of experts. A domain model based on user tasks is constructed and a set of core metadata elements for use in TRs is proposed. Ideally, the TR should allow searching as well as browsing for a KOS, matching a user's search while also providing information about existing terminology services, accessible to both humans and machines. The issues surrounding metadata for KOS are also discussed, together with the rationale for different aspects and the importance of a core set of KOS metadata for future machine-based access; a possible core set of metadata elements is proposed. This is dealt with in terms of practical experience and in relation to the Dublin Core Application Profile.
Patent analysis has become important for management as it offers timely and valuable information to evaluate R&D performance and identify the prospects of patents. This study explores the scattering patterns of patent impact based on citations in 3 distinct technological areas, the liquid crystal, semiconductor, and drug technological areas, to identify the core patents in each area. The research follows the approach from Bradford's law, which equally divides total citations into 3 zones. While the result suggests that the scattering of patent citations corresponded with features of Bradford's law, the proportion of patents in the 3 zones did not match the proportion as proposed by the law. As a result, the study shows that the distributions of citations in all 3 areas were more concentrated than what Bradford's law proposed. The Groos (1967) droop was also presented by the scattering of patent citations, and the growth rate of cumulative citation decreased in the third zone.
Literature citation analysis plays a very important role in bibliometrics and scientometrics, such as the Science Citation Index (SCI) impact factor, h-index. Existing citation analysis methods assume that all citations in a paper are equally important, and they simply count the number of citations. Here we argue that the citations in a paper are not equally important and some citations are more important than the others. We use a strength value to assess the importance of each citation and propose to use the regression method with a few useful features for automatically estimating the strength value of each citation. Evaluation results on a manually labeled data set in the computer science field show that the estimated values can achieve good correlation with human-labeled values. We further apply the estimated citation strength values for evaluating paper influence and author influence, and the preliminary evaluation results demonstrate the usefulness of the citation strength values.
Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
The extent to which an article attracts citations has long been of interest. However, recent research has emphasized not just the receipt but also the pacing of citation. Citation speed has been shown to be affected by journal prestige and self-citation but also public funding of research. Amidst these viewpoints, this paper explores the speed of article citation of a multi-institutional, multi-disciplinary publicly funded research center relative to that of a comparison group of articles. Results indicate that articles by authors affiliated with the center are significantly more likely to have early-cited papers within the year of publication than the random comparison group, with controls by field also being significant. Implications for the ability of a publicly funded center to attract attention toward articles are discussed.
Literature-related discovery (LRD) is the linking of two or more literature concepts that have heretofore not been linked (i.e., disjoint), in order to produce novel, interesting, and intelligible knowledge (i.e., potential discovery). The mainstream software for assisting LRD is Arrowsmith. It uses text-based linkage to connect two disjoint literatures, and it generates intermediate linking literatures by matching Title phrases from two disjoint literatures (literatures that do not share common records). Arrowsmith then prioritizes these linking phrases through a series of text-based filters. The present study examines citation-based linkage in addition to text-based linkage to link disjoint literatures through a process called bibliographic coupling. Two disjoint literatures were selected for the demonstration: Parkinson's Disease (PD) (neurodegeneration) and Crohn's Disease (CD) (autoimmune). Three cases were examined: (1) matching phrases in records with no shared references (text-based linkage only); (2) shared references in records with no matching phrases (citation-based linkage only); (3) matching phrases in records with shared references (text-based and citation-based linkages). In addition, the main themes in the body of shared references were examined through grouping techniques to identify the common themes between the two literatures. All the high-level concepts in the Case 1) records could be found in Case 3) records Some new concepts (at the sub-set level of the main themes) not found in the Case 3) records were identified in the Case 2) records. The synergy of matching phrases and shared references provides a strong prioritization to the selection of promising matching phrases as discovery mechanisms. There were three major themes that unified the PD and CD literatures: Genetics; Neuroimmunology; Cell Death. However, these themes are not completely independent. For example, there are genetic determinants of the inflammatory response. Naturally occurring genetic variants in important inflammatory mediators such as TNF-alpha appear to alter inflammatory responses in numerous experimental and a few clinical models of inflammation. Additionally, there is a strong link between neuroimmunology and cell death. In PD, for example, neuroinflammatory processes that are mediated by activated glial and peripheral immune cells might eventually lead to dopaminergic cell death and subsequent disease progression.
The present paper tries to show that the current state of the art in syntactics and semantics, in computer systems based on the theory of inventive problem solving known as TRIZ, may help in the task of literature based discovery. With a structured and logic cause linkage between concepts, LBD could be faster and with less expert involvement at the beginning of the LBD process. The author tries to demonstrate the concept with two different problems: the hearing and balance problem known as Meniere's disease, and to some of the current problems in the lithium air batteries for electric vehicles. By using open literature based discovery from An to Bn and from Bn to Cn, and with the logic relationships of real causes and effects approach, the author finds several relative new concepts such as vitamin A. Other concepts as niacin or fish oil, are also found, as potential to help in the Meniere's disease. Secondly, using such procedure the author is able to find patents from disparate domain of expertise, as patents about odor control or metal casting.
Progress on the development of nanotechnology has led to a number of initiatives which serve to normalize activities in this area. Among emerging technologies, nanotechnology is one of the most prominent, and it raises high expectations in a wide range of areas affecting daily life. The risks to human health, the pathways of exposure to nanomaterials, and occupational safety are recent issues which require more attention. The study was performed on nanopatents by collecting, processing and analyzing information extracted from specialized patent databases covering the period from 1991 to 2011, totalling 1,343 patents and representing 36 countries. These patents were classified by the International Patent Classification, using the methodology proposed in a study published by Organization for Economic Co-operation and Development, which resulted in six groups of patents, distributed as follows: nanomaterials (40.3 %), medicine and biotechnology (26.6 %), measurement and production (10 %), electronics (2.7 %), energy and the environment (2.2 %), and optical electronics (1 %). Around 17 % of the patents in question did not fall into the adopted classification. The aim of this paper is to analyze the main trends of patenting related to nanotechnology, its development and environmental implications. An additional goal is to assist policy-makers to adjust the regulatory framework on nanotechnology, and to make recommendations for governments, industry, and national organizations, on creating specific subsidies for regulatory framework in Brazil.
Research that integrates the social and natural sciences is vital to address many societal challenges, yet is difficult to arrange, conduct, and disseminate. This paper compares diffusion of the research supported by a unique U.S. National Science Foundation program on Human and Social Dynamics ("HSD") with a matched group of heavily cited papers. We offer a measure of the distance of cites between the Web of Science Category ("WoSC") in which a publication appears and the WoSC of the journal citing it, and find that HSD publications are cited more distantly than are comparison publications. We provide another measure-citation velocity-finding that HSD publications are cited with similar lag times as are the comparison papers. These basic citation distance and velocity measures enrich analyses of research knowledge diffusion patterns.
Increased competition due to rapid technological development pushes all participants in the market to focus on the prospect of New and Emerging Science & Technologies (NESTs). One promising NEST, dye-sensitized solar cells (DSSCs), has attracted attention in recent years. We focus on three research questions: how can we estimate DSSCs research activity trends; how can we identify DSSCs market expansion patterns; and, seeking to identify potential subsystems, what are the likely evolutionary paths of DSSCs development? In this paper, patent analysis is applied to help determine the developmental stage of a particular technology and trace its potential evolutionary pathways. In addition, since patent information can reflect commercial degree, we use patent transfer patterns to help evaluate market shift prospects.
Patent activity in China for vibration-reduction control technology in high-speed railway vehicle systems was analyzed based on a portfolio of 193 patents or applications from the State Intellectual Property Office of the People's Republic of China official Web-based database and a search of the World Intellectual Property Organization PCT database. Patent activity features such as timing, applicant, technology classification, technical themes, and patents in force were obtained and analyzed. As a further stage of research, patent data on locomotive wheel sets were analyzed by means of a matrix analysis of problems and technologies. The main statistical information and conclusions include estimating the development stage, discovering the distributions of applications and applicants, weighing the roles played by major applications, determining R&D hotspots, and providing a better understanding of domestic patent activities in this field. Policy implications for innovation-related domestic R&D institutions in the technologies under study were proposed based on the analytical results.
As the National Science Foundation (NSF) implements new cross-cutting initiatives and programs, interest in assessing the success of these experiments in fostering interdisciplinarity grows. A primary challenge in measuring interdisciplinarity is identifying and bounding the discrete disciplines that comprise interdisciplinary work. Using statistical text-mining techniques to extract topic bins, the NSF recently developed a topic map of all of their awards issued between 2000 and 2011. These new data provide a novel means for measuring interdisciplinarity by assessing the language or content of award proposals. Using the Directorate for Social, Behavioral, and Economic Sciences as a case study and drawing on the new topic model of the NSF's awards, this paper explores new methods for quantifying interdisciplinarity in the NSF portfolio.
We report progress on new developments in the breakthrough paper indicator, which allows early selection of a small group of publications which may become potential breakthrough candidates based on dynamics of publication citations and certain qualitative characteristics of citations. We used a quantitative approach to identify typical citation patterns of highly cited papers. Based on these analyses, we propose two forecasting models to select groups of breakthrough paper candidates that exceed high citation thresholds five years post-publication. Here we study whether interdisciplinarity in the subject categories or geographical diversity serve as possible measures to improve ranking of breakthrough paper candidates. We found that ranked geographical diversities of known breakthrough papers have equal or better ranks than corresponding citations ranks. This allows us to apply additional filtering for better identifications of breakthrough candidates. We studied several interdisciplinarity indices, including richness, Shannon index, Simpson index, and Rao-Stirling-Porter index. We did not find any correlations between citation ranks and ranked interdisciplinarity indices.
Topic modeling is a type of statistical model for discovering the latent "topics'' that occur in a collection of documents through machine learning. Currently, latent Dirichlet allocation (LDA) is a popular and common modeling approach. In this paper, we investigate methods, including LDA and its extensions, for separating a set of scientific publications into several clusters. To evaluate the results, we generate a collection of documents that contain academic papers from several different fields and see whether papers in the same field will be clustered together. We explore potential scientometric applications of such text analysis capabilities.
A knowledge organization system (KOS) can help easily indicate the deep knowledge structure of a patent document set. Compared to classification code systems, a personalized KOS made up of topics can represent the technology information in a more agile, detailed manner. This paper presents an approach to automatically construct a KOS of patent documents based on term clumping, Latent Dirichlet Allocation (LDA) model, K-Means clustering and Principal Components Analysis (PCA). Term clumping is adopted to generate a better bag-of-words for topic modeling and LDA model is applied to generate raw topics. Then by iteratively using K-Means clustering and PCA on the document set and topics matrix, we generated new upper topics and computed the relationships between topics to construct a KOS. Finally, documents are mapped to the KOS. The nodes of the KOS are topics which are represented by terms and their weights and the leaves are patent documents. We evaluated the approach with a set of Large Aperture Optical Elements (LAOE) patent documents as an empirical study and constructed the LAOE KOS. The method used discovered the deep semantic relationships between the topics and helped better describe the technology themes of LAOE. Based on the KOS, two types of applications were implemented: the automatic classification of patents documents and the categorical refinements above search results.
This paper analyzes several well-known bibliometric indices using an axiomatic approach. We concentrate on indices aiming at capturing the global impact of a scientific output and do not investigate indices aiming at capturing an average impact. Hence, the indices that we study are designed to evaluate authors or groups of authors but not journals. The bibliometric indices that are studied include classic ones such as the number of highly cited papers as well as more recent ones such as the h-index and the g-index. We give conditions that characterize these indices, up to the multiplication by a positive constant. We also study the bibliometric rankings that are induced by these indices. Hence, we provide a general framework for the comparison of bibliometric rankings and indices. (C) 2014 Elsevier Ltd. All rights reserved.
Most current h-type indicators use only a single number to measure a scientist's productivity and impact of his/her published works. Although a single number is simple to calculate, it fails to outline his/her academic performance varying with time. We empirically study the basic h-index sequence for cumulative publications with consideration of the yearly citation performance (for convenience, referred as L-Sequence). L-Sequence consists of a series of L factors. Based on the citations received in the corresponding individual year, every factor along a scientist's career span is calculated by using the h index formula. Thus L-Sequence shows the scientist's dynamic research trajectory and provides insight into his/her scientific performance at different periods. Furthermore, L proportional to, summing up all factors of L-Sequence, is for the evaluation of the whole research career as alternative to other h-index variants. Importantly, the partial factors of the L-Sequence can be adapted for different evaluation tasks. Moreover, L-Sequence could be used to highlight outstanding scientists in a specific period whose research interests can be used to study the history and trends of a specific discipline. (C) 2014 Elsevier Ltd. All rights reserved.
We first introduced interesting definitions of "heartbeat" and "heartbeat spectrum" for "sleeping beauties", based on van Raan's variables. Then, we investigated 58,963 papers of Nobel laureates during 1900-2000 and found 758 sleeping beauties. By proposing and using G(s) index, an adjustment of Gini coefficient, to measure the inequality of "heartbeat spectrum", we observed that publications which possess "late heartbeats" (most citations were received in the second half of sleeping period) have higher awakening probability than those have "early heartbeats" (most citations were received in the first half of sleeping period). The awakening probability appears the highest if an article's G(s) index exists in the interval [0.2, 0.6). (C) 2014 Elsevier Ltd. All rights reserved.
Relative Specialization Index (RSI) was introduced as a simple transformation of the Activity Index (AI), the aim of this transformation being standardization of AI, and therefore more straightforward interpretation. RSI is believed to have values between -1 and 1, with -1 meaning no activity of the country (institution) in a certain scientific field, and 1 meaning that the country is only active in the given field. While it is obvious from the definition of RSI that it can never be 1, it is less obvious, and essentially unknown, that its upper limit can be quite far from 1, depending on the scientific field. This is a consequence of the fact that AI has different upper limits for different scientific fields. This means that comparisons of RSIs, or AIs, across fields can be misleading. We therefore believe that RSI should not be used at all. We also show how an appropriate standardization of AI can be achieved. (C) 2014 Elsevier Ltd. All rights reserved.
A recent paper (Canavero et al., 2014. Journal of the American Society for Information Science and Technology, doi:10.1109/TPC.2013.2255935) performed a bibliometric analysis of an extensive set of scientific journals within the Engineering field, published by IEEE (Institute of Electrical and Electronics Engineers). The analysis was based on (i) the citation impact of journal articles and (ii) the reputation of journal authors in terms of total scientific production and relevant citation impact. The goal of this paper is to complement the prior analysis, investigating on the different citation cultures of these journals, depending on the sub-field/specialty of interest. To perform this evaluation, it is suggested a novel technique, which takes into account the connections between journals and some highly specialized communities of scientists, known as IEEE Technical Societies and Councils. After showing significant differences in terms of propensity to cite, probably attributable to the large variety of sub-fields and specialties covered by IEEE journals, it is presented a simplified technique for the sub-field normalization of the results of the prior study. The main contribution of this work is (1) providing an empirical confirmation of the complexity of the problem of normalization, even for journals within the same field but different sub-fields/specialties, and (2) showing how the use of highly specialized information on a journal reference sub-field(s) may be helpful for improving the estimation of the journal propensity to cite. Description is supported by a large amount of empirical data. (C) 2014 Elsevier Ltd. All rights reserved.
The aim of this study is to analyze some properties of the distribution of journals that are cited in the h-core of citing journals listed in the Journal Citation Reports. Data were obtained from the 2011 edition of JCR available for universities in Spain. The citing journal matrix available in JCR was used to identify the cited journals that appear most frequently in the h-core. The results show that about 70% of citing journals occupy positions other than the first one in the set of journals cited by them. Some properties of the distribution of cited journals that appear in the h-core are also studied, such as the cost, in terms of citations, of occupying a given position, and the spectrum of positions (distribution of frequencies with which a given cited journal appears in different positions). The measures calculated here could be used to define new scientometric indicators. (C) 2014 Elsevier Ltd. All rights reserved.
A new link-based document ranking framework is devised with at its heart, a contents and time sensitive random literature explorer designed to more accurately model the behaviour of readers of scientific documents. In particular, our ranking framework dynamically adjusts its random walk parameters according to both contents and age of encountered documents, thus incorporating the diversity of topics and how they evolve over time into the score of a scientific publication. Our random walk framework results in a ranking of scientific documents which is shown to be more effective in facilitating literature exploration than PageRank measured against a proxy gold standard based on papers' potential usefulness in facilitating later research. One of its many strengths lies in its practical value in reliably retrieving and placing promisingly useful papers at the top of its ranking. (C) 2014 Elsevier Ltd. All rights reserved.
How does the published scientific literature used by scientific community? Many previous studies make analysis on the static usage data. In this research, we propose the concept of dynamic usage data. Based on the platform of realtime.springer.com, we have been monitoring and recording the dynamic usage data of Scientometrics articles round the clock. Our analysis find that papers published in recent four years have many more downloads than papers published four years ago. According to our quantitative calculation, papers downloaded on one day have an average lifetime of 4.1 years approximately. Classic papers are still being downloaded frequently even long after their publication. Additionally, we find that social media may reboot the attention of old scientific literature in a short time. (C) 2014 Elsevier Ltd. All rights reserved.
Cited non-source documents such as articles from regional journals, conference papers, books and book chapters, working papers and reports have begun to attract more attention in the literature. Most of this attention has been directed at understanding the effects of including non-source items in research evaluation. In contrast, little work has been done to examine the effects of including non-source items on science maps and on the structure of science as reflected by those maps. In this study we compare two direct citation maps of a 16-year set of Scopus documents - one that includes only source documents, and one that includes non-source documents along with the source documents. In addition to more than doubling the contents of the map, from 19 M to 43 M documents, the inclusion of non-source items strongly augments the social sciences relative to the natural sciences and medicine and makes their position in the map more central. Books are also found to play a significant role in the map, and are much more highly cited on average than articles. (C) 2014 Elsevier Ltd. All rights reserved.
Bornmann, Stefaner, de Moya Anegon, and Mutz (2014) have introduced a web application (www,excellencemappingmet) which is linked to both academic ranking lists published hitherto (e.g. the Academic Ranking of World Universities) as well as spatial visualization approaches. The web application visualizes institutional performance within specific subject areas as ranking lists and on custom tile-based maps. The new, substantially enhanced version of the web application and the generalized linear mixed model for binomial data on which it is based are described in this paper. Scopus data are used which have been collected for the SCImago Institutions Ranking. Only those universities and research-focused institutions are considered that have published at least 500 articles, reviews and conference papers in the period 2006-2010 in a certain Scopus subject area. In the enhanced version, the effect of single covariates (such as the per capita GDP of a country in which an institution is located) on two performance metrics (best paper rate and best journal rate) is examined and visualized. A covariate-adjusted ranking and mapping of the institutions is produced in which the single covariates are held constant. The results on the performance of institutions can then be interpreted as if the institutions all had the same value (reference point) for the covariate in question. For example, those institutions can be identified worldwide showing a very good performance despite a bad financial situation in the corresponding country. (C) 2014 Elsevier Ltd. All rights reserved.
This study presents a unique approach in investigating the knowledge diffusion structure for the field of data quality through an analysis of the main paths. We study a dataset of 1880 papers to explore the knowledge diffusion path, using citation data to build the citation network. The main paths are then investigated and visualized via social network analysis. This paper takes three different main path analyses, namely local, global, and key-route, to depict the knowledge diffusion path and additionally implements the g-index and h-index to evaluate the most important journals and researchers in the data quality domain. (C) 2014 Elsevier Ltd. All rights reserved.
The percentages of shares of world publications of the European Union and its member states, China, and the United States have been represented differently as a result of using different databases. An analytical variant of the Web-of-Science (of Thomson Reuters) enables us to study the dynamics in the world publication system in terms of the field-normalized top-1% and top-10% most-frequently cited publications. Comparing the EU28, USA, and China at the global level shows a top-level dynamic that is different from the analysis in terms of shares of publications: the United States remains far more productive in the top-1% of all papers; China drops out of the competition for elite status; and the EU28 increased its share among the top-cited papers from 2000 to 2010. Some of the EU28 member states overtook the United States during this decade; but a clear divide remains between EU15 (Western Europe) and the Accession Countries. Network analysis shows that China was embedded in this top-layer of internationally co-authored publications. These publications often involve more than a single European nation. (C) 2014 Elsevier Ltd. All rights reserved.
Equalizing bias (EqB) is a systematic inaccuracy which arises when authorship credit is divided equally among coauthors who have not contributed equally. As the number of coauthors increases, the diminishing amount of credit allocated to each additional coauthor is increasingly composed of equalizing bias such that when the total number of coauthors exceeds 12, the credit score of most coauthors is composed mostly of EqB. In general, EqB reverses the byline hierarchy and skews bibliometric assessments by underestimating the contribution of primary authors, i.e. those adversely affected by negative EqB, and overestimating the contribution of secondary authors, those benefitting from positive EqB. The positive and negative effects of EqB are balanced and sum to zero, but are not symmetrical. The lack of symmetry exacerbates the relative effects of EqB, and explains why primary authors are increasingly outnumbered by secondary authors as the number of coauthors increases. Specifically, for a paper with 50 coauthors, the benefit of positive EqB goes to 39 secondary authors while the burden of negative EqB befalls 11 primary authors. Relative to harmonic estimates of their actual contribution, the EqB of the 50 coauthors ranged from < -90% to > 350%. Senior authorship, when it occurs, is conventionally indicated by a corresponding last author and recognized as being on a par with a first author. If senior authorship is not recognized, then the credit lost by an unrecognized senior author is distributed among the other coauthors as part of their EqB. The powerful distortional effect of EqB is compounded in bibliometric indices and performance rankings derived from biased equal credit. Equalizing bias must therefore be corrected at the source by ensuring accurate accreditation of all coauthors prior to the calculation of aggregate publication metrics. (C) 2014 The Authors. Published by Elsevier Ltd.
This paper investigates directional returns to scale (RTS) and illustrates this approach by studying biological institutes of the Chinese Academy of Sciences (CAS). Using the following input-output indicators are proposed: senior professional and technical staffs, middle level and junior professional and technical staffs, research expenditure on personnel salaries and other expenditures, SCI papers, high-quality papers, graduates training and intellectual properties, the paper uses the methods recently proposed by Yang to analyze the directional returns to scale and the effect of directional congestion of biological institutes in Chinese Academy of Sciences. Based on our analysis we come to the following findings: (1) we detect the regime of directional returns to scale (increasing, constant, decreasing) for each biological institute. This information can be used as the basis for decision-making about organizational adjustment; (2) congestion and directional congestion occurs in several biological institutes. In such cases the outputs of these institutes decrease when the inputs increase. Such institutes should analyze the underlying reason for the occurrence of congestion so that S&T resources can be used more efficiently. (C) 2014 Elsevier Ltd. All rights reserved.
In this work we address the comprehensive Scimago Institutions Ranking 2012, proposing a data visualization of the listed bibliometric indicators for the 509 Higher Education Institutions among the 600 largest research institutions ranked according to their outputs. We focus on research impact, internationalization and leadership indicators, which became important benchmarks in a worldwide discussion about research quality and impact policies for universities. Our data visualization reveals a qualitative difference between the behavior of Northern American and Western European Higher Education Institutions concerning International collaboration levels. Chinese universities show still a systematic low international collaboration levels which are positively linked to the low research impact. The data suggests that research impact can be related directly to internationalization only to rather low values for both indicators. Above world average, other determinants may become relevant in fostering further impact. The leadership indicator provides further insights to the collaborative environment of universities in different geographical regions, as well as the optimized collaboration portfolio for enhancing research impact. (C) 2014 Elsevier Ltd. All rights reserved.
Preferential Attachment (PA) models the scientific citation process. In the PA model, a new paper attaches itself to the citation network based only on the popularity of the currently existing papers. This invariably leads to a network whose degree distribution satisfies the Power Law. Yet, empirical results show that paper age should also play a role in the citation process. In other words, when references are chosen for a new paper, the age of an existing paper may also affect the choice for citing. In this paper, we derive a generalized PA model that includes the effect of aging, with analytical solution. Such a model can be used to analyze the competing influence of preferential attachment and aging effect quantitatively in citation process and explain differences in various research domains by the extent of aging. It may also serve as a general model of network formation. (C) 2014 Elsevier Ltd. All rights reserved.
In a recent paper, Chambers and Miller introduced two fundamental axioms for scientific research indices. We perform a detailed analysis of these two axioms, thereby providing clean combinatorial characterizations of the research indices that satisfy these axioms and of the so-called step-based indices. We single out the staircase indices as a particularly simple subfamily of the step-based indices, and we provide a simple axiomatic characterization for them. (C) 2014 Elsevier Ltd. All rights reserved.
Previous research shows that researchers' social network metrics obtained from a collaborative output network (e.g., joint publications or co-authorship network) impact their performance determined by g-index. We use a richer dataset to show that a scholar's performance should be considered with respect to position in multiple networks. Previous research using only the network of researchers' joint publications shows that a researcher's distinct connections to other researchers, a researcher's number of repeated collaborative outputs, and a researchers' redundant connections to a group of researchers who are themselves well-connected has a positive impact on the researchers' performance, while a researcher's tendency to connect with other researchers who are themselves well-connected (i.e., eigenvector centrality) had a negative impact on the researchers' performance. Our findings are similar except that we find that eigenvector centrality has a positive impact on the performance of scholars. Moreover, our results demonstrate that a researcher's tendency toward dense local neighborhoods and the researchers' demographic attributes such as gender should also be considered when investigating the impact of the social network metrics on the performance of researchers. (C) 2014 Elsevier Ltd. All rights reserved.
This paper explores a possible approach to a research evaluation, by calculating the renown of authors of scientific papers. The evaluation is based on the citation analysis and its results should be close to a human viewpoint. The PageRank algorithm and its modifications were used for the evaluation of various types of citation networks. Our main research question was whether better evaluation results were based directly on an author network or on a publication network. Other issues concerned, for example, the determination of weights in the author network and the distribution of publication scores among their authors. The citation networks were extracted from the computer science domain in the ISI Web of Science database. The influence of self-citations was also explored. To find the best network for a research evaluation, the outputs of PageRank were compared with lists of prestigious awards in computer science such as the Turing and Codd award, ISI Highly Cited and ACM Fellows. Our experiments proved that the best ranking of authors was obtained by using a publication citation network from which self-citations were eliminated, and by distributing the same proportional parts of the publications' values to their authors. The ranking can be used as a criterion for the financial support of research teams, for identifying leaders of such teams, etc. (C) 2014 Elsevier Ltd. All rights reserved.
Genre is considered to be an important element in scholarly communication and in the practice of scientific disciplines. However, scientometric studies have typically focused on a single genre, the journal article. The goal of this study is to understand the role that handbooks play in knowledge creation and diffusion and their relationship with the genre of journal articles, particularly in highly interdisciplinary and emergent social science and humanities disciplines. To shed light on these questions we focused on handbooks and journal articles published over the last four decades belonging to the research area of science and technology studies (STS), broadly defined. To get a detailed picture we used the full-text of five handbooks (500,000 words) and a well-defined set of 11,700 STS articles. We confirmed the methodological split of STS into qualitative and quantitative (scientometric) approaches. Even when the two traditions explore similar topics (e.g., science and gender) they approach them from different starting points. The change in cognitive foci in both handbooks and articles partially reflects the changing trends in STS research, often driven by technology. Using text similarity measures we found that, in the case of STS, handbooks play no special role in either focusing the research efforts or marking their decline. In general, they do not represent the summaries of research directions that have emerged since the previous edition of the handbook. (C) 2014 Elsevier Ltd. All rights reserved.
By modeling research systems as complex systems we generalize similarity measures used in the literature during the last two decades. We propose to use the mathematical tools developed within the spin-glasses literature to evaluate similarity within systems and between systems in a unified manner. Our measure is based on the 'overlap' of disciplinary profiles of a set of research systems and can readily be integrated in the framework of traditional bibliometric profile analysis. The investigation of the distribution of the overlaps provides useful insights on the dynamics of the general system, that is whether it converges toward a unique disciplinary structure or to a differentiated pattern. We illustrate the usefulness of the approach by investigating the dynamics of disciplinary profiles of European countries from 1996 to 2011. We analyze several bibliometric indicators (including publications and citations) of European countries in the 27 Scopus subject categories. We compare the disciplinary profiles of European countries (i) among them; (ii) with respect to the European standard; and (iii) to the World reference. We find that there is a convergence toward a unique European disciplinary profile of the scientific production even if large differences in the scientific profiles still remain. The investigation of the dynamics by year shows that developing countries are converging toward the European model while some developed countries are departing from it. (C) 2014 Elsevier Ltd. All rights reserved.
The main objective of this study is to analyze the relationship between research impact and the structural properties of co-author networks. A new bibliographic source, Microsoft Academic Search, is introduced to test its suitability for bibliometric analyses. Citation counts and 500 one-step ego networks were extracted from this engine. Results show that tiny and sparse networks - characterized by a high Betweenness centrality and a high Average path length - achieved more citations per document than dense and compact networks described by a high Clustering coefficient and a high Average degree. According to disciplinary differences, Mathematics, Social Sciences and Economics & Business are the disciplines with more sparse and tiny networks; while Physics, Engineering and Geosciences are characterized by dense and crowded networks. This suggests that in sparse ego networks, the central author have more control on their collaborators being more selective in their recruitment and concluding that this behaviour has positive implications in the research impact. (C) 2014 Elsevier Ltd. All rights reserved.
A new percentile-based rating scale P100 has recently been proposed to describe the citation impact in terms of the distribution of the unique citation values. Here I investigate P100 for 5 example datasets, two simple fictitious models and three larger empirical samples. Counterintuitive behavior is demonstrated in the model datasets, pointing to difficulties when the evolution with time of the indicator is analyzed or when different fields or publication years are compared. It is shown that similar problems can occur for the three larger datasets of empirical citation values. Further, it is observed that the performance evaluation result in terms of percentiles can be influenced by selecting different journals for publication of a manuscript. (C) 2014 Elsevier Ltd. All rights reserved.
Omitted citations - i.e., missing links between a cited paper and the corresponding citing papers - are the main consequence of several bibliometric database errors. This paper investigates the possible relationship between omitted citations and publishers of the relevant citing papers. This relationship is potentially meaningful because: (i) publishers generally impose editorial styles, which could affect database errors, and (ii) some publishers may be more efficient than others in detecting and correcting pre-existing errors in the manuscripts to be published, reducing the risk of database errors. Based on an extensive sample of scientific papers in the Manufacturing Engineering field, this study examines the citations omitted by the Scopus and WoS databases, using a recent automated algorithm. Major results are that: (i) there are significant differences in terms of omitted-citation rate between publishers and (ii) the omitted-citation rates of publishers may vary depending on the database in use. (C) 2014 Elsevier Ltd. All rights reserved.
National policies aimed at fostering the effectiveness of scientific systems should be based on reliable strategic analysis identifying strengths and weaknesses at field level. Approaches and indicators thus far proposed in the literature have not been completely satisfactory, since they fail to distinguish the effect of the size of production factors from that of their quality, particularly the quality of labor. The current work proposes an innovative "input-oriented" approach, which permits: (i) estimation of national research performance in a field and comparison to that of other nations, independent of the size of their respective research staffs; and, for fields of comparable intensity of publication, (ii) identification of the strong and weak research fields within a national research system on the basis of international comparison. In reference to the second objective, the proposed approach is applied to the Italian case, through the analysis of the 2006-2010 scientific production of the Italian academic system, in the 200 research fields where bibliometric analysis is meaningful. (C) 2014 Elsevier Ltd. All rights reserved.
This study proposes a temporal analysis method to utilize heterogeneous resources such as papers, patents, and web news articles in an integrated manner. We analyzed the time gap phenomena between three resources and two academic areas by conducting text mining-based content analysis. To this end, a topic modeling technique, Latent Dirichlet Allocation (LDA) was used to estimate the optimal time gaps among three resources (papers, patents, and web news articles) in two research domains. The contributions of this study are summarized as follows: firstly, we propose a new temporal analysis method to understand the content characteristics and trends of heterogeneous multiple resources in an integrated manner. We applied it to measure the exact time intervals between academic areas by understanding the time gap phenomena. The results of temporal analysis showed that the resources of the medical field had more up-to-date property than those of the computer field, and thus prompter disclosure to the public. Secondly, we adopted a power-law exponent measurement and content analysis to evaluate the proposed method. With the proposed method, we demonstrate how to analyze heterogeneous resources more precisely and comprehensively. (C) 2014 Elsevier Ltd. All rights reserved.
This study proposes a network-based model with two parameters to find influential authors based on the idea that the prestige of a whole network changes when a node is removed. We apply the Katz-Bonacich centrality to define network prestige, which agrees with the idea behind the PageRank algorithm. We further deduce a concise mathematical formula to calculate each author's influence score to find the influential ones. Furthermore, the functions of two parameters are revealed by the analysis of simulation and the test on the real-world data. Parameter alpha provides useful information exogenous to the established network, and parameter beta measures the robustness of the result for cases in which the incompleteness of the network is considered. On the basis of the coauthor network of Paul Erdos, a comprehensive application of this new model is also provided. (C) 2014 Elsevier Ltd. All rights reserved.
Transformations and applications of scientific knowledge into new technologies are usually complex interactive processes. Is it possible to detect, from bibliographic information alone, structural alterations and significant events within these processes that may indicate breakthrough discoveries? In this empirical study we focus on R&D processes leading to HIV/AIDS medicines called Integrase Inhibitors. Where scientific progress and discoveries are reflected in research papers, patents signify inventions and technological achievements. Our temporal analysis of distinctive events in this R&D area, tracing trends within both bibliographic information sources, is driven by three bibliometric indicators: (1) contributions of 'bridging researchers' who are also inventors, (2) 'key papers' that subject experts in the field considered milestones in the research process, and (3) the multidisciplinary impact of those papers. The main results indicate that a combination of key papers, bridging researchers and multidisciplinary impact might help track potential 'Charge type' breakthrough developments.
Journal rankings and journal ratings are important to governments, research institutes, and scientific research in general, and they frequently serve as the criteria for evaluating research performance to determine whether specific researchers will receive promotions and/or earn research grants. However, the only widely adopted journal assessment method is known as impact factor (IF), which focuses on citations in academic journals. However, IF disregards the technological applications and value of academic journals. In this article, we propose a method to rank academic journals that utilizes non-patent references in patent documents. We also compare the differences between journal rankings derived by using IF with those derived from the Intellectual Property Citation Index (IPCI) across different fields; moreover, some fields contain positive and significant correlations between IF and the IPCI. The results of this study offer a new perspective from which to assess the technological value of academic journals, particularly those in the technological and scientific fields. This study considers linkages among science and technology and the needs of the stakeholders in journal assessment to shed light on journal assessment and journal ranking methods.
Although the world has experienced rapid urbanization, rural areas have always been and are still an important research field in human geography. This paper performed a bibliometric analysis on rural geography studies based on the peer-reviewed articles concerning rural geography published in the SSCI-listed journals from 1990 to 2012. Our analysis examines publication patterns (document types and publishing languages, article outputs and their categories, major journals and their publication, most productive authors, geographic distribution and international collaboration) and demonstrates the evolution of intellectual development of rural geography by studying highly cited papers and their citation networks and temporal evolution of keywords. Our research findings include: The article number has been increasing since the 1900s, and went through three phases, and the rural geography research is dominated in size by UK and USA. The USA is the most productive in rural geography, but the UK had more impact than other countries in the terms of the average citation of articles. Three distinct but loosely linked research streams of rural geography were identified and predominated by the UK rural geographers. The keywords frequencies evolved according to contexts of rural development and academic advances of human geography, but they were loosely and scattered since the rural researches in different regions or different systems faced with different problems.
This paper presents a new methodology to describe global innovations networks. Using 167,315 USPTO patents granted in 2009 and the papers they cited, this methodology shows "scientific footprints of technology" that cross national boundaries, and how multinational enterprises interact globally with universities and other firms. The data and the map of these flows provide insights to support a tentative taxonomy of global innovation networks.
Evaluation has become a regular practice in the management of science, technology and innovation (ST&I) programs. Several methods have been developed to identify the results and impacts of programs of this kind. Most evaluations that adopt such an approach conclude that the interventions concerned, in this case ST&I programs, had a positive impact compared with the baseline, but do not control for any effects that might have improved the indicators even in the absence of intervention, such as improvements in the socio-economic context. The quasi-experimental approach therefore arises as an appropriate way to identify the real contributions of a given intervention. This paper describes and discusses the utilization of propensity score (PS) in quasi-experiments as a methodology to evaluate the impact on scientific production of research programs, presenting a case study of the BIOTA Program run by FAPESP, the State of So Paulo Research Foundation (Brazil). Fundamentals of quasi-experiments and causal inference are presented, stressing the need to control for biases due to lack of randomization, also a brief introduction to the PS estimation and weighting technique used to correct for observed bias. The application of the PS methodology is compared to the traditional multivariate analysis usually employed.
Does devoting more academic research resources promote academic quality? This study aims to examine the influence of higher education R&D expenditure (HERD) on academic quality measured by the relative citation impact (RCI). Both the ordered Probit and panel data models are employed to implement the empirical estimation, the cross-country evidence suggests that an increase in academic R&D is positively related to academic quality. The further analyses on different academic disciplines show the HERD is more relevant to science publications. This finding is robust for various specifications.
An increasing demand for bibliometric assessment of individuals has led to a growth of new bibliometric indicators as well as new variants or combinations of established ones. The aim of this review is to contribute with objective facts about the usefulness of bibliometric indicators of the effects of publication activity at the individual level. This paper reviews 108 indicators that can potentially be used to measure performance on individual author-level, and examines the complexity of their calculations in relation to what they are supposed to reflect and ease of end-user application. As such we provide a schematic overview of author-level indicators, where the indicators are broadly categorised into indicators of publication count, indicators that qualify output (on the level of the researcher and journal), indicators of the effect of output (effect as citations, citations normalized to field or the researcher's body of work), indicators that rank the individual's work and indicators of impact over time. Supported by an extensive appendix we present how the indicators are computed, the complexity of the mathematical calculation and demands to data-collection, their advantages and limitations as well as references to surrounding discussion in the bibliometric community. The Appendix supporting this study is available online as supplementary material.
This study describes the basic methodological approach and the results of URAP-TR, the first national ranking system for Turkish universities. URAP-TR is based on objective bibliometric data resources and includes both size-dependent and size-independent indicators that balance total academic performance with performance per capita measures. In the context of Turkish national university rankings, the paper discusses the implications of employing multiple size-independent and size-dependent indicators on national university rankings. Fine-grained ranking categories for Turkish universities are identified through an analysis of ranking results across multiple indicators.
How has the terrorism affected the research process and findings? The author tries to answer to this question through an exploratory analysis of the impact of these tragic events on the research outputs of scientists, institutions and countries. In particular, this report provides a wide range of scientometric data related to terrorism studies over the world during the two decades from 1991 to 2011. After the September 11, 2001 events (9/11) in the United States, the concerned academicians have responded in a way that they started producing an increasing number of research publications, as if they were under the influence of some kind of a driving force, stimulating the overall academic production linked to this tragic event. However, after this trend has reached its peak in 2002, that driving force has visibly weakened, and since the mid 2000's, the number of research publication in the field of terrorism studies has steadily decreased. Nonetheless, the number of terrorist events per year, along with the property damage and fatality rate, has continuously increased over the observed lapse of time. Using these results as a backdrop, in this paper is argued that the field of terrorism research should be explored from a critical and multi-cultural perspective, and that all scientific researchers should remain objective, for scientific research is to be independent from political systems, its contingent events in any form, and the transitory historical circumstances.
By means of their academic publications, authors form a social network. Instead of sharing casual thoughts and photos (as in Facebook), authors select co-authors and reference papers written by other authors. Thanks to various efforts (such as Microsoft Academic Search and DBLP), the data necessary for analyzing the academic social network is becoming more available on the Internet. What type of information and queries would be useful for users to discover, beyond the search queries already available from services such as Google Scholar? In this paper, we explore this question by defining a variety of ranking metrics on different entities-authors, publication venues, and institutions. We go beyond traditional metrics such as paper counts, citations, and h-index. Specifically, we define metrics such as influence, connections, and exposure for authors. An author gains influence by receiving more citations, but also citations from influential authors. An author increases his or her connections by co-authoring with other authors, and especially from other authors with high connections. An author receives exposure by publishing in selective venues where publications have received high citations in the past, and the selectivity of these venues also depends on the influence of the authors who publish there. We discuss the computation aspects of these metrics, and the similarity between different metrics. With additional information of author-institution relationships, we are able to study institution rankings based on the corresponding authors' rankings for each type of metric as well as different domains. We are prepared to demonstrate these ideas with a web site (http://pubstat.org) built from millions of publications and authors.
In this study we compare internationalization of academic journals in six fields of science. Internationalization was investigated through journals' concentration on publishing papers from particular countries, relationship between the geographical distributions of editors and authors, and relationship between language of publication and the geographical distribution of papers. Having analyzed more than 1,000 journals we can state that social sciences literature in the fields considered is still nationally and linguistically fragmented more than natural sciences literature, but in some cases the gap is not so big. One of the consequences concerning research output assessment is that usefulness of international databases having national disparity in coverage is still limited in social sciences.
This paper aims to examine authorship trends in software engineering, especially those related to the number of authors, of scientific publications. We collected and mined around 70.000 entries from DBLP for 122 conferences and journals, for the period 1971-2012, in order to process several bibliometric indicators. We provide evidence that the number of authors of articles in software engineering is increasing on average around +0.40 authors/decade. The results also indicate that until 1980, the majority of the articles have a sole author, while nowadays articles with 3 or 4 authors represent almost half of the total.
The aim of this study is to map and analyze the structure and evolution of the scientific literature on gender differences in higher education and science, focusing on factors related to differences between 1991 and 2012. Co-word analysis was applied to identify the main concepts addressed in this research field. Hierarchical cluster analysis was used to cluster the keywords and a strategic diagram was created to analyze trends. The data set comprised a corpus containing 652 articles and reviews published between 1991 and 2012, extracted from the Thomson Reuters Web of Science database. In order to see how the results changed over time, documents were grouped into three different periods: 1991-2001, 2002-2007, and 2008-2012. The results showed that the number of themes has increased significantly over the years and that gender differences in higher education and science have been considered by specific research disciplines, suggesting important research-field-specific variations. Overall, the study helps to identify the major research topics in this domain, as well as highlighting issues to be addressed or strengthened in further work.
This paper focuses on methods to study the distribution of an author's collaborative relationships among different communities in co-authorship networks. Based on the index of extensity centrality, we propose a new index and name it extensity centrality-Newman (Cext-N). Drawing upon a data set of three top journals (MISQ, ISR, JMIS) between 2010 and 2012 in Information Systems, we verify and describe the application and value of our approach. Due to the fact that the starting points among Cext-N and classical indices are quite different and a single index is not advocated in scientific evaluation, we can select the indices in actual application by considering their starting points to ensure the value of each index is taken into account.
We aim at identifying (1) whether and how various data sources influence mapping an intellectual structure of the field of bioinformatics, and (2) the landscape of bioinformatics by integrating those sources. To this end, we conduct a comprehensive bibliometric analysis by harvesting bibliographic information from DBLP, PubMed Central, and Web of Science. We then measure and compare topological characteristics of networks generated using these sources. The results show a dichotomous pattern dominated by PubMed Central and WoS. In addition, a few influential scientists in the field of bioinformatics receive very high citations from their colleagues, which is a driving force to bloom the field. These few scientists are connected to a much larger research community. Most of the researchers are intellectually linked within a few steps, in spite of the domain's interdisciplinary characteristics. Particularly, influential authors consist of a small world. We also identify that there is not a coherent body of discipline in bioinformatics since the field is still under development. Finally, the journals and conferences indexed by each source cover different research topics, and PubMed Central is more inclusive than DBLP as an indexing database.
Academics can now use the web and the social websites to disseminate scholarly information in a variety of different ways. Although some scholars have taken advantage of these new online opportunities, it is not clear how widespread their uptake is or how much impact they can have. This study assesses the extent to which successful scientists have social web presences, focusing on one influential group: highly cited researchers working at European institutions. It also assesses the impact of these presences. We manually and systematically identified if the European highly cited researchers had profiles in Google Scholar, Microsoft Academic Search, Mendeley, Academia and LinkedIn or any content in SlideShare. We then used URL mentions and altmetric indicators to assess the impact of the web presences found. Although most of the scientists had an institutional website of some kind, few had created a profile in any social website investigated, and LinkedIn-the only non-academic site in the list-was the most popular. Scientists having one kind of social web profile were more likely to have another in many cases, especially in the life sciences and engineering. In most cases it was possible to estimate the relative impact of the profiles using a readily available statistic and there were disciplinary differences in the impact of the different kinds of profiles. Most social web profiles had some evidence of uptake, if not impact; nevertheless, the value of the indicators used is unclear.
To better understand the rapidly growing social media research domain, this study presents the findings of a scientometric analysis of the corresponding literature. We conducted a research productivity analysis and citation analysis of individuals, institutions, and countries based on 610 peer-reviewed social media articles published in journals and conference proceedings between October 2004 and December 2011. Results indicate that research productivity is exploding and that several leading authors, institutions, countries, and a small set of foundational papers have emerged. Based on the results-indicating that the social media domain displays limited diversity and is still heavily influenced by practitioners-the paper raises two fundamental challenges facing the social media domain and its future advancement, namely the lack of academic maturity and the Matthew Effect.
Plagiarism is one of the most important current debates among scientific stakeholders. A separate but related issue is the use of authors' own ideas in different papers (i.e., self-plagiarism). Opinions on this issue are mixed, and there is a lack of consensus. Our goal was to gain deeper insight into plagiarism and self-plagiarism through a citation analysis of documents involved in these situations. The Deja vu database, which comprises around 80,000 duplicate records, was used to select 247 pairs of documents that had been examined by curators on a full text basis following a stringent protocol. We then used the Scopus database to perform a citation analysis of the selected documents. For each document pair, we used specific bibliometric indicators, such as the number of authors, full text similarity, journal impact factor, the Eigenfactor, and article influence. Our results confirm that cases of plagiarism are published in journals with lower visibility and thus tend to receive fewer citations. Moreover, full text similarity was significantly higher in cases of plagiarism than in cases of self-plagiarism. Among pairs of documents with shared authors, duplicates not citing the original document showed higher full text similarity than those citing the original document, and also showed greater overlap in the references cited in the two documents.
In this paper we analyze topic evolution over time within bioinformatics to uncover the underlying dynamics of that field, focusing on the recent developments in the 2000s. We select 33 bioinformatics related conferences indexed in DBLP from 2000 to 2011. The major reason for choosing DBLP as the data source instead of PubMed is that DBLP retains most bioinformatics related conferences, and to study dynamics of the field, conference papers are more suitable than journal papers. We divide a period of a dozen years into four periods: period 1 (2000-2002), period 2 (2003-2005), period 3 (2006-2008) and period 4 (2009-2011). To conduct topic evolution analysis, we employ three major procedures, and for each procedure, we develop the following novel technique: the Markov Random Field-based topic clustering, automatic cluster labeling, and topic similarity based on Within-Period Cluster Similarity and Between-Period Cluster Similarity. The experimental results show that there are distinct topic transition patterns between different time periods. From period 1 to period 3, new topics seem to have emerged and expanded, whereas from period 3 to period 4, topics are merged and display more rigorous interaction with each other. This trend is confirmed by the collaboration pattern over time.
This paper focuses on measuring the academic research performance of Chinese universities by using Scopus database from 2007 to 2010. We have provided meaningful indicators to measure the research performance of Chinese universities as compared to world class universities of the US and the European region. Using these indicators, we first measure the quantity and quality of the research outcomes of the universities and then examine the internationalization of research by using international collaborations, international citations and international impact metrics. Using all of this data, we finally present an overall score called research performance point to measure the comprehensive research strength of the universities for the selected subject categories. The comparison identifies the gap between Chinese universities and top-tier universities from selected regions across various subject areas. We find that Chinese universities are doing well in terms of publication volume but receive less citations from their published work. We also find that the Chinese universities have relative low percentage of publications at high impact venues, which may be the reason that they are not receiving more citations. Therefore, a careful selection of publication venues may help the Chinese universities to compete with world class universities and increase their research internationalization.
The purpose of this paper is twofold: methodological and empirical. Methodologically, we describe a matching and disambiguation procedure for the identification of author-inventors (researchers who publish and patent) located in the same country. Our methodology aims to maximize precision and recall rates by taking into account national name writing customs and country-specific dictionaries for person and institution names (academic and non-academic) in the name matching stage and by including a recursive validation step in the person disambiguation stage. An application of this methodology to the identification of Spanish author-inventors is described in detail. Empirically, we present the first results of applying the described methodology to the matching of all SCOPUS 2003-2008 publications of Spanish authors to all 1978-2009 EPO applications with Spanish inventors. Using this data, we identify 4,194 Spanish author-inventors. A first look at their patenting and publication patterns reveals that they make quite a significant contribution to the country's overall scientific and technological production in the time period considered: 27 % of all EPO patent applications invented in Spain and 15 % of all SCOPUS publications authored in Spain, excluding non-technological disciplines. To our knowledge, this is the first time that a large scale identification of author-inventors from Spain has been done, with no limitation in terms of fields, regions or types of institutions. We also make available online for scientific use an anonymized subset of the database (patent applications invented by authors affiliated to Spanish public universities).
Inventor disambiguation is an increasingly important issue for users of patent data. We propose and test a number of refinements to the original Massacrator algorithm, originally proposed by Lissoni et al. (The keins database on academic inventors: methodology and contents, 2006) and now applied to APE-INV, a free access database funded by the European Science Foundation. Following Raffo and Lhuillery (Res Policy 38:1617-1627, 2009) we describe disambiguation as a three step process: cleaning&parsing, matching, and filtering. By means of sensitivity analysis, based on MonteCarlo simulations, we show how various filtering criteria can be manipulated in order to obtain optimal combinations of precision and recall (type I and type II errors). We also show how these different combinations generate different results for applications to studies on inventors' productivity, mobility, and networking; and discuss quality issues related to linguistic issues. The filtering criteria based upon information on inventors' addresses are sensitive to data quality, while those based upon information on co-inventorship networks are always effective. Details on data access and data quality improvement via feedback collection are also discussed.
Quality evaluation and its assurance in higher education institutions constitute an obligation and scope of most European Universities. To accomplish this, quantitative indices, known as bibliometrics, are recruited which are considered a useful evaluation tool particularly for academics' and Universities' research performance. In the present study, the research quality of the five Greek civil engineering departments (Athens, Patras, Thessaloniki, Volos, Xanthi) is assessed by means of several advanced bibliometric indices calculated separately for each academic. Statistical analysis of the data is also performed to compare the observed differences in the mean values of the calculated indices. The study is conducted both in department and academic rank level to explore how research activity is distributed among the various ranks. In addition, to evaluate the research status of the Greek departments in the European context, their research output is compared with that of London civil engineering department. To explore the dependence of bibliometrics on seniority, bibliometric analysis considering the research activity of all academics only during the last decade is also made. Finally, the temporal progress of the research productivity leads to interesting findings about the impact of the European economic crisis on research performance. In general, bibliometrics demonstrate that Patras department host academics of better quality, but Athens exhibits higher scientific activity over the last decade. Superiority of London department is evident but few bibliometrics are comparable with the ones of the Greek departments. Results also indicate that no common standards in hiring/promotion of academics are established, while the European socio-economic crisis has significant negative impact on research productivity.
Identifying academic inventors is crucial for reliable assessments of academic patenting and for understanding patent-based university-to-industry technology transfer. It requires solving the "who is who" problem at the individual inventor level. This article describes data collection and matching techniques applied to identify academic inventors in Germany. To manage the large dataset, we adjust a matching technique applied in prior research by comparing the inventor and professor names in the first step after cleaning. We also suggest a new approach for determining the similarity score. To evaluate our methodology we apply it to the EP-INV-PatStat database and compare its results to alternative approaches. For our German data, results are less sensitive to the choice of name comparison algorithm than to the specific filtering criteria employed. Restricting the search to EPO applications or identifying inventors by professor title underestimates academic patenting in Germany.
Nature is among the world's most highly cited multidisciplinary science journals with one of the highest impact factors of 38.597 (Nature Publishing Group (NPG) 2013), which is used relatively often in many scientific rankings. When analysing the regional distribution of Nature publications, we found a high correlation between the expenditures and the number of local affiliations that are counted on a national basis. The same regularity can be observed for the world's top 30 and the US's top 50 universities; however, the correlation is now skewed by the so-called cumulative advantage or the Matthew Effect, which evidently rewards those that are ranked at the top of the Academic Ranking of World Universities. The rich get richer and the poor get poorer. Surprisingly, the amount of the endowment better determines the number of Nature publications for universities than the total research expenditure.
This paper analyses the information science research field of informetrics to identify publication strategies that have been important for its successful researchers. The study uses a micro-analysis of informetrics researchers from 5,417 informetrics papers published in 7 core informetrics journals during 1948-2012. The most productive informetrics researchers were analysed in terms of productivity, citation impact, and co-authorship. The 30 most productive informetrics researchers of all time span several generations and seem to be usually the primary authors of their research, highly collaborative, affiliated with one institution at a time, and often affiliated with a few core European centres. Their research usually has a high total citation impact but not the highest citation impact per paper. Perhaps surprisingly, the US does not seem to be good at producing highly productive researchers but is successful at producing high impact researchers. Although there are exceptions to all of the patterns found, researchers wishing to have the best chance of being part of the next generation of highly productive informetricians may wish to emulate some of these characteristics.
We introduce and evaluate a novel network-based approach for determining individual credit of coauthors in multi-authored papers. In the proposed model, coauthorship is conceptualized as a directed, weighted network, where authors transfer coauthorship credits among one another. We validate the model by fitting it to empirical data about authorship credits from economics, marketing, psychology, chemistry, and biomedicine. Also, we show that our model outperforms prior alternatives such as fractional, geometric, arithmetic, and harmonic counting in generating coauthorship credit allocations that approximate the empirical data. The results from the empirical evaluation as well as the model's capability to be adapted to domains with different norms for how to order authors per paper make the proposed model a robust and flexible framework for studying substantive questions about coauthorship across domains.
This paper presents a bibliometric analysis of articles from the Republic of Serbia in the period 2006-2012 that are indexed in the Thomson Reuters SCI-EXPANDED database. The Republic of Serbia is a small country in Europe with about seven million citizens that became an independent country in 2006. Since 2006, Serbian science has achieved some recognition. Analysis included 14,293 articles with authors all from Serbia. Distribution of published articles in the Web of Science categories, journals, scientific-research institutions and researchers were analysed. Most cited independent research articles from Serbia were also analysed. The Y-index indicator for rating the productivity of researchers and institutions was used. This indicator takes into account the contribution of the researcher to the published results. The results showed that the productivity of articles from Serbia is significant compared to neighbouring Serbian countries, taking into account the number of researchers in these countries, their GDPs and the percentages of GDPs spent on research.
This article uses document co-citation analysis to objectively explore the underlying structure of the intellectual property research domain, taken from a managerial and strategic standpoint. The goal of this study is identifying its main research areas, understanding its current state of development and suggesting potential future directions, by analyzing the co-citations from 181 papers published between 1992 and 2011 in the most influential academic journals. Five main clusters have been identified, mapped, and labeled as follows: Economics of patent system, technological and institutional capabilities, university patenting, intellectual property exploitation, and division of labor. Their most active areas on this topic, and the most influential and co-cited papers have been identified and described. Also, intra- and inter-cluster knowledge base diversity has been assessed by using indicators stemming from the domains of information theory and biology. A t test has been performed to assess the significance of the inter-cluster diversity. The knowledge bases of these five clusters are significantly diverse, this meaning that they are five co-existing paradigms.
To investigate patterns of technology collaboration within the Chinese automobile industry, this study employs a unique dataset of patent applications that reveal a record of 64,938 collaborative relations in the industry during the period from 1985 to 2010. Our results indicate that over 60 % of the total collaborations were conducted after China entered the WTO. The invention and utility types of patents account for 98 % of the total collaborations throughout the sample period. Using a network analysis method, we find that the key differences between domestic enterprises collaborating with indigenous enterprises (DD collaboration) and with foreign firms (DF collaboration) are in patent types and technology domains. The DF network is also denser and more centralized than the DD network, although the amount of nodes and links of the DD network is greater than that of the DF collaboration network. The analysis and visualization of the collaboration networks and corresponding largest components reveal that a large number of domestic enterprises prefer to collaborate with top global automobile manufacturers. We also find that a number of universities have become key players in the collaborations among industry, universities and research institutes. This study provides a deeper understanding of technology collaborations from various perspectives and also highlights several avenues for future research.
An extended latent Dirichlet allocation (LDA) model is presented in this paper for patent competitive intelligence analysis. After part-of-speech tagging and defining the noun phrase extraction rules, technological words have been extracted from patent titles and abstracts. This allows us to go one step further and perform patent analysis at content level. Then LDA model is used for identifying underlying topic structures based on latent relationships of technological words extracted. This helped us to review research hot spots and directions in subclasses of patented technology in a certain field. For the extension of the traditional LDA model, another institution-topic probability level is added to the original LDA model. Direct competing enterprises' distribution probability and their technological positions are identified in each topic. Then a case study is carried on within one of the core patented technology in next generation telecommunication technology-LTE. This empirical study reveals emerging hot spots of LTE technology, and finds that major companies in this field have been focused on different technological fields with different competitive positions.
The current study investigates parts manufacturers' innovative behavior from the population ecology perspective. Specifically, this paper proposes that firm level inertia and network level inertia matter in parts manufacturer's innovation. Using data from auto parts manufacturers, we test four hypotheses, and the results show that firm level inertia indicated by age does not matter, while firm's innovative inertia matters in parts manufacturers' innovation. At the same time, we find that cluster can promote general parts firms' innovation, but they will harm the innovative firms' innovative behavior. These results contribute to our understanding of parts manufacturer's innovation.
Numerous studies have sought to uncover violations of objectivity and impartiality in peer review; however the notion of reciprocity has been absent in much of this discussion, particularly as it relates to gendered and ethnicized behaviors of peer review. The current study addresses this gap in research by investigating patterns of reciprocity (i.e., correspondences between patterns of recommendations received by authors and patterns of recommendations given by reviewers in the same social group) by perceived gender and ethnicity of reviewers and authors for submissions to the Journal of the American Society for Information Science and Technology from June 2009 to May 2011. The degree of reciprocity for each social group was examined by employing Monte Carlo resampling to extrapolate more robust patterns from the limited data available. We found that papers with female authors received more negative reviews than reviews for male authors. Reciprocity was suggested by the fact that female reviewers gave lower reviews than male reviewers. Reciprocity was also exhibited by ethnicity, although non-Western reviewers gave disproportionately more recommendations of major revision, while non-Western authors tended to receive more outright rejections. This study provides a novel theoretical and methodological basis for future studies on reciprocity in peer review.
In many countries culture, practice or regulations inhibit the co-presence of relatives within the university faculty. We test the legitimacy of such attitudes and provisions, investigating the phenomenon of nepotism in Italy, a nation with high rates of favoritism. We compare the individual research performance of "children" who have "parents" in the same university against that of the "non-children" with the same academic rank and seniority, in the same field. The results show non-significant differences in performance. Analyses of career advancement show that children's research performance is on average superior to that of their colleagues who did not advance. The study's findings do not rule out the existence of nepotism, which has been actually recorded in a low percentage of cases, but do not prove either the most serious presumed consequences of nepotism, namely that relatives who are poor performers are getting ahead of non-relatives who are better performers. In light of these results, many attitudes and norms concerning parental ties in academia should be reconsidered.
Health care sciences and services research (HCSSR) has come to the fore in recent years and related research literature increased rapidly over the last few decades. The main purpose of this study is to describe the global progress and to determine the current trends on HCSSR by using a scientometrics approach to survey related literature in the Web of Science database from 1900 to 2012. The document types, languages, publication patterns, subject categories, journals, geographic and institutional distributions, top cited articles, and the distribution of keywords were thoroughly examined. The results show that HCSSR has increased rapidly over the past 20 years, most notably in the last decade. In total, there are currently 128,728 research articles in 156 journals listed in 39 WoS subject categories. The top 20 most productive countries, and institutions were analyzed in detail, and 11 frequently cited papers and research foci were identified based on citation analysis. HCSSR spans many disciplines and focuses mainly on public, environmental & occupational health and education educational research. Medical Care, Academic Medicine, Health Affairs and Journal of School Health are the core journals with both high quantity and quality. High-income countries make up the leading nations, especially G7 countries. Meanwhile, "emerging economies" are also increasingly engaging this field. American and Canadian institutions have made greater advances in productions, citations, and cooperation, with stronger and better development prospects overall. The hot topics include internet use and decision making in health care, palliative care and end of life research, health status and quality of life, quality of healthcare and patient's satisfaction, medical education, and health communication. Also, most researchers tend to study health care sciences based on the topics of quality-of-life assessment, and their interest in quality-of-life measures has increased. Increasing attention has been paid to the developing countries, especially "emerging economies" like China. Although health research has made much progress, many questions still remain unanswered and there are few assessments of how well research systems carry out their essential functions. Hence, there is currently an urgent need for timely establishment of an effective health research system.
In a bid for an eye-catching title, many writers use devices such as interrogation and exclamation marks, metaphors, double meanings and vague expressions which do not comply with accepted standards in style manuals of scientific writing. The purpose of this article is to analyse the lack of accuracy of titles in articles on bibliometrics published in biomedical journals and to discuss the effect this may have on the reader. A corpus of 1,505 titles included in PubMed and Web of Science between 2009 and 2011 and retrieved under the MeSH major topic "bibliometrics" and other related terms was analyzed. Different types of inaccuracy were identified and a classification was developed and used for this particular study. 23.4 % of the titles contain inaccuracies of some kind. Editorial titles show a higher percentage of these (11.43 %) than original articles (8.83 %) and letters (3.2 %), the most frequent being the inclusion of a question in the title (seen in 30.9 % of the papers), followed by vague and imprecise expressions (17.8 %), acronyms (16.4 %) and double meanings (14 %). Many titles fail to comply with the conventions of scientific writing. A descriptive title accurately reflecting the content of an article would give readers a better idea of its content, help them to decide more rapidly whether they want to read it and facilitate retrieval from bibliographic databases.
This study investigates whether academics can capitalize on their external prominence (measured by the number of pages indexed on Google, TED talk invitations or New York Times bestselling book successes) and internal success within academia (measured by publication and citation performance) in the speakers' market. The results indicate that the larger the number of web pages indexing a particular scholar, the higher the minimum speaking fee. Invitations to speak at a TED event, or making the New York Times Best Seller list is also positively correlated with speaking fees. Scholars with a stronger internal impact or success also achieve higher speaking fees. However, once external impact is controlled, most metrics used to measure internal impact are no longer statistically significant.
The main objective of this study was to analyze research productivity originating from Middle East Arab (MEA) countries in the field of diabetes mellitus (DM). Data from January 1, 1996 till December 31, 2012 were searched for documents with specific words in diabetes as a "source title" and a list of 13 MEA countries as affiliation country. Research productivity was evaluated based on number of publications, citation analysis, indexing in Institute for Scientific Information and impact factor (IF). The 13 MEA countries published a total of 479 documents in 41 diabetes journals. This number represents 0.75 % of the total documents produced globally in the field of DM. The number of published documents increased by around fivefold from early 2000 to 2012. Of the 41 journal titles retrieved, 24 (58.5 %) had their IF listed in the journal citation reports 2012. Forty-two documents (14.5 %) were published in journals that had no official IF. The total number of citations for documents published from MEA countries in the field of DM, at the time of data analysis, was 5,565 with an h index of 35. The median (inter-quartile range) citation for documents from the 13 MEA countries was 4 (1-11). The top productive institution in the field of DM was United Arab Emirates University with 51 documents (10.6 %). Authors from MEA countries collaborated mostly with authors in countries like United Kingdom, USA, and Germany. The present data show promising and relatively good diabetes research productivity in MEA countries especially after 2008.
Citation analysis has become an essential tool for research and academic effectiveness evaluation of universities. However, authorship identity has long been difficult to resolve in bibliometric analyses for many scientific fields, where performance of algorithms against human judgment is far from universal. Now with the boom of authors with compound names (mainly, Latino researchers and from Portuguese language countries) in scientific publications, clustering methods continue lowering their performance, due to completely forgetting the context and order of names (first name"s" and last name"s") of each author in the publication (authorship identity). These kinds of mistakes affect visualization of publications, decreasing the likelihood of finding a given article by a specific author and generating bad quotations in the online systems. This has led to an unsuitable registration and unsuitable grouping of author names "ambiguous authorship identity" of each scientific publication. This process requires more work, time, attention, and accountability on the part of authors, reviewers, journal editors, and providers of bibliographic databases. These errors can be corrected by cross-referencing with each full original article, using manual checks and without ignoring the names issue at the moment of drafting and/or reviewing a manuscript. This paper seeks to raise awareness on how to write author names, highlighting the way in which they are being cited and self-citing the name of authors and co-authors in the publications.
This study examines the research performance and international research collaborations (IRC) of ASEAN nations in the area of economics. Over the last 3 decades international collaborated papers have increased in the region, while locally-co-authored papers have declined. Singapore towered among ASEAN nations in research efficiency based on geographical area, population and GDP. Vietnam performed relatively better in research efficiency than research productivity (number of papers produced), while Indonesia performed poorly. Overall, internationally co-authored papers were cited twice as often as locally authored papers except that both The Philippines and Indonesia exhibited almost no difference in how their local and internationally co-authored papers were cited. The study also examined IRC from the perspective of social networks. Centrality had a strong correlation with research performance; however, vertex tie-strength (a result of repeat collaboration) showed maximum correlation with research performance. While Malaysia emerged as the nation with the highest betweenness centrality or 'bridging' power, the US emerged as the most favoured international partner of ASEAN nations. However, collaboration between ASEAN countries accounted for just 4 % of all international collaborations. Increased academic mobility and more joint scientific works are suggestions to consider to boost educational co-operation among the ASEAN nations.
Here we examine the evolution of journal sharing between scientific subject categories, using evolutionary game theory. We assume that there is journal sharing between subject categories if they share common scholarly journals. In this paper, the Prisoners' dilemma (within evolutionary game theory) is used as a metaphor for the problems surrounding the evolution of journal sharing between scientific subject categories. Using evolutionary games, here we show that connections between categories (that share common journals) can enable journal sharing to persist indefinitely on stationary configurations. The conclusion is that journal sharing between subject categories is an evolutionary advantage. Using a set of experiments, we have explored the asymptotic behaviour of this system for various values of the model's parameter and the results seem robust. Subject categories are described in terms of graphs, such that categories occupy the vertices. Sharing categories are connected through the edges of those graphs. The combination of evolutionary game theory and graph theory provides the flexibility for carrying out more realistic simulations.
The science of polymer solar cells and the technology based on it is now pursued as a very exciting and promising area of research at leading universities, national laboratories, and companies throughout the world. In this paper, we conduct a comprehensive and in-depth bibliometric analysis of this area that breaks down scholarly performance into three components-quantity, quality and consistency. The citation data is retrieved from the Web of Science. We identify the most productive organisations, countries, authors and also the most influential journals in which this newly emerging area is published using these criteria.
If peer review has been and is continuing to be an acceptable approach for evaluation, Science and technology (S&T) metrics have been demonstrated to be a more accurate and objectively independent tools for evaluation. This article provides insights from an example of a relevant use of S&T metrics to assess a national research policy and subsequently universities achievements within this policy. One of the main findings were that just by setting S&T metrics as objective indicators there was an increasing research outputs: productivity, impact, and collaboration. However, overall productivity is still far low when brought to academic staff size and that a huge difference exists among universities achievements. The reliability of scientometric evaluation's use as a performance tool is increasing in universities and the culture of this evaluation usefulness in research policy has widely spread. Surprisingly, this evaluation shows that even if S&T metrics have substantially increased, funds execution as means of rate of payment on total budget was less than 15 % due mainly to the high and unusual increase in funding allocations than was before the policy, a fact to which universities were managerially not well prepared. Finally, future evaluation should follow in the very short-term to quantify the impact extent of the policy revealed in this annual evaluation.
Recently, national governments have tried to improve technology ecology, by formulating research and development (R&D) policies and investing in R&D programs. For strategically designed national R&D plans, analytic approaches that identify and assess the impact of each technology from short-term and long-term perspectives are necessary. Further, in methodological perspective, the approaches should be able to synthetically consider the most recent technological information, the direct and hidden impacts among technologies, and the relative impacts of the focal technology in globally-linked technological relationship from the overall perspective. However, most previous studies based patent citation networks are insufficient for these requirements. As a remedy, we present a combined approach for constructing a technology impact network and identifying the impact and intermediating capability of technology areas from the perspective of a national technology system. To construct and analyze the technology impact network, our method integrates three network techniques: patent co-classification (PCA), decision making trial and evaluation laboratory (DEMATEL), and social network analysis (SNA). The advantages of the proposed method are threefold. First, it identifies the directed technological knowledge flows from the most recent patents, by employing PCA. Second, the proposed network contains both the direct and indirect impacts among different technology areas, by applying the DEMATEL method. Third, using SNA, the method can analyze the characteristics of the technologies in terms of the comprehensive impacts and the potential brokerage capabilities. The method is illustrated using all of the recent Korean patents (58,279) in the United States patent database from 2008 to 2012. We expect that our method can be used to provide input to decision makers for effective R&D planning.
People who are collaborating can share files in two main ways: performing Group Information Management (GIM) using a common repository or performing Personal Information Management (PIM) by distributing files as e-mail attachments and storing them in personal repositories. There is a trend toward using common repositories with many organizations encouraging workers to use GIM to avoid duplication of files and management. So far, PIM and GIM have been studied by different research communities, so their effectiveness for file retrieval has not yet been systematically compared. We compared PIM and GIM in a large-scale elicited personal information retrieval study. We asked 275 users to retrieve 860 of their own shared files, testing the effect of sharing method on success and efficiency of retrieval. Participants preferred PIM over GIM. More important, PIM retrieval was more successful: Participants using GIM failed to find 22% of their files compared with 13% failures using PIM. This may be because active organization aids retrieval: When using personally created folders, the failure percentage was 65% lower than when using default folders (e.g., My Documents), and more than 5 times lower than when using folders created by others for GIM. Theoretical reasons for this are discussed.
An important aspect to performing text categorization is selecting appropriate supervised classification and feature selection methods. A comprehensive benchmark is needed to inform best practices in this broad application field. Previous benchmarks have evaluated performance for a few supervised classification and feature selection methods and limited ways to optimize them. The present work updates prior benchmarks by increasing the number of classifiers and feature selection methods order of magnitude, including adding recently developed, state-of-the-art methods. Specifically, this study used 229 text categorization data sets/tasks, and evaluated 28 classification methods (both well-established and proprietary/commercial) and 19 feature selection methods according to 4 classification performance metrics. We report several key findings that will be helpful in establishing best methodological practices for text categorization.
Online content providers, such as news portals and social media platforms, constantly seek new ways to attract large shares of online attention by keeping their users engaged. A common challenge is to identify which aspects of online interaction influence user engagement the most. In this article, through an analysis of a news article collection obtained from Yahoo News US, we demonstrate that news articles exhibit considerable variation in terms of the sentimentality and polarity of their content, depending on factors such as news provider and genre. Moreover, through a laboratory study, we observe the effect of sentimentality and polarity of news and comments on a set of subjective and objective measures of engagement. In particular, we show that attention, affect, and gaze differ across news of varying interestingness. As part of our study, we also explore methods that exploit the sentiments expressed in user comments to reorder the lists of comments displayed in news pages. Our results indicate that user engagement can be anticipated predicted if we account for the sentimentality and polarity of the content as well as other factors that drive attention and inspire human curiosity.
This article seeks to make visible information-sharing activities that take place within a geographically dispersed network of design researchers. For this purpose, a theoretical approach is applied that comprises the analytical notion of material objects and a document theory. Empirical material was primarily ethnographically produced over a period of 6 months, including 2 seminars within the network. Trajectories of sharing that reach across time and space have been identified by studying how people interact with multidimensional objects, such as documents. These were found to coordinate and shape the social practice under study. The theoretical framework has made it possible to highlight aspects of information sharing that have tended to be blackboxed in previous research. It has been suggested in previous research that the concept of information sharing can be reduced to that of mere sharing. Such a stance potentially entails reduction of conceptual ambiguity but may also decrease analytical sharpness. Based on the present study, it appears beneficial to adopt the concept of document into the discourse of information-sharing research. By adding the concept of document to our analytical toolbox, which hitherto has been dominated by the slightly diffuse concept of information, material features can be emphasized without reducing the social and cognitive dimensions of information sharing. The article offers insight into the information-sharing activities of design researchers. Through its focus on materiality, it presents a novel theoretical approach and methodological strategy for studying information practices.
Organizing and structuring online information becomes a mainstream activity within large organizations as increasing volumes of information are made available via the web. General methodologies, best practice, and guidelines for web information architecture (IA) have been developed and refined. This research paper extends the knowledge base for web IA by examining situated practice within large organizations and building theory to provide a deeper understanding of how large organizations construct information-rich websites. A grounded theory, The Situated Practice of Web IA in Large Organizations, is proposed as an integrated theoretical framework for practice in this context. The theoretical framework is composed of 4 foundational constructs titled: owning, negotiating, enacting, and knowing web IA. Building on these foundations, an integrating central construct of practicing web IA is proposed. This theoretical framework will inform large organizations and practitioners as they approach web IA.
Social media have transformed social interactions and now look set to transform workplace communications. In this exploratory study, we investigate how employees use and get value from a variety of social networking technologies. The context of this research is 4 software firms located in China. Notwithstanding differences in corporate attitudes toward social networking, we identify common themes in the way Web 2.0 technologies are leveraged as value is created by employees at all levels. We draw on the communication ecology framework to analyze the application of various technologies. We inductively develop 5 propositions that describe how social networking technologies contribute directly to horizontal and vertical communication in organizations, and ultimately to individual, team, and organizational performance. Implications for research and practice are discussed.
Data fusion is currently used extensively in information retrieval for various tasks. It has proved to be a useful technology because it is able to improve retrieval performance frequently. However, in almost all prior research in data fusion, static search environments have been used, and dynamic search environments have generally not been considered. In this article, we investigate adaptive data fusion methods that can change their behavior when the search environment changes. Three adaptive data fusion methods are proposed and investigated. To test these proposed methods properly, we generate a benchmark from a historic Text REtrieval Conference data set. Experiments with the benchmark show that 2 of the proposed methods are good and may potentially be used in practice.
Recently, an increasing number of information retrieval studies have triggered a resurgence of interest in redefining the algorithmic estimation of relevance, which implies a shift from topical to multidimensional relevance assessment. A key underlying aspect that emerged when addressing this concept is the aggregation of the relevance assessments related to each of the considered dimensions. The most commonly adopted forms of aggregation are based on classical weighted means and linear combination schemes to address this issue. Although some initiatives were recently proposed, none was concerned with considering the inherent dependencies and interactions existing among the relevance criteria, as is the case in many real-life applications. In this article, we present a new fuzzy-based operator, called iAggregator, for multidimensional relevance aggregation. Its main originality, beyond its ability to model interactions between different relevance criteria, lies in its generalization of many classical aggregation functions. To validate our proposal, we apply our operator within a tweet search task. Experiments using a standard benchmark, namely, Text REtrieval Conference Microblog, emphasize the relevance of our contribution when compared with traditional aggregation schemes. In addition, it outperforms state-of-the-art aggregation operators such as the Scoring and the And prioritized operators as well as some representative learning-to-rank algorithms.
Cocitation and co-word methods have long been used to detect and track emerging topics in scientific literature, but both have weaknesses. Recently, while many researchers have adopted generative probabilistic models for topic detection and tracking, few have compared generative probabilistic models with traditional cocitation and co-word methods in terms of their overall performance. In this article, we compare the performance of hierarchical Dirichlet process (HDP), a promising generative probabilistic model, with that of the 2 traditional topic detecting and tracking methodscocitation analysis and co-word analysis. We visualize and explore the relationships between topics identified by the 3 methods in hierarchical edge bundling graphs and time flow graphs. Our result shows that HDP is more sensitive and reliable than the other 2 methods in both detecting and tracking emerging topics. Furthermore, we demonstrate the important topics and topic evolution trends in the literature of terrorism research with the HDP method.
For the last decade or so, sentiment analysis, which aims to automatically identify opinions, polarities, or emotions from user-generated content (e.g., blogs, tweets), has attracted interest from both academic and industrial communities. Most sentiment analysis strategies fall into 2 categories: lexicon-based and corpus-based approaches. While the latter often requires sentiment-labeled data to build a machine learning model, both approaches need sentiment-labeled data for evaluation. Unfortunately, most data domains lack sufficient quantities of labeled data, especially at the subdocument level. Semisupervised learning (SSL), a machine learning technique that requires only a few labeled examples and can automatically label unlabeled data, is a promising strategy to deal with the issue of insufficient labeled data. Although previous studies have shown promising results of applying various SSL algorithms to solve a sentiment-analysis problem, co-training, an SSL algorithm, has not attracted much attention for sentiment analysis largely due to its restricted assumptions. Therefore, this study focuses on revisiting co-training in depth and discusses several co-training strategies for sentiment analysis following a loose assumption. Results suggest that co-training can be more effective than can other currently adopted SSL methods for sentiment analysis.
International coauthorship relations have increasingly shaped another dynamic in the natural and life sciences during recent decades. However, much less is known about such internationalization in the social sciences. In this study, we analyze international and domestic coauthorship relations of all citable items in the DVD version of the Social Sciences Citation Index 2011 (SSCI). Network statistics indicate 4 groups of nations: (a) an Asian-Pacific one to which all Anglo-Saxon nations (including the United Kingdom and Ireland) are attributed, (b) a continental European one including also the Latin-American countries, (c) the Scandinavian nations, and (d) a community of African nations. Within the EU-28, 11 of the EU-15 states have dominant positions. In many respects, the network parameters are not so different from the Science Citation Index. In addition to these descriptive statistics, we address the question of the relative weights of the international versus domestic networks. An information-theoretical test is proposed at the level of organizational addresses within each nation; the results are mixed, but the international dimension is more important than the national one in the aggregated sets (as in the Science Citation Index). In some countries (e.g., France), however, the national distribution is leading more than the international one. Decomposition of the United States in terms of states shows a similarly mixed result; more U.S. states are domestically oriented in the SSCI and more internationally in the SCI. The international networks have grown during the last decades in addition to the national ones but not by replacing them.
How do social ties in online worlds evolve over time? This research examined the dynamic processes of relationship formation, maintenance, and demise in a massively multiplayer online game. Drawing from evolutionary and ecological theories of social networks, this study focuses on the impact of three sets of evolutionary factors in the context of social relationships in the online game EverQuest II (EQII): the aging and maturation processes, social architecture of the game, and homophily and proximity. A longitudinal analysis of tie persistence and decay demonstrated the transient nature of social relationships in EQII, but ties became considerably more durable over time. Also, character level similarity, shared guild membership, and geographic proximity were powerful mechanisms in preserving social relationships.
We show how automatically extracted citations in historical corpora can be used to measure the direct and indirect influence of authors on each other. These measures can in turn be used to determine an author's overall prominence in the corpus and to identify distinct schools of thought. We apply our methods to two major historical corpora. Using scholarly consensus as a gold standard, we demonstrate empirically the superiority of indirect influence over direct influence as a basis for various measures of authorial impact.
This paper describes high-quality journals in Brazil and Spain, with an emphasis on the distribution models used. It presents the general characteristics (age, type of publisher, and theme) and analyzes the distribution model by studying the type of format (print or digital), the type of access (open access or subscription), and the technology platform used. The 549 journals analyzed (249 in Brazil and 300 in Spain) are included in the 2011 Web of Science (WoS) and Scopus databases. Data on each journal were collected directly from their websites between March and October 2012. Brazil has a fully open access distribution model (97%) in which few journals require payment by authors thanks to cultural, financial, operational, and technological support provided by public agencies. In Spain, open access journals account for 55% of the total and have also received support from public agencies, although to a lesser extent. These results show that there are systems support of open access in scientific journals other than the author pays model advocated by the Finch report for the United Kingdom.
It is shown that a normalized version of the g-index is a good normalized impact and concentration measure. A proposal for such a measure by Bartolucci is improved.
The analysis of the high end of citation distributions represented by its tail provides important supplementary information on the citation profile of the unit under study. In a previous study by Glanzel (Scientometrics 97: 13-23, 2013a), a parameter-free solution providing four performance classes has been proposed. Unlike in methods based on pre-set percentiles, this method is not sensitive to ties and ensures needless integration of measures of outstanding and even extreme performance into the standard tools of scientometric performance assessment. The applicability of the proposed method is demonstrated for both subject analysis and the combination of different subjects at the macro and meso level.
Co-authorship has become common practice in most science and engineering disciplines and, with the growth of co-authoring, has come a fragmentation of norms and practices, some of them discipline-based, some institution-based. It becomes increasingly important to understand these practices, in part to reduce the likelihood of misunderstanding in collaborations among authors from different disciplines and fields. Moreover, there is also evidence of widespread satisfaction with collaborative and co-authoring experiences. In some cases the dissatisfactions are more in the realm of bruised feelings and miscommunication but in others there is clear exploitation and even legal disputes about, for example, intellectual property. Our paper is part of a multiyear study funded by the U. S. National Science Foundation (NSF) and draws its data from a representative national survey of scientists working in 108 Carnegie Doctoral/Research Universities-Very High Research Activity (n = 641). The paper tests hypotheses about the determinants of collaboration effectiveness. Results indicate that having an explicit discussion about co-authorship reduces the odds of a bad collaboration on a recent scholarly article. Having co-authors from different universities also reduces the odds of a bad collaboration, while large numbers of co-authors have the reverse effect. The results shed some systematic, empirical light on research collaboration practices, including not only norms and business-as-usual, but also routinely bad collaborations.
The goal of this paper is introducing the citer-success-index (cs-index), i.e. an indicator that uses the number of different citers as a proxy for the impact of a generic set of papers. For each of the articles of interest, it is defined a comparison term-which represents the number of citers that, on average, an article published in a certain period and scientific field is expected to "infect"-to be compared with the actual number of citers of the article. Similarly to the recently proposed success-index (Franceschini et al. Scientometrics 92(3): 621-6415, 2011), the cs-index allows to select a subset of "elite" papers. The cs-index is analyzed from a conceptual and empirical perspective. Special attention is devoted to the study of the link between the number of citers and cited authors relating to articles from different fields, and the possible correlation between the cs- and the success-index. Some advantages of the cs-index are that (i) it can be applied to multidisciplinary groups of papers, thanks to the field-normalization that it achieves at the level of individual paper and (ii) it is not significantly affected by self citers and recurrent citers. The main drawback is its computational complexity.
Author co-citation analysis (ACA) was an important method for discovering the intellectual structure of a given scientific field. There was sufficient experience that ACA would work with almost any user data that lent itself to co-occurrence. While most of the current researches still relied on the data of scientific literatures. In this study, in order to provide useful information for better enterprise management, the idea and method of ACA was applied to analyze the information interaction intensity and contents of enterprise web users. Firstly, the development of ACA was briefly introduced. Then the sample data and method used in this study were given. Three QQ groups' instant messages of a Chinese company were selected as the raw data and the concepts and model of user interaction intensity (UII) were proposed by referring the ACA theory. Social network analysis method, combined with in-deep interview method were used to analyze the information interaction intensity and contents of enterprise users. Operatively, Excel, Ucinet, Pajek, Netdraw and VOSviewer software were combined to analyze them quantitatively and visually. Finally, it concluded that UII model was relatively reasonable and it could nicely measure the information interaction intensity and contents of enterprise web users.
In order to evaluate approaches for identifying science citation index (SCI) covered publications within non-patent references (NPRs), the author employs a computer science method that uses two key indicators, recall and precision, to evaluate the relevance of information retrieval systems. There are two primary reasons that this method is adequate: in contrast to the retrievability ratios used previously, first, this method can evaluate two dimensions of matching accuracy, and second, results of its evaluation are independent of the intermediate outcome. The author then proposes an approach for identifying SCI publications within NPRs that consists of five steps: (1) data collection, (2) creation of supervised and test data, (3) selection and execution of matching algorithms, (4) evaluation of algorithms and optimization of their combinations, and (5) evaluation of optimized combinations. A comparison of the proposed and conventional approaches showed that the proposed approach works well, with results far better (99 % precision and 95 % recall) than the target implicitly set in previous studies. The author also applied the approach to comprehensive NPR data in U.S. utility patents registered between 1992 and 2012 and checked the performance. Results showed that the approach could identify SCI publications from within millions of NPRs in an acceptable time (i.e., within a couple of weeks) and that it performs as expected from the evaluation in step 5. On the basis of these results, the proposed approach is considered of value in studies on relations and/or interactions between science publications and patents.
Counts of hyperlinks between websites can be unreliable for webometrics studies so researchers have attempted to find alternate counting methods or have tried to identify the reasons why links in websites are created. Manual classification of individual links in websites is infeasible for large webometrics studies, so a more efficient approach to identifying the reasons for link creation is needed to fully harness the potential of hyperlinks for webometrics research. This paper describes a machine learning method to automatically classify hyperlink source and target page types in university websites. 78 % accuracy was achieved for automatically classifying web page types and up to 74 % accuracy for predicting link target page types from link source page characteristics.
This paper investigates disciplinary differences in how researchers use the microblogging site Twitter. Tweets from selected researchers in ten disciplines (astrophysics, biochemistry, digital humanities, economics, history of science, cheminformatics, cognitive science, drug discovery, social network analysis, and sociology) were collected and analyzed both statistically and qualitatively. The researchers tended to share more links and retweet more than the average Twitter users in earlier research and there were clear disciplinary differences in how they used Twitter. Biochemists retweeted substantially more than researchers in the other disciplines. Researchers in digital humanities and cognitive science used Twitter more for conversations, while researchers in economics shared the most links. Finally, whilst researchers in biochemistry, astrophysics, cheminformatics and digital humanities seemed to use Twitter for scholarly communication, scientific use of Twitter in economics, sociology and history of science appeared to be marginal.
Download indicators are of major potential interest because the great quantity of readily available download data means that any statistical inferences drawn from them will be of robust significance. We study the relationship between citation and downloads at the journal and paper levels, and the influence of language on that relationship. The data used were taken from the Scopus (citations) and ScienceDirect (downloads) databases. The results showed that downloads have limited utility as predictors of citation since it is in the early years when any correlations have the least significance. The relationship between downloads and citation also differs from one discipline to another. The relationship at the paper level is considerably weaker than at the journal level. This may be indicative of the number of downloads depending largely on the diffusion of the journal. In francophone regions, downloading from journals is proportionately less than citations to those same journals. There seems to be a part of citations to non-English-language journals which is invisible to Scopus. This makes the number of downloads proportionately greater than that of citations, leading to a lack of correlation between downloads and citations in that class of journal.
Increasing pressure on budgets of funding bodies has led to discussion of how to make financial resources go further, and to the concern that some researchers take more money from funding bodies for a particular project than needed, a practice that has been termed "double-dipping". Some evidence has emerged that this might be occurring, and in this context of suddenly increased funding scarcity, albeit in a system with greater forms of support, a proposal has been made that funding bodies monitor and manage individual researcher portfolios to optimize resource use. Our paper provides evidence relevant to both the "double dipping" issue and the proposal to manage portfolios. We show that where certain pre-conditions for "double dipping" are met (i.e. when funding comes from more than organisation, and the organisations fund research in a very similar area), and where therefore an argument to monitor researcher portfolios might be applicable, the research produced under these conditions has greater citation impact. We query the claim that when more funding is acknowledged this is inherently undesirable and we express our doubts that subjecting the allocation of funding to researchers to a bureaucratic management process will necessarily increase the impact of research.
This study puts an emphasis on the disciplinary differences observed for the behaviour of citations and downloads. This was exemplified by studying citations over the last 10 years in four selected fields, namely, arts and humanities, computer science, economics, econometrics, and finance, and oncology. Differences in obsolescence characteristics were studied using synchronic as well as diachronic counts. Furthermore, differences between document types were taken into consideration and correlations between journal impact and journal usage measures were calculated. The number of downloads per document remains almost constant for all four observed areas within the last four years, varying from approximately 180 (oncology) to 300 (economics). The percentage of downloaded documents is higher than 90 % for all areas. The number of citations per document ranges from one (arts and humanities) to three (oncology). The percentages of cited documents range from 40 to 56 %. According to our study, 50-140 downloads correspond to one citation. A differentiation according to document type reveals further download- and citation-specific characteristics for the observed subject areas. This study points to the fact that citations can only measure the impact in the 'publish or perish' community; however, this approach is neither applicable to the whole scientific community nor to society in general. Downloads may not be a perfect proxy to estimate the overall usage. Nevertheless, they measure at least the intention to use the downloaded material, which is invaluable information in order to better understand publication and communication processes. Usage metrics should consider the unique nature of downloads and ought to reflect their intrinsic differences from citations.
In this study, we evaluated future trends of worldwide patenting in nanotechnology and its domains using logistic growth curves while the patent activity from the main countries, technological domains and subdomains were assessed in four different contexts: worldwide, patents filed in the United States Patent and Trademark Office (USPTO), and patents applications in the triadic (TRIAD) and in the tetradic (TETRAD) countries. The indicators were developed based on a set of records recovered from the Derwent Innovation Index database. Nanotechnology has recently emerged as a new research field, with logistic trend behaviors generating interesting discussions since they suggest that technological development in nanotechnology and its domains has reached an initial maturation stage. Future scenarios were compiled due to the difficult to establish upper limits to forecasting curves. Although China's share of patents is small in some cases, it was the only country to constantly increase the number of patents from a worldwide perspective. In contrast, the USA and the EU were the most active in the USPTO, TRIAD and TETRAD cases, followed by Japan and Korea. The technological subdomains of main interest from countries/region changed according to the perspective adopted, even though there was a clear bias towards semiconductors, surface treatments, electrical components, macromolecular chemistry, materials-metallurgy, pharmacy-cosmetics and analysis-measurement-control subdomains. We conclude that monitoring nanotechnology advances should be constantly reviewed in order to confirm the evidence observed and forecasted.
In our article we compare downloads from ScienceDirect, citations from Scopus and readership data from the social reference management system Mendeley for articles from two information systems journals ("Journal of Strategic Information Systems" and "Information and Management") published between 2002 and 2011. Our study shows a medium to high correlation between downloads and citations (Spearman r = 0.77/0.76) and between downloads and readership data (Spearman r = 0.73/0.66). The correlation between readership data and citations, however, was only medium-sized (Spearman r = 0.51/0.59). These results suggest that there is at least "some" difference between the two usage measures and the citation impact of the analysed information systems articles. As expected, downloads and citations have different obsolescence characteristics. While the highest number of downloads are usually made in the publication year and immediately afterwards, it takes several years until the citation maximum is reached. Furthermore, there was a re-increase in the downloads in later years which might be an indication that citations also have an effect on downloads to some degree.
Productivity is the quintessential indicator of efficiency in any production system. It seems it has become a norm in bibliometrics to define research productivity as the number of publications per researcher, distinguishing it from impact. In this work we operationalize the economic concept of productivity for the specific context of research activity and show the limits of the commonly accepted definition. We propose then a measurable form of research productivity through the indicator "Fractional Scientific Strength (FSS)", in keeping with the microeconomic theory of production. We present the methodology for measure of FSS at various levels of analysis: individual, field, discipline, department, institution, region and nation. Finally, we compare the ranking lists of Italian universities by the two definitions of research productivity.
Altmetrics, indices based on social media platforms and tools, have recently emerged as alternative means of measuring scholarly impact. Such indices assume that scholars in fact populate online social environments, and interact with scholarly products in the social web. We tested this assumption by examining the use and coverage of social media environments amongst a sample of bibliometricians examining both their own use of online platforms and the use of their papers on social reference managers. As expected, coverage varied: 82 % of articles published by sampled bibliometricians were included in Mendeley libraries, while only 28 % were included in CiteULike. Mendeley bookmarking was moderately correlated (.45) with Scopus citation counts. We conducted a survey among the participants of the STI2012 participants. Over half of respondents asserted that social media tools were affecting their professional lives, although uptake of online tools varied widely. 68 % of those surveyed had LinkedIn accounts, while Academia.edu, Mendeley, and ResearchGate each claimed a fifth of respondents. Nearly half of those responding had Twitter accounts, which they used both personally and professionally. Surveyed bibliometricians had mixed opinions on altmetrics' potential; 72 % valued download counts, while a third saw potential in tracking articles' influence in blogs, Wikipedia, reference managers, and social media. Altogether, these findings suggest that some online tools are seeing substantial use by bibliometricians, and that they present a potentially valuable source of impact data.
Using Chinese National Knowledge Infrastructure as the data resource, this paper searched some papers about open access (OA). Some Visual Basic for Applications programs were developed to generate the co-word matrix, compute the E-index value of keywords as well as the density and centrality of thematic clusters. Callon's clustering method was also used to generate keywords clusters. Then, co-word analysis method and strategic diagrams were utilized to detect the main research themes as well as explore the development situation and status of these research themes. Furthermore, author-themes coupling network was mapped with the help of Netdraw in order to detect the relationship between core authors and research themes of OA as well as the core authors' influence on these themes. Based on this, some conclusions were got in the end.
This micro-level study explores the extent that citation analysis provides an accurate and representative assessment of the use and impact of bioinformatics e-research infrastructure. The bioinformatic e-research infrastructure studied offers common tools used by life scientists to analyse and interpret genetic and protein sequence information. These e-resources therefore provide an interesting example with which to explore how representative citations are as acknowledgements of knowledge in the life sciences. The examples presented here suggest that there is a relation between number of visits to these databases and number of citations; however, a parallel finding shows how citation analysis frequently underestimates acknowledged use of the resources offered on this e-research infrastructure. The paper discusses the implications of the findings for various aspects of impact measurement and also considers how appropriate citation analysis is as a measurement of knowledge claims.
Publications that are not indexed by citation indices such as Web of Science (WoS) or Scopus are called "non-source items". These have so far been neglected by most bibliometric analyses. The central issue of this study is to investigate the characteristics of non-source items and the effect of their inclusion in bibliometric evaluations in the social sciences, specifically German political science publications. The results of this study show that non-source items significantly increase the number of publications (+1,350 %) and to a lesser extent the number of citations from SCIE, SSCI, and A&HCI (+150 %) for evaluated political scientists. 42 % of non-source items are published as book chapters. Edited books and books are cited the most among non-source items. About 40 % of non-source items are in English, while 80 % of source items are in English. The citation rates of researchers taking non-source items into account are lower than those from source items, partially as a result of the limited coverage of WoS. In contrast, the H-indices of researchers taking only non-source items into account are higher than those from source items. In short, the results of this study show that non-source items should be included in bibliometric evaluations, regardless of their impact or the citations from them. The demand for a more comprehensive coverage of bibliometric database in the social sciences for a higher quality of evaluations is shown.
The purpose of this article is to map the evolving patterns of patent assignees' collaboration networks and build a latent collaboration index (LCI) model for evaluating the collaboration probability among assignees. The demonstration process was carried on the field of industrial biotechnology (IB) from 2000 to 2010. The results show that the number of assignees in the field of IB grew steadily, while the number of patents decreased slowly year by year after it reached peak in 2002 and 2003. Densification and growth analysis, average degree, density and components analysis showed that the collaboration networks tended to density. Especially the diameter analysis indicated that the IB field had come into a mature mode after finishing the topological transition occurred in about 2002 or 2003. The nodes had degree k followed a power law distribution, which implied a preferential linking feature of the network evolving and thus provided a foundation for link prediction from the aspect of network evolving. Basing on this, two network-related factors had been brought into the LCI model, which were degree and network distance. Their values were positive and negative for link prediction respectively. In addition, types of assignees, geographical distances and topics similarities had also been added into the LCI model. Different types of assignees had also different probabilities to be linked, such as corporations had been collaborated more frequently, while universities ranked lowest based on collaborations. Assignees from the same countries seemed to be likely to collaborate to each other. It have to been noted that the LCI model is flexible that can be adjusted of the factors or their weights according to different subjects, time or data. For instance, the topics similarities between assignees would be removed from the LCI model for link prediction in the field of IB because of the poor inference from topics similarities to collaborations. Actually, many promising pairs of assignees that seemed to have the potential to collaborate to each other according to one or more of these factors have never collaborated. One possible reason might be that collaboration is not popular behaviours among assignees during the process of patent application or maintain. Another reason could be the competitions between assignees. Many a time the promising pairs are competing pairs. Therefore, it was hard to carry out regression analysis basing on those four factors to get usable coefficients set of the four factors. The LCI model could only be used to make qualitative analysis on collaboration potential when it was revised.
Researchers typically pay greater attention to scientific papers published within the last 2 years, and especially papers that may have great citation impact in the future. However, the accuracy of current citation impact prediction methods is still not satisfactory. This paper argues that objective features of scientific papers can make citation impact prediction relatively accurate. The external features of a paper, features of authors, features of the journal of publication, and features of citations are all considered in constructing a paper's feature space. The stepwise multiple regression analysis is used to select appropriate features from the space and to build a regression model for explaining the relationship between citation impact and the chosen features. The validity of this model is also experimentally verified in the subject area of Information Science & Library Science. The results show that the regression model is effective within this subject.
Understanding the evolution of research topics is crucial to detect emerging trends in science. This paper proposes a new approach and a framework to discover the evolution of topics based on dynamic co-word networks and communities within them. The NEViewer software was developed according to this approach and framework, as compared to the existing studies and science mapping software tools, our work is innovative in three aspects: (a) the design of a longitudinal framework based on the dynamics of co-word communities; (b) it proposes a community labelling algorithm and community evolution verification algorithms; (c) and visualizes the evolution of topics at the macro and micro level respectively using alluvial diagrams and coloring networks. A case study in computer science and a careful assessment was implemented and demonstrating that the new method and the software NEViewer is feasible and effective.
This paper analyses the following seven sub-fields of Sustainable Energy Research with respect to the influence of proceedings papers on citation patterns across citing and cited document types, overall sub-field and document type impacts and citedness: the Wind Power, Renewable Energy, Solar and Wave Energy, Geo-thermal, Bio-fuel and Bio-mass energy sub-fields. The analyses cover peer reviewed research and review articles as well as two kinds of proceeding papers from conferences published 2005-2009 in (a) book series or volumes and (b) special journal issues excluding meeting abstracts cited 2005-2011 through Web of Science. Central findings are: The distribution across document types of cited versus citing documents is highly asymmetric. Predominantly proceedings papers from both proceeding volumes as well as published in journals cite research articles (60-76 %). Largely, journal-based proceedings papers are cited rather than papers published in book series or volumes and have field impacts corresponding to research articles. With decreasing proceedings paper dominance in research fields the ratio of proceeding paper volumes over journal-based proceedings papers decreases significantly and the percentage of proceedings papers in journals citing journal-based proceedings papers over all publications citing journal-based proceedings papers decreases significantly (from 26.3 % in Wind Power to 4 % in Bio Fuel). Further, the segment of all kinds of proceedings papers (the combined proceedings paper types) citing all proceedings papers over all publications citing all kinds of proceedings papers decreases significantly (from 36.1 % in Wind Power to 11.3 % in Bio Fuel). Simultaneously the field citedness increases across the seven research fields. The distribution of citations from review articles shows that novel knowledge essentially derives directly from research articles (53-72 %)-to a much less extent from proceedings publications published in journals (9-13 %).
While the citation context of a reference may provide detailed and direct information about the nature of a citation, few studies have specifically addressed the role of this information in retrieving relevant documents from the literature primarily due to the lack of full text databases. In this paper, we design a retrieval system based on full texts in the PubMed Central database. We constructed two modules in the retrieval system. One is a reference retrieval module based on citation contexts. Another is a citation context retrieval module for searching the citation contexts of a specific paper. The results of comparisons show that the reference retrieval module performed better than Google Scholar and PubMed database in terms of finding proper references based on topic words extracted from citation context. It also performed very well on searching highly cited papers and classic papers. The citation context retrieval module visualizes the topics of citation contexts as tag clouds and classifies citation contexts based on cue words in citation contexts.
The great importance international rankings have achieved in the research policy arena warns against many threats consequence of the flaws and shortcomings these tools present. One of them has to do with the inability to accurately represent national university systems as their original purpose is only to rank world-class universities. Another one has to do with the lack of representativeness of universities' disciplinary profiles as they usually provide a unique table. Although some rankings offer a great coverage and others offer league tables by fields, no international ranking does both. In order to surpass such limitation from a research policy viewpoint, this paper analyzes the possibility of using national rankings in order to complement international rankings. For this, we analyze the Spanish university system as a study case presenting the I-UGR Rankings for Spanish universities by fields and subfields. Then, we compare their results with those obtained by the Shanghai Ranking, the QS Ranking, the Leiden Ranking and the NTU Ranking, as they all have basic common grounds which allow such comparison. We conclude that it is advisable to use national rankings in order to complement international rankings, however we observe that this must be done with certain caution as they differ on the methodology employed as well as on the construction of the fields.
This paper analyses the patterns of Danish research productivity, citation impact and (inter) national collaboration across document types 2000-2012, prior to and after the introduction of the Norwegian publication point-based performance indicator in 2008. Document types analysed are: research articles; conference proceedings papers excluding meeting abstracts; and review articles. The Danish Research & Innovation Agency's basic statistics combined with Web of Science (WoS) are used for data collection and analyses. Findings demonstrate that the research article productivity increases steeply (37 %) after the start of the performance indicator and the citation impact progresses linearly over the entire period, regardless the introduction of the performance indicator. Academic staff progression is only 24 % during the same time period. The collaboration ratio between purely Danish and internationally cooperated research articles remains stable during the period, the number of collaborative countries increases while the ratio declines significantly for proceedings papers. The citation impact of internationally cooperated research articles increases since 2009 but drops for proceedings papers; also their productivity declines slightly from 2009 according to Research Agency statistics. Since 2006 the WoS indexing of proceedings papers is fast declining; as a consequence the ratio between Danish proceedings papers and research articles declines in WoS. According to Research Agency statistics a decline likewise takes place, starting from 2009. The positive growth in research articles mainly derives from the Science and Technology fields published in prestigious Level 2 journals; the development of articles published in less prestigious Level 1 journals derives from all fields. Three of the eight Danish universities have significantly altered their research publication profiles since 2009. The publication performance model is regarded as the significant accelerator of these processes in recent years.
This paper introduces author-level bibliometric co-occurrence network by discussing its history and contribution to the analysis of scholarly communication and intellectual structure. The difference among various author co-occurrence networks, which type of network shall be adapted in different situations, as well as the relationship among these networks, however, remain not explored. Five types of author co-occurrence networks were proposed: (1) co-authorship (CA); (2) author co-citation (ACC); (3) author bibliographic coupling (ABC); (4) words-based author coupling (WAC); (5) journals-based author coupling (JAC). Networks of 98 high impact authors from 30 journals indexed by 2011 version of Journal Citation Report-SSCI under the Information Science & Library Science category are constructed for study. Social network analysis and hierarchical cluster analysis are applied to identify sub-networks with results visualized by VOSviewer software. QAP test is used to find potential correlation among networks. Cluster analysis results show that all the five types of networks have the power for revealing intellectual structure of sciences but the revealed structures are different from each other. ABC identified more sub-structures than other types of network, followed by CA and ACC. WAC result is easily affected and JAC result is ambiguous. QAP test result shows that ABC network has the highest proximity with other types of networks while CA network has relatively lower proximity with other networks. This paper will provide a better comprehension of author interaction and contribute to cognitive application of author co-occurrence network analysis.
This study adopts a bibliometric approach to quantitatively assess current research trends in nanofiltration membrane technology (NFM), a new membrane separation technology widely used in various fields. It analyses scientific papers published between 1988-2011 in all journals contained in the Science Citation Index and patent data with the same time span from the Derwent patent database. The study examines developments in basic NFM research and technological innovations. Over the past 24 years, there has been a notable growth in publication outputs. Compared with other countries, China exhibited a rapid growth, particularly from 2000-2011, with its total number of papers ranking second only to the United States (US). Chinese NFM papers focus on energy and agriculture, while the US focuses on biochemistry and molecular biology. China holds the most global NFM patents, with rapid growth in patent numbers from 2005-2011. China, the US and Japan together hold 78% of the total global NFM patents and have a strong technological advantage in water treatment and separation technology. Although there are four Chinese institutions in the top 10 patentee list, most are application patents that focus on the integrated application of existing nanofiltration membrane. In contrast the patents owned by foreign patentees are mostly research patents involving technology innovations of the nanofiltration membrane itself. Therefore, NFM research capacity in China should be further strengthened to maximize the advantages gained via research to date.
Competitive technical intelligence addresses the landscape of both opportunities and competition for emerging technologies, as the boom of newly emerging science & technology (NEST)-characterized by a challenging combination of great uncertainty and great potential-has become a significant feature of the globalized world. We have been focusing on the construction of a "NEST Competitive Intelligence'' methodology that blends bibliometric and text mining methods to explore key technological system components, current R&D emphases, and key players for a particular NEST. This paper emphasizes the semantic TRIZ approach as a useful tool to process "Term Clumping'' results to retrieve "problem & solution (P&S)'' patterns, and apply them to technology roadmapping. We attempt to extend our approach into NEST Competitive Intelligence studies by using both inductive and purposive bibliometric approaches. Finally, an empirical study for dye-sensitized solar cells is used to demonstrate these analyses.
The Relative Specialization Index (RSI) is an indicator that measures the research profile of a country by comparing the share of a given field in the publications of a given country with the share of the same field in the world total of publications. If measured over time, this indicator may be influenced in the world total by the increased representation of certain other countries with different research profiles. As a case, we study the effect on the RSI for The Netherlands of the increased representation of China in the ISI Web of Science. Although the booming of China is visible in the RSI for The Netherlands, especially in the last decade and in fields where the countries have opposite specializations, the basic research profile as measured by the RSI remains the same. We conclude that the indicator is robust with regard to booming countries, and that it may suffice to observe the general changes in the research profile of the database if the RSI for a country is studied over time.
The study of science at the individual scholar level requires the disambiguation of author names. The creation of author's publication oeuvres involves matching the list of unique author names to names used in publication databases. Despite recent progress in the development of unique author identifiers, e. g., ORCID, VIVO, or DAI, author disambiguation remains a key problem when it comes to large-scale bibliometric analysis using data from multiple databases. This study introduces and tests a new methodology called seed ? expand for semi-automatic bibliographic data collection for a given set of individual authors. Specifically, we identify the oeuvre of a set of Dutch full professors during the period 1980-2011. In particular, we combine author records from a Dutch National Research Information System (NARCIS) with publication records from the Web of Science. Starting with an initial list of 8,378 names, we identify 'seed publications' for each author using five different approaches. Subsequently, we 'expand' the set of publications in three different approaches. The different approaches are compared and resulting oeuvres are evaluated on precision and recall using a 'gold standard' dataset of authors for which verified publications in the period 2001-2010 are available.
The prospects of altmetrics are especially encouraging for research fields in the humanities that currently are difficult to study using established bibliometric methods. Yet, little is known about the altmetric impact of research fields in the humanities. Consequently, this paper analyses the altmetric coverage and impact of humanities-oriented articles and books published by Swedish universities during 2012. Some of the most common altmetric sources are examined using a sample of 310 journal articles and 54 books. Mendeley has the highest coverage of journal articles (61%) followed by Twitter (21%) while very few of the publications are mentioned in blogs or on Facebook. Books, on the other hand, are quite often tweeted while both Mendeley's and the novel data source Library Thing's coverage is low. Many of the problems of applying bibliometrics to the humanities are also relevant for altmetric approaches; the importance of non-journal publications, the reliance on print as well the limited coverage of non-English language publications. However, the continuing development and diversification of methods suggests that altmetrics could evolve into a valuable tool for assessing research in the humanities.
In this article barycenters of the places of publication of monographs, edited books and book chapters are used to represent the internationalization of research in the Social Sciences and Humanities (SSH) as practiced at universities in Flanders (Belgium). Our findings indicate that, in terms of places of publication, the distance between peer reviewed and non-peer reviewed SSH book literature is growing. Whereas peer reviewed books are increasingly published abroad and in English, non-peer reviewed book literature remains firmly domestic and published in the Dutch language. This divergence is more the case for the Social Sciences than for the Humanities. For Law we have found a pattern along the lines of the Social Sciences. We discuss these findings in view of the two main readerships of SSH publications: international academia on the one hand, and a mostly domestic intelligentsia on the other.
To evaluate the usefulness of a full-text database as a source for assessing obliteration by incorporation (OBI), 3,707 article records including the catchphrases "bounded rationality" and/or "boundedly rational" (connected with the work of H. A. Simon) in the article text were retrieved from JSTOR, a full-text database with broad disciplinary coverage. Two subsets were analyzed-a 10 % systematic sample of all records and a set of all articles in Economics journals (with the addition of the Journal of Economic Theory). A majority of articles in the 10 % sample came from Economics and Management journals, while Psychology was poorly represented. In the 10 % sample, based on the percentage of true implicit citations between 1992 and 2009 in the 80 % of records that had a catchphrase in the body of the article, rather than just in the reference list, annual OBI ranged from 0 to 70 % (mean 33 %) with no discernible trend. The Economics articles showed a narrower range of OBI-fluctuating around 40 % implicit citations over the same time period. In both data sets, a large proportion of indirect citations were to sources that themselves cited a relevant work by Simon. Over 90 % of the articles in both the 10 % sample and the economics journal set would not have been retrieved with a database record search because they lacked the catchphrase in the record fields.
We introduce a method to predict or recommend high-potential future (i.e., not yet realized) collaborations. The proposed method is based on a combination of link prediction and machine learning techniques. First, a weighted co-authorship network is constructed. We calculate scores for each node pair according to different measures called predictors. The resulting scores can be interpreted as indicative of the likelihood of future linkage for the given node pair. To determine the relative merit of each predictor, we train a random forest classifier on older data. The same classifier can then generate predictions for newer data. The top predictions are treated as recommendations for future collaboration. We apply the technique to research collaborations between cities in Africa, the Middle East and South-Asia, focusing on the topics of malaria and tuberculosis. Results show that the method yields accurate recommendations. Moreover, the method can be used to determine the relative strengths of each predictor.
Using curriculum vitae (CVs) or Short Bios in published resources such as the Internet enables us to analyze many issues concerning researchers' careers. However, analysis of CVs or Short Bios concerning researchers' life history, such as movement between countries, has rarely been conducted. In this paper, we pursue two purposes: to demonstrate which conditions (citation impact, countries or sectors) are favorable for the analysis, and to show structures of production of highly cited papers. To grasp more obvious tendencies, we compare two ``extreme'' samples: highly cited and uncited papers. First, we assess the identification rates of researchers' origin broken down by researchers' affiliation (countries and sectors). Then, we analyze the influence of these researchers' international movement based on their origin. The results show the full landscape of the movement's influence on national publication, the characteristics of each country in terms of researchers' countries of origin and the research experience of both internationally moved and domestic researchers. Moreover, we analyze the contributions of researchers who returned from abroad to their home countries. Finally, we assess the limitations of our research method and the topic to be addressed concerning this method.
In this paper an analysis of the presence and possibilities of altmetrics for bibliometric and performance analysis is carried out. Using the web based tool Impact Story, we collected metrics for 20,000 random publications from the Web of Science. We studied both the presence and distribution of altmetrics in the set of publications, across fields, document types and over publication years, as well as the extent to which altmetrics correlate with citation indicators. The main result of the study is that the altmetrics source that provides the most metrics is Mendeley, with metrics on readerships for 62.6% of all the publications studied, other sources only provide marginal information. In terms of relation with citations, a moderate spearman correlation (r = 0.49) has been found between Mendeley readership counts and citation indicators. Other possibilities and limitations of these indicators are discussed and future research lines are outlined.
To survive worldwide competitions of research and development in the current rapid increase of information, decision-makers and researchers need to be supported to find promising research fields and papers. But finding those fields from an available data in too much heavy flood of information becomes difficult. We aim to develop a methodology supporting to find emerging leading papers with a bibliometric approach. The analyses in this work are about four academic domains using our time transition analysis. In the time transition analysis, after citation networks are constructed, centralities of each paper are calculated and their changes are tracked. Then, the centralities are plotted, and the features of the leading papers are extracted. Based on the features, we proposed ways to detect the leading papers by focusing on in-degree centrality and its transition. This work will contribute to finding the leading paper, and it is useful for decision-makers and researchers to decide the worthy research topic to invest their resources.
Historically, science of science (Sci2) studies have been performed by single investigators or small teams. As the size and complexity of data sets and analyses scales up, a "Big Science'' approach (Price, Little science, big science, 1963) is required that exploits the expertise and resources of interdisciplinary teams spanning academic, government, and industry boundaries. Big Sci2 studies utilize "big data'', i.e., large, complex, diverse, longitudinal, and/or distributed datasets that might be owned by different stake-holders. They apply a systems science approach to uncover hidden patterns, bursts of activity, correlations, and laws. They make available open data and open code in support of replication of results, iterative refinement of approaches and tools, and education. This paper introduces a database-tool infrastructure that was designed to support big Sci2 studies. The open access Scholarly Database (http://sdb.cns.iu.edu) provides easy access to 26 million paper, patent, grant, and clinical trial records. The open source Sci2 tool (http://sci2.cns.iu.edu) supports temporal, geospatial, topical, and network studies. The scalability of the infrastructure is examined. Results show that temporal analyses scale linearly with the number of records and file size, while the geospatial algorithm showed quadratic growth. The number of edges rather than nodes determined performance for network based algorithms.
Collaboration is believed to be influential on researchers' productivity. However, the impact of collaboration relies on factors such as disciplines, collaboration patterns, and collaborators' characters. In addition, at different career stages, such as the growth or the establishment career stages of scientists, collaboration is different in scale and scope, and its effect on productivity varies. In this paper, we study the relationships between collaboration and productivity in four disciplines, Organic Chemistry, Virology, Mathematics and Computer Science. Our studyfound that the productivity is correlated with collaboration in general, but the correlation could be positive or negative on the basis of which aspect of collaboration to measure, i.e., the scale or scope of the collaboration. The correlation becomes stronger as individual scientists progress through various stages of their career. Furthermore, experimental disciplines, such as Organic Chemistry and Virology, have shown stronger correlation coefficients than theoretical ones such as Mathematics and Computer Science.
People frequently answer consequential questions, such as those with a medical focus, using Internet search engines. Their primary goal is to revise or establish beliefs in one or more outcomes. Search engines are not designed to furnish answers, and instead provide results that may contain answers. Information retrieval research has targeted aspects of information access such as query formulation, relevance, and search success. However, there are important unanswered questions on how beliefs-and potential biases in those beliefs-affect search behaviors and how beliefs are shaped by searching. To understand belief dynamics, we focus on yes-no medical questions (e.g., Is congestive heart failure a heart attack?), with consensus answers from physicians. We show that (a) presearch beliefs are affected only slightly by searching and changes are likely to skew positive (yes); (b) presearch beliefs affect search behavior; (c) search engines can shift some beliefs by manipulating result rank and availability, but strongly-held beliefs are difficult to move using uncongenial information and can be counterproductive, and (d) search engines exhibit near-random answer accuracy. Our findings suggest that search engines should provide correct answers to searchers' questions and develop methods to persuade searchers to shift strongly held but factually incorrect beliefs.
Serendipity occurs when unexpected circumstances and an aha moment of insight result in a valuable, unanticipated outcome. Designing digital information environments to support serendipity can not only provide users with new knowledge, but also propel them in directions they might not otherwise have traveled insurprising and delighting them along the way. As serendipity involves unexpected circumstances it cannot be directly controlled, but it can be potentially influenced. However, to the best of our knowledge, no previous work has focused on providing a rich empirical understanding of how it might be influenced. We interviewed 14 creative professionals to identify their self-reported strategies aimed at increasing the likelihood of serendipity. These strategies form a framework for examining ways existing digital environments support serendipity and for considering how future environments can create opportunities for it. This is a new way of thinking about how to design for serendipity; by supporting the strategies found to increase its likelihood rather than attempting to support serendipity as a discrete phenomenon, digital environments not only have the potential to help users experience serendipity but also encourage them to adopt the strategies necessary to experience it more often.
The web encourages the constant creation and distribution of large amounts of information; it is also a valuable resource for understanding human behavior and communication. To take full advantage of the web as a research resource that extends beyond the consideration of snapshots of the present, however, it is necessary to begin to take web archiving much more seriously as an important element of any research program involving web resources. The ephemeral character of the web requires that researchers take proactive steps in the present to enable future analysis. Efforts to archive the web or portions thereof have been developed around the world, but these efforts have not yet provided reliable and scalable solutions. This article summarizes the current state of web archiving in relation to researchers and research needs. Interviews with researchers, archivists, and technologists identify the differences in purpose, scope, and scale of current web archiving practice, and the professional tensions that arise given these differences. Findings outline the challenges that still face researchers who wish to engage seriously with web content as an object of research, and archivists who must strike a balance reflecting a range of user needs.
As the Internet becomes ubiquitous, it has advanced to more closely represent aspects of the real world. Due to this trend, researchers in various disciplines have become interested in studying relationships between real-world phenomena and their virtual representations. One such area of emerging research seeks to study relationships between real-world and virtual activism of social movement organization (SMOs). In particular, SMOs holding extreme social perspectives are often studied due to their tendency to have robust virtual presences to circumvent real-world social barriers preventing information dissemination. However, many previous studies have been limited in scope because they utilize manual data-collection and analysis methods. They also often have failed to consider the real-world aspects of groups that partake in virtual activism. We utilize automated data-collection and analysis methods to identify significant relationships between aspects of SMO virtual communities and their respective real-world locations and ideological perspectives. Our results also demonstrate that the interconnectedness of SMO virtual communities is affected specifically by aspects of the real world. These observations provide insight into the behaviors of SMOs within virtual environments, suggesting that the virtual communities of SMOs are strongly affected by aspects of the real world.
A constant shifting between two main tenets of the information behavior (IB) fieldcentrality of the user and the essential role of contexthas become a differentiation point for contemporary approaches in the field, but it also poses a major difficulty in tracing information practices. On one side, the user-centered paradigm asks researchers to focus on the individual; on the other, emerging context-centered approaches move the position of context into the foreground of information studies. Although there have been attempts to create in between approaches to achieve a compromise between these two positions, they have merely generated more positions between the two poles in a continuum between approaches focusing on the individual and those focusing on context. Such positioning not only creates an endless debate about the research focus of information studies but also limits such studies to a set of factors, a priori defined by the researcher. This article argues that IB research could benefit from actor-network theory, which could give the actors a space to perform their own positioning.
Anyone who has clarified a thought or prompted a response during a conversation by drawing a picture has exploited the potential of image making to convey information. Images are increasingly ubiquitous in daily communication due to advances in visually enabled information and communication technologies (ICT), such as information visualization applications, image retrieval systems, and virtual collaborative work tools. Although images are often used in social contexts, information science research concerned with the visual representation of information typically focuses on the image artifact and system building. To learn more about image making as a form of social interaction and as a form of information practice, a qualitative study examined face-to-face conversations involving the creation of ad hoc visualizations (i.e., napkin drawings). Interactional sociolinguistic concepts of conversational involvement and coordination guided multimodal analysis of video-recorded interactions that included spontaneous drawing. Findings show patterns in communicative activities associated with the visual representation of information. Furthermore, the activity of mark making contributes to the maintenance of conversational involvement in ways that are not always evident in the drawn artifact. This research has implications for the design and evaluation of visually enabled virtual collaboration environments, visual information extraction and retrieval systems, and data visualization tools.
A theory of megacitation is introduced and used in an experiment to demonstrate how a qualitative scholarly book review can be converted into a weighted bibliometric indicator. We employ a manual human-coding approach to classify book reviews in the field of history based on reviewers' assessments of a book author's scholarly credibility (SC) and writing style (WS). In total, 100 book reviews were selected from the American Historical Review and coded for their positive/negative valence on these two dimensions. Most were coded as positive (68% for SC and 47% for WS), and there was also a small positive correlation between SC and WS (r=0.2). We then constructed a classifier, combining both manual design and machine learning, to categorize sentiment-based sentences in history book reviews. The machine classifier produced a matched accuracy (matched to the human coding) of approximately 75% for SC and 64% for WS. WS was found to be more difficult to classify by machine than SC because of the reviewers' use of more subtle language. With further training data, a machine-learning approach could be useful for automatically classifying a large number of history book reviews at once. Weighted megacitations can be especially valuable if they are used in conjunction with regular book/journal citations, and libcitations (i.e., library holding counts) for a comprehensive assessment of a book/monograph's scholarly impact.
Primary and secondary (K-12) teachers form the essential core of children's formal learning before adulthood. Even though teaching is a mainstream, information-rich profession, teachers are understudied as information users. More specifically, not much is known about teacher personal information management (PIM). Teacher PIM is critically important, as teachers navigate a complex information space complicated by the duality of digital and physical information streams and changing demands on instruction. Our research study increases understanding of teacher PIM and informs the development of tools to support educators. Some important unknowns exist about teachers as information users: What are teachers' PIM practices? What are the perceived consequences of these practices for teaching and learning? How can PIM practices be facilitated to benefit teaching and learning? This study employed a qualitative research design, with interviews from 24 primary and secondary teachers. We observed various systems for information organization, and teachers report their systems to be effective. Important sources for teachers' information in order of importance are personal collections, close colleagues, and the Internet. Key findings reveal that inheriting and sharing information play an important part in information acquisition for teachers and that information technology supporting education creates unintentional demands on information management. The findings on the nature of teacher information, teacher information finding, keeping, and organizational practices have important implications for teachers themselves, school principals, digital library developers, school librarians, curriculum developers, educational technology developers, and educational policy makers.
Annotations, in the form of markings and comments on the text, are often part of scholarly work. Digital platforms increasingly allow these annotations to be shared in group and public environments. To explore scholars' current behavior and attitudes toward shared annotations, semistructured interviews with 20 doctoral students were conducted. The findings suggest that sociocognitive processes are integral to scholars' creation and use of shared annotations. However, although scholars clearly support creating and using shared annotations, several sociocognitive hurdles have hampered adoption of scholarly shared annotation systems. This article discusses common themes emerging from the findings and relates them to research on shared annotations.
How information resources can be meaningfully related has been addressed in contexts from bibliographic entries to hyperlinks and, more recently, linked data. The genre structure and relationships among genre structure constituents shed new light on organizing information by purpose or function. This study examines the relationships among a set of functional units previously constructed in a taxonomy, each of which is a chunk of information embedded in a document and is distinct in terms of its communicative function. Through a card-sort study, relationships among functional units were identified with regard to their occurrence and function. The findings suggest that a group of functional units can be identified, collocated, and navigated by particular relationships. Understanding how functional units are related to each other is significant in linking information pieces in documents to support finding, aggregating, and navigating information in a distributed information environment.
Guided by literature on fatigue from within the domains of clinical and occupational studies, the present article seeks to define the phenomenon termed social network fatigue in the context of one of the popular uses of social networks, namely, to stay socially connected. This is achieved through an identification of the antecedents and effects of experiences that contribute to negative emotions or to a reduction in interest in using social networks with the help of a mixed-methods study. Five generic antecedents and varying effects of these antecedents on individual user activities have been identified. Fatigue experiences could result from social dynamics or social interactions of the members of the community, content made available on social networks, unwanted changes to the platform that hosts the network, self-detected immersive tendencies of the users themselves, or a natural maturing of the life cycle of the community to which the user belongs. The intensity of the fatigue experience varies along a continuum ranging from a mild or transient experience to a more severe experience, which may eventually result in the user's decision to quit the environment that causes stress. Thus, users were found to take short rest breaks from the environment, moderate their activities downward, or suspend their social network activities altogether as a result of fatigue experiences.
In this study, we discover Russian centers of excellence and explore patterns of their collaboration with each other and with foreign partners. Highly cited papers serve as a proxy for excellence and coauthored papers as a measure of collaborative efforts. We find that currently research institutes (of the Russian Academy of Sciences as well as others) remain the key players despite recent government initiatives to stimulate university science. The contribution of the commercial sector to high-impact research is negligible. More than 90% of Russian highly cited papers involve international collaboration, and Russian institutions often do not play a dominant role. Partnership with U.S., German, U.K., and French scientists increases markedly the probability of a Russian paper becoming highly cited. Patterns of national (intranational) collaboration in world-class research differ significantly across different types of organizations; the strongest ties are between three nuclear/particle physics centers. Finally, we draw a coauthorship map to visualize collaboration between Russian centers of excellence.
This paper uncovers patterns of knowledge dissemination among scientific disciplines. Although the transfer of knowledge is largely unobservable, citations from one discipline to another have been proven to be an effective proxy to study disciplinary knowledge flow. This study constructs a knowledge-flow network in which a node represents a Journal Citation Reports subject category and a link denotes the citations from one subject category to another. Using the concept of shortest path, several quantitative measurements are proposed and applied to a knowledge-flow network. Based on an examination of subject categories in Journal Citation Reports, this study indicates that social science domains tend to be more self-contained, so it is more difficult for knowledge from other domains to flow into them; at the same time, knowledge from science domains, such as biomedicine-, chemistry-, and physics-related domains, can access and be accessed by other domains more easily. This study also shows that social science domains are more disunified than science domains, because three fifths of the knowledge paths from one social science domain to another require at least one science domain to serve as an intermediate. This work contributes to discussions on disciplinarity and interdisciplinarity by providing empirical analysis.
This article explores the feasibility, benefits, and limitations of in-text author citation analysis and tests how well it works compared with traditional author citation analysis using citation databases. In-text author citation analysis refers to author-based citation analysis using in-text citation data from full-text articles rather than reference data from citation databases. It has the potential to help with the application of citation analysis to research fields such as the social sciences that are not covered well by citation databases and to support weighted citation and cocitation counting for improved citation analysis results. We found that in-text author citation analysis can work as well as traditional citation analysis using citation databases for both author ranking and mapping if author name disambiguation is performed properly. Using in-text citation data without any author name disambiguation, ranking authors by citations is useless, whereas cocitation analysis works well for identifying major specialties and their interrelationships with cautions required for the interpretation of small research areas and some authors' memberships in specialties.
As a complement to Nelson and Winter's (1977) article titled In Search of a Useful Theory of Innovation, a sociological perspective on innovation networks can be elaborated using Luhmann's social systems theory, on the one hand, and Latour's sociology of translations, on the other. Because of a common focus on communication, these perspectives can be combined as a set of methodologies. Latour's sociology of translations specifies a mechanism for generating variation in relations (associations), whereas Luhmann's systems perspective enables the specification of (functionally different) selection environments such as markets, professional organizations, and political control. Selection environments can be considered as mechanisms of social coordination that can self-organizebeyond the control of human agencyinto regimes in terms of interacting codes of communication. Unlike relatively globalized regimes, technological trajectories are organized locally in landscapes. A resulting duality of structure (Giddens, 1979) between the historical organization of trajectories and evolutionary self-organization at the regime level can be expected to drive innovation cycles. Reflexive translations add a third layer of perspectives to (a) the relational analysis of observable links that shape trajectories and (b) the positional analysis of networks in terms of latent dimensions. These three operations can be studied in a single framework, but using different methodologies. Latour's first-order associations can then be analytically distinguished from second-order translations in terms of requiring other communicative competencies. The resulting operations remain infrareflexively nested, and can therefore be used for innovative reconstructions of previously constructed boundaries.
Each co-author (CA) of any scientist can be given a rank of importance according to the number of joint publications which the authors have together. In this paper, the Zipf-Mandelbrot-Pareto law, i.e. is shown to reproduce the empirical relationship between and and shown to be preferable to a mere power law, . The CA core value, i.e. the core number of CAs, is unaffected, of course. The demonstration is made on data for two authors, with a high number of joint publications, recently considered by Bougrine (Scientometrics, 98(2): 1047-1064, 2014) and for seven authors, distinguishing between their "journal" and "proceedings" publications as suggested by Miskiewicz (Physica A, 392(20), 5119-5131, 2013). The rank-size statistics is discussed and the and exponents are compared. The correlation coefficient is much improved (0.99, instead of 0.92). There are marked deviations of such a co-authorship popularity law depending on sub-fields. On one hand, this suggests an interpretation of the parameter . On the other hand, it suggests a novel model on the (likely time dependent) structural and publishing properties of research teams. Thus, one can propose a scenario for how a research team is formed and grows. This is based on a hierarchy utility concept, justifying the empirical Zipf-Mandelbrot-Pareto law, assuming a simple form for the CA publication/cost ratio, . In conclusion, such a law and model can suggest practical applications on measures of research teams. In Appendices, the frequency-size cumulative distribution function is discussed for two sub-fields, with other technicalities.
This study employs social network analysis to identify institutions with strong international collaborative relationships in astronomical research. We find that the strongest ties tend to link institutions across continents in research collaboration. However, the effect of geographic factors is still notable in light of the fact that most of the institutions in the largest subgroup are located in Europe. Examination of the network position, measured by degree centrality, indicates that homophily is more common than heterophily in the network. A relatively high number of relational ties are observed among institutions that have similar levels of network centrality. Mutual relations are prevalent among central institutions, while strong mutual solidarity exists between institutions on the periphery of the network. This study shows a general unstable international collaborative relationship among astronomical institutions. While more and more institutions have linked up in research collaboration, many of them keep relatively weak ties. Institutions tend not remain in the same subgroup, but link to different partners over time.
This paper provides useful insights for the design of networks that promote research productivity. The results suggest that the different dimensions of social capital affect scientific performance differently depending on the area of knowledge. Overall, dense networks negatively affect the creation of new knowledge. In addition, the analysis shows that a division of labor in academia, in the sense of interdisciplinary research, increases the productivity of researchers. It is also found that the position in a network is critical. Researchers who are central tend to create more knowledge. Finally, the findings suggest that the number of ties have a positive impact on future productivity. Related to areas of knowledge, Exact Sciences is the area in which social capital has a stronger impact on research performance. On the other side, Social and Humanities, as well as Engineering, are the ones in which social capital has a lesser effect. The differences found across multiple domains of science suggest the need to consider this heterogeneity in policy design.
This article is a contribution towards an understanding of open access (OA) publishing. It proposes an analysis framework of 18 core attributes, divided into the areas of bibliographic information, activity metrics, economics, accessibility, and predatory issues. The framework has been employed in a systematic analysis of 30 OA journals in software engineering (SE) and information systems (IS), which were selected from among 386 OA journals in Computer Science from the Directory of OA Journals. An analysis was performed on the sample of the journals, to provide an overview of the current situation of OA journals in the fields of SE and IS. The journals were then compared between-group, according to the presence of article processing charges. A within-group analysis was performed on the journals requesting article processing charges from authors, in order to understand what is the value added according to different price ranges. This article offers several contributions. It presents an overview of OA definitions and models. It provides an analysis framework born from the observation of data and the existing literature. It raises the need to study OA in the fields of SE and IS while offering a first analysis. Finally, it provides recommendations to readers of OA journals. This paper highlights several concerns still threatening the adoption of OA publishing in the fields of SE and IS. Among them, it is shown that high article processing charges are not sufficiently justified by the publishers, which often lack transparency and may prevent authors from adopting OA.
In recent years, collaborations between scholars have drastically increased in all fields. Using individual and country collaboration data from the past 30 years, this paper studies the evolution and trend of collaboration networks in the field of information systems. Our research shows that individual scholars and all countries display the "long tail" phenomenon in article publishing. Average collaboration degree and co-authorship ratio of articles over time are on the rise overall. Evolutionary analysis of collaboration networks manifest that the network development is basically mature although it has not yet reached a stable status. International collaborations have shown a gradual increase, with the increase in participating countries distributed mainly in Europe and Asia and increase in collaborations mainly in North America and Europe, especially the United States, England and Canada.
Field normalization is a necessary step in a fair cross-field comparison of citation impact. In practice, mean-based method (m-score) is the most popular method for field normalization. However, considering that mean-based method only utilizes the central tendency of citation distribution in the normalization procedure and dispersion is also a significant characteristic, an open and important issue is whether alternative normalization methods which take both central tendency and variability into account perform better than mean-based method. With the aim of collapsing citation distributions of different fields into a universal distribution, this study compares the normalization effect of m-score and z-score based on 236 Web of Science (WoS) subject categories. The results show that both m-score and z-score have remarkable normalization effect as compared with raw citations, but neither of them can realize the ideal goal of "universality of citation distributions". The results also suggest that m-score is generally preferable to z-score. The essential cause that m-score has an edge over z-score as a whole has a direct relationship with the characteristics of skewed citation distributions in which case m-score is more applicable than z-score.
The ProQuest Dissertations and Theses database contains records for approximately 2.3 million dissertations conferred at 1,490 research institutions across 66 countries. Despite the scope of the Dissertations and Theses database, no study has explicitly sought to validate the accuracy of the ProQuest SCs. This research examines the degree to which ProQuest SCs serve as proxies for disciplinarity, the relevance of doctoral work to doctoral graduates' current work, and the permeability of disciplines from the perspective of the mismatch between SCs and disciplinarity. To examine these issues we conducted a survey of 2009-2010 doctoral graduates, cluster-sampled from Economics, Political Science, and Sociology ProQuest SCs. The results from the survey question the utility of traditional disciplinary labels and suggest that scholars may occupy a post-interdisciplinary space in which they move freely across disciplinary boundaries and identify with topics instead of disciplines.
Rankings have become a major form of quality assessment in higher education over the past few decades. Most rankings rely, to varying extent, on bibliometric indicators intended to capture the quantity and quality of the scientific output of institutions. The growing popularity of this practice has raised a number of concerns, one of the most important being whether evaluations of this sort treat different work styles and publication habits in an unbiased manner and, consequently, whether the resulting rankings properly respect the particular modes of research characteristic of various disciplines and subdisciplines. The research reported in this paper looked at this issue, using data on more than one hundred US sociology departments. Our results showed that institutions that are more quantitative in character are more likely to favor journals over books as the dominant form of scientific communication and fare, in general, considerably better on the National Research Council's assessment than their qualitative equivalents. After controlling for differences in publication practices, the impact of research style declined but remained statistically significant. It thus seems that the greater preference of qualitative departments for books over articles as publication outlets puts them at a disadvantage as far as quality assessments are concerned, although their lagging behind their quantitative counterparts cannot fully be explained by this factor alone.
Measuring scientific performance is currently a common practice of funding agencies, fellowship evaluations and hiring institutions. However, as has already been recognized by many authors, comparing the performance in different scientific fields is a difficult task due to the different publication and citation patterns observed in each field. In this article, we defend that scientific performance of an individual scientist, laboratory or institution should be analysed within the corresponding context and we provide objective tools to perform this kind of comparative analysis. The usage of the new tools is illustrated by using two control groups, to which several performance measurements are referred: one group being the Physics and Chemistry Nobel laureates from 2007 to 2012, the other group consisting of a list of outstanding scientists affiliated to two different institutions.
New institutions are coming to the fore as stakeholders in research, particularly hospitals and clinical departments involved in providing health care. As a result, new environments for research are gaining importance. This study aims to investigate how different individual characteristics, together with collective and contextual factors, affect the activity and performance of researchers in the particular setting of hospitals and research centres affiliated with the Spanish National Health System (NHS). We used a combination of quantitative science indicators and perception-based data obtained through a survey of researchers working at NHS hospitals and research centres. Inbreeding and involvement in clinical research is the combination of factors with the greatest influence on scientific productivity, because these factors are associated with increased scientific output both overall as well as in high-impact journals. Ultimately, however, satisfaction with human resources in research group combined with gender (linked in turn to leadership) is the combination of factors associated most clearly with the most relevant indicator of productivity success, i.e. the number of articles in high-impact journals as principal author. Researchers' competitiveness in obtaining research funding as principal investigator is associated with a combination of satisfaction with research autonomy and involvement in clinical research. Researchers' success is not significantly related with their age, seniority and international experience. The way health care institutions manage and combine the factors likely to influence research may be critical for the development and maintenance of research-conducive environments, and ultimately for the success of research carried out in hospitals and other settings within the national public health system.
In this paper, we look at the issue of the high-end of research performance which is captured in the tail of a citation distribution. As the mean is insufficient to capture the skewness of such distributions, a consistency or concentration measure is the additional parameter needed. We show that the h-index is only approximately a heuristic mock of a composite indicator built from three primary indicators which are the number, mean and consistency term. The z-index is able to sense the change in consistency in the distribution due to the outliers in the tail of the distribution.
It is widely accepted that biotechnology is a globally significant and growing research field. Because of its biodiversity, Colombia has a comparative advantage to innovate and commercialize biotechnological products and services. The aim of this study is to obtain a research profile and intellectual structure of the country in the biotechnology field by using bibliometric methods. These methods are needed to monitor the capacities and the compliance of national policies in biotechnology. By using records extracted from the ISI WoK database, this study describes the biotechnology publication trend, productivity and collaboration among institutions and countries, preferred journals, and the intellectual structure at the research subject area. Although, a growing trend in biotechnology publication was observed, the productivity is still low when compared to other countries in the region and the world. On the other hand, researchers seem to show a preference for international over domestic collaboration. The results suggest two elements: first, policy has not had the expected outcome in the short term, and second, a lack of internal collaboration could therefore reflect low endogenous capacities. The bibliometric methods used in this study can be applied to a wide range of research fields other than biotechnology.
Digital preservation of scientific papers enables their wider accessibility, but also provides a valuable source of information that can be used in a longitudinal scientometric study. The Electronic Library of the Mathematical Institute of the Serbian Academy of Sciences and Arts (eLib) digitizes the most prominent mathematical journals printed in Serbia. In this paper, we study a co-authorship network which represents collaborations among authors who published their papers in the eLib journals in an 80 year period (from 1932 to 2011). Such study enables us to identify patterns and long-term trends in scientific collaborations that are characteristic for a community which mainly consists of Serbian (Yugoslav) mathematicians. Analysis of connected components of the network reveals a topological diversity in the network structure: the network contains a large number of components whose sizes obey a power-law, the majority of components are isolated authors or small trivial components, but there is also a small number of relatively large, non-trivial components of connected authors. Our evolutionary analysis shows that the evolution of the network can be divided into six periods that are characterized by different intensity and type of collaborative behavior among eLib authors. Analysis of author metrics shows that betweenness centrality is a better indicator of author productivity and long-term presence in the eLib journals than degree centrality. Moreover, the strength of correlation between productivity metrics and betweenness centrality increases as the network evolves suggesting that even more stronger correlation can be expected in the future.
Many webometric studies have used hyperlinks to investigate links to or between specific collections of websites to estimate their impact or identify connectivity patterns. Whilst major commercial search engines have previously been used to identify hyperlinks for these purposes, their hyperlink search facilities have now been shut down. In response, a range of alternative sources of link data have been suggested, but all have limitations. This article introduces a new type of link that can be identified from commercial search engines, linked title mentions. These can be found by querying title mentions in a search engine and then removing those not associated with a relevant hyperlink. Results of a proof of concept test on 51 U.S. library and information science schools and four other sets of schools suggest that linked title mentions may tend to give better results than title mentions in some cases when used for site inlinks but may not always be an improvement on URL citations. For links between or co-inlinks to specified pairs of academic websites, linked title mentions do not generally provide an improvement over title mentions, but they do over URL citations in some cases. Linked title mentions may also be useful for sets of non-academic websites when the alternatives give too few or misleading results.
Patents and licenses are foundational to successful technology transfer in universities. In this article, the activities and performance of university patenting and licensing are studied to gauge the effectiveness of the Bayh-Dole Act (the "Act"), the most influential piece of US legislation on university technology transfer (UTT). Based on raw data from five sources, the annual numbers of patents granted, licenses signed, startup companies launched, and research expenditures are analyzed. Correlations are performed for all data presented to quantify trends over different time periods. We found that patenting and licensing activities in US universities slowed down greatly after 2000 and remained flat until the period from 2010 to 2012, when activities recover to the level of strength characterizing the period before 2000 and after the enactment of the Act. We identify that economic recessions is the major cause to the flatness of the patenting activities during 2000s. We also explain some of the differences found among different data sources and time periods.
The Stirling index of the set of references of the corpus documents is widely used in the literature on interdisciplinary research and is defined as the integration score of the corpus under study. Such an indicator is relevant at the scale of a research institution, however, there is a gap between the integration scores of individual documents, and a global score computed on the whole set of references. The difference between the global index and the average of individual document indexes carries another relevant information about the corpus: it measures the diversity between the reference profiles of the corpus documents. It is, therefore, named between article index whereas the average of the individual article indexes is called within article index. The statistical properties of these two indexes as well as of the global index are derived from a general approximation method for distributions and lead to statistical tests which can be used to make meaningful comparisons between an institution indexes and benchmark values. The two dimensions of the global index provide a more acute information on the interdisciplinary practices of an institution researchers in a given research domain and is, therefore, likely to contribute to strategic and management issues.
This paper explores the changing role of world regions (North America, EU15, South EU, Central and Eastern Europe (CEE), Former-USSR, Latin America, Asia Pacific and the Middle East) in science from 1981 to 2011. We use bibliometric data extracted from Thomson Reuter's National Science Indicators (2011) for 21 broad disciplines, and aggregated the data into the four major science areas: life, fundamental, applied and social sciences. Comparing three sub-periods (1981-1989, 1990-2000 and 2001-2011), we investigate (i) over time changes in descriptive indicators such as publications, citations, and relative impact; (ii) static specialization measured by revealed comparative advantage (RCA) in citations and papers; and (iii) dynamic specialization measured by absolute growth in papers. Descriptive results show a global shift in science largely in quantity (papers) and much less in impact (citations). We argue this should be interpreted as a shift in science's absorptive capacity but not necessarily a shift of knowledge generation at the world science frontier, which reflects the nature of science systems operating with high inertia and path dependency in areas of their historically inherited advantages and disadvantages. In view of their common historical legacy in science we are particularly interested in the process of convergence/divergence of the catching-up/transition regions with the world frontier regions. We implement an interpretative framework to compare regions in terms of their static and dynamic specialization from 1981-1989 to 2001-2011. Again, our analysis shows that while science systems are mostly characterised by strong inertia and historically inherited (dis)advantages, Asia Pacific, Latin America and CEE show strong catching-up characteristics but largely in the absorptive capacity of science.
A bibliometric analysis was applied in this work to evaluate Antarctic research from 1993 to 2012 based on the Science Citation Index database. According to samples of 30,024 articles related to Antarctica, this study reveals the evolution of the scientific outputs on Antarctic research from the aspects of subject categories, major journals, international collaboration, and temporal trends in keywords focus. Antarctic research has developed rapidly in the past two decades, with an increasing amount of article output, references and citations. Geosciences multidisciplinary, oceanography, ecology, meteorology and atmospheric sciences and geography physical were the most popular subject categories. Among the 20 major journals related to Antarctic research, Polar Biology, Geophysical Research Letters and Journal of Geophysical Research-Atmospheres ranked as the top three. With the largest quantity of articles and high citations, USA was the leading contributor to global Antarctic research and had a dominant position in collaborative networks. In addition, a keywords analysis determined that climate change, sea ice and krill were the topics that generated the most interest and concern. Because this paper reveals underlying patterns in scientific outputs, research subjects and academic collaboration, it may serve as a summary of global research history on Antarctica and a potential basis for future research.
The study demonstrates an integrated method of forecasting the trend of a country's publications. In this context the paper examines international collaboration in a country's overall publication and forecasts its future trend. The integrated method is based on regression and scaling relationship. India is taken as a case study for this examination. The study shows some interesting features of India's publication pattern based on time-series data. One observes exponential nature of her publication growth from 2002 onwards. International collaboration also exhibits exponential growth roughly from the same period. Also one observes the faster growth of international collaborative papers than the overall growth of research papers. The study predicts values of number of internationally collaborative papers for the years 2015 and 2020. The robustness of the method is also demonstrated.
The name ambiguity problem presents many challenges for scholar finding, citation analysis and other related research fields. To attack this issue, various disambiguation methods combined with separate disambiguation features have been put forward. In this paper, we offer an unsupervised Dempster-Shafer theory (DST) based hierarchical agglomerative clustering algorithm for author disambiguation tasks. Distinct from existing methods, we exploit the DST in combination with Shannon's entropy to fuse various disambiguation features and come up with a more reliable candidate pair of clusters for amalgamation in each iteration of clustering. Also, some solutions to determine the convergence condition of the clustering process are proposed. Depending on experiments, our method outperforms three unsupervised models, and achieves comparable performances to a supervised model, while does not prescribe any hand-labelled training data.
This study illustrates scientists' referencing (mis)behavior by structuring the dissemination network of referencing errors. The sample set consists of 16,622 referencing errors of a highly cited paper published by Laemmli, UK in Nature in 1970. Dissemination networks of thirteen types of volume-page double errors and one type of page-only error are constructed and analyzed. Focusing on papers which carry the same volume-page double error, or the same page error, the citing-cited relationship between any two of them was identified and author bylines were compared to find common author(s). Our investigation results in three disseminating routes of referencing errors. Route 1: Citing a paper and copying its reference; Route 2: Copying a reference from another paper but without citing this paper; Route 3: Copying references from an earlier paper published by the author himself (herself) without rechecking the accuracy of the reference. The first two routes reflect scientists' referencing misbehavior while the third calls attention to self-copying of references.
A bibliometric approach is explored to tracking international scientific migration, based on an analysis of the affiliation countries of authors publishing in peer reviewed journals indexed in Scopus (TM). The paper introduces a model that relates base concepts in the study of migration to bibliometric constructs, and discusses the potentialities and limitations of a bibliometric approach both with respect to data accuracy and interpretation. Synchronous and asynchronous analyses are presented for 10 rapidly growing countries and 7 scientifically established countries. Rough error rates of the proposed indicators are estimated. It is concluded that the bibliometric approach is promising provided that its outcomes are interpreted with care, based on insight into the limits and potentialities of the approach, and combined with complementary data, obtained, for instance, from researchers' Curricula Vitae o, survey or questionnaire- based data. Error rates for units of assessment with indicator values based on sufficiently large numbers are estimated to be fairly below 10 %, but can be expected to vary substantially among countries of origin, especially between Asian countries and Western countries.
Person identification based on iris recognition is getting more and more attention among the modalities used for biometric recognition. This fact is due to the immutable and unique characteristics of the iris. Therefore it is of utmost importance for researchers interested in this discipline to know who and what is relevant in this area. This paper presents a comprehensive overview of the field of iris recognition research using a bibliometric approach. Besides, this article provides historical records, basic concepts, current progress and trends in the field. With this purpose in mind, our bibliometric study is based on 1,354 documents written in English, published between 2000 and 2012. Scopus was used to perform the information retrieval. In the course of this study, we synthesized significant bibliometric indicators on iris recognition research in order to evaluate to what extent this particular field has been explored. Thereby, we focus on foundations, temporal evolution, leading authors, most cited papers, significant conventions, leading journals, outstanding research topics and enterprises and patents. Research topics are classified into three main categories: ongoing, emerging, and decreasing according to their corresponding number of publications over the period under study. An analysis of these indicators suggests there has been major advances in iris recognition research and also reveals promising new avenues worthy of investigation in the future. This study will be useful to future investigators in the field.
An original cross-sectional dataset referring to a medium-sized Italian university is implemented in order to analyze the determinants of scientific research production at individual level. The dataset includes 942 permanent researchers of various scientific sectors for a 3-year time-span (2008-2010). Three different indicators-based on the number of publications and/or citations-are considered as response variables. The corresponding distributions are highly skewed and display an excess of zero-valued observations. In this setting, the goodness-of-fit of several Poisson mixture regression models are explored by assuming an extensive set of explanatory variables. As to the personal observable characteristics of the researchers, the results emphasize the age effect and the gender productivity gap-as previously documented by existing studies. Analogously, the analysis confirms that productivity is strongly affected by the publication and citation practices adopted in different scientific disciplines. The empirical evidence on the connection between teaching and research activities suggests that no univocal substitution or complementarity thesis can be claimed: a major teaching load does not affect the odds to be a non-active researcher and does not significantly reduce the number of publications for active researchers. In addition, new evidence emerges on the effect of researchers administrative tasks-which seem to be negatively related with researcher's productivity-and on the composition of departments. Researchers' productivity is apparently enhanced by operating in department filled with more administrative and technical staff, and it is not significantly affected by the composition of the department in terms of senior/junior researchers.
Results from self-observation of the working time distribution of a university teacher for a period of 45 years (starting from the very beginning of his carrier till two years after retirement) are reported classified in (i) teaching; (ii) scientific; (iii) administrative, organizational, technical; (iv) social/other activities. For the whole period, the teaching take 19 %, scientific work-22 % and various kinds of administrative, organizational and technical activities-52 % of the overall working time. The latter varies within the limits of 6.1-14.5 h per calendar day (mean values for an year) and in average is 10 h per calendar day for the entire 45-years-period. The changes of the working time distribution and the working day duration during the years are shown. The time consumed in fulfillment of obligations as head of research institution and of university units is revealed. The data are accompanied with information for the growth of the scientific production of the observed person. It is appeared that the latter as well as the growth of the citations of his papers can be depicted by the well known exponential law describing the accelerated development of the science.
Hundreds of scholarly studies have investigated various aspects of Wikipedia. Although a number of literature reviews have provided overviews of this vast body of research, none has specifically focused on the readers of Wikipedia and issues concerning its readership. In this systematic literature review, we review 99 studies to synthesize current knowledge regarding the readership of Wikipedia and provide an analysis of research methods employed. The scholarly research has found that Wikipedia is popular not only for lighter topics such as entertainment but also for more serious topics such as health and legal information. Scholars, librarians, and students are common users, and Wikipedia provides a unique opportunity for educating students in digital literacy. We conclude with a summary of key findings, implications for researchers, and implications for the Wikipedia community.
This paper reviews the worldwide growth of open-access (OA) repositories, 2005 to 2012, using data collected by the OpenDOAR project. Initial repository development was focused on North America, Western Europe, and Australasia, particularly the United States, United Kingdom, Germany, and Australia, followed by Japan. Since 2010, there has been repository growth in East Asia, South America, and Eastern Europe, especially in Taiwan, Brazil, and Poland. During the period, some countries, including France, Italy, and Spain, have maintained steady growth, whereas other countries, notably China and Russia, have experienced limited growth. Globally, repositories are predominantly institutional, multidisciplinary and English-language based. They typically use open-source OAI-compliant software but have immature licensing arrangements. Although the size of repositories is difficult to assess accurately, available data indicate that a small number of large repositories and a large number of small repositories make up the repository landscape. These trends are analyzed using innovation diffusion theory, which is shown to provide a useful explanatory framework for repository adoption at global, national, organizational, and individual levels. Major factors affecting both the initial development of repositories and their take-up include IT infrastructure, cultural factors, policy initiatives, awareness-raising activity, and usage mandates. Mandates are likely to be crucial in determining future repository development.
This article develops a framework for analyzing and comparing privacy and privacy protections across (inter alia) time, place, and polity and for examining factors that affect privacy and privacy protection. This framework provides a method to describe precisely aspects of privacy and context and a flexible vocabulary and notation for such descriptions and comparisons. Moreover, it links philosophical and conceptual work on privacy to social science and policy work and accommodates different conceptions of the nature and value of privacy. The article begins with an outline of the framework. It then refines the view by describing a hypothetical application. Finally, it applies the framework to a real-world privacy issue-campaign finance disclosure laws in the United States and France. The article concludes with an argument that the framework offers important advantages to privacy scholarship and for privacy policy makers.
This paper presents a new global patent map that represents all technological categories and a method to locate patent data of individual organizations and technological fields on the global map. This overlay map technique may support competitive intelligence and policy decision making. The global patent map is based on similarities in citing-to-cited relationships between categories of the International Patent Classification (IPC) of European Patent Office (EPO) patents from 2000 to 2006. This patent data set, extracted from the PATSTAT database, includes 760,000 patent records in 466 IPC-based categories. We compare the global patent maps derived from this categorization to related efforts of other global patent maps. The paper overlays the nanotechnology-related patenting activities of two companies and two different nanotechnology subfields on the global patent map. The exercise shows the potential of patent overlay maps to visualize technological areas and potentially support decision making. Furthermore, this study shows that IPC categories that are similar to one another based on citing-to-cited patterns (and thus close in the global patent map) are not necessarily in the same hierarchical IPC branch, thereby revealing new relationships between technologies that are classified as pertaining to different (and sometimes distant) subject areas in the IPC scheme.
Web search engines (WSEs) use search queries to profile users and to provide personalized services like query disambiguation or refinement. These services are valuable because users get an enhanced search experience. However, the compiled user profiles may contain sensitive information that might represent a privacy threat. This issue should be addressed in a way that it also preserves the utility of the profile with regard to search services. State-of-the-art approaches tackle these issues by generating and submitting fake queries that are related to the interests of the user. This technique allows the WSE to only know general (and useful) data while the detailed (and potentially private) data are obfuscated. To build fake queries, these proposals rely on past queries to obtain user interests. However, we argue that this is not always the best strategy and, in this article, we study the use of social networks to gather more accurate user profiles that enable better personalized service while offering a similar, or even better, level of practical privacy. These hypotheses are empirically supported by evaluations using real profiles gathered from Twitter and a set of AOL search queries.
This paper provides a better understanding of the implications of researchers' social networks in bibliographic references. Using a set of chemistry papers and conducting interviews with their authors (n = 32), I characterize the type of relation the author has with the authors of the references contained in his/her paper (n = 3,623). I show that citation relationships do not always involve underlying personal exchanges and that unknown references are an essential component, revealing segmentations in scientific groups. The relationships implied by references are of various strengths and origins. Several inclusive social circles are then identified: co-authors, close acquaintances, colleagues, invisible colleges, peers, contactables, and strangers. I conclude that publication is a device that contributes to a relatively stable distribution among the various social circles that structure scientific sociability.
This study proposes a new way of using WordNet for query expansion (QE). We choose candidate expansion terms from a set of pseudo-relevant documents; however, the usefulness of these terms is measured based on their definitions provided in a hand-crafted lexical resource such as WordNet. Experiments with a number of standard TREC collections WordNet-based that this method outperforms existing WordNet-based methods. It also compares favorably with established QE methods such as KLD and RM3. Leveraging earlier work in which a combination of QE methods was found to outperform each individual method (as well as other well-known QE methods), we next propose a combination-based QE method that takes into account three different aspects of a candidate expansion term's usefulness: (a) its distribution in the pseudo-relevant documents and in the target corpus, (b) its statistical association with query terms, and (c) its semantic relation with the query, as determined by the overlap between the WordNet definitions of the term and query terms. This combination of diverse sources of information appears to work well on a number of test collections, viz., TREC123, TREC5, TREC678, TREC robust (new), and TREC910 collections, and yields significant improvements over competing methods on most of these collections.
This study explores the effect from considering citation relevancy in the main path analysis. Traditional citation-based analyses treat all citations equally even though there can be various reasons and different levels of relevancy for one document to reference another. Taking the relevancy level into consideration is intuitively advantageous because it adopts more accurate information and will thus make the results of a citation-based analysis more trustworthy. This is nevertheless a challenging task. We are aware of no citation-based analysis that has taken the relevancy level into consideration. The difficulty lies in the fact that the existing patent or patent citation database provides no readily available relevancy level information. We overcome this issue by obtaining citation relevancy information from a legal database that has relevancy level ranked by legal experts. This paper selects trademark dilution, a legal concept that has been the subject of many lawsuit cases, as the target for exploration. We apply main path analysis, taking citation relevancy into consideration, and verify the results against a set of test cases that are mentioned in an authoritative trademark book. The findings show that relevancy information helps main path analysis uncover legal cases of higher importance. Nevertheless, in terms of the number of significant cases retrieved, relevancy information does not seem to make a noticeable difference.
This article presents a novel method for extracting knowledge from Wikipedia and a classification schema for annotating the extracted knowledge. Unlike the majority of approaches in the literature, we use the raw Wikipedia text for knowledge acquisition. The main assumption made is that the concepts classified under the same node in a taxonomy are described in a comparable way in Wikipedia. The annotation of the extracted knowledge is done at two levels: ontological and logical. The extracted properties are evaluated in the traditional way, that is, by computing the precision of the extraction procedure and in a clustering task. The second method of evaluation is seldom used in the natural language processing community, but it is regularly employed in cognitive psychology.
Citation indicators are increasingly used in book-based disciplines to support peer review in the evaluation of authors and to gauge the prestige of publishers. However, because global citation databases seem to offer weak coverage of books outside the West, it is not clear whether the influence of non-Western books can be assessed with citations. To investigate this, citations were extracted from Google Books and Google Scholar to 1,357 arts, humanities and social sciences (AHSS) books published by 5 university presses during 19612012 in 1 non-Western nation, Malaysia. A significant minority of the books (23% in Google Books and 37% in Google Scholar, 45% in total) had been cited, with a higher proportion cited if they were older or in English. The combination of Google Books and Google Scholar is therefore recommended, with some provisos, for non-Western countries seeking to differentiate between books with some impact and books with no impact, to identify the highly-cited works or to develop an indicator of academic publisher prestige.
A number of bibliometric indices have been developed to evaluate an individual's scientific impact, and the most popular are the h-index and its variants. However, existing bibliometric indices are computed based on the number of citations received by each article, but they do not consider the frequency with which individual citations are mentioned in an article. We use "citation mention" to denote a unique occurrence of a cited reference mentioned in the citing article, and thus some citations may have more than one mention in an article. According to our analysis of the ACL Anthology Network corpus in the natural language processing field, more than 40% of cited references have been mentioned twice or in corresponding citing articles. We argue that citation mention is a preferable for representing the citation relationships between articles, that is, a reference article mentioned m times in the citing article will be considered to have received m citations, rather than one citation. Based on this assumption, we revise the h-index and propose a new bibliometric index, the WL-index, to evaluation an individual's scientific impact. According to our empirical analysis, the proposed WL-index more accurately discriminates between program committee chairs of reputable conferences and ordinary authors.
It is often necessary to compose a team consisting of experts with diverse competencies to accomplish complex tasks. However, for its proper functioning, it is also preferable that a team be socially cohesive. A team recommendation system, which facilitates the search for potential team members, can be of great help both for (a) individuals who need to seek out collaborators and for (b) managers who need to build a team for some specific tasks. Such a decision support system that readily helps summarize multiple metrics indicating a team (and its members) quality, and possibly rank the teams in a personalized manner according to the end users' preferences, thus serves as a tool to cope with what would otherwise be an information avalanche. In this work, we present Social Web Application for Team Recommendation, a general-purpose framework to compose various information retrieval and social graph mining and visualization subsystems together to build a composite team recommendation system, and instantiate it for a case study of academic teams.
Visually assisted product image search has gained increasing popularity because of its capability to greatly improve end users' e-commerce shopping experiences. Different from general-purpose content-based image retrieval (CBIR) applications, the specific goal of product image search is to retrieve and rank relevant products from a large-scale product database to visually assist a user's online shopping experience. In this paper, we explore the problem of product image search through salient edge characterization and analysis, for which we propose a novel image search method coupled with an interactive user region-of-interest indication function. Given a product image, the proposed approach first extracts an edge map, based on which contour curves are further extracted. We then segment the extracted contours into fragments according to the detected contour corners. After that, a set of salient edge elements is extracted from each product image. Based on salient edge elements matching and similarity evaluation, the method derives a new pairwise image similarity estimate. Using the new image similarity, we can then retrieve product images. To evaluate the performance of our algorithm, we conducted 120 sessions of querying experiments on a data set comprised of around 13k product images collected from multiple, real-world e-commerce websites. We compared the performance of the proposed method with that of a bag-of-words method (Philbin, Chum, Isard, Sivic, & Zisserman, 2008) and a Pyramid Histogram of Orientated Gradients (PHOG) method (Bosch, Zisserman, & Munoz, 2007). Experimental results demonstrate that the proposed method improves the performance of example-based product image retrieval.
This research adopts a repertoire approach to examine the concept of a health information repertoire defined as a set of sources through which people get health information. Drawing on a random sample survey in Austin, TX, it borrows the concepts of cultural omnivores and univores to investigate how health information repertoire are related to social capital and digital inequalities. Results demonstrate that both the size and the composition of health information repertoires vary by social and digital connectivity. People with greater social capital have a larger repertoire and are less likely to be univores dependent on the Internet or interpersonal contacts. People with Internet access have a larger repertoire and are less likely to be univores dependent on television. More skilled Internet users are less likely to be univores dependent on interpersonal contacts, whereas frequent Internet users are more likely to be omnivores with a four-channel repertoire including the Internet, interpersonal contacts, television, and newspaper. The positive relationship between social capital and repertoire size is stronger among less-skilled Internet users. There are significant variations in health information repertoires in terms of media access and sociodemographic characteristics. Scholarly and practical implications are discussed.
Electronic documents produced in business processes are valuable information resources for organizations. In many cases they have to be accessible long after the life of the business processes or information systems in connection with which they were created. To improve the management and preservation of documents, organizations are deploying Extensible Markup Language (XML) as a standardized format for documents. The goal of this paper is to increase understanding of XML document management and provide a framework to enable the analysis and description of the management of XML documents throughout their life. We followed the design science approach. We introduce a document life cycle model consisting of five phases. For each of the phases we describe the typical activities related to the management of XML documents. Furthermore, we also identify the typical actors, systems, and types of content items associated with the activities of the phases. We demonstrate the use of the model in two case studies: one concerning the State Budget Proposal of the Finnish government and the other concerning a faculty council meeting agenda at a university.
In this article, the authors introduce two citation-based approaches to facilitate a multidimensional evaluation of 39 selected management journals. The first is a refined application of PageRank via the differentiation of citation types. The second is a form of mathematical manipulation to identify the roles that the selected management journals play. Their findings reveal that Academy of Management Journal, Academy of Management Review, and Administrative Science Quarterly are the top three management journals, respectively. They also discovered that these three journals play the role of a knowledge hub in the domain. Finally, when compared with Journal Citation Reports (Thomson Reuters, Philadelphia, PA), their results closely match expert opinions.
We present CitNetExplorer, a new software tool for analyzing and visualizing citation networks of scientific publications. CitNetExplorer can for instance be used to study the development of a research field, to delineate the literature on a research topic, and to support literature reviewing. We first introduce the main concepts that need to be understood when working with CitNetExplorer. We then demonstrate CitNetExplorer by using the tool to analyze the scientometric literature and the literature on community detection in networks. Finally, we discuss some technical details on the construction, visualization, and analysis of citation networks in CitNetExplorer. (C) 2014 Elsevier Ltd. All rights reserved.
The citations to a set of academic articles are typically unevenly shared, with many articles attracting few citations and few attracting many. It is important to know more precisely how citations are distributed in order to help statistical analyses of citations, especially for sets of articles from a single discipline and a small range of years, as normally used for research evaluation. This article fits discrete versions of the power law, the lognormal distribution and the hooked power law to 20 different Scopus categories, using citations to articles published in 2004 and ignoring uncited articles. The results show that, despite its popularity, the power law is not a suitable model for collections of articles from a single subject and year, even for the purpose of estimating the slope of the tail of the citation data. Both the hooked power law and the lognormal distributions fit best for some subjects but neither is a universal optimal choice and parameter estimates for both seem to be unreliable. Hence only the hooked power law and discrete lognormal distributions should be considered for subject-and-year-based citation analysis in future and parameter estimates should always be interpreted cautiously. (C) 2014 Elsevier Ltd. All rights reserved.
A similarity comparison is made between 120 journals from five allied Web of Science disciplines (Communication, Computer Science-Information Systems, Education & Educational Research, Information Science & Library Science, Management) and a more distant discipline (Geology) across three time periods using a novel method called citing discipline analysis that relies on the frequency distribution of Web of Science Research Areas for citing articles. Similarities among journals are evaluated using multidimensional scaling with hierarchical cluster analysis and Principal Component Analysis. The resulting visualizations and groupings reveal clusters that align with the discipline assignments for the journals for four of the six disciplines, but also greater overlaps among some journals for two of the disciplines or categorizations that do not necessarily align with their assigned disciplines. Some journals categorized into a single given discipline were found to be more closely aligned with other disciplines and some journals assigned to multiple disciplines more closely aligned with only one of the assigned disciplines. The proposed method offers a complementary way to more traditional methods such as journal co-citation analysis to compare journal similarity using data that are readily available through Web of Science. (C) 2014 Elsevier Ltd. All rights reserved.
International scientific collaboration has been the dominant driving force for promoting scientific and technological advancement. However, current international scientific collaboration analysis and evaluation mainly concentrate on the exploration of international collaboration network; hence, an evaluation method of international scientific collaboration is yet to be formed. In this paper, we take the dye-sensitized solar cells (DSSCs) as an empirical object and combine the international collaboration network with geographic information, which we call the International Collaboration Map, to display the international collaboration situations among countries or regions worldwide (inter-country collaboration), the collaborations among countries or regions within each continent (intra-continent collaboration), and the collaborations among continents (inter-continent collaboration) from different angles. Based on the thought of fractional count, this study introduces the indicators of collaborative country rank, international collaboration width, and international collaboration activity; the study employs the International Collaboration Activity Index (ICAI) to comprehensively measure the degree of countries or regions international collaboration at the country or region level. It systematically evaluates the differences between the active degree of relevant countries or regions in collaborative research. We use correlation analysis among the five sub-criterions and verify the rationality of index construction. K-means clustering analysis is undertaken among 84 countries or regions in the DSSCs field. The results show the formation of three groups, each with their unique international collaboration features. (C) 2014 Elsevier Ltd. All rights reserved.
Delayed recognition is a concept applied to articles that receive very few to no citations for a certain period of time following publication, before becoming actively cited. To determine whether such a time spent in relative obscurity had an effect on subsequent citation patterns, we selected articles that received no citations before the passage of ten full years since publication, investigated the subsequent yearly citations received over a period of 37 years and compared them with the citations received by a group of papers without such a latency period. Our study finds that papers with delayed recognition do not exhibit the typical early peak, then slow decline in citations, but that the vast majority enter decline immediately after their first - and often only - citation. Middling papers' citations remain stable over their lifetime, whereas the more highly cited papers, some of which fall into the "sleeping beauty" subtype, show non-stop growth in citations received. Finally, papers published in different disciplines exhibit similar behavior and did not differ significantly. (C) 2014 Elsevier Ltd. All rights reserved.
Using empirical data I demonstrate that the result of performance evaluations by percentiles can be drastically influenced by the proper choice of the journal in which a manuscript is published. (C) 2014 Elsevier Ltd. All rights reserved.
We apply the test of Ijiri and Simon (1974) to a large data set of authors in economics. This test has been used by Tol (2009, 2013a) to identify a (within-author) Matthew effect for authors based on citations. We show that the test is quite sensitive to its underlying assumptions and identifies too often a potential Matthew effect. We propose an alternative test based on the pure form of Gibrat's law. It states that stochastic proportionate citation growth, i.e. independent of its size, leads to a lognormal distribution. By using a one-sided Kolmogorov-Smirnov test we test for deviations from the lognormal distribution which we interpret as an indication of the Matthew effect. Using our large data set we also explore potential empirical characteristics of economists with a Matthew effect. (C) 2014 Elsevier Ltd. All rights reserved.
Today, it is not clear how the impact of research on other areas of society than science should be measured. While peer review and bibliometrics have become standard methods for measuring the impact of research in science, there is not yet an accepted framework within which to measure societal impact. Alternative metrics (called altmetrics to distinguish them from bibliometrics) are considered an interesting option for assessing the societal impact of research, as they offer new ways to measure (public) engagement with research output. Altmetrics is a term to describe web-based metrics for the impact of publications and other scholarly material by using data from social media platforms (e. g. Twitter or Mendeley). This overview of studies explores the potential of altmetrics for measuring societal impact. It deals with the definition and classification of altmetrics. Furthermore, their benefits and disadvantages for measuring impact are discussed. (C) 2014 Elsevier Ltd. All rights reserved.
It is well-known that the distribution of citations to articles in a journal is skewed. We ask whether journal rankings based on the impact factor are robust with respect to this fact. We exclude the most cited paper, the top 5 and 10 cited papers for 100 economics journals and recalculate the impact factor. Afterwards we compare the resulting rankings with the original ones from 2012. Our results show that the rankings are relatively robust. This holds both for the 2-year and the 5-year impact factor. (C) 2014 Elsevier Ltd. All rights reserved.
This paper examined the citation impact of Chinese- and English-language articles in Chinese-English bilingual journals indexed by Scopus and Web of Science (WoS). Two findings were obtained from comparative analysis: (1) Chinese-language articles were not biased in citations compared with English-language articles, since they received a large number of citations from Chinese scientists; (2) a Chinese-language community was found in Scopus, in which Chinese-language articles mainly received citations from Chinese-language articles, but it was not found in WoS whose coverage of Chinese-language articles is only one-tenth of Scopus. The findings suggest some implications for academic evaluation of journals including Chinese-language articles in Scopus and WoS. (C) 2014 Elsevier Ltd. All rights reserved.
This paper exploits a unique 2003-2011 large dataset, indexed by Thomson Reuters, consisting of 17.2 million disambiguated authors classified into 30 broad scientific fields, as well as the 48.2 million articles resulting from a multiplying strategy in which any article coauthored by two or more persons is wholly assigned as many times as necessary to each of them. The dataset is characterized by a large proportion of authors who have their oeuvre in several fields. We measure individual productivity in two ways that are uncorrelated: as the number of articles per person and as the mean citation per article per person in the 2003-2011 period. We analyze the shape of the two types of individual productivity distributions in each field using size-and scale-independent indicators. To assess the skewness of productivity distributions we use a robust index of skewness, as well as the Characteristic Scores and Scales approach. For productivity inequality, we use the coefficient of variation. In each field, we study two samples: the entire population, and what we call "successful authors", namely, the subset of scientists whose productivity is above their field average. The main result is that, in spite of wide differences in production and citation practices across fields, the shape of field productivity distributions is very similar across fields. The parallelism of the results for the population as a whole and for the subset of successful authors, when productivity is measured as mean citation per article per person, reveals the fractal nature of the skewness of scientific productivity in this case. These results are essentially maintained when any article co-authored by two or more persons is fractionally assigned to each of them. (C) 2014 Elsevier Ltd. All rights reserved.
Can altmetric data be validly used for the measurement of societal impact? The current study seeks to answer this question with a comprehensive dataset (about 100,000 records) from very disparate sources (F1000, Altmetric, and an in-house database based on Web of Science). In the F1000 peer review system, experts attach particular tags to scientific papers which indicate whether a paper could be of interest for science or rather for other segments of society. The results show that papers with the tag "good for teaching" do achieve higher altmetric counts than papers without this tag - if the quality of the papers is controlled. At the same time, a higher citation count is shown especially by papers with a tag that is specifically scientifically oriented ("new finding"). The findings indicate that papers tailored for a readership outside the area of research should lead to societal impact. If altmetric data is to be used for the measurement of societal impact, the question arises of its normalization. In bibliometrics, citations are normalized for the papers' subject area and publication year. This study has taken a second analytic step involving a possible normalization of altmetric data. As the results show there are particular scientific topics which are of especial interest for a wide audience. Since these more or less interesting topics are not completely reflected in Thomson Reuters' journal sets, a normalization of altmetric data should not be based on the level of subject categories, but on the level of topics. (C) 2014 Elsevier Ltd. All rights reserved.
In this study, we identified and analyzed characteristics of top-cited single-author articles published in the Science Citation Index Expanded from 1991 to 2010. A top-cited single-author article was defined as an article that had been cited at least 1000 times from the time of its publication to 2012. Results showed that 1760 top-cited single-author articles were published in 539 journals listed in 130 Web of Science categories between 1901 and 2010. The top productive journal was Science and the most productive category was multidisciplinary physics. Most of the articles were not published in high-impact journals. Harvard University led all other institutions in publishing top-cited single-author articles. Nobel Prize winners contributed 7.0% of articles. In total, 72 Nobel Prize winners published 124 single-author articles. Single-authored papers published in different periods exhibited different patterns of citation trends. However, top-cited articles consistently showed repetitive peaks regardless of the time period of publication. "Theory (or theories)" was the most frequently appeared title word of all time. Leading title words varied at different time periods, and only five title words, method(s), protein(s), structure(s), molecular, and quantum consistently remained in the top 20 in different time periods. (C) 2014 Elsevier Ltd. All rights reserved.
Citations are increasingly used for research evaluations. It is therefore important to identify factors affecting citation scores that are unrelated to scholarly quality or usefulness so that these can be taken into account. Regression is the most powerful statistical technique to identify these factors and hence it is important to identify the best regression strategy for citation data. Citation counts tend to follow a discrete lognormal distribution and, in the absence of alternatives, have been investigated with negative binomial regression. Using simulated discrete lognormal data (continuous lognormal data rounded to the nearest integer) this article shows that a better strategy is to add one to the citations, take their log and then use the general linear (ordinary least squares) model for regression (e. g., multiple linear regression, ANOVA), or to use the generalised linear model without the log. Reasonable results can also be obtained if all the zero citations are discarded, the log is taken of the remaining citation counts and then the general linear model is used, or if the generalised linear model is used with the continuous lognormal distribution. Similar approaches are recommended for altmetric data, if it proves to be lognormally distributed. (C) 2014 Elsevier Ltd. All rights reserved.
The distribution of cumulative citations L and contributed citations L-f to individual multi-authored papers published by selected authors working in different scientific disciplinesis analyzed and discussed using Langmuir-type function: y(n)=y(0)[1-alpha Kn/(1 + Kn)], where y(n)-denotes the total number of normalized cumulative citations l(n)* and normalized contributed citations l(nf)* received by individual papers of rank n, y(0) is the maximum value of y(n) when n = 0, alpha >= 1 is an effectiveness parameter, and K is the Langmuir constant related to the dimensionless differential energy Q = ln(KNc), with N-c as the number of papers receiving citations. Relationships between the values of the Langmuir constant K of the distribution function, the number N-c of papers of an individual author receiving citations and the effectiveness parameter alpha of this function, obtained from analysis of the data of rank size distributions of the authors, are investigated. It was found that: (1) the quantity KNc obtained from the real citation distribution of papers of various authors working in different disciplines is inversely proportional to (alpha-1) with a proportional constant (KNc)(0)< 1, (2) the relation KNc=(KNc)(0)/(alpha-1) also holds for the citation distribution of journals published in countries of two different groups, investigated earlier (Sangwal, K. (2013). Journal of Informetrics, 7, 487-504), and (3) deviations of the real citation distribution from curves predicted by the Langmuir-type function are associated with changing activity of sources of generation of items (citations). (C) 2014 Elsevier Ltd. All rights reserved.
We study the correlation between citation-based and expert-based assessments of journals and series, which we collectively refer to as sources. The source normalized impact per paper (SNIP), the Scimago Journal Rank 2 (SJR2) and the raw impact per paper (RIP) indicators are used to assess sources based on their citations, while the Norwegian model is used to obtain expert-based source assessments. We first analyze - within different subject area categories and across such categories - the degree to which RIP, SNIP and SJR2 values correlate with the quality levels in the Norwegian model. We find that sources at higher quality levels on average have substantially higher RIP, SNIP, and SJR2 values. Regarding subject area categories, SNIP seems to perform substantially better than SJR2 from the field normalization point of view. We then compare the ability of RIP, SNIP and SJR2 to predict whether a source is classified at the highest quality level in the Norwegian model or not. SNIP and SJR2 turn out to give more accurate predictions than RIP, which provides evidence that normalizing for differences in citation practices between scientific fields indeed improves the accuracy of citation indicators. (C) 2014 Elsevier Ltd. All rights reserved.
The percentile-based rating scale P100 describes the citation impact in terms of the distribution of unique citation values. This approach has recently been refined by considering also the frequency of papers with the same citation counts. Here I compare the resulting P100' with P100 for an empirical dataset and a simple fictitious model dataset. It is shown that P100' is not much different from standard percentile-based ratings in terms of citation frequencies. A new indicator P100 '' is introduced. (C) 2014 Elsevier Ltd. All rights reserved.
The delivery of personalized news content depends on the ability to predict user interests. We evaluated different methods for acquiring user profiles based on declared and actual interest in various news topics and items. In an experiment, 36 students rated their interest in six news topics and in specific news items and read on 6 days standard, nonpersonalized editions and personalized (basic or adaptive) news editions. We measured subjective satisfaction with the editions and expressed preferences, along with objective measures, to infer actual interest in items. Users' declared interest in news topics did not strongly predict their actual interest in specific news items. Satisfaction with all news editions was high, but participants preferred the personalized editions. User interest was weakly correlated with reading duration, article length, and reading order. Different measures predicted interest in different news topics. Explicit measures predicted interest in relatively clearly defined topics such as sports, but were less appropriate for broader topics such as science and technology. Our results indicate that explicit and implicit methods should be combined to generate user profiles. We suggest that a personalized newspaper should contain both general information and personalized items, selected based on specific combinations of measures for each of the different news topics. Based on the findings, we present a general model to decide on the personalization of news content to generate personalized editions for readers.
Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
This article explores how best to use lexical and statistical translation evidence together for cross-language information retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine-readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the adverse effect of translation ambiguity. Coverage statistics for NII Testbeds and Community for Information Access Research (NTCIR) queries confirm that these resources have complementary strengths. Experiments with translation evidence from a small parallel corpus indicate that even rather rough estimates of translation probabilities can yield further improvements over a strong technique for translation weighting based on using Jensen-Shannon divergence as a term-association measure. Finally, a novel approach to posttranslation query expansion using a random walk over the Wikipedia concept link graph is shown to yield further improvements over alternative techniques for posttranslation query expansion. Evaluation results on the NTCIR-5 English-Korean test collection show statistically significant improvements over strong baselines.
Pseudorelevance feedback (PRF) was proposed to solve the limitation of relevance feedback (RF), which is based on the user-in-the-loop process. In PRF, the top-k retrieved images are regarded as PRF. Although the PRF set contains noise, PRF has proven effective for automatically improving the overall retrieval result. To implement PRF, the Rocchio algorithm has been considered as a reasonable and well-established baseline. However, the performance of Rocchio-based PRF is subject to various representation choices (or factors). In this article, we examine these factors that affect the performance of Rocchio-based PRF, including image-feature representation, the number of top-ranked images, the weighting parameters of Rocchio, and similarity measure. We offer practical insights on how to optimize the performance of Rocchio-based PRF by choosing appropriate representation choices. Our extensive experiments on NUS-WIDE-LITE and Caltech 101+Corel 5000 data sets show that the optimal feature representation is color moment+wavelet texture in terms of retrieval efficiency and effectiveness. Other representation choices are that using top-20 ranked images as pseudopositive and pseudonegative feedback sets with the equal weight (i.e., 0.5) by the correlation and cosine distance functions can produce the optimal retrieval result.
Personalization of information retrieval tailors search towards individual users to meet their particular information needs by taking into account information about users and their contexts, often through implicit sources of evidence such as user behaviors. This study looks at users' dwelling behavior on documents and several contextual factors: the stage of users' work tasks, task type, and users' knowledge of task topics, to explore whether or not taking account contextual factors could help infer document usefulness from dwell time. A controlled laboratory experiment was conducted with 24 participants, each coming 3 times to work on 3 subtasks in a general work task. The results show that task stage could help interpret certain types of dwell time as reliable indicators of document usefulness in certain task types, as was topic knowledge, and the latter played a more significant role when both were available. This study contributes to a better understanding of how dwell time can be used as implicit evidence of document usefulness, as well as how contextual factors can help interpret dwell time as an indicator of usefulness. These findings have both theoretical and practical implications for using behaviors and contextual factors in the development of personalization systems.
Automatic image annotation plays a critical role in modern keyword-based image retrieval systems. For this task, the nearest-neighbor-based scheme works in two phases: first, it finds the most similar neighbors of a new image from the set of labeled images; then, it propagates the keywords associated with the neighbors to the new image. In this article, we propose a novel approach for image annotation, which simultaneously improves both phases of the nearest-neighbor-based scheme. In the phase of neighbor search, different from existing work discovering the nearest neighbors with the predicted distance, we introduce a ranking-oriented neighbor search mechanism (RNSM), where the ordering of labeled images is optimized directly without going through the intermediate step of distance prediction. In the phase of keyword propagation, different from existing work using simple heuristic rules to select the propagated keywords, we present a learning-based keyword propagation strategy (LKPS), where a scoring function is learned to evaluate the relevance of keywords based on their multiple relations with the nearest neighbors. Extensive experiments on the Corel 5K data set and the MIR Flickr data set demonstrate the effectiveness of our approach.
With the ubiquity of the Internet and the rapid development of Web 2.0 technology, social question and answering (SQA) websites have become popular knowledge-sharing platforms. As the number of posted questions and answers (QAs) continues to increase rapidly, the massive amount of question-answer knowledge is causing information overload. The problem is compounded by the growing number of redundant QAs. SQA websites such as Yahoo! Answers are open platforms where users can freely ask or answer questions. Users also may wish to learn more about the information provided in an answer so they can use related keywords in the answer to search for extended, complementary information. In this article, we propose a novel approach to identify complementary QAs (CQAs) of a target QA. We define two types of complementarity: partial complementarity and extended complementarity. First, we utilize a classification-based approach to predict complementary relationships between QAs based on three measures: question similarity, answer novelty, and answer correlation. Then we construct a CQA network based on the derived complementary relationships. In addition, we introduce a CQA network analysis technique that searches the QA network to find direct and indirect CQAs of the target QA. The results of experiments conducted on the data collected from Yahoo! Answers Taiwan show that the proposed approach can more effectively identify CQAs than can the conventional similarity-based method. Case and user study results also validate the helpfulness and the effectiveness of our approach.
The intent of this article is to use cooperative game theory to predict the level of social impact of scholarly papers created by citation networks. Social impact of papers can be defined as the net effect of citations on a network. A publication exerts direct and indirect influence on others (e.g., by citing articles) and is itself influenced directly and indirectly (e.g., by cited articles). This network leads to an influence structure of citing and cited publications. Drawing on cooperative game theory, our research problem is to translate into mathematical equations the rules that govern the social impact of a paper in a citation network. In this article, we show that when citation relationships between academic papers function within a citation structure, the result is social impact instead of the (individual) citation impact of each paper. Mathematical equations explain the interaction between papers in such a citation structure. The equations show that the social impact of a paper is affected by the (individual) citation impact of citing publications, immediacy of citing articles, and number of both citing and cited papers. Examples are provided for several academic papers.
This study aims to understand the learning experience of English language learners (ELLs) within the framework of Kuhlthau's Information Search Process (ISP). Forty-eight ELL students from three classes at a high school participated in the study while they conducted a research project in English. Data were collected through demographic questionnaire and process surveys. Students' demographic information, knowledge about their research topic, labeling of knowledge, estimate of interest and knowledge, and learning outcomes were collected and analyzed with content analysis and statistical techniques. The findings indicate that ELL students, as a whole group, showed significant increases in their topical knowledge and estimate of interest and knowledge as they progressed in the research project, which are consistent with what other ISP-based studies found. When three different English proficiency-level groups were compared, only the intermediate group showed significant increases in topical knowledge and estimate of knowledge throughout the process. Also, different research tasks impacted the amount and substance of knowledge students built and their estimated knowledge during the research project. The findings led to suggestions for instructional strategies such as learning goals reflecting various kinds of learning, differentiated instructions in mixed-ability classrooms, structured interventions, personalized research topics, and teacher-school librarian collaborations.
Citizen access to information, particularly scientific information used for public policy discussions and decision making, is important in a democracy. However, access to this information can sometimes be restricted or blocked in various ways. This research adopts Jaeger and Burnett's (2005) conceptualization of information access as the presence of a robust system through which information is made available to citizens and others (p. 465) with physical, intellectual, and social components. This provides the conceptual framework through which incidents of restricted access to science policy (RASP) were analyzed in a comparative case study. The research found that citizens' physical, intellectual, and social access to scientific research was restricted in these cases. Furthermore, the theoretical framework of democratic accountability held normative and symbolic power for the respondents; although democratic accountability did not accurately predict respondents' actions, it was used as a significant justification for their actions. This research suggests that the conceptual framework of information access (with physical, intellectual, and social components) and the theoretical framework of democratic accountability (although primarily normative) may be useful approaches to subsequent investigations of censorship and restricted access in other situations and research areas.
We identify the effects of specific organizational norms, arrangements, and policies regarding uses of social technologies for informal knowledge sharing by consultants. For this study, the term social technologies refers to the fast-evolving suite of tools such as traditional applications like e-mail, phone, and instant messenger; emerging social networking platforms (often known as social media) such as blogs and wikis; public social networking sites (i.e., Facebook, Twitter, and LinkedIn); and enterprise social networking technologies that are specifically hosted within one organization's computing environment (i.e., Socialtext). Building from structuration theory, the analysis presented focuses on the knowledge practices of consultants related to their uses of social technologies and the ways in which organizational norms and policies influence these practices. A primary contribution of this research is a detailed contextualization of social technology uses by knowledge workers. As many organizations are allowing social media-enabled knowledge sharing to develop organically, most corporate policy toward these platforms remains defensive, not strategic, limiting opportunities. Implications for uses and expectations of social technologies arising from this research will help organizations craft relevant policies and rules to best support technology-enabled informal knowledge practices.
With the increasing pressure on researchers to produce scientifically rigorous and relevant research, researchers need to find suitable publication outlets with the highest value and visibility for their manuscripts. Traditional approaches for discovering publication outlets mainly focus on manually matching research relevance in terms of keywords as well as comparing journal qualities, but other research-relevant information such as social connections, publication rewards, and productivity of authors are largely ignored. To assist in identifying effective publication outlets and to support effective journal recommendations for manuscripts, a three-dimensional profile-boosted research analytics framework (RAF) that holistically considers relevance, connectivity, and productivity is proposed. To demonstrate the usability of the proposed framework, a prototype system was implemented using the ScholarMate research social network platform. Evaluation results show that the proposed RAF-based approach outperforms traditional recommendation techniques that can be applied to journal recommendations in terms of quality and performance. This research is the first attempt to provide an integrated framework for effective recommendation in the context of scientific item recommendation.
Previous studies of international scientific collaboration have rarely gone beyond revealing the structural relationships between countries. Considering how scientific collaboration is actually initiated, this study focuses on the organization and sector levels of international coauthorship networks, going beyond a country-level description. Based on a network analysis of coauthorship networks between members of the Organisation for Economic Cooperation and Development (OECD), this study attempts to gain a better understanding of international scientific collaboration by exploring the structure of the coauthorship network in terms of university-industry-government (UIG) relationships, the mode of knowledge production, and the underlying dynamic of collaboration in terms of geographic, linguistic, and economic factors. The results suggest that the United States showed overwhelming dominance in all bilateral UIG combinations with the exception of the government-government (GG) network. Scientific collaboration within the industry sector was concentrated in a few players, whereas that between the university and industry sectors was relatively less concentrated. Despite the growing participation from other sectors, universities were still the main locus of knowledge production, with the exception of 5 countries. The university sector in English-speaking wealthy countries and the government sector of non-English-speaking, less-wealthy countries played a key role in international collaborations between OECD countries. The findings did not provide evidence supporting the institutional proximity argument.
Wikipedia, like other encyclopedias, includes biographies of notable people. However, because it is jointly written by many contributors, it is subject to constant manipulation by contributors attempting to add biographies of non-notable people. Over time, Wikipedia has developed inclusion criteria for notable people (e.g., receiving a significant award) based on which newly contributed biographies are evaluated. In this paper we present and analyze a set of simple indicators that can be used to predict which article will eventually be accepted. These indicators do not refer to the content itself, but to meta-content features (such as the number of categories that the biography is associated with) and to author-based features (such as if it is a first-time author). By training a classifier on these features, we successfully reached a high predictive performance (area under the receiver operating characteristic [ROC] curve [AUC] of 0.97) even though we overlooked the actual biography text.
Ranking journals is an important exercise in academia. While several approaches to rank journals exist, an inherent assumption of these approaches is that there is indeed a hierarchy of journals, which is captured by the methods used for ranking them. We address a more fundamental question: Is there a linear hierarchy within journals? In this article, we introduce the dominance ranking approach that investigates the extent of hierarchy in a given set of objects by examining the extent of intransitivity in the system of interactions. We test the efficacy of the approach to ranking information systems journals based on citation data spanning a 3 year period from 2009 to 2011. Results indicate that the approach is very effective in identifying the extent of hierarchy within journals, and subsequently in ranking the journals. With its statistical underpinnings, the approach brings greater objectivity to the ranking of journals than prior approaches.
Bibliometric studies often measure and compare scholarly performance, but they rarely investigate why universities, departments, and research groups do have different performance. In this paper we try to explain differences in scholarly performance of research groups in terms of organizational variables. In order to do this, we extensively review the relevant literature, and develop a model using two theoretical approaches. A multivariate analysis shows which of the independent variables do play a role in the various scholarly performance dimensions. The study shows what organizational strategies may help in optimizing performance in various dimensions. Implications are discussed.
Technology management (TM) is multidisciplinary in nature. This paper investigates the multidisciplinary characteristics of TM through journal citation network analysis. The TM network composed of ten TM specialty journals and relevant journals of other disciplines is constructed based on their citation relationships. In particular, the relatedness index is employed to capture the citation relationships between journals with consideration of different journal sizes. Scrutinizing the network reveals what disciplines have contributed to TM and to what disciplines TM has contributed. The role of TM journals in exchanging knowledge with other disciplines is also identified by using brokerage analysis. TM is shown to have a high degree of interaction with six disciplines: Business and Management, Marketing, Economics, Planning and Development, Information Science, and Industrial Engineering and Operations Research. It is shown that visualizing and analyzing the TM network can provide an excellent overview of its multidisciplinary structure in terms of knowledge flow. This can help TM researchers easily grasp the historical development and fundamental features of TM.
Patent search is a substantial basis for many operational questions and scientometric evaluations. We consider it as a sequence of distinct stages. The "patent wide search" involves a definition of system boundaries by means of classifications and a keyword search producing a patent set with a high recall level (see Schmitz in Patentinformetrie: Analyse und Verdichtung von technischen Schutzrechtsinformationen, DGI, Frankfurt (Main), 2010 with an overview of searchable patent meta data). In this set of patents a "patent near search" takes place, producing a patent set with high(er) precision. Hence, the question arises how the researcher has to operate within this patent set to efficiently identify patents that contain paraphrased descriptions of the sought inventive elements in contextual information and whether this produces different results compared to a conventional search. We present a semiautomatic iterative method for the identification of such patents, based on semantic similarity. In order to test our method we generate an initial dataset in the course of a patent wide search. This dataset is then analyzed by means of the semiautomatic iterative method as well as by an alternative method emulating the conventional process of keyword refinement. It thus becomes obvious that both methods have their particular "raison d'tre", and that the semiautomatic iterative method seems to be able to support a conventional patent search very effectively.
Research output and impact metrics derived from commercial citation databases such as Web of Science and Scopus have become the de facto indicators of scholarly performance across different disciplines and regions. However, it has been pointed out that the existing metrics are largely inadequate to reflect scholars' overall peer-mediated performance, especially in the social sciences and humanities (SSH) where publication channels are more diverse. In this paper alternative metrics exploring a variety of formal and informal communication channels were proposed, with the aim of better reflecting SSH scholarship. Data for a group of SSH scholars in Taiwan on these metrics were collected. Principal component analysis revealed four underlying dimensions represented by the 18 metrics. Multiple-regression analyses were then performed to examine how well each of these dimensions predicted the academic standing of the scholars, measured by the number of public grants awarded and prestigious research awards received. Differences in the significance of the predictors were found between the social sciences and humanities. The results suggest the need to consider disciplinary differences when evaluating scholarly performance.
Certain scholarly publications or patent publications may signal breakthroughs in basic scientific research or radical new technological developments. Are there bibliographical indicators that enable an analysis of R&D dynamics to help identify these 'local revolutions' in science and technology? The focus of this paper is on early stage identification of potential breakthroughs in science that may evolve into new technology. We analyse bibliographic information for a typical example of such a breakthrough to pinpoint information that has the potential to be used as bibliographic indicator. The typical example used is the landmark research paper by Novoselov et al. (Science 306(5696): 666-669, 2004) concerning graphene. After an initial accumulation of theoretical knowledge about graphene over a period of 50 years this publication of the discovery of a method to produce graphene had an immediate and significant impact on the R&D community; it provides a link between theory, experimental verification, and new technological applications. The publication of this landmark discovery marks a sharp rise in the number of scholarly publications, and not much later an increase in the number of filings for related patent applications. Noticeable within 2 years after publication is an above average influx of researchers and of organisations. Changes in the structure of co-citation term maps point to renewed interest from theoretical physicists. The analysis uncovered criteria that can help in identifying at early stage potential breakthroughs that link science and technology.
Scholarly publications reify fruitful collaborations between co-authors. A branch of research in the science studies focuses on analyzing the co-authorship networks of established scientists. Such studies tell us about how their collaborations developed through their careers. This paper updates previous work by reporting a transversal and a longitudinal studies spanning the lifelong careers of a cohort of researchers from the DBLP bibliographic database. We mined 3,860 researchers' publication records to study the evolution patterns of their co-authorships. Two features of co-authors were considered: (1) their expertise, and (2) the history of their partnerships with the sampled researchers. Our findings reveal the ephemeral nature of most collaborations: 70 % of the new co-authors were only one-shot partners since they did not appear to collaborate on any further publications. Overall, researchers consistently extended their co-authorships (1) by steadily enrolling beginning researchers (i.e., people who had never published before), and (2) by increasingly working with confirmed researchers with whom they already collaborated.
The scientific problem of this study is the analysis of the portfolio of outputs by public research labs in the presence of hybrid funding scheme based on public and market-oriented financing mechanisms. Research institutes are considered Decision Making Units, which produce two different kinds of scientific outputs using inputs. We consider some scientific outputs with more international visibility (High Visibility Outputs-HVOs) than others called Low Visibility Outputs (LVOs). We confront this problem by a scientometric approach applying models of the Directional Output Distance Function, which endeavours to measure and analyze the effects of hybrid financing of public research labs in terms of potential loss in high quality scientific outputs, in particular when the share of market-oriented funds is beyond a specific threshold. Results, considering R&D organizations of "hard sciences", seem to show that a hybrid financing scheme, too market-oriented for supporting operation (and survival) of research labs, tends to affect scientific output portfolio by lowering scientific performances and HVOs. The study here also proposes a preliminary analysis of the optimal level of market financing in relation to total financial resources for a fruitful co-existence of market and public funding scheme to maximize the scientific output (publications) of R&D labs. The findings show main differences across scientific departments and some critical weaknesses points and threats by public research labs for production of scientific outputs.
International collaboration has become a strategic policy initiative for building scientific competency in different countries. This is driven by increasing realisation that no country possess all the wherewithal to address complexities of scientific research, dedicate huge funding, and confront global challenges. Varied institutional mechanisms have been created by different countries for strategising international collaboration such as signing bilateral agreements, initiating dedicated programs with partner countries in different S&T areas. Some countries have further deepened their relationship by creating bilateral S&T organisations/specialised centres. The role of bilateral organisation in strengthening inter-country research and innovation partnership is not explicitly underscored in collaboration studies. The present study addresses this issue by taking up the case study of a bilateral organisation IFCPAR/CEFIPRA (Indo-French Centre for Promotion of Advanced Research/Centre Franco-Indien pour la Promotion de la Recherche Avanc,e) which was established by India and France in 1987 to support their science and technology partnership. Through this case study the paper draws insight of inter-country collaboration in S&T and show how its dynamics and structural aspects are affected by a bilateral organisation.
The article discusses the scientific output of the three South Caucasus republics: Armenia, Azerbaijan and Georgia (Armenia, Azerbaijan and Georgia are widely referred to as Transcaucasia Republics or South Caucasus Republics). It focuses on the scientific publications of Armenia, Azerbaijan and Georgia indexed in the Web of Science international database. The article first examines the role of the three republics in Soviet science and the scientific papers they produced during the last decade of the Union of Soviet Socialist Republics. The article then studies the scientific situation in Armenia, Azerbaijan and Georgia after the restoration of their independence in 1991, reviewing the three republics' scientific publications, their citations and their scientific cooperation, as well as other scientific indicators.
Many studies have found that co-authored research is more highly cited than single author research. This finding is policy relevant as it indicates that encouraging co-authored research will tend to maximise citation impact. Nevertheless, whilst the citation impact of research increase as the number of authors increases in the sciences, the extent to which this occurs in the social sciences is unknown. In response, this study investigates the average citation level of articles with one to four authors published in 1995, 1998, 2001, 2004 and 2007 in 19 social science disciplines. The results suggest that whilst having at least two authors gives a substantial citation impact advantage in all social science disciplines, additional authors are beneficial in some disciplines but not in others.
This pioneering approach to the subject area of Information Literacy Assessment in Higher Education (ILAHE) aims at gaining further knowledge about its scope from a terminological-spatial perspective and also at weighting and categorizing relevant terms on the basis of levels of similarity. From a retrospective and selective search, the bibliographic references of scientific literature on ILAHE were obtained from the most representative databases (LISA, ERIC and WOS), comprising the period 2000-2011 and restricting results to English language. Keywords in titles, descriptors and abstracts of the selected items were labelled and extracted with Atlas.ti software. The main research topics in this field were determined through a co-words analysis and graphically represented by the software VOSviewer. The results showed two areas of different density and five clusters that involved the following issues: evaluation-education, assessment, students-efficacy, learning-research, and library. This method has facilitated the identification of the main research topics about ILAHE and their degree of proximity and overlapping.
This paper provides a comprehensive comparative analysis of the South East European countries scientific output and impact by Frascati fields of science in the period of 2005-2010. The aim is to determine the volume of scientific output in the mentioned period, level of development of certain scientific fields in selected countries and quality of scientific publication production. SEE countries' scientific performance is examined on several indicators including total number of country publications per full time equivalent researcher, revealed publication advantage, the h index and top cited articles. Results of the study could be especially significant to the planners and policy-makers because they provide facts important for the long term S&T planning of the country.
Research dissemination in the Computer Science domain depends heavily on conference publications. The review processes of major conferences is rigorous and the work presented in those venues have more visibility and more citations than many journals, with the advantage of a faster dissemination of ideas. We consider that any evaluation system in the Computer Science domain must take into account conferences as having the same importance as journals. This makes the evaluation of venues an important issue. While journals are usually evaluated through their Impact Factor, there is no widely accepted method for evaluating conferences. In our work we analyzed the possibility of using Machine learning techniques to extend an existing ranking to new conferences, based on a set of measurements that are available for the majority of venues. Our proposal consists on the application of a Machine learning technique-self-organizing maps-with some extensions in order to classify new conferences based on an existing ranking. We also try to estimate the theoretical maximal accuracy that can be obtained using statistical learning techniques.
In this exploratory study, we analyze co-authorship networks of collaborative cancer research in India. The complete network is constructed from bibliometric data on published scholarly articles indexed in two well-known electronic databases covering two 6-year windows from 2000 to 2005 and 2006 to 2011 inclusive. Employing a number of important metrics pertaining to the underlying topological structures of the network, we discusses implications for effective policies to enhance knowledge generation and sharing in cancer research in the country. With some modifications, our methods can be applied without difficulty to examine policy structure of related disciplines in other countries of the world.
The present paper introduces two independent concepts. X-centage is a statistical indicator characterizing distributions of percentage-valued variables in a vein similar to Hirsch's h-index. Heterodisciplinarity is a measure of polydisciplinarity using the disciplinary categorization of references and/or citations. The Journal Citation Reports database is used for an empirical study of using the X-centage for measuring reference heterodisciplinarity of science fields.
The aim of this study is to examine how scientific collaborative features influence scientific collaboration networks and then affect scientific output. In order to explore the influence of scientific collaboration, we define three collaborative features: inertia, diversity and strength. The data are collected from Scopus and the Web of Science databases. Using technique for order preference by similarity to an ideal solution method, we firstly combine h-index, impact factor and SCImago journal rank to rank journals in the field of wind power. Then we construct the collaboration network of institutions and use structural equation model-partial least square to examine the relationship among collaborative features, network structure, and scientific output. The results show that collaborative diversity and strength have positive effects on scientific output, while collaborative inertia has a negative effect. Both of centrality and structural holes fully account for (mediate) the relationships between collaborative features and outputs. The findings have some important policy implications to scientific collaboration: (1) research institutions should actively participate in diverse collaborations; (2) rather than only collaborating with previous partners, they should seek more new partners; and (3) collaborative features are important antecedents of scientific networks.
The global number of papers in different areas has increased over the years. Additionally, changes in academic production scenarios, such as the decrease in the relative number of single-authored (SA) papers, have been observed. Thus, the aims of this study are to assess the trend of SA papers in four subareas of biology and also to estimate the year when 0.1 % of papers in these subareas will be SA (considering two adjusted models). The subareas investigated were Ecology, Genetics, Zoology and Botany. Our hypothesis is that all subareas show a decay in the number of SA papers. However, this pattern is more pronounced in subareas that were originally interdisciplinary (Genetics and Ecology) than in disciplinary areas (Zoology and Botany). In fact, SA papers have declined over the years in all subareas of biology, and according to the best model (Akaike Criteria), the first area that will have 0.1 % SA papers is Genetics, followed by Ecology. A partial regression indicates that the decrease in SA papers can be related to the increase in the number of authors and number of citations, suggesting the greater scientific impact of interdisciplinary research. However, other variables (e.g., political, linguistic and behavioral) can contribute to the decrease in SA papers. We lastly conclude that the number of SA papers in all subareas of biology in the coming years might continue decreasing and becoming rare, perhaps even to the point of extinction (to use a very common term in biology). In addition, all subareas of biology have become more interdisciplinary, combining the knowledge of various authors (and perhaps authors from different areas). The consequence of this approach is increasingly collaborative work, which may facilitate the increased success of the group.
With the study on 2,217,047 references of 280,280 source articles in Chinese Social Science Citation Index in year 2006-2008, we discovered the overall aging phenomenon of humanities and social sciences by means of synchronous citation analysis, and compared the aging law of seven disciplines. The results reveal that the aging speed of seven disciplines roughly descend as follows: Management, Economics, Education, Law, Literature, Philosophy, History. This is due to the reasons that the aging speed of humanities is slower than social sciences and the dependence of History and Philosophy on archival literature is the strongest. Moreover, each discipline of humanities and social sciences follows a basic function: half life (H) x Price Index (P) = constant C, C is 2.6 or so. Furthermore, the maximum citation age of humanities and social sciences at this stage is found to be about 3 years.
Rare earth elements (REE) are needed to produce many cutting-edge products, and their depletion is a major concern. In this paper, we identify unique characteristics of REE-related patents granted from 1975 to 2013 in five large patent offices around the world. Through topic detection and clustering of patent text, we found that purification processes related to oxides, nitrogen oxide, and exhaust gas were highlighted in the Korean Intellectual Property Office and Japan Patent Office (JPO). Molecular sieve, dispersion, and preparation methods involving yttrium, cerium, methane, zirconium, and ammonia were prominent in the China Patent and Trademark Office (CPTO) in the areas of performing operation and transporting. Quadratic assignment procedure correlation analysis was performed for IPC co-occurrence among REE patents in different offices, and the United States Patent and Trademark Office showed significantly different patterns than the CPTO and JPO. Furthermore, using betweenness centrality as an indicator of technology transition, the manufacture and treatment of nanostructures, nanotechnology for materials and surface science, and electrodes were identified as important REE technologies to be protected in Korea. In Japan, the technological areas identified as important for protection were the apparatuses and processes of manufacturing or assembling devices, compounds of iron, and materials. Our study results offer insights into national strategies for REE-related technologies in each country.
Null hypothesis statistical significance tests (NHST) are widely used in quantitative research in the empirical sciences including scientometrics. Nevertheless, since their introduction nearly a century ago significance tests have been controversial. Many researchers are not aware of the numerous criticisms raised against NHST. As practiced, NHST has been characterized as a 'null ritual' that is overused and too often misapplied and misinterpreted. NHST is in fact a patchwork of two fundamentally different classical statistical testing models, often blended with some wishful quasi-Bayesian interpretations. This is undoubtedly a major reason why NHST is very often misunderstood. But NHST also has intrinsic logical problems and the epistemic range of the information provided by such tests is much more limited than most researchers recognize. In this article we introduce to the scientometric community the theoretical origins of NHST, which is mostly absent from standard statistical textbooks, and we discuss some of the most prevalent problems relating to the practice of NHST and trace these problems back to the mix-up of the two different theoretical origins. Finally, we illustrate some of the misunderstandings with examples from the scientometric literature and bring forward some modest recommendations for a more sound practice in quantitative data analysis.
The paper investigates interdisciplinarity of scientific fields based on graph of collaboration between the researchers. A new measure for interdisciplinarity is proposed that takes into account graph content and structure. Similarity between science categories is estimated based on text similarity between their descriptions. The proposed new measure is applied in exploratory analysis of research community in Slovenia. We found that Biotechnology and Natural sciences are the most interdisciplinary in their publications and collaborations on research projects. In addition evolution of interdisciplinarity of scientific fields in Slovenia is observed, showing that over the last decade interdisciplinarity increases the fastest in Medical sciences mainly due to collaborations with Natural and Technical sciences.
In recent years, numerous studies have been published which have used bibliometric data to look at collaborations in research. This study presents a proposal with which the topical connections of the institutions of an organization can be investigated through analysis of co-authorships, direct citation links, and co-citations. Based on various bibliometric data sets for an organization whose institutions are used as an example, this study illustrates the possibility of comparing the self-perception of institutions of this organization (co-authorships, direct citation links) with a view to (possible) mutual collaboration with the external perception (co-citations). This comparison is made firstly for the whole organization with the aid of network graphs; secondly, the comparison is presented in a table for a specific institution and its (possible) collaborations in the organization. Particularly the tabular breakdown of the links between the institutions can provide concrete indications of possible further collaboration between the institutions which have not yet manifested themselves in co-authorships.
Concepts and methods of complex networks have been employed to uncover patterns in a myriad of complex systems. Unfortunately, the relevance and significance of these patterns strongly depends on the reliability of the datasets. In the study of collaboration networks, for instance, unavoidable noise pervading collaborative networks arises when authors share the same name. To address this problem, we derive a hybrid approach based on authors' collaboration patterns and topological features of collaborative networks. Our results show that the combination of strategies, in most cases, performs better than the traditional approach which disregards topological features. We also show that the main factor accounting for the improvement in the discriminability of homonymous authors is the average shortest path length. Finally, we show that it is possible to predict the weighting associated to each strategy compounding the hybrid system by examining the discrimination obtained from the traditional analysis of collaboration patterns. Because the methodology devised here is generic, our approach is potentially useful to classify many other networked systems governed by complex interactions.
Through the bibliometric approach and citation analysis, this study analyzes the disciplines and subjects of literature citing important information science journals during the period from 1998 to 2010. The four information science journals under study include Journal of the American Society for Information Science and Technology, Information Processing and Management, the Journal of Information Science, and the Journal of Documentation. The Ulrich's Periodical Directory, Library of Congress Subject Headings retrieved from WorldCat and the LISA database were used to identify the main classes, subclasses, and subjects of citing journals. We also indentify and analyze the highly citing journals, the main classes and subclasses of citing journals for the four journals under study as well as highly cited subjects in journals related to library and information science. Overall, the knowledge flow out of the domain of information science mainly includes information science itself, and also science and technology at a lower percentage. Moreover, there are minor outputs for various other subjects. The comparison of knowledge flow into and out of the domain of information science reveals the main knowledge flow is into information science itself. This comparison also reveals significant knowledge flow from computer science to information science.
China's rise in science has been widely acknowledged. Yet we know little empirically about academic research focusing on China. Utilizing a uniquely constructed large-scale dataset, this paper explores China-related publications through bibliometric analysis. Our data suggests that not only interest in China but also knowledge about China has developed rapidly over the years. Despite an increasingly diverse profile of participants, the substantial rise of research focusing on China is largely limited to affluent regions and some geographically proximate neighbors of China. The research discloses that overseas Chinese facilitate academic research focusing on China. The research foci of China-related studies have gradually shifted from social science to natural science and, in more recent years, to Chinese environmental issues, public health and economy.
This article examines the conceptual evolution of qualitative research in the field of marketing from 1956 to 2011, identifying the main themes and applications for which it has been used and the trends for the future. Science mapping analysis was employed, using co-word networks in a longitudinal framework. Science mapping analysis differs from other tools in that it includes the use of bibliometric indicators. The great number of studies published makes it possible to undertake a conceptual analysis of how qualitative marketing research has evolved. To show the conceptual evolution of qualitative marketing research, four study periods were chosen. The results made it possible to identify eight thematic areas that employ qualitative research in the field of marketing: Consumer behaviour, Supply chain management, Dynamic capabilities, Methodology, Media, Business to business marketing, International Marketing and Customer Satisfaction.
Doctoral theses are an important source of publication in universities, although little research has been carried out on the publications resulting from theses, on so-called derivative articles. This study investigates how derivative articles can be identified through a text analysis based on the full-text of a set of medical theses and the full-text of articles, with which they shared authorship. The text similarity analysis methodology applied consisted in exploiting the full-text articles according to organization of scientific discourse IMRaD (Introduction, Methodology, Results and Discussion) using the TurnItIn plagiarism tool. The study found that the text similarity rate in the Discussion section can be used to discriminate derivative articles from non-derivative articles. Additional findings were: the first position of the thesis's author dominated in 85 % of derivative articles, the participation of supervisors as coauthors occurred in 100 % of derivative articles, the authorship credit retained by the thesis's author was 42 % in derivative articles, the number of coauthors by article was 5 in derivative articles versus 6.4 coauthors, as average, in non-derivative articles and the time differential regarding the year of thesis completion showed that 87.5 % of derivative articles were published before or in the same year of thesis completion.
Mean-based method may be the most popular linear method for field normalization of citation impact. However, the relatively good but not ideal performance of mean-based method, plus its being a special case of the general scaling method y = kx and the more general affine method y = kx + b, implies that more effective linear methods may exist. Under the idea of making the citation distribution of each field approximate a common reference distribution through the transformation of scaling method and affine method with unknown parameters k and b, we derived the scaling and affine methods under separate unweighted and weighted optimization models for 236 Web of Science subject categories. While the unweighted-optimization-based scaling and affine methods did not show full advantages over mean-based method, the weighted-optimization-based affine method showed a decided advantage over mean-based method along most parts of the distributions. At the same time, the trivial advantage of weighted-optimization-based scaling method over mean-based method indirectly validated the good normalization performance of mean-based method. Based on these results, we conclude that mean-based method is acceptable for general field normalization, but in the face of higher demands on normalization effect, the weighted-optimization-based affine method may be a better choice.
This study characterizes the volume and visibility of Latin American scientific output in the area of Public Health, through a combined analysis of bibliometric, socioeconomic and health indicators of the top 10 Latin American producers of documents. The information was obtained from the SCImago Institutions Rankings (SIR) portal, based on Scopus data, in the category Public Health, Environmental and Occupational Health, of the area Medicine, for the period 2003-2011. Our scientometric analysis involved a set of quantitative indicators (based on document recount), plus performance ones to measure impact and excellence (based on citation recount) and international collaboration. The socioeconomic indicators measured investment in health and in research, and the number of researchers. Basic health indicators were used, along with the inequity indicator known as INIQUIS. The main results reveal that the research systems with the greatest capacity to communicate scientific results are those of Brazil and Mexico, and potentially Colombia and Argentina. The best visibility was demonstrated by Uruguay, Puerto Rico and Peru, countries with high rates of collaboration. No single country stands out as having a perfectly balanced relationship regarding all the dimensions analyzed. A relative balance is achieved by Brazil, Uruguay and Argentina, though with different levels of scientific output. The tangible achievements in health attained by Cuba and Chile do not appear to be related with the results of research published in the area of Public Health. There is clearly a need to find methods that would allow us to evaluate the transfer of research knowledge into practice, by means of the scientometric perspective.
The recently developed Cooperative Patent Classifications of the U.S. Patent and Trade Office (USPTO) and the European Patent Office (EPO) provide new options for an informed delineation of samples in both USPTO data and the Worldwide Patent Statistical Database (PatStat) of EPO. Among the "technologies for the mitigation of climate change" (class Y02), we zoom in on nine material technologies for photovoltaic cells; and focus on one of them (CuInSe2) as a lead case. Two recently developed techniques for making patent maps with interactive overlays-geographical ones using Google Maps and maps based on citation relations among International Patent Classifications (IPC)-are elaborated into dynamic versions that allow for online animations and comparisons by using split screens. Various forms of animation are discussed. The longitudinal development of Rao-Stirling diversity in the IPC-based maps provided us with a heuristics for studying technological diversity in terms of generations of the technology. The longitudinal patterns are clear in USPTO data more than in PatStat data because PatStat aggregates patent information from countries in different stages of technological development, whereas one can expect USPTO patents to be competitive at the technological edge.
As enterprises expand and post increasing information about their business activities on their websites, website data promises to be a valuable source for investigating innovation. This article examines the practicalities and effectiveness of web mining as a research method for innovation studies. We use web mining to explore the R&D activities of 296 UK-based green goods small and mid-size enterprises. We find that website data offers additional insights when compared with other traditional unobtrusive research methods, such as patent and publication analysis. We examine the strengths and limitations of enterprise innovation web mining in terms of a wide range of data quality dimensions, including accuracy, completeness, currency, quantity, flexibility and accessibility. We observe that far more companies in our sample report undertaking R&D activities on their web sites than would be suggested by looking only at conventional data sources. While traditional methods offer information about the early phases of R&D and invention through publications and patents, web mining offers insights that are more downstream in the innovation process. Handling website data is not as easy as alternative data sources, and care needs to be taken in executing search strategies. Website information is also self-reported and companies may vary in their motivations for posting (or not posting) information about their activities on websites. Nonetheless, we find that web mining is a significant and useful complement to current methods, as well as offering novel insights not easily obtained from other unobtrusive sources.
Increased specialization and extensive collaboration are common behaviours in the scientific community, as well as the evaluation of scientific research based on bibliometric indicators. This paper aims to analyse the effect of collaboration (co-authorship) on the scientific output of Italian economists. We use social network analysis to investigate the structure of co-authorship, and econometric analysis to explain the productivity of individual Italian economists, in terms of 'attributional' variables (such as age, gender, academic position, tenure, scientific sub-discipline, geographical location), 'relational' variables (such as propensity to cooperate and the stability of cooperation patterns) and 'positional' variables (such as betweenness and closeness centrality indexes and clustering coefficients).
This study investigates whether scientific publications can give plausible suggestions about whether R&D support infrastructures in the UK successfully foster scientific activity and cooperation. For this, research publications associated with UK SPs were identified from Scopus for the years 1975-2010 and analysed by region, infrastructure type and organisation type. There was apparently a systematic intensification of R&D from the 90s as evidenced by the publications of on-park firms and research institutions. Science Parks and Research Parks were the most successful infrastructures in fostering cooperation and research production, in comparison to Science and Innovation centres, Technology parks, Incubators and other parks, and HEIs were the major off-park partners for the on-park businesses. The East of England, the South East, and Scotland concentrate the highest proportion of parks, each of these three major geographical agglomerations exhibit distinct areas of scientific specialisation. Parks seem to have a positive impact on the overall level of collaboration and production of science and technology, which are highly concentrated in competitive regions. Nevertheless, industry-academia collaborations show that on-park firms tend to collaborate with partners beyond their local region rather than the local HEI. Support infrastructures may therefore not help to reduce the uneven development and geographic distribution of research-intensive industries in the UK.
The Dutch Economics top-40, based on publications in ISI listed journals, is-to the best of our knowledge-the oldest ranking of individual academics in Economics and is well accepted in the Dutch academic community. However, this ranking is based on publication volume, rather than on the actual impact of the publications in question. This paper therefore uses two relatively new metrics, the citations per author per year (CAY) metric and the individual annual h-index (hIa) to provide two alternative, citation-based, rankings of Dutch academics in Economics & Business. As a data source, we use Google Scholar instead of ISI to provide a more comprehensive measure of impact, including citations to and from publications in non-ISI listed journals, books, working and conference papers. The resulting rankings are shown to be substantially different from the original ranking based on publications. Just like other research metrics, the CAY or hIa-index should never be used as the sole criterion to evaluate academics. However, we do argue that the hIa-index and the related CAY metric provide an important additional perspective over and above a ranking based on publications in high impact journals alone. Citation-based rankings are also shown to inject a higher level of diversity in terms of age, gender, discipline and academic affiliation and thus appear to be more inclusive of a wider range of scholarship.
ArgentinaA ' s patterns of publication in the humanities and social sciences were studied for the period 2003-2012, using the Scopus database and distinguishing the geographic realm of the research. The results indicate that "topics of national scope" have grown and gained international visibility. They can be broadly characterized as having Spanish as the language of publication, and a marked preference for single authorship; in contrast, the publication of "global topics", not geographically limited, characteristically have English as the language of divulgation, and institutional collaboration is stronger and more consolidated. Citation is apparently not determined only by the geographic realm of research, but also by language of publication, co-authorship, and the profiles of the journals where published. These results could contribute to constructive reflection upon publishing policy. The existence of a community of journals that tolerates biased patterns may make researchers echo and perpetuate poor practices, constructing or adapting the channels of communication. Such results also prove useful as a point of reference when evaluation criteria are elaborated by scientific committees, as unsupervised promotion and evaluation patterns could become based on local or overly subjective precepts, disregarding the disciplinary practices of the international scientific community.
As a basic knowledge resource, patents play an important role in identifying technology development trends and opportunities, especially for emerging technologies. However patent mining is restricted and even incomplete, because of the obscure descriptions provided in patent text. In this paper, we conduct an empirical study to try out alternative methods with Derwent Innovation Index data. Our case study focuses on nano-enabled drug delivery (NEDD) which is a very active emerging biomedical technology, encompassing several distinct technology spaces. We explore different ways to enhance topical intelligence from patent compilations. We further analyze extracted topical terms to identify potential innovation pathways and technology opportunities in NEDD. As a basic knowledge resource, patents play an important role in identifying technology development trends and opportunities, especially for emerging technologies. However patent mining is restricted and even incomplete, because of the obscure descriptions provided in patent text. In this paper, we conduct an empirical study to try out alternative methods with Derwent Innovation Index data. Our case study focuses on nano-enabled drug delivery (NEDD) which is a very active emerging biomedical technology, encompassing several distinct technology spaces. We explore different ways to enhance topical intelligence from patent compilations. We further analyze extracted topical terms to identify potential innovation pathways and technology opportunities in NEDD.
Since repositories are a key tool in making scholarly knowledge open access (OA), determining their web presence and visibility on the Web (both are proxies of web impact) is essential, particularly in Google (search engine par excellence) and Google Scholar (a tool increasingly used by researchers to search for academic information). The few studies conducted so far have been limited to very specific geographic areas (USA), which makes it necessary to find out what is happening in other regions that are not part of mainstream academia, and where repositories play a decisive role in the visibility of scholarly production. The main objective of this study is to ascertain the web presence and visibility of Latin American repositories in Google and Google Scholar through the application of page count and web mention indicators respectively. For a sample of 137 repositories, the results indicate that the indexing ratio is low in Google, and virtually nonexistent in Google Scholar; they also indicate a complete lack of correspondence between the repository records and the data produced by these two search tools. These results are mainly attributable to limitations arising from the use of description schemas that are incompatible with Google Scholar (repository design) and the reliability of web mention indicators (search engines). We conclude that neither Google nor Google Scholar accurately represent the actual size of OA content published by Latin American repositories; this may indicate a non-indexed, hidden side to OA, which could be limiting the dissemination and consumption of OA scholarly literature.
Nobel laureates have achieved the highest recognition in academia, reaching the boundaries of human knowledge and understanding. Owing to past research, we have a good understanding of the career patterns behind their performance. Yet, we have only limited understanding of the factors driving their recognition with respect to major institutionalized scientific honours. We therefore look at the award life cycle achievements of the 1901-2000 Nobel laureates in physics, chemistry, and physiology or medicine. The results show that Nobelists with a theoretical orientation achieved more awards than laureates with an empirical orientation. Moreover, it seems their educational background shapes their future recognition. Researchers educated in Great Britain and the US tend to attract more awards than other Nobelists, although there are career pattern differences. Among those, laureates educated at Cambridge or Harvard are more successful in Chemistry, those from Columbia and Cambridge excel in Physics, while Columbia educated laureates dominate in Physiology or Medicine.
The main bibliometric databases indicate large differences in country-level scientific publishing productivity, with high growth in many East Asian countries. However, it is difficult to translate country-level publishing productivity to individual-level productivity due to cross-country differences in the size and composition of the research workforce, as well as limited coverage of publications in the social sciences and humanities. Alternative data sources, such as individual-level self-reported publication data, may capture a wider range of publication channels but potentially include non-peer reviewed output and research re-published in different languages. Using individual-level academic survey data across 11 countries, this study finds large differences across countries in individual-level publishing productivity. However, when fractionalised for English-language and peer-reviewed publications, cross-country differences are relatively smaller. This suggests that publishing productivity in certain countries is inflated by a tendency to publish in non-peer reviewed outlets. Academics in large, non-English speaking countries also potentially benefit from a wider range of domestic publication channels. Demographic, motivational and institutional characteristics associated with high individual-level publishing productivity account for part of the publishing productivity differences within and between counties in English-language and peer-reviewed publishing productivity, but not in total publishing productivity where such workforce characteristics only account for within-country differences.
This paper presents a comparative impact analysis on collaborative research in Malaysia. All analyses were conducted using ISI-indexed journal articles published in the 10-year period spanning the years 2000-2009. The publication growth and distribution of domestic versus international Malaysian-addressed collaborative articles was examined. Then, a three-pronged approach was used to compare the research performance between international and domestic research for the top ten high-productivity subject categories. Firstly, the potentiality of collaborative research impact is determined using the Mann-Whitney-Wilcoxon and Bootstrap Kolmogorov-Smirnov tests. Then, the Hirsch and Egghe indices were computed for each subject category to estimate the distance needed to bridge the gap between international and domestic research. Lastly, the composition of researchers was measured using the internationality index. We discuss how the findings of our methodology help advise collaborative research strategies that will contribute to better research performance in the leading scientific categories.
Alzheimer's disease (AD) is one of degenerative brain diseases, whose cause is hard to be diagnosed accurately. As the number of AD patients has increased, researchers have strived to understand the disease and develop its treatment, such as medical experiments and literature analysis. In the area of literature analysis, several traditional studies analyzed the literature at the macro level like author, journal, and institution. However, analysis of the literature both at the macro level and micro level will allow for better recognizing the AD research field. Therefore, in this study we adopt a more comprehensive approach to analyze the AD literature, which consists of productivity analysis (year, journal/proceeding, author, and Medical Subject Heading terms), network analysis (co-occurrence frequency, centrality, and community) and content analysis. To this end, we collect metadata of 96,081 articles retrieved from PubMed. We specifically perform the concept graph-based network analysis applying the five centrality measures after mapping the semantic relationship between the UMLS concepts from the AD literature. We also analyze the time-series topical trend using the Dirichlet multinomial regression topic modeling technique. The results indicate that the year 2013 is the most productive year and Journal of Alzheimer's Disease the most productive journal. In discovery of the core biological entities and their relationships resided in the AD related PubMed literature, the relationship with glycogen storage disease is founded most frequently mentioned. In addition, we analyze 16 main topics of the AD literature and find a noticeable increasing trend in the topic of transgenic mouse.
959 full text articles has been studied to explore the intellectual structure of scientometrics in the period 2005-2010 using text mining and co-word analysis. The trends and patterns of scientometrics in the journal Scientometrics were revealed by measuring the association strength of selected keywords which represent the produced concept and idea in the field of scientometrics. All articles were collected from the journal Scientometrics through Springerlink (full text database) and keywords were added non-parametrically from the LISA database and the articles themselves (keywords provided by author). Other important keywords are extracted from the title and abstract of the article manually. These keywords are standardized using a vocabulary tool. With the objective of delineating dynamic changes of the field of scientometrics, the period 2005-2010 was studied and further divided into two consecutive periods: 2005-2007 and 2008-2010. The results show that publication has some well-established topics which are changing gradually to adopt new themes.
An analysis of 3,089 papers on global camel research during 2003-2012, as indexed in Scopus international multidisciplinary database indicate an average annual growth rate of 11.20% and registered an average citation per paper of 2.24. The publication output was scattered in 257 journal titles and originated in 104 countries, of which the top 15 countries contributed 87.44% share to global publication output during 2003-2012. The highest publication output came from USA, followed by India, Saudi Arabia, Iran, Egypt, United Arab Republic, United Kingdom, France, China, Germany, Sudan, Belgium, Australia, Canada and Kenya. The publication share has increased in case of Iran, Saudi Arabia, Egypt, China, France, Sudan, India, Australia and Canada, as against decrease in UK, USA, Kenya, Belgium, Germany and UAE from 2003-2007 to 2008-2012. Eight out of 15 most productive countries have achieved high relative citation index (1 and above): Belgium (3.61), Australia (2.69), UK (2.38), Canada (2.33), France (2.07), USA (1.87), Germany (1.65), UAE (1.11) and Kenya (1.09) during 2003-2012. Agricultural and biological sciences (43.35% share) contributed the largest share, followed by veterinary science (29.75% share), medicine (17.74% share), immunology and microbiology (13.99% share), biochemistry, genetics and molecular biology (13.99% share), environment science (5.08% share) and pharmacology, toxicology and pharmaceutics (3.11% share) during 2003-2012. Among narrow sub-fields, the focused areas were camel disease and infection, camel milk and dairy produce, camel non-milk products, camel reproduction, camel feed and diet camel physiology, camel genetics, camel parasitology, etc. The world camel research output originated from 311 organizations, of which the top 20 contributed 31.72% global publication share during 2003-2012.
We analyze the data about works (papers, books) from the time period 1990-2010 that are collected in Zentralblatt MATH database. The data were converted into four 2-mode networks (works x authors, works x journals, works x keywords and works x mathematical subject classifications) and into a partition of works by publication year. The networks were analyzed using Pajek-a program for analysis and visualization of large networks. We explore the distributions of some properties of works and the collaborations among mathematicians. We also take a closer look at the characteristics of the field of graph theory as were realized with the publications.
Community structure is one of the important properties of social networks in general and in particular the citation networks in the field of scientometrics. A majority of existing methods are not proper for detecting communities in a directed network, and thus hinders their applications in the citation networks. In this paper, we provide a novel method which not only overcomes the above mentioned disability, but also has a relative low algorithm time complexity which facilitates the application in large scale networks. We use the concept of Shannon entropy to measure a network's information and then consider the process of detecting communities as a process of information loss. Based on this idea, we develop an optimal model to depict the process of detecting communities and further introduce the principle of dynamic programming to solve the model. A simulation test is also designed to examine the model's accuracy in discovering the community structure and identifying the optimal community number. Finally, we apply our method in a citation network from the journal Scientometrics and then provide several insights on promising research topics through the detected communities by our method.
The h-index is a widely used bibliometric indicator for assessing individual scientists or other units of analysis. When evaluating aggregated authors, the h-index may produce rankings that are not consistent with the individual ones. The problem is claimed to affect all h-type indices; while the highly cited publications indicator, which comes from a different class, represents an alternative that is immune to such issue. The main objective of this work is to perform a comparative analysis of some bibliometric indicators originally designed to measure the overall impact of individual scientific production, when applied to the evaluation of groups, to investigate the consistency between the rankings at different levels of aggregation. For that, we use part of a previously reported citation database. The results indicate that, although the consistency at distinct aggregative levels is not formally complied by the h-index and all its variants, it is met with reasonable frequency.
Digitization, the Internet, and information or webometric interdisciplinary approaches are affecting the fields of Scientometrics and Library and Information Science. These new approaches can be used to improve citation-only procedures to estimate the quality and impact of research. A European pilot to explore this potential was called "European Educational Research Quality Indicators" (EERQI, FP7 # 217549). An interdisciplinary consortium was involved from 2008 to 2011. Different types of indicators were developed to score 171 educational research documents. Extrinsic bibliometric and citation indicators were collected from the Internet for each document; intrinsic indicators reflecting content-based quality were developed and relevant data gathered by peer review. Exploratory and confirmatory factor analysis and structural modeling were used to explore statistical relationships among latent factors or concepts and their indicators. Three intrinsic and two extrinsic latent factors were found to be relevant. Moreover, the more a document was related to a reviewer's own area of research, the higher the score the reviewer gave concerning (1) significance, originality, and consistency, and (2) methodological adequacy. The conclusions are that a prototype EERQI framework has been constructed: intrinsic quality indicators add specific information to extrinsic quality or impact indicators, and vice versa. Also, a problem of "objective" impact scores is that they are based on "subjective" or biased peer-review scores. Peer-review, which is foundational to having a work cited, seems biased and this bias should be controlled or improved by more refined estimates of quality and impact of research. Some suggestions are given and limitations of the pilot are discussed. As the EERQI development approach, instruments, and tools are new, they should be developed further.
We explore which innovation support infrastructures help Higher Education Institutions (HEIs) with research and technology (R&T) production and knowledge commercialisation. The objectives are to determine (1) the time required by innovation support infrastructures like science parks (SPs) to promote research activities and the factors that may influence it; and (2) if a HEI's R&T output and commercial performance are helped by innovation support infrastructures like SPs or incubators. The analysis is based upon publications produced by on-park firms (1975-2010), as well as patents and quantitative data from national HEIs with collaborative ties with 92 support infrastructures. Statistical analyses reveal that research parks & campuses and SPs are the infrastructures that are most likely to promote prompt R&T activities and University-Industry (U-I) collaboration for their residents and newer parks seem to be the most successful at encouraging U-I interactions. HEIs' efforts to exploit their academic research base through support infrastructures have no significant impact on the volume of patents or research publications produced by them, and on entrepreneurial activities with less institutionalised support, such as joint research, contract research or consultancy. However, relationships with SPs and incubators strongly associate with the commercial performance of universities in terms of their academic spin-offs and facilities and equipment services.
This paper presents cross-country comparisons between Canada and the United States in terms of the impact of public grants and scientific collaborations on subsequent nanotechnology-related publications. In this study we present the varying involvement of academic researchers and government funding to capture the influence of funded research in order to help government agencies evaluate their efficiency in financing nanotechnology research. We analyze the measures of quantity and quality of research output using time-related econometric models and compare the results between nanotechnology scientists in Canada and the United States. The results reveal that both research grants and the position of researchers in co-publication networks have a positive influence on scientific output. Our findings demonstrate that research funding yields a significantly positive linear impact in Canada and a positive non-linear impact in the United States on the number of papers and in terms of the number of citations we observe a positive impact only in the US. Our research shows that the position of scientists in past scientific networks plays an important role in the quantity and quality of papers published by nanotechnology scientists.
This stated preference study approached the issue on sub-categorization of the information science-library science (IS-LS) journals listed in the Journal Citation Report (JCR) 2011. To investigate this, 243 active authors/editors publishing in this field were requested to indicate their preferred category to 83 journal titles listed in JCR 2011 from four options: information science (IS), library science (LS), information systems (ISys) and do not know/undecided. Based on the popularity count, respondents assigned 39 titles to LS, 23 titles to IS and 21 titles to ISys. Twenty-five titles received high "do-not-know" counts-these are titles in non-English languages, information management and publishing sub-fields. Only one title in LS was grouped in the highest quartile by impact factor, compared to 8 titles in IS and 11 in ISys. This indicates that LS journals are hardly represented among the top 25 % of the impact factor distribution of JCR's ranked IS-LS journals. Respondents show concern about the "fit" of information systems journals in the IS-LS category.
Did the demise of the Soviet Union in 1991 influence the scientific performance of the researchers in Eastern European countries? Did this historical event affect international collaboration by researchers from the Eastern European countries with those of Western countries? Did it also change international collaboration among researchers from the Eastern European countries? Trying to answer these questions, this study aims to shed light on international collaboration by researchers from the Eastern European countries (Russia, Ukraine, Belarus, Moldova, Bulgaria, the Czech Republic, Hungary, Poland, Romania, and Slovakia). The number of publications and normalized citation impact values are compared for these countries based on InCites (Thomson Reuters), from 1981 up to 2011. The international collaboration by researchers affiliated to institutions in Eastern European countries at the time points of 1990, 2000 and 2011 was studied with the help of Pajek and VOSviewer software, based on data from the Science Citation Index (Thomson Reuters). Our results show that the breakdown of the communist regime did not lead, on average, to a huge improvement in the publication performance of the Eastern European countries and that the increase in international co-authorship relations by the researchers affiliated to institutions in these countries was smaller than expected. Most of the Eastern European countries are still subject to changes and are still awaiting their boost in scientific development.
Research and education are organically connected in that lectures convey the results of research, which is frequently initiated by inspiring lectures. As a result, the contents of lecture materials and research publications and the research capabilities of universities should be considered in the investigations of the relationships between research and teaching. We examine the relationship between research and teaching using automatic text analysis. In particular, we scrutinize the relatedness of the content of research papers with the content of lecture materials to investigate the association between teaching and research. We adopt topic modeling for the correlation analysis of research capabilities and the reflectiveness of research topics in lecture materials. We select the field of machine learning as a case study because the field is contemporary and because data related to teaching and research are easily accessible via the Internet. The results reveal interesting characteristics of lecture materials and research publications in the field of machine learning. The research capability of an institute is independent of the lecture materials. However, for introductory courses, teaching and research measures showed a weak negative relationship, and there is little relationship between the measures for advanced courses.
The five BRICS countries (Brazil, Russia, India, China and South Africa) are among the most important developing countries. They are joined in an association to foster mutual development. In their meetings officials have made statements on the importance of scientific collaboration. The present article analyses scientific collaborations between the five countries using co-authorships of scientific products. Gross counts, Salton's indexes and Jaccard coefficients, as well as probabilistic affinity indexes (PAI) are calculated to highlight the different dimensions of inter-BRIC collaborations, as well as their evolution. Collaboration with external actors, and in different scientific sub-areas, is also measured. Bilateral collaborations are heterogeneous. PAIs, which are size independent, show that the trends of inter-BRICS collaborations are stable with time. Heterogeneity across different scientific areas is also present. At the end of the article results are discussed, and policy suggestions are offered.
We investigated the extent to which different selection mechanisms for awarding scholarships varied in their short- and longer-term consequences in the performance of awardees in terms of scientific production. We conducted an impact evaluation study on undergraduate, master's, and PhD research scholarships and compared two different financial sources in Brazil: in one, the selection mechanism was based on a peer review system; the other was based on an institutional system other than peer review. Over 8,500 questionnaires were successfully completed, covering the period 1995-2009. The two groups were compared in terms of their scientific performance using a propensity score approach. We found that the peer-reviewed scholarship awardees showed better performance: they published more often and in journals with higher impact factors than scholarship awardees from the other group. However, two other results indicate a different situation. First, over the long-term, awardees under the peer review system continued to increase their publication rate and published in higher-quality journals; however, the differences with the control group tended to diminish after PhD graduation. Second, the better performance of peer-reviewed scholarships was not observed in all subject areas. The main policy implications of this study relate to a better understanding of selection mechanisms and the heterogeneity regarding the relation between selection processes and scientific and academic output.
In this study, we investigate and compare the number of examiners' forward citations in the United States and Japan. The most effective way to do so is to compare pairs of patent applications that have equivalent content. Therefore, we propose a new method of extracting substantially equivalent pairs of US and Japanese patent applications, focusing on the equivalence of the specifications and the claims. Our results reveal that during the substantive examination, US examiners cite patent application publications (PAPs) as well as granted patent publications (GPPs), whereas Japanese examiners tend to cite PAPs only. We further examine why GPPs are frequently cited by US examiners. The most likely reason seems to be that many US examiners retain the old habit of searching and citing only GPPs, but not PAPs. The insights offered by this study could be significant for future analyses based on the number of citations, particularly in the United States.
This exploratory work sheds light on important functional information characteristics of the system of research collaboration by examining large-scale topological structures of co-authorship networks, created through the affiliative ties of scholarly articles published by collaborating researchers in peer-reviewed journals and conference proceedings. The model adopted in this work to understand the underlying collaboration system incorporates the strengths of collaborative coupling among the researchers. The questions we examine in this work are as follows: (1) What new functional characteristics emerge when combined structural effects of collaborative coupling and large-scale connectivity exist in the networks? (2) What information does a specific closeness distribution of collaborating researchers convey with regard to the flow of knowledge through collaborative activities? (3) What is the temporal dynamics of large-scale structure formation in these networks? The work involves a comparative study of these characteristics using the networks of two countries: India and the US. Our results have important implications for scientometric studies of collaboration research.
The number of scientific papers published by researchers in Africa has been rising faster than the total world scientific output in recent years. This trend is relevant, as for a long period up until 1996, Africa's share of the world scientific output remained below 1.5 %. The propensity to publish in the continent has risen particularly fast since 2004, suggesting that a possible take-off of African science is taking place. This paper highlights that, in parallel with this most recent growth in output, the apparent productivity of African science, as measured by publications to gross domestic product, has risen in recent years to a level above the world average, although, when one looks at the equivalent ratio after it has been normalized by population, there is still a huge gap to overcome. Further it is shown that publications from those few African countries whose scientific communities demonstrate higher levels of specialization and integration in international networks, have a higher impact than the world average. Additionally, the paper discusses the potential applications of the new knowledge that has been produced by African researchers, highlighting that so far, South Africa seems to be the only African country where a reasonable part of that new knowledge seems to be connecting with innovation.
Although universities have played an important role in knowledge creation, it is also of concern to see how universities perform in knowledge utilization. In the present article, an effective approach is proposed to evaluate and compare university performance in knowledge utilization for patented inventions. Growth trajectories of the cumulative patent citations to scientific publications produced by individual universities are analyzed by using latent growth modeling. Moreover, we examine how the utilization of scientific knowledge created in 1995 and 2005 is affected by research impact and university-industry collaboration among the universities in Europe, North America, and East Asia. The results indicate that not all top 300 research universities in the world perform well in knowledge utilization for patented inventions. Some policy implications are discussed.
Cycles that cross two or more boundaries between disciplines in the co-authorship graph for all of science are used to set upper limits on the number of co-authored papers required to cross 15 disciplines or subdisciplines ranging from macroeconomics to neurology. The upper limits obtained range from one (discrete mathematics, macroeconomics and nuclear physics) to six (neuroscience). The 15 disciplines or subdisciplines examined form a "small world" with an average separation of only 2.0 co-authorship links. It is conjectured that the high-productivity, high average degree centers of all scientific disciplines form a small world, and therefore that the diameter of the co-authorship graph of all of science is only slightly larger than the average diameter of the co-authorship graphs of its subdisciplines.
This study explores interdisciplinarity evolution of Biochemistry and Molecular Biology (BMB) over a one-hundred-year period on several fronts, namely: change in interdisciplinarity, identification of core disciplines, disciplinary emergence, and potential discipline detection, in order to assess the evolution of interdisciplinarity over time. Science overlay maps and a StreamGraph were used to visualize interdisciplinary evolution. Our study confirms that interdisciplinarity evolves mainly from neighbouring fields to distant cognitive areas and provides evidence of an increasing tendency of BMB researchers to cite literature from other disciplines. Additionally, from our results, we can see that the top potential interdisciplinary relations belong to distant disciplines of BMB; their share of references is small, but is increasing markedly. On the whole, these results confirm the dynamic nature of interdisciplinary relations, and suggest that current scientific problems are increasingly addressed using knowledge from a wide variety of disciplines.
University rankings frequently struggle to delineate the separate contributions of institutional size and excellence. This presents a problem for public policy and university leadership, for example by blurring the pursuit of excellence with the quest for growth. This paper provides some insight into the size/excellence debate by exploring the explicit contribution of institutional size to the results of the Shanghai ranking indicators. Principal components analysis of data from the Shanghai ranking (2013 edition) is used to explore factors that contribute to the variation of the total score. The analysis includes the five non-derived ARWU indicators (Alumni, Award, HiCi, S&N and PUB) and uses the number of equivalent full-time academic staff (FTE) as a measure of size. Two significant but unequal factors are found, together explaining almost 85 % of the variance in the sample. A factor clearly associated with the size of the institution explains around 30 % of the variance. To sharpen the interpretation of the smaller factor as a measure of the effect of size, we extend the analysis to a larger set of institutions to eliminate size-dependent selection effects. We also show that eliminating outlying universities makes little difference to the factors. Our inferences are insensitive to the use of raw data, compared with the compressed and scaled indicators used by ARWU. We conclude that around 30 % of the variation in the ARWU indicators can be attributed to variation in size. Clearly, size-related factors cannot be overlooked when using the ranking results. Around 55 % of the variation arises from a component which is uncorrelated with size and which measures the quality of research conducted at the highest levels. The presence of this factor encourages further work to explore its nature and origins.
In this contribution, we measure how long researchers are willing to wait (WTW) for an editorial decision on the acceptance or rejection of a submitted manuscript. This measure serves as a proxy for the expected value of a publication to a researcher in the field of economic, business and financial history. We analyze how this WTW measure varies with the characteristics of the submitting authors themselves. We distinguish the impact of personal characteristics (including age, gender and geographic location) as well as work-related characteristics (including research discipline, affiliation and academic position). To identify the factors determining economic history authors' WTW for editorial decisions, we use a valuation technique known as stated choice experiments. Our results show that respondents found the standing of the journal to be at least as important as its ISI impact factor. Moreover, we find differences in publication culture between economic and history departments. Overall, researchers' willingness to wait is influenced to a greater extent by the research discipline in which the respondents are active (history vs. economics), than by their personal characteristics (e.g. the education or the type of Ph.D. they obtained).
This study prospectively evaluates the accessibility of Internet references in leading general medical journals and explores the impact of their lost accessibility. We identified all original contributions published in five leading peer-reviewed traditional general medical journals and one leading on-line journal that were published at two time points (January 2005 and January 2008). We followed the sample prospectively for 5 years and determined the number of Internet references that remained accessible. Our sample of 165 original contributions contained 154 Internet references. Accessibility to Internet references declined from 51 % after 4 years to 37 % after 8 years in the articles published in January 2005, and decreased from 78 % after 1 year to 44 % after 5 years in the articles published in January 2008. Among those Internet references published in the most highly-cited articles, only 19 % (95 % CI 10-35 %) remained accessible in March 2013. Among the Internet references cited in the Methods section of the articles, only 30 % (95 % CI 20-43 %) remained accessible. Of the 91 Internet references which were no longer accessible at the end of the follow-up period, 39 (43 %) were assigned a rating of either 'important' or 'very important'. Accessibility of Internet references declines substantially over time most often because the information is updated or the sites become unavailable. Accessibility remains poor even among those Internet references that are most important.
A positive influence of international collaboration on the impact of research has been extensively described. This paper delves further into this issue and studies to what extent the type of collaborating country-high, medium or low R&D intensive country-and which country is the leader in the research may influence the impact of the final scientific output. Among 9,961 papers co-authored by scientists from Spain and from another country (bilateral collaboration) during 2008-2009, papers with high R&D intensive countries predominated (60 %) and received the highest number of citations. This holds true in eight out of nine fields, being Social Sciences the one which benefited the most from partnerships with high R&D intensive countries. Mathematics emerges as a special case where other factors such as the partner's specialisation in the field may have a greater influence on research impact than the level of investment in R&D of the collaborating country. No significant influence of the type of country leading the research on the impact of the final papers is observed in most fields. Research policy implications are finally discussed.
Technological breadth, as an indicator of knowledge integration, and breadth of technological diffusion are two sides of a coin when studying the value of patents. In this contribution some h-type indices are developed, and applications in the field of technological innovation are provided. The obtained results suggest that these h-type indices can be used to describe two-dimensions of a firm's vitality with respect to technological innovation and technological breadth of patents. Hence, patent-related h-indices can serve as simple alternative indicators to describe a firm's performance in technological innovation. They can, moreover, be used to compare firms active in the same industry.
We test 16 bibliometric indicators with respect to their validity at the level of the individual researcher by estimating their power to predict later successful researchers. We compare the indicators of a sample of astrophysics researchers who later co-authored highly cited papers before their first landmark paper with the distributions of these indicators over a random control group of young authors in astronomy and astrophysics. We find that field and citation-window normalisation substantially improves the predicting power of citation indicators. The sum of citation numbers normalised with expected citation numbers is the only indicator which shows differences between later stars and random authors significant on a 1 % level. Indicators of paper output are not very useful to predict later stars. The famous h-index makes no difference at all between later stars and the random control group.
The aim of this paper is to determine the role that academic collaboration plays on the impact of Latin-American and the Caribbean research on management as an academic research discipline. The results show that the impact of Latin American articles on management, which were published between 1990 and 2010 in JCR journals is positively associated to collaboration r (s) = .133, p = .001. Collaborated articles have on average 1.22 times more impact than single authored ones. The level of collaboration is positively correlated to impact r (s) = .337, p = .001. Articles published through international collaboration have 1.59 times more impact than those published through domestic collaboration.
Typical young Polish scientist is an alumnus of doctoral studies at the same university and department where he/she completed his/her Master degree. The career is continued by receiving a habilitation at the same university and department. Then a holder of habilitation is promoted to a tenured position at the same university and department. Detailed analysis of scientific careers of 154 recent Ph.D. recipients and of 16 habilitation candidates in chemistry from University of Warsaw is presented. More than 96 % of the Ph.D. theses were results of doctoral studies. A typical doctor is Polish citizen (> 98 %), alumnus/alumna of the University of Warsaw (> 85 %), holder of Master degree in chemistry (88 %) who joined the Ph.D. program at the same university directly after having completed his/her Master degree, and completed the Ph.D. program 5.5 years after completion of Master degree. A fraction of recent female Ph.D. recipients in chemistry (61 %) is very high as compared with the corresponding fractions in other countries (e.g., USA), but it is still substantially lower than the fraction of female Master degree recipients. In recent habilitation candidates, the female ratio is 50 %, thus relative male dominance is observed at higher levels. At least one-third of the recent Ph.D. recipients were employed by the same university, where they received their Ph.D., while the fraction of the recent Ph.D. recipients employed by other universities in Poland was below 5 %. High degree of academic inbreeding is due to the legal system in Poland, which (nominally) is designed to prevent academic inbreeding, but the regulations can be easily circumvented. Over 10 % of the recent Ph.D. recipients found post-doctoral positions abroad, chiefly in EU countries and in the USA.
The citation potential is a measure of the probability of being cited. Obviously, it is different among fields of science, social science, and humanities because of systematic differences in publication and citation behaviour across disciplines. In the past, the citation potential was studied at journal level considering the average number of references in established groups of journals (for example, the crown indicator is based on the journal subject categories in the Web of Science database). In this paper, some characterizations of the author's scientific research through three different research dimensions are proposed: production (journal papers), impact (journal citations), and reference (bibliographical sources). Then, we propose different measures of the citation potential for authors based on a proportion of these dimensions. An empirical application, in a set of 120 randomly selected highly productive authors from the CSIC Research Centre (Spain) in four subject areas, shows that the ratio between production and impact dimensions is a normalized measure of the citation potential at the level of individual authors. Moreover, this ratio reduces the between-group variance in relation to the within-group variance in a higher proportion than the rest of the indicators analysed. Furthermore, it is consistent with the type of journal impact indicator used. A possible application of this result is in the selection and promotion process within interdisciplinary institutions, since it allows comparisons of authors based on their particular scientific research.
In the social sciences, university departments are the governance units where the demand for and the supply of researchers interact. As a first step towards a formal model of this process, this paper investigates the characteristics of productivity distributions in a unique dataset consisting of 2,530 faculty members with at least one publication who were working in the 81 top world Economics departments in 2007. Individual productivity is measured in two ways: as the number of publications up to 2007, and as a quality index that weights differently the articles published in four journal equivalent classes. The academic age of individuals, measured as the number of years since obtaining a Ph.D. up to 2007, is used to measure productivity per year. Independently of the two productivity measures, and both before and after age normalization, the five main findings of the paper are the following. Firstly, individuals within each department have very different productivities. Secondly, there is not a single pattern of productivity inequality and skewness at the department level. On the contrary, productivity distributions are very different across departments. Thirdly, the effect on overall productivity inequality of differences in productivity distributions between departments is greater than the analogous effect in other contexts. Fourth, to a large extent, this effect on overall productivity inequality is accounted for by scale factors well captured by departments' mean productivities. Fifth, this high degree of departmental heterogeneity is found to be compatible with greater homogeneity across the members of a partition of the sample into seven countries and a residual category.
This paper uses a bibliometric analysis method to probe into the evolution of China's science and technology policies from 1949 to 2010, and the roles of core government agencies in policy-making. We obtained 4,707 Chinese S&T policies from GDIS, a Chinese public policy database provided by Tsinghua University. Co-word analysis and network analysis were applied in mapping the topics of S&T policies and collaboration among the agencies, while citation analysis was applied to assess the influence of S&T policies. Findings include: first, the focus of Chinese S&T policies is mainly on applied research and industrialization, rather than basic research; second, more and more government agencies are involved in making S&T policies, but collaboration efforts are not significantly increasing; last but not least, the influence of different S&T policies is determined by the administrative ranking of the policy-making agencies responsible for drafting those policies.
Gender disparities persist in several areas of society and scientific research is no exception. This study describes the evolution of the place of women in Russian science from 1973 to 2012, in terms of published research output, research productivity, international and national collaboration, and scientific impact, taking into account the socioeconomic, political and historic context of the country, which was marked by the fall of the USSR in 1991. The results show that gender parity is far from being achieved. Women remain underrepresented in terms of their contribution to research output and scientific impact in almost all disciplines, with Mathematics and Physics, research areas in which Russia is specialized, having the largest gap. Men and women show different collaboration patterns on the national and international level, whereas women are preeminent on the national scene, men are on the international one. Although the impact of women's scientific output significantly increases after the fall of the USSR, the gap between both genders remains stable over time for most of the disciplines. As a result, this increase cannot be interpreted as an improvement of the women's relative influence in Russian science, but rather an improvement of Russian science impact in general.
The objective of this study was to make a current diagnosis of the scientific production of Ibero-American researchers on information literacy and information competences during the last four decades. The literature output on information literacy was examined using the techniques of bibliometric analysis and information visualization. The literature considered was that constituted by the articles included in the Web of Science (Thomson Reuters), Scopus (Elsevier), Library and Information Science Abstracts, and Library, Information Science and Technology Abstracts databases. The occurrence of descriptors was analysed using VOSViewer, a program that groups them into clusters and generates a map of their connections. The results showed exponential growth of some 30 % annually between 2005 and 2011, with a mean of 14.45 documents per year. Spain, with 119 documents, was the top producing country, followed by Brazil with 76. The distribution of the more than 500 authors fitted a Lotka-law pattern, and the distribution of the 105 journals fitted the three zones of a Bradford-law pattern. The visualization map showed the 62 descriptors to group into seven clusters. For its centrality, there stood out "Information literacy", strongly related with "Information Science". At the edge of the map were "Digital literacy" and "School library", indicative of their lack of any strong relationship with other terms. The "Education", "Knowledge management", "Universities & colleges", and "University libraries" descriptors were linked closely with the main IL theme.
The main methods for ranking academic journals are peer review based approaches and applications of various bibliometric indicators, or a mixture of the two. Such rankings are used to assess the overall quality of journals, although their real meaning remains unclear as long as the notion of "quality' is not precisely defined. In our approach we examine journal evaluation from the perspective of knowledge accumulation taking the citation distribution into account. A new indicator, the sub-impact factor denoted as SIF, derived sub-impact factor sequences and an aggregated SIF-indicator are proposed. An empirical study is performed on 64 journals in the area of operations research and management science, illustrating the use of these indicators.
In this paper, we make use of keywords in scientific articles in solar energy during the period 2000-2013 to investigate scientific relatedness at the topic level (i.e. relatedness between topic and topic) and the country level (i.e. relatedness between topic and country). The bibliometric analyses show that both publications and knowledge topics exhibit significant rise, and China has exceeded the USA and developed into the largest scientific producer after 2010. We determine the degree of relatedness by means of the topics co-occurrence network and explore the evolving dynamic processes of scientific relatedness which indicates decreasing patterns in the two countries. The results also highlight differences between the research directions in the USA and China: in the USA "energy efficiency and environment" prove more developed, while in China "solar power" shows more central. This study assesses the extent to which the scientific relatedness exerts influence on the literature productivity at the country level. We find negative relationships between scientific relatedness and publications in both of countries. Our work has potential implications for the future policies with respect to the innovative research in the solar energy field.
Despite the fact that diffusion research has existed for more than a century, a quantitative review covering this subject in a broad and general context is still lacking. This article reviews diffusion research by providing an extensive bibliometric and clustering analysis. In total, we identified thirteen clusters comprising 6,811 publications over the period of 2002-2011, and thereby describe the characteristics of diffusion research in an extensive and general way based on quantitative bibliometric methods. The analysis reveals that diffusion research is highly interdisciplinary in character, involving several disciplines from ethnology to economics, with many overlapping research trails. The concluding section indicates that diffusion research seems to be data driven and relies heavily on solely empirical studies. Consequently, influential publications rely on empirical data that support and change theories in modest ways only. In this contribution, we propose a review method that produces a fairly good overview of the research area and which can be applied to any knowledge field to replace or complement the traditional literature review.
An estimation of the h-index is proposed for cases when the original variable underlying the distribution for which the h-index had been determined was rescaled. Within its validity limits, the approximation can be usefully applied for field normalization, change of time frames or other changes of measurement scales.
Journals are increasingly making use of online supplemental information (OSI) as a means to convey part of the material previously included in the papers themselves. Quite often, material displaced to OSI is accompanied by references that, with rare exceptions, are not incorporated into citation databases. An analysis of OSI in a random sample of papers published in 2013 in the Proceedings of the National Academy of Sciences of the USA revealed that unique references only listed in OSI amount to more than 10 % of the number of references included in the papers themselves. Obliteration of these references in citation databases contributes to substantial inaccuracies in citation counts, with a bias against papers that are cited only in the methods sections usually displaced to OSI.
The application of machine learning algorithms in the construction of ranking models is a relatively new research area which has emerged during the last 10 years within the field of artificial intelligence and information retrieval. This paper presents a bibliometric study of scientific output on learning to rank (L2R) between 2000 and 2013. For this procedure to be successful, every relevant bibliographic L2R record retrieved from the Scopus database was considered. The records were processed according to a series of one-dimensional and multi-dimensional metric indicators which were selected for the study. The results of this research provide the scientific community with reliable, up-to-date information about the state of L2R research and trends, and will enable researchers to develop valuable studies to reinforce research, development and innovation.
Online bibliographic databases are powerful resources for research in data mining and social network analysis especially co-author networks. Predicting future rising stars is to find brilliant scholars/researchers in co-author networks. In this paper, we propose a solution for rising star prediction by applying machine learning techniques. For classification task, discriminative and generative modeling techniques are considered and two algorithms are chosen for each category. The author, co-authorship and venue based information are incorporated, resulting in eleven features with their mathematical formulations. Extensive experiments are performed to analyze the impact of individual feature, category wise and their combination w.r.t classification accuracy. Then, two ranking lists for top 30 scholars are presented from predicted rising stars. In addition, this concept is demonstrated for prediction of rising stars in database domain. Data from DBLP and Arnetminer databases (1996-2000 for wide disciplines) are used for algorithms' experimental analysis.
Highly cited papers are an important reference point in a research field. H-Classics is a new identification method of highly cited papers that is based on the H-index and is sensitive to the own characteristics of any research discipline and also its evolution. Recently, Ho (Scientometrics 98(1):137-155, 2014) presented a study on highly cited papers in Social Work area using as selection criterion a citation threshold value equal to 50 citations received. In this paper, we present a new study on the highly cited papers in Social Work discipline which is developed using the concept of H-Classics. This new study provides more precise results and a different vision on Social Work area.
Since the early 1970s, scholars have contributed their talent and intellect towards the establishment of the discipline and the education of the next generation of hospitality and tourism professionals. Espousing the popular notion "publish or perish", numerous scholars have explored the discipline's research foundations from an array of different perspectives, such as the ranking and rating of scholars, journal publications and institutions. This novel empirical endeavor aims to enrich the existing intellectual capital by investigating the publication strategies of forty-four prolific hospitality and tourism scholars, by focusing on three distinctive thematic areas, namely, a journal's impact factor and citations, authorship specifics, and research themes. Findings are of interest to both current and future scholars in their quest for academic excellence and contributions, which further enhance the hospitality and tourism discipline.
There is a worldwide trend towards application of bibliometric research evaluation, in support of the needs of policy makers and research administrators. However the assumptions and limitations of bibliometric measurements suggest a probabilistic rather than the traditional deterministic approach to the assessment of research performance. The aim of this work is to propose a multivariate stochastic model for measuring the performance of individual scientists and to compare the results of its application with those arising from a deterministic approach. The dataset of the analysis covers the scientific production indexed in Web of Science for the 2006-2010 period, of over 900 Italian academic scientists working in two distinct fields of the life sciences.
An analysis of article-level metrics of 27,856 PLOS ONE articles reveals that the number of tweets was weakly associated with the number of citations (beta = 0.10), and weakly negatively associated with citations when the number of article views was held constant (beta = -0.06). The number of tweets was predictive of other social media activity (beta = 0.34 for Mendeley and beta = 0.41 for Facebook), but not of the number of article views on PubMed Central (beta = 0.01). It is concluded that the scientific citation process acts relatively independently of the social dynamics on Twitter.
In the past 30 years, publications from either China or Germany have increased exponentially with China growing much faster especially in the natural sciences, engineering and technology. In medical and health sciences, however, China still lags behind Germany in terms of publication production. Germany performs better in producing high-quality papers (measured by citations). Collaboration between the two countries has increased significantly, especially in the natural sciences and engineering and technology, but less so in medical and health sciences. Collaboration between China and Germany may help to raise quality of Chinese research in terms of highly cited papers.
The Computer Science (CS) community has been discussing, for some time now, the role of conferences as publication venues. In this regard, computer scientists claim to have a long-standing tradition in publishing their research results in conferences, which are also recognized as being different to events in other disciplines. This practice, however, contrasts with journal driven publication practices which are the prevailing academic standard. Consequently, the assessment of the quality of CS conferences with respect to journals is a recurrent topic of discussion within evaluation boards in charge of judging researchers' performance. Even when agreements are feasible inside the discipline, they are often subject to the scrutiny in the context of multi-disciplinary evaluation boards-usually ruled by standard bibliometrics-in which CS researchers compete for obtaining scholarships, positions and funding. The Argentinian CS community is not an exception in this respect. In this paper, we present a study of the publication practices of the Argentinian CS community, their evolution over time and, more importantly, the impact they achieved in terms of citations. The findings of this study are good basis for understanding the publishing practices of our community, promoting future discussions as well as supporting the community positions regarding these issues.
Paper featured on the cover of a journal has more visibility in an issue compared with other ordinary articles for both printed and electronic journal. Does this kind of visibility guarantee more attention and greater impact of its associated content than the non-cover papers? In this research, usage and citation data of 60 issues of PLOS Biology from 2006 to 2010 are analyzed to compare the attention and scholarly impact between cover and non-cover paper. Our empirical study confirms that, in most cases, the group difference between cover and non-cover paper is not significant for attention or impact. Cover paper is not the best one, nor at the upper level in one issue considering the attention or the citation impact. Having a paper featured on the cover of a journal may be a source of pride to researchers, many institutions and researchers would even release news about it. However, a paper being featured on the cover of a journal doesn't guarantee more attention and greater impact.
It is well known in bibliometrics that the average number of citations per paper differs greatly between the various disciplines. The differing citation culture (in particular the different average number of references per paper and thereby the different probability of being cited) is widely seen as the cause of this variation. Based on all Web of Science (WoS) records published in 1990, 1995, 2000, 2005, and 2010 we demonstrate that almost all disciplines show similar numbers of references in the appendices of their papers. Our results suggest that the average citation rate is far more influenced by the extent to which the papers (cited as references) are included in WoS as linked database records. For example, the comparatively low citation rates in the humanities are not at all the result of a lower average number of references per paper but are caused by the low fraction of linked references which refer to papers published in the core journals covered by WoS.
Wikipedia may be the best-developed attempt thus far to gather all human knowledge in one place. Its accomplishments in this regard have made it a point of inquiry for researchers from different fields of knowledge. A decade of research has thrown light on many aspects of the Wikipedia community, its processes, and its content. However, due to the variety of fields inquiring about Wikipedia and the limited synthesis of the extensive research, there is little consensus on many aspects of Wikipedia's content as an encyclopedic collection of human knowledge. This study addresses the issue by systematically reviewing 110 peer-reviewed publications on Wikipedia content, summarizing the current findings, and highlighting the major research trends. Two major streams of research are identified: the quality of Wikipedia content (including comprehensiveness, currency, readability, and reliability) and the size of Wikipedia. Moreover, we present the key research trends in terms of the domains of inquiry, research design, data source, and data gathering methods. This review synthesizes scholarly understanding of Wikipedia content and paves the way for future studies.
To be effective and at the same time sustainable, a community data curation model needs to be aligned with the community's current data practices, including research project activities, data types, and perceptions of data quality. Based on a survey of members of the condensed matter physics (CMP) community gathered around the National High Magnetic Field Laboratory, a large national laboratory, this article defines a model of CMP research project tasks consisting of 10 task constructs. In addition, the study develops a model of data quality perceptions by CMP scientists consisting of four data quality constructs. The paper also discusses relationships among the data quality perceptions, project roles, and demographic characteristics of CMP scientists. The findings of the study can inform the design of a CMP data curation model that is aligned and harmonized with the community's research work structure and data practices.
Micro-blogging services such as Twitter represent constantly evolving, user-generated sources of information. Previous studies show that users search such content regularly but are often dissatisfied with current search facilities. We argue that an enhanced understanding of the motivations for search would aid the design of improved search systems, better reflecting what people need. Building on previous research, we present qualitative analyses of two sources of data regarding how and why people search Twitter. The first, a diary study (p=68), provides descriptions of Twitter information needs (n=117) and important meta-data from active study participants. The second data set was established by collecting first-person descriptions of search behavior (n=388) tweeted by twitter users themselves (p=381) and complements the first data set by providing similar descriptions from a more plentiful source. The results of our analyses reveal numerous characteristics of Twitter search that differentiate it from more commonly studied search domains, such as web search. The findings also shed light on some of the difficulties users encounter. By highlighting examples that go beyond those previously published, this article adds to the understanding of how and why people search such content. Based on these new insights, we conclude with a discussion of possible design implications for search systems that index micro-blogging content.
Information use intrigues information behavior researchers, though many have struggled with how to conceptualize and study this phenomenon. Some work suggests that information may have social uses, hinting that information use is more complicated than previous frameworks suggest. Therefore, we use a micro-sociological, symbolic interactionist approach to examine the use of one type of informationbiomedical informationin the everyday life interactions of chronic illness patients and their families. Based on a grounded theory analysis of 60 semi-structured interviews (30 individual patient interviews and 30 family group interviews) and observations within the family group interviews, we identify 4 categories of information use: (a) knowing my body; (b) mapping the social terrain; (c) asserting autonomy; and (d) puffing myself up. Extending previous research, the findings demonstrate use of biomedical information in interactions that construct a valued self for the patient: a person who holds authority, and who is unique and cared for. In so doing, we contribute novel insights regarding the use of information to manage social emotions such as shame, and to construct embodied knowledge that is mobilized in action to address disease-related challenges. We thus offer an expanded conceptualization of information use that provides new directions for research and practice.
Recent studies have shown that counting citations from books can help scholarly impact assessment and that Google Books (GB) is a useful source of such citation counts, despite its lack of a public citation index. Searching GB for citations produces approximate matches, however, and so its raw results need time-consuming human filtering. In response, this article introduces a method to automatically remove false and irrelevant matches from GB citation searches in addition to introducing refinements to a previous GB manual citation extraction method. The method was evaluated by manual checking of sampled GB results and comparing citations to about 14,500 monographs in the Thomson Reuters Book Citation Index (BKCI) against automatically extracted citations from GB across 24 subject areas. GB citations were 103% to 137% as numerous as BKCI citations in the humanities, except for tourism (72%) and linguistics (91%), 46% to 85% in social sciences, but only 8% to 53% in the sciences. In all cases, however, GB had substantially more citing books than did BKCI, with BKCI's results coming predominantly from journal articles. Moderate correlations between the GB and BKCI citation counts in social sciences and humanities, with most BKCI results coming from journal articles rather than books, suggests that they could measure the different aspects of impact, however.
We explore classifying scientific disciplines including their temporal features by focusing on their collaboration structures over time. Bibliometric data for Slovenian researchers registered at the Slovenian Research Agency were used. These data were obtained from the Slovenian National Current Research Information System. We applied a recently developed hierarchical clustering procedure for symbolic data to the coauthorship structure of scientific disciplines. To track temporal changes, we divided data for the period 1986-2010 into five 5-year time periods. The clusters of disciplines for the Slovene science system revealed 5 clusters of scientific disciplines that, in large measure, correspond with the official national classification of sciences. However, there were also some significant differences pointing to the need for a dynamic classification system of sciences to better characterize them. Implications stemming from these results, especially with regard to classifying scientific disciplines, understanding the collaborative structure of science, and research and development policies, are discussed.
According to Leventhal's Common Sense Model of illness regulation, people approach and deal with their illnesses differently depending on their cognitive representations of them. Thus, understanding people's illness representations can be invaluable when assisting them to develop lifestyle modifications that improve their health. What role does information use play in this equation? This is the crucial question addressed by this two-part study. Part 1 hypothesizes a model of how information use at different timepoints may affect illness representations, and then tests this model. The study found that a number of information use type and time pairings (e.g., information used to consult healthcare practitioners at symptom onset) were significantly associated with present-day level of personal control. The results suggest that it is not merely type or timing of information use alone that is helpful in illness coping, but the coupling of the two; this has several implications for the design of patient education programs. Part 2 examines how information use and illness representations differ based on the way an individual participates in online health forums and social media sites. The following four different participation styles were investigated: nonuser, only reading (lurker), posting occasionally but largely reading (infrequent poster), and reading and posting (poster). Differences in both information use and illness perceptions were found, and the implications of these are discussed.
In this article, we present a new algorithm for clustering a bilingual collection of comparable news items in groups of specific topics. Our hypothesis is that named entities (NEs) are more informative than other features in the news when clustering fine grained topics. The algorithm does not need as input any information related to the number of clusters, and carries out the clustering only based on information regarding the shared named entities of the news items. This proposal is evaluated using different data sets and outperforms other state-of-the-art algorithms, thereby proving the plausibility of the approach. In addition, because the applicability of our approach depends on the possibility of identifying equivalent named entities among the news, we propose a heuristic system to identify equivalent named entities in the same and different languages, thereby obtaining good performance.
The construct of value is highly relevant to information. For research on the value of information, Saracevic and Kantor (1997) proposed a framework from a value perspective in philosophy. In this report, we substantiate the framework with an updated review of the literature and demonstrate its applicability to understanding the value of user feedback as one type of information. Our field study, in the setting of a health information provider whose information products serve thousands of Canadian healthcare professionals, provides an example of how this value-of-information framework can be operationalized for an organization. In addition to the theoretical and methodological contributions, this research adds to the literature by documenting the way that textual feedback data were used to optimize the content of an information resource. This contrasts with published studies that only dealt with the use of quantitative feedback by information providers not involved in content production.
Decision making for the complex patient is challenging for doctors because of increased complexity, such as multiple co-morbidities and interprofessionality for which evidence-based literature and guidelines are currently lacking. The consequent uncertainty causes vagueness, threatening patient safety and the quality of care. This article is motivated by the design science paradigm and describes the interprofessional decision-making model for the complex patient, namely, INDECO along with an example instantiation. Drawing on our experience in an intensive care unit of a tertiary hospital in Israel, the bi-dimensional view of this model includes the medical- and the interprofessional perspective. Retrospective assessment of 3 case studies of complex patients is used for assessing the usefulness of INDECO in decision making. The study reported here draws support from relevant literature, including the information science, information systems, and the medical domains. The findings resonate with emerging research developments focusing on healthcare decision making.
The importance of a research article is routinely measured by counting how many times it has been cited. However, treating all citations with equal weight ignores the wide variety of functions that citations perform. We want to automatically identify the subset of references in a bibliography that have a central academic influence on the citing paper. For this purpose, we examine the effectiveness of a variety of features for determining the academic influence of a citation. By asking authors to identify the key references in their own work, we created a data set in which citations were labeled according to their academic influence. Using automatic feature selection with supervised machine learning, we found a model for predicting academic influence that achieves good performance on this data set using only four features. The best features, among those we evaluated, were those based on the number of times a reference is mentioned in the body of a citing paper. The performance of these features inspired us to design an influence-primed h-index (the hip-index). Unlike the conventional h-index, it weights citations by how many times a reference is mentioned. According to our experiments, the hip-index is a better indicator of researcher performance than the conventional h-index.
Many papers have appeared recently assessing the effects of using tables and graphs in scientific publications. In this brief communication, we assess some of the methodological difficulties that have arisen in this context. These difficulties encompass issues of data availability, suitability of indicators, nature and purpose of tables and graphs, and the role of supplementary information.
This paper presents a database link network to measure the impact of databases on biological research. To this end, we used the 20,861 full-text articles from PubMed Central in the field of Bioinformatics. We then extracted databases from the methodology sections of these articles and their references. The list of databases was built with The 2013 Nucleic Acids Research Molecular Biology Database Collection (available online), which includes 1512 databases. The database link network was constructed from sets of pairs of databases mentioned in the methodology sections of full-text PubMed Central articles. The edges of the database link network represent the link relationships between two databases. The weight of each edge is determined either by the link frequency of the two databases (i.e., in the link-weighted database link network) or the topic similarity between two databases (i.e., in the similarity-weighted database link network). With the database link network, we analyzed the topological structure and main paths of the database link network to trace the usage, connection, and evolution of databases. We also conducted content analysis by comparing content similarities among the papers citing databases. (C) 2014 Elsevier Ltd. All rights reserved.
This article examines how different factors influence the number of times articles in the five most recognized transportation journals are cited. The effects of most of the explanatory variables indicating the characteristics of articles, authors and journals correspond with earlier studies of citation counts. Special focus in this study is placed on estimating the relationship between researchers' human capital or skills and their experience. For the purpose of this study, human capital is defined as a scientist's ability to conduct research at the frontier of his or her discipline and is measured by how frequently his or her research is cited. Experience is measured by counting the number of their previous scientific articles. Using negative binomial regression, we find that experience offers a statistically significant positive effect on the human capital of scientists. However, this effect diminishes rapidly with the level of experience. This suggests that young researchers relatively quickly learn the skills and gain the knowledge necessary to produce high-quality research. (C) 2014 Elsevier Ltd. All rights reserved.
The literature on gender differences in research performance seems to suggest a gap between men and women, where the former outperform the latter. Whether one agrees with the different factors proposed to explain the phenomenon, it is worthwhile to verify if comparing the performance within each gender, rather than without distinction, gives significantly different ranking lists. If there were some structural factor that determined a penalty in performance of female researchers compared to their male peers, then under conditions of equal capacities of men and women, any comparative evaluations of individual performance that fail to account for gender differences would lead to distortion of the judgments in favor of men. In this work we measure the extent of differences in rank between the two methods of comparing performance in each field of the hard sciences: for professors in the Italian university system, we compare the distributions of research performance for men and women and subsequently the ranking lists with and without distinction by gender. The results are of interest for the optimization of efficient selection in formulation of recruitment, career advancement and incentive schemes. (C) 2014 Elsevier Ltd. All rights reserved.
This study explores the connections between social and usage metrics (altmetrics) and bibliometric indicators at the author level. It studies to what extent these indicators, gained from academic sites, can provide a proxy for research impact. Close to 10,000 author profiles belonging to the Spanish National Research Council were extracted from the principal scholarly social sites: ResearchGate, Academia.edu and Mendeley and academic search engines: Microsoft Academic Search and Google Scholar Citations. Results describe little overlapping between sites because most of the researchers only manage one profile (72%). Correlations point out that there is scant relationship between altmetric and bibliometric indicators at author level. This is due to the almetric ones are site-dependent, while the bibliometric ones are more stable across web sites. It is concluded that altmetrics could reflect an alternative dimension of the research performance, close, perhaps, to science popularization and networking abilities, but far from citation impact. (C) 2014 Elsevier Ltd. All rights reserved.
In this article we study three types of uncitedness in Library and Information Science journals: uncitedness for articles, authors and topics. One important aspect in this study is giving accurate definitions of the indicators for measuring uncited papers, uncited authors and uncited topics. It is found that for the period 1991-2010 ratios of uncited papers fluctuate within the interval [0,0.1]. This ratio is relatively stable and not very high. Comparison of average number of pages, average number of references, average number of authors per paper and percentage of single-authored papers between cited and uncited papers shows that no matter the journal, the first three indicators' values for uncited papers are lower, while the values of the fourth indicator are higher, than the corresponding values for cited papers. The fact that almost all uncited authors in a journal published only one paper in this journal illustrates that a journal's uncited authors are the least productive authors in this journal. Yet, productive and highly cited authors also publish uncited papers. As to why some topics fall into the group of uncited topics, the hypothesis is that the combination of unfamiliar keywords forms an unfamiliar topic, a topic authors have elected not to study further. Another assumption is that some uncited topics fall outside the field of Library and Information Science. Retrieval results in the Web of Science for a set of uncited keywords and keyword combinations support this assumption. (C) 2014 Elsevier Ltd. All rights reserved.
This work aims at establishing the task-force involved in scientific production at the institutional or national level, globally or per area or sub-area of knowledge. In the proposed system, the estimated task-force is further divided into core (permanent members of the institution(s)) and collaborators (more mobile members), and allows normalization of scientific production. Research groups/institutions/countries of different sizes/scientific areas can, thus, be directly compared and the time evolution of these groups inspected. Results are presented for the characterization of four universities (from Portugal, Sweden and USA) in the 2008-2012 period, for the research area of Chemistry. It is shown that it is possible not only to estimate the task-force, but also to derive new, relevant indicators for the set under analysis. Aspects pertaining to collaboration fluxes are also assessed. (C) 2014 Elsevier Ltd. All rights reserved.
This paper analyzes the diverse scientific careers of researchers in order to understand the key factors that could lead to a successful career. Essentially, we intend to answer some specific questions pertaining to a researcher's scientific career - What are the local and the global dynamics regulating a researcher's decision to select a new field of research at different points of her entire career? What are the suitable quantitative indicators to measure the diversity of a researcher's scientific career? We propose two entropy-based metrics to measure a researcher's choice of research topics. Experiments with large computer science bibliographic dataset reveal that there is a strong correlation between the diversity of the career of a researcher and her success in scientific research in terms of the number of citations. We observe that while most of the researchers are biased toward either adopting diverse research fields or concentrating on very few fields, a majority of the prominent researchers tend to follow a typical "scatter-gather" policy - although their entire careers are immensely diverse with different types of fields selected at different time periods, they remain focused primarily in at most one or two fields at any particular time point of their career. Finally, we propose a stochastic model which, quite accurately, mimics the notion of field selection process observed in the real publication dataset. (C) 2014 Elsevier Ltd. All rights reserved.
Purpose: The goal is to identify the features of top-rated gold open access (OA) journals by testing seven main variables: languages, countries, years of activity and years in the DOAJ repository, publication fee, the field of study, whether the journal has been launched as OA or converted, and the type of publisher. Sample: A sample of 1910 gold OA journals has been obtained by combining Scopus SJR 2012, the DOAJ, and data provided by previous studies (Solomon, 2013). Method: We have divided the SJR index into quartiles for all journals' subject areas. First, we show descriptive statistics by combining quartiles based on their features. Then, after having converted the quartiles into a dummy variable, we test it as a dependent variable in a binary logistic regression. Contribute: This work contributes empirically to better understanding the gold OA efficacy of data analysis, which may be helpful in improving journals' rankings in the areas where this is still a struggle. Findings: Significant results have been found for all variables, except for the types of publishers, and for born or converted journals. (C) 2014 Elsevier Ltd. All rights reserved.
By empirical demonstration, this study extends the assessments of BRICS countries (Brazil, Russia, India, China and South Africa) in performing science and technology in previous studies by exploring their cumulative patterns of science and technology (proxied by publications and patents respectively). Projections of cumulative production in science and technology are made using logistic growth function. Our analyses show that - though having different growth trajectories in science production - the BRICS countries exhibit similar patterns in pursuing technology. This embodies the strong commitment of BRICS to improve their technological capabilities in the process of industrial development. Inspired by the Relative Impact Index (RII) proposed by Nesta and Patel, we propose the Relative Science Impact Index (RSII) to evaluate the relative impact of science and technology on the process of technological catching-up in emerging economies and examine the co-evolution between science-based patents and patent citations. Our correlation analysis between forward citation and RSII marks some distinctive pursuits of BRICS countries in science-based patenting activities. (C) 2014 Elsevier Ltd. All rights reserved.
We study the problem of normalizing citation impact indicators for differences in citation practices across scientific fields. Normalization of citation impact indicators is usually done based on a field classification system. In practice, the Web of Science journal subject categories are often used for this purpose. However, many of these subject categories have a quite broad scope and are not sufficiently homogeneous in terms of citation practices. As an alternative, we propose to work with algorithmically constructed classification systems. We construct these classification systems by performing a large-scale clustering of publications based on their citation relations. In our analysis, 12 classification systems are constructed, each at a different granularity level. The number of fields in these systems ranges from 390 to 73,205 in granularity levels 1-12. This contrasts with the 236 subject categories in the WoS classification system. Based on an investigation of some key characteristics of the 12 classification systems, we argue that working with a few thousand fields may be an optimal choice. We then study the effect of the choice of a classification system on the citation impact of the 500 universities included in the 2013 edition of the CWTS Leiden Ranking. We consider both the MNCS and the PPtop 10% indicator. Globally, for all the universities taken together citation impact indicators generally turn out to be relatively insensitive to the choice of a classification system. Nevertheless, for individual universities, we sometimes observe substantial differences between indicators normalized based on the journal subject categories and indicators normalized based on an appropriately chosen algorithmically constructed classification system. (C) 2014 Elsevier Ltd. All rights reserved.
CiteSpace is a visual document analysis software, by which performances and trends of certain disciplines can be displayed for a given period. Moreover, the evolution of a frontier research can be explored by such software as well. This research focuses on the visualization and quantitative study in bibliographic databases by taking the university-industry collaboration studies as an example. Using the Web of Science (WOS), 587 publications and over 30,000 references were selected for analysis, which produced the following results: (1) Our method can clearly reveal the key elements of certain disciplines, such as the largest share of publications, the most frequently cited authors and journals in the university-industry cooperation research field; (2) The relationships among the frequently cited authors, references, journals and keywords can be explained visually in the university-industry cooperation research field; (3) Of special note is that the potential problems and evolutionary trends of certain research fields such as university-industry cooperation can also be ascertained via our method; (4) In general, according to the case study, our visualization and quantitative method evolved a new research framework to evaluate the performance of some research areas. (C) 2014 Elsevier Ltd. All rights reserved.
Research networks play a crucial role in the production of new knowledge since collaboration contributes to determine the cognitive and social structure of scientific fields and has a positive influence on research. This paper analyses the structure of co-authorship networks in three different fields (Nanoscience, Pharmacology and Statistics) in Spain over a three-year period (2006-2008) and explores the relationship between the research performance of scientists and their position in co-authorship networks. A denser co-authorship network is found in the two experimental fields than in Statistics, where the network is of a less connected and more fragmented nature. Using the g-index as a proxy for individual research performance, a Poisson regression model is used to explore how performance is related to different co-authorship network measures and to disclose interfield differences. The number of co-authors (degree centrality) and the strength of links show a positive relationship with the g-index in the three fields. Local cohesion presents a negative relationship with the g-index in the two experimental fields, where open networks and the diversity of co-authors seem to be beneficial. No clear advantages from intermediary positions (high betweenness) or from being linked to well-connected authors (high eigenvector) can be inferred from this analysis. In terms of g-index, the benefits derived by authors from their position in co-authorship networks are larger in the two experimental fields than in the theoretical one. (C) 2014 Elsevier Ltd. All rights reserved.
In this contribution we consider one particular node in a network, referred to as the ego. We combine Zipf lists and ego measures to put forward a conceptual framework for characterizing this particular node. In this framework we unify different forms of h-indices, in particular the h-degree, introduced in the literature. Similarly, different forms of the g-index, the a-index and the R-index are unified. We focus on the pure mathematical and logical concepts, referring to the existing literature for practical examples. (C) 2014 Elsevier Ltd. All rights reserved.
The h-index has been shown to increase in many cases mostly because of citations to rather old publications. This inertia can be circumvented by restricting the evaluation to a publication and citation time window. Here I report results of an empirical study analyzing the evolution of the thus defined timed h-index in dependence on the length of the time window. (C) 2014 Elsevier Ltd. All rights reserved.
Lotkaian informetrics is the framework most often used to study statistical distributions in the production and usage of information. Although Lotkaian distributions are traditionally used to characterize the Information Production Process (IPP), we have shown in a previous article that the IPP can successfully be studied using the effort function - the latter having been initially introduced to define the Exponential Informetric Process (EIP). These themes continue to be developed in this article, in which we present a necessary and sufficient condition for the existence of the EIP. Our current approach is similar to the one used to study IPPs. Inverse power and exponential distributions serve to illustrate the results obtained in the context of an EIP. Numerical examples are discussed. (C) 2014 Elsevier Ltd. All rights reserved.
In this paper, we analyze the adequacy and applicability of readership statistics recorded in social reference management systems for creating knowledge domain visualizations. First, we investigate the distribution of subject areas in user libraries of educational technology researchers on Mendeley. The results show that around 69% of the publications in an average user library can be attributed to a single subject area. Then, we use co-readership patterns to map the field of educational technology. The resulting visualization prototype, based on the most read publications in this field on Mendeley, reveals 13 topic areas of educational technology research. The visualization is a recent representation of the field: 80% of the publications included were published within ten years of data collection. The characteristics of the readers, however, introduce certain biases to the visualization. Knowledge domain visualizations based on readership statistics are therefore multifaceted and timely, but it is important that the characteristics of the underlying sample are made transparent. (C) 2015 Elsevier Ltd. All rights reserved.
Distributing scientific funding to the suitable universities and research fields is very important to the innovation acceleration in science and technology. Using a longitudinal panel dataset of the National Natural Science Foundation of China (NSFC), the total 224,087 sponsored projects is utilized to investigate the distributions of scientific funding across universities and research disciplines. The inequality of funding distribution is studied through the investigation of Gini coefficient, and its fundamental rules are discovered through the technique of distribution fitting. It is found that the inequality of distributions of NSFC funding across 1971 universities is decreasing, and the distribution of funding and supported universities of 971 research fields follow Generalized Pareto distribution and Geometric distribution function, respectively. This study is dedicated to give an entire landscape to help make policy of distributing scientific funding. (C) 2015 Elsevier Ltd. All rights reserved.
Science is increasingly produced in collaborative teams, but collaborative teams in science are self-assembled and fluid. Such characteristics call for a network approach to account for external activities responsible for team product but taking place beyond closed team boundaries in the open network. Given such characteristics of collaborative teams in science, we empirically test the interdependence between collaborative teams in the same network. Specifically, using fixed effects Poisson models and panel data of 1310 American scientists' life-time publication histories, we demonstrate knowledge spillovers from new collaborators to other teams not involving these new collaborators. Our findings have important implications for studying the organization of science. (C) 2014 Elsevier Ltd. All rights reserved.
Author direct citation analysis (ADCA, also called inter-citation or cross citation) is a new feasible and applicable technique for exploring knowledge communication and discovering scientific structure. This study explored ADCA among prolific, highly cited, and core authors in information science in China and around the world. The results revealed the following. (1) The datasets in China and around the world cover overlapping, but also unique topics. Research subjects on information science around the world can be divided into three categories and 10 clusters; meanwhile, that in China can be divided into three categories and 9 clusters. Chinese scholars who are mostly involved in cross subjects and multi-fields are not as specialized and profound as foreign scholars. An obvious imbalance exists in the evolution of discipline structure around the world, indicating the necessity of a synchronous promotion of research specialty and cross comprehensiveness. Chinese scholars concentrate more on topics such as competitive intelligence, information resource management, and information retrieval, and they focus less on information security and user analysis. (2) Knowledge communication between active authors is stronger than the knowledge flow from highly influential authors to active authors around the world; meanwhile, Chinese researchers tend to adopt the knowledge of authoritative literature. The knowledge flow through bidirectional direct citation is related to mutual knowledge communication. Authoritative scholars are produced when prolific authors cite highly cited authors. The level of mutual recognition among Chinese scholars has not reached that among foreign scholars; in the former, less bidirectional flow of knowledge is involved, and unidirectional flow is limited to geographical proximity, cooperation, or teacher-student relationship. (3) In contrast to traditional author co-citation analysis (ACA), ADCA pays more attention to the mutual interaction among currently active scholars and to mainly showing the current research focus. (C) 2015 Elsevier Ltd. All rights reserved.
This paper shows empirically how the choice of certain data pre-processing methods for disambiguating author names affects our understanding of the structure and evolution of co-publication networks. Thirty years of publication records from 125 Information Systems journals were obtained from DBLP. Author names in the data were pre-processed via algorithmic disambiguation. We applied the commonly used all-initials and first-initial based disambiguation methods to the data, generated over-time networks with a yearly resolution, and calculated standard network metrics on these graphs. Our results show that initial-based methods underestimate the number of unique authors, average distance, and clustering coefficient, while overestimating the number of edges, average degree, and ratios of the largest components. These self-reinforcing growth and shrinkage mechanisms amplify over time. This can lead to false findings about fundamental network characteristics such as topology and reasoning about underlying social processes. It can also cause erroneous predictions of trends in future network evolution and suggest unjustified policies, interventions and funding decisions. The findings from this study suggest that scholars need to be more attentive to data pre-processing when analyzing or reusing bibliometric data. (C) 2015 Elsevier Ltd. All rights reserved.
The use of quantum concepts and formalisms in the information sciences is assessed through an analysis of published literature. Five categories are identified: use of loose analogies and metaphors between concepts in quantum physics and library/information science; use of quantum concepts and formalisms in information retrieval; use of quantum concepts and formalisms in studying meaning and concepts; quantum social science, in areas adjacent to information science; and the qualitative application of quantum concepts in the information disciplines. Quantum issues have led to demonstrable progress in information retrieval and semantic modelling, with less clear-cut progress elsewhere. Whether there may be a future "quantum turn" in the information sciences is debated, the implications of such a turn are considered, and a research agenda outlined.
This study examines the ways in which informational support based on user-generated content is provided for the needs of leisure-related travel planning in an online discussion group and a Q&A site. Attention is paid to the grounds by which the participants bolster the informational support. The findings draw on the analysis of 200 threads of a Finnish online discussion group and a Yahoo! Answers Q&A (question and answer) forum. Three main types of informational support were identified: providing factual information, providing advice, and providing personal opinion. The grounds used in the answers varied across the types of informational support. While providing factual information, the most popular ground was description of the attributes of an entity. In the context of providing advice, reference to external sources of information was employed most frequently. Finally, although providing personal opinions, the participants most often bolstered their views by articulating positive or negative evaluations of an entity. Overall, regarding the grounds, there were more similarities than differences between the discussion group and the Q&A site.
In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with the following 4 types: news, ongoing events, memes, and commemoratives. While previous research has analyzed trending topics over the long term, we look at the earliest tweets that produce a trend, with the aim of categorizing trends early on. This allows us to provide a filtered subset of trends to end users. We experiment with a set of straightforward language-independent features based on the social spread of trends and categorize them using the typology. Our method provides an efficient way to accurately categorize trending topics without need of external data, enabling news organizations to discover breaking news in real-time, or to quickly identify viral memes that might inform marketing decisions, among others. The analysis of social features also reveals patterns associated with each type of trend, such as tweets about ongoing events being shorter as many were likely sent from mobile devices, or memes having more retweets originating from a few trend-setters.
Participation is today central to many kinds of research and design practice in information studies and beyond. From user-generated content to crowdsourcing to peer production to fan fiction to citizen science, the concept remains both unexamined and heterogeneous in its definition. Intuitions about participation are confirmed by some examples, but scandalized by others, and it is difficult to pinpoint why participation seems to be robust in some cases and partial in others. In this paper we offer an empirically based, comparative analysis of participation that demonstrates its multidimensionality and provides a framework that allows clear distinctions and better analyses of the role of participation. We derive 7 dimensions of participations from the literature on participation and exemplify those dimensions using a set of 102 cases of contemporary participation that include uses of the Internet and new media.
A similarity-oriented approach for deriving reference values used in citation normalization is explored and contrasted with the dominant approach of utilizing database-defined journal sets as a basis for deriving such values. In the similarity-oriented approach, an assessed article's raw citation count is compared with a reference value that is derived from a reference set, which is constructed in such a way that articles in this set are estimated to address a subject matter similar to that of the assessed article. This estimation is based on second-order similarity and utilizes a combination of 2 feature sets: bibliographic references and technical terminology. The contribution of an article in a given reference set to the reference value is dependent on its degree of similarity to the assessed article. It is shown that reference values calculated by the similarity-oriented approach are considerably better at predicting the assessed articles' citation count compared to the reference values given by the journal-set approach, thus significantly reducing the variability in the observed citation distribution that stems from the variability in the articles' addressed subject matter.
This research creates an architecture for investigating the existence of probable lexical divergences between articles, categorized as Institute for Scientific Information (ISI) and non-ISI, and consequently, if such a difference is discovered, to propose the best available classification method. Based on a collection of ISI- and non-ISI-indexed articles in the areas of business and computer science, three classification models are trained. A sensitivity analysis is applied to demonstrate the impact of words in different syntactical forms on the classification decision. The results demonstrate that the lexical domains of ISI and non-ISI articles are distinguishable by machine learning techniques. Our findings indicate that the support vector machine identifies ISI-indexed articles in both disciplines with higher precision than do the Naive Bayesian and K-Nearest Neighbors techniques.
Using a large data set, indexed by Thomson Reuters, consisting of 4.4 million articles published in 1998-2003 with a 5-year citation window for each year, this article studies country citation distributions for a partitioning of the world into 36 countries and two geographical areas in eight broad scientific fields and the all-sciences case. The two key findings are the following. First, country citation distributions are highly skewed and very similar to each other in all fields. Second, to a large extent, differences in country citation distributions can be accounted for by scale factors. The Empirical situation described in the article helps to understand why international comparisons of citation impact according to (a) mean citations and (b) the percentage of articles in each country belonging to the top 10% of the most cited articles are so similar to each other.
In recent years, searching the web on mobile devices has become enormously popular. Because mobile devices have relatively small screens and show fewer search results, search behavior with mobile devices may be different from that with desktops or laptops. Therefore, examining these differences may suggest better, more efficient designs for mobile search engines. In this experiment, we use eye tracking to explore user behavior and performance. We analyze web searches with 2 task types on 2 differently sized screens: one for a desktop and the other for a mobile device. In addition, we examine the relationships between search performance and several search behaviors to allow further investigation of the differences engendered by the screens. We found that users have more difficulty extracting information from search results pages on the smaller screens, although they exhibit less eye movement as a result of an infrequent use of the scroll function. However, in terms of search performance, our findings suggest that there is no significant difference between the 2 screens in time spent on search results pages and the accuracy of finding answers. This suggests several possible ideas for the presentation design of search results pages on small devices.
In a knowledge-intensive environment, a task in an organization is typically performed by a group of people who have task-related knowledge and expertise. Each group may require task-related knowledge of different topic domains and documents to accomplish its tasks. Document recommendation methods are very useful to resolve the information overload problem and proactively support knowledge workers in the performance of tasks by recommending appropriate documents to meet their information needs. A worker's document referencing behavior can be modeled as a knowledge flow (KF) to represent the evolution of his information needs over time. However, the information needs of workers and groups may change over time, so that modeling the knowledge referencing behavior of a group of workers is difficult. Additionally, most traditional recommendation methods which provide personalized recommendations do not consider workers' KFs, or the information needs of the majority of workers in a group to recommend task knowledge. In this work, I integrate the KF mining method and propose group-based recommendation methods, including group-based collaborative filtering (GCF) and group content-based filtering (GCBF), to actively provide task-related documents for groups. Experimental results show that the proposed methods have better performance than the personalized recommendation methods in recommending the needed documents for groups. Thus, the recommended documents can fulfill the groups' task needs and facilitate knowledge sharing among groups.
Informational support and nurturant support are two basic types of social support offered in online health communities. This study identifies types of social support in the QuitStop forum and brings insights to exchange patterns of social support and user behaviors with content analysis and social network analysis. Motivated by user information behavior, this study defines two patterns to describe social support exchange: initiated support exchange and invited support exchange. It is found that users with a longer quitting time tend to actively give initiated support, and recent quitters with a shorter abstinent time are likely to seek and receive invited support. This study also finds that support givers of informational support quit longer ago than support givers of nurturant support, and support receivers of informational support quit more recently than support receivers of nurturant support. Usually, informational support is offered by users at late quit stages to users at early quit stages. Nurturant support is also exchanged among users within the same quit stage. These findings help us understand how health consumers are supporting each other and reveal new capabilities of online intervention programs that can be designed to offer social support in a timely and effective manner.
In the past three decades, several studies have extracted antecedents to the user adoption of health information systems (HIS). This study proposes a reflective pause on the HIS adoption literature to broaden our understanding of factors contributing to the user adoption of electronic medical record (EMR). This paper provides a comprehensive taxonomy of the factors influencing the user adoption of EMR and classifies these factors into meaningful categories. We searched the selected keywords on several academic databases and found an initial set of 9,684 studies. We excluded papers on the basis of their title, abstract, and full text (89 remaining papers). The effectiveness of adoption theories has been explored based on the empirical results identified in the EMR research. Furthermore, according to the conceptualization of the factors in the literature, a list of 78 factors affecting EMR adoption was identified. These factors were classified to eight categories: individual factors, psychological factors, behavioural factors, environmental factors, organizational factors, financial factors, legal factors, and technical factors. The results have implications for researchers and practitioners, including policymakers, marketers, information technology (IT) professionals, health information management (HIM) practitioners, health practice managers, and EMR system developers.
This article presents the first situation-rooted typology of intimate partner violence (IPV) postings in social question and answer (Q&A) sites. Survivors as well as abusers post high-risk health, legal, and financial questions to Q&A sites; answers come from individuals who self-identify as lawyers, experts, survivors, and abusers. Using grounded theory this study examines 1,241 individual posts, each within its own context, raising issues of agency and expectations. Informed by Savolainen's everyday life information seeking (ELIS) and Nahl's affective load theory (ALT), the resultant Q&A typology suggests implications for IPV service design, policy development, and research priorities.
Conceptual frameworks and taxonomies are an important part of the emerging base of knowledge on the curation of research data. We present the Data Practices and Curation Vocabulary (DPCVocab), a functional vocabulary created for specifying relationships among data practices in research, types of data produced and used, and curation roles and activities. The vocabulary consists of 3 categoriesResearch Data Practices, Data, and Curationwith 187 terms validated through empirical studies of scientific data practices in the Earth and life sciences. The present article covers the DPCVocab development process and examines applications for mapping relationships across the 3 categories, identifying factors for projecting curation costs and important differences in curation requirements across disciplines. As a tool for curators, the vocabulary provides a framework for charting curation options and guiding systematic administration of curation services. It can serve as a shared terminology or lingua franca to support interactions and collaboration among curators, data producers, system developers, and other stakeholders in data infrastructure and services. The DPCVocab as a whole supports both the technical and the human aspects of professional curation work essential to the modern research system.
Name ambiguity in the context of bibliographic citation affects the quality of services in digital libraries. Previous methods are not widely applied in practice because of their high computational complexity and their strong dependency on excessive attributes, such as institutional affiliation, research area, address, etc., which are difficult to obtain in practice. To solve this problem, we propose a novel coarse-to-fine framework for name disambiguation which sequentially employs 3 common and easily accessible attributes (i.e., coauthor name, article title, and publication venue). Our proposed framework is based on multiple clustering and consists of 3 steps: (a) clustering articles by coauthorship and obtaining rough clusters, that is fragments; (b) clustering fragments obtained in step 1 by title information and getting bigger fragments; (c) and clustering fragments obtained in step 2 by the latent relations among venues. Experimental results on a Digital Bibliography and Library Project (DBLP) data set show that our method outperforms the existing state-of-the-art methods by 2.4% to 22.7% on the average pairwise F1 score and is 10 to 100 times faster in terms of execution time.
This article examines whether Sister Nivedita (1867-1911) played any role in the writing of the articles and books by J. C. Bose (1858-1937), an Indian scientist, during their long period of association (1898-1911). Here, the writings of J.C. Bose are studied from this perspective. It may be noted that the relation between Bose and Nivedita was personal in nature. Three important style markersfunction words, punctuation marks, and usage of words are used to trace the changes in Bose's writing style, if any. The results show that during the association between Bose and Nivedita, Bose's style of writing changed considerably in comparison to the earlier period when they did not know each other. It shows Nivedita helped Bose in preparing his journal articles and books. The results reveal a distinct change in Bose's writing style after his meeting with Nivedita. This is reflected in his changing pattern of usage of these three stylistic features. Bose slowly moved back towards his original style of writing after Nivedita's death, but his later works still carried Nivedita's influence.
Service research in information systems (IS) has received attention over many years. This study probes into the development of service science, management, and engineering literatures through the perspective of bibliometrics under IS. Data were based on the science social citation index, from the Institute of Scientific Information Web of Science database. A total of 4,513 entries in a span of 22 years from 1991 to 2012 were collected. This paper implemented and classified service science, service management, and SE articles using the following publication type and language, characteristics of articles outputs, country, subject categories and journals, and the frequency of title-words and keywords used. Also, the paper performs the K-S test to check whether the distribution of author article production follows Lotka's law. Meanwhile, the analysis indicated the most relevant disciplines for SSME subject category provided by business economics, information science and library science, and computer science.
The measurement of the research output of the Higher Education Institutions (HEIs) is problematic, due to the multi-product nature of their teaching and research activities. This study analyses the difficulties related to the measurement of the research output of the HEI and proposes a simple overall indicator which incorporates quantitative and qualitative aspects to permit the decomposition of the influence of the two factors. On the basis of this indicator homogeneous comparisons are made of the relative research output of the countries of the European Union and its evolution during the period 1996-2010.
In organizations, knowledge creation activities are embedded in collaborative networks and are influenced by their partners. Therefore, we examine how entire networks change over time in this study, as well as the reasoning behind the structures of ego networks based on unique scientific research discoveries published in the emerging cross-disciplinary field of nano-energy. These data were extracted from Science Citation Index Expanded. Specifically, we mainly focus on two dimensions of ego network changes: network growth and diversity. Results demonstrate the recent remarkable growth of inter-organizational collaborative networks in the nano-energy field and empirically prove that the subsequent growth and diversity of ego networks are caused by three coexisting driving forces (collaborative capacity, network status position and cohesion) that act collectively. Our study is conducted at the organizational level because we investigate the universities, research institutes and firms that participate in nano-energy scientific research and the collaborative networks formed through co-authorships among these institutions in knowledge creation processes. Moreover, our study has significant implications for the scientific research conducted by organizations in developing countries and emerging fields.
Business portfolio restructuring (BPR) has received considerable attention in the fields of management and finance. However, to the best of our knowledge, there are no studies applying extensive qualitative and quantitative methods to BPR research. The aim of the present paper is to fill this gap by presenting the first complete bibliometric review of BPR research. In this work, for the first time, not only the extant literature published between 1993 and 2012 is analysed but also the most cited bibliographic references using bibliometric techniques. In this way, past and present academic contributions are reviewed. Four main results are forthcoming: first, a certain parallelism is found with bibliometric studies in strategic management. Second, the intellectual grounding for this field involves the subjects of economics, management and finance as its principal contributors. Third, the theoretical basis for the study of BPR mainly involves agency theory, transaction cost theory, and the resource-based view. Fourth, the financial crisis of 2008 explains an important part of current research priorities and trends among BPR scholars.
Eutrophication has become a top environmental issue for most lake ecosystems in the world and enhanced phosphorus (P) input is usually considered the primary stressor. Focused on the role of phosphorus in eutrophic lakes, a bibliometric approach was applied to quantitatively evaluate the main interests of research and trends in this area. Using data from the Science Citation Index Expanded database between 1900 and 2013, a total of 3,875 publications was returned by searching topic keywords. Spatial, temporal, and interactive characteristics of the articles, countries, and keywords are presented using time series, frequency, and co-occurrence analysis. Result shows that the annual publications on P in eutrophic lakes keep an exponential growth (R (2) = 0.93; p < 0.0001) over the last two decades, reflecting an increasing attraction in this area. However, publications of phosphorus research make up only 40 % of total records in eutrophic lakes, indicating that there are other significant topics in eutrophication problems of lakes. The USA is the largest output country in this area, contributing 23 % of the total articles, followed by China with a proportion of 15 %. However. China has replaced the USA as the largest output country in the world since 2011, but its citation per paper is significantly lower than the USA, indicating its' favor on quantity over quality. Based on international cooperation analysis, five regional groups were found, and the USA, the UK, P.R. China, Sweden, and German are the centers of their groups. The top 20 title keywords, author keywords and keywords plus were identified according to their frequency to assist our understanding of interests of research and modes. Surprisingly, nitrogen is a high co-occurrence keyword in this study, and its share of publications with P research in eutrophic lakes is increasing rapidly. Furthermore, the high correlation between P and N research in spatial distribution also indicates the increasing significance of N research in eutrophic lakes.
Many areas of academic and industrial work make use of the notion of a 'technology'. This paper attempts to reduce the ambiguity around the definition of what constitutes a 'technology' by extension of a method described previously that finds highly relevant patent sets for specified technological fields. The method relies on a less ambiguous definition that includes both a functional component and a component consisting of the underlying knowledge in a technological field to form a two-component definition. These two components form a useful definition of a technology that allows for objective, repeatable and thus comparable analysis of specific technologies. 28 technological domains are investigated: the extension of an earlier technique is shown to be capable of finding highly relevant and complete patent sets for each of the technologies. Overall, about 500,000 patents from 1976 to 2012 are classified into these 28 domains. The patents in each of these sets are not only highly relevant to the domain of interest but there are relatively low numbers of patents classified into any two of these domains (total patents classified in two domains are 2.9 % of the total patents and the great majority of patent class pairs have zero overlap with a few of the 378 patent class pairs containing the bulk of the doubly listed patents). On the other hand, the patents within a given domain cite patents in other domains about 90 % of the time. These results suggest that technology can be usefully decomposed to distinct units but that the inventions in these relatively tightly contained units depend upon widely spread additional knowledge.
We examine the sub-field of philosophy of science using a new method developed in information science, Referenced Publication Years Spectroscopy (RPYS). RPYS allows us to identify peak years in citations in a field, which promises to help scholars identify the key contributions to a field, and revolutionary discoveries in a field. We discovered that philosophy of science, a sub-field in the humanities, differs significantly from other fields examined with this method. Books play a more important role in philosophy of science than in the sciences. Further, Einstein's famous 1905 papers created a citation peak in the philosophy of science literature. But rather than being a contribution to the philosophy of science, their importance lies in the fact that they are revolutionary contributions to physics with important implications for philosophy of science.
This paper examined the coauthorship patterns of China's humanities and social sciences (HSS), based on articles and reviews covered by the Social Science Citation Index and the Arts and Humanities Citation Index of the Web of Science. We defined four types of coauthorship as: no collaboration (NOC), national collaboration (NAC), bilateral international collaboration (BIC) and multilateral international collaboration (MIC), and proposed the development phases of China's HSS as: 1978-1991, 1992-2000 and 2001-present. Accordingly, we explored the evolution of coauthorship patterns by a number of metrics. Findings include: (1) the coauthorship patterns of China's HSS significantly evolved from NOC to NAC, BIC and MIC; (2) China's major collaborators had not significantly varied over the past decade, in which USA had always taken the lead (among every four HSS articles of China, one was collaborated with USA); (3) pic (percentage of internationally coauthored articles) was negatively correlated to pnc (percentage of not cited articles); (4) MIC is 1.5 times the CPP (citation per publication) of BIC, 3 times of NAC and 4 times of NOC. Chinese government has been eagerly promoting economic development through science and technology. However, after over 30 years' growth miracle, Chinese government realized that China's HSS had been overshadowed, and then initiated prosperity plannings.
Most studies investigating individual achievement in criminology and criminal justice equate total publications with scholarly productivity. The current study sought to broaden the definition of scholarly productivity by incorporating empirical indices of the quantity and quality of scholarly productivity and applying these indices to both total and first author publications. Analyses performed using publication and citation data from the top 100 criminology and criminal justice scholars over the past 5 years revealed that the total number of publications was no substitute for an integrated (quantity and quality) assessment. Results further indicated that averaging across the total publication and first author integrated models seemed to provide the fairest and most balanced assessment of scholarly productivity. It was also noted that compared to non-theoreticians, theoreticians were more likely to publish first author articles and fared significantly better when evaluated against the first author integrated model than when evaluated against the total publications integrated model. Use of these models to assess scholarly productivity in criminology, criminal justice, and other fields may be warranted.
The majority of early exploratory webometrics studies have typically used simple network methods or multi-dimensional scaling to identify hyperlink or text-based relationships between collections of related academic websites. This paper uses unsupervised machine learning techniques to identify groups of computer science departments with similar interests through co-word occurrences in the homepages of the departmental research groups. The clustering results reflect inter-department research similarity reasonably well, at least as reflected online. This clustering approach may be useful for policy makers in identifying future collaborators with similar research interests or for monitoring research fields.
Research fronts represent areas of cutting-edge study in specific fields. They not only provide insights into current focuses and future trends, but also serve as important indicators for government policymaking with regard to technology. This study employed both bibliographic coupling and co-citation as methods to analyze the evolution of research fronts in the OLED field, and compared the outcomes in order to identify the differences between, and assess the effectiveness of, the two methods in detecting such research fronts. This study indicated that both analytic methods can be employed to track the evolution of research fronts. Compared with co-citation, bibliographic coupling identifies a higher number of research fronts, and detects the emergence of the research fronts earlier, thus showing better performance than co-citation in detecting research fronts.
We compared general and specialized databases, by searching bibliographic information regarding journal articles in the computer science field, and by evaluating their bibliographic coverage and the quality of the bibliographic records retrieved. We selected a sample of computer science articles from an Italian university repository (AIR) to carry out our comparison. The databases selected were INSPEC, Scopus, Web of Science (WoS), and DBLP. We found that DBLP and Scopus indexed the highest number of unique articles (4.14 and 4.05 % respectively), that each of the four databases indexed a set of unique articles, that 12.95 % of the articles sampled were not indexed in any of the databases selected, that Scopus was better than WoS for identifying computer science publications, and that DBLP had a greater number of unique articles indexed (19.03 %), when compared to INSPEC (11.28 %). We also measured the quality of a set of bibliographic records, by comparing five databases: Scopus, WoS, INSPEC, DBLP and Google Scholar (GS). We found that WoS, INSPEC and Scopus provided better quality indexing and better bibliographic records in terms of accuracy, control and granularity of information, when compared to GS and DBLP. WoS and Scopus also provided more sophisticated tools for measuring trends of scholarly publications.
Using bibliometric techniques, this work investigates the evolution of titles in economics research. It attempts to present a complete and accurate picture of systematic changes in the average character number, syllable number, word number and conceptual diversity in the titles over a long period of time. Based on a total of 338,866 academic paper titles in economics published between 1890 and 2012 from the EconLit and the Web of Knowledge, the economics titles were analyzed from the perspectives of social network, computational phonetics and conceptual diversity. The results showed that in the evolution of this discipline, authors were using increasingly more words for their paper titles and the conceptual diversity in paper titles underwent interesting periodic fluctuations over more than 100 years. The 1970s was a decade that achieved special prominence in conceptual diversity and relational complexity of titles.
Utilizing a unique dataset of the Chinese Academy of Sciences academicians (1993-2013), this paper investigates the Matthew effect in China's science. Three indicators, namely the concentration index, the Matthew index and the coefficient of variation, are adopted to measure the uneven distribution of academicians of the Chinese Academy of Sciences among different regions and disciplines. The empirical analysis demonstrates the existence of the Matthew effect in China's science for the above two dimensions. Yet, this effect has weakened for all regions with the exception of Beijing. We argue that this uneven distribution of the nation's brightest minds makes scientifically competitive regions and disciplines even more competitive while putting those less developed regions and research domains at further disadvantage.
The research streams of transition economies and emerging markets have some common ground, but yet differ. The goal of this study is to provide a better understanding of the commonalities and differences regarding trends and topics of this cross-disciplinary research area. We employ the novel method of topic models on a corpus of nearly 6,000 articles in more than 600 journals from 1995 to 2012 to identify 25 topics and analyze their trends and use across scope (transition or emerging), disciplines (business or economics) and geography (countries or regions).
The introduction of new research evaluation policies in most of the Eastern European (EE) countries was followed by the substantial growth in their (international) scientific productivity. The article starts with a brief review of the current research evaluation practice in EE countries and then explores the pattern of changes in international scientific production of 20 EE countries in the field of social sciences and humanities during 2004-2013. A new indicator named Journal Diversity Index (JDI) is suggested as a possible measure of sustainability and genuineness of the globalization of social sciences in EE countries. JDI represents the number of journals that account for 50 % of country's published articles, corrected for the total number of unique journals in which articles by the authors from all EE countries appear. The analysis has shown that EE countries with the lower JDI largely base their international scientific production on national journals covered by Web of Science (WoS). Those countries also have a lower average citation rate of articles. With the exception of Hungary and Poland, the "globalization" of EE social sciences still rely strongly on language, regional and cultural proximities. This is potentially harmful given the unstable status of EE journals in WoS. EE science policy institutions should take more responsibility in controlling the quality of national journals indexed in international databases. They should also be aware of significant differences in the coverage policies of Thomson Reuters and Elsevier and possible implications of those differences for the science evaluation practice.
This paper examines the varying prevalence of conflict of interest (COI), and "no conflict", statements on biomedical research papers, which are increasingly being required by journal editors. They are important as they may detract from the perceived objectivity of the results if the authors are in the pay of commercial companies. However, the frequency of these statements in the web of science (WoS) is only a few percent of the total number of biomedical papers. A survey of journal editors revealed that many COI statements are excluded from the WoS because they are printed separately from the acknowledgement section of the paper. One consequence of the appearance of COI statements on papers is that the WoS mistakenly includes companies who have given money to some of the researchers for unrelated work among the sponsors listed among the funding organizations, and this will distort the analysis of the funding of the research being reported in some of the papers and appears nearly to double companies' apparent tally of papers.
We compare estimates for past institutional research performances coming from two bibliometric indicators to the results of the UK's Research Assessment Exercise which last took place in 2008. We demonstrate that a version of the departmental h-index is better correlated with the actual results of that peer-review exercise than a competing metric known as the normalised citation-based indicator. We then determine the corresponding h-indices for 2008-2013, the period examined in the UK's Research Excellence Framework (REF) 2014. We place herewith the resulting predictions on the arXiv in advance of the REF results being published (December 2014). These may be considered as unbiased predictions of relative performances in that exercise. We will revisit this paper after the REF results are available and comment on the reliability or otherwise of these bibliometrics as compared with peer review.
DOI-i.e., Digital Object Identifier-is a character string, which univocally identifies entities that are object of intellectual property. In bibliometrics, DOIs are used for univocally identifying scientific papers. The aim of this short communication is to raise the reader's awareness of bibliometric database errors in DOI indexing, in particular, the incorrect assignment of a single DOI to multiple papers. This error is quite interesting since DOI is commonly regarded as an effective means to identify scientific articles unambiguously. For the purpose of example, a short list of DOIs, which have been wrongly assigned by the Scopus database to multiple papers, is shown. Although being relatively rare, DOI indexing errors should be considered by bibliometricians when querying bibliometric databases by DOI.
Informetrics and information retrieval (IR) represent fundamental areas of study within information science. Historically, researchers have not fully capitalized on the potential research synergies that exist between these two areas. Data sources used in traditional informetrics studies have their analogues in IR, with similar types of empirical regularities found in IR system content and use. Methods for data collection and analysis used in informetrics can help to inform IR system development and evaluation. Areas of application have included automatic indexing, index term weighting and understanding user query and session patterns through the quantitative analysis of user transaction logs. Similarly, developments in database technology have made the study of informetric phenomena less cumbersome, and recent innovations used in IR research, such as language models and ranking algorithms, provide new tools that may be applied to research problems of interest to informetricians. Building on the author's previous work (Wolfram in Applied informetrics for information retrieval research, Libraries Unlimited, Westport, 2003), this paper reviews a sample of relevant literature published primarily since 2000 to highlight how each area of study may help to inform and benefit the other.
This paper attempts to sketch the interrelation between information retrieval and scientometrics pointing at possible synergy effects provided by some recently developed bibliometric methods in the context of subject delineation and clustering. Examples of specific search strategies based on both traditional retrieval techniques and bibliometric methods are used to illustrate this approach. Special attention is paid to hybrid techniques and the use of 'core documents'. The latter ones are defined merely on the basis of bibliometric similarities, but have by definition properties that make 'core documents' also interesting and attractive for information retrieval.
In this position paper, we comment on various approaches to the delineation of scientific fields or domains, a typical prerequisite for a wide class of bibliometric studies. There is growing evidence that this meso-level, between micro targets of typical IR and large disciplines handled by macro-level bibliometric studies, takes full advantage of hybrid approaches. Firstly, delineation tasks gain to combine the a priori thinking of traditional IR, which typically involves clearly targeted expectations, and the a posteriori thinking of bibliometric mapping, where the decisions are built on external structuring of the domain in a wider context. The combination of the two ways of thought is far from new, with IR increasingly building on bibliometric networks for query expansion, and bibliometrics building on IR for evaluating and refining its outcomes. Secondly, delineation benefits from the multi-network perspective, which gives different representations of the scientific topics, usually all the more converging than the objects are dense and well separated. Focusing on two basic networks-words and citations-various sequences or combinations of operations are discussed. Bibliometrics and IR, especially when properly combined in multi-network approaches, provide an efficient toolbox for studies of domains delimitation. It should be recalled however that the context of such studies is often loaded with policy stakes that ask for cautious supervision and consultation processes.
We introduce a novel ranking of search results based on a variant of the h-index for directed information networks such as the Web. The h-index was originally introduced to measure an individual researcher's scientific output and influence, but here a variant of it is applied to assess the "importance" of web pages. Like PageRank, the "importance" of a page is defined by the "importance" of the pages linking to it. However, unlike the computation of PageRank which involves the whole web graph, computing the h-index for web pages (the hw-rank) is based on a local computation and only the neighbors of the neighbors of the given node are considered. Preliminary results show a strong correlation between ranking with the hw-rank and PageRank, and moreover its computation is simpler and less complex than computation of the PageRank. Further, larger scale experiments are needed in order to assess the applicability of the method.
We describe ongoing research where the aim is to apply recent results from the research field of information fusion to bibliometric analysis and information retrieval. We highlight the importance of 'uncertainty' within information fusion and argue that this concept is crucial also for bibliometrics and information retrieval. More specifically, we elaborate on three research strategies related to uncertainty: uncertainty management methods, explanation of uncertainty and visualization of uncertainty. We exemplify our strategies to the classical problem of author name disambiguation where we show how uncertainty can be modeled explained and visualized using information fusion. We show how an information seeker can benefit from tracing increases/decreases of uncertainty in the reasoning process. We also present how such changes can be explained for the information seeker through visualization techniques, which are employed to highlight the complexity involved in the process of modeling and managing uncertainty in bibliometric analysis. Finally we argue that a further integration of information fusion approaches in the research area of bibliometrics and information retrieval may results in new and fruitful venues of research.
Given a user-selected seed author, a unique experimental system called AuthorWeb can return the 24 authors most frequently co-cited with the seed in a 10-year segment of the Arts and Humanities Citation Index. The Web-based system can then instantly display the seed and the others as a Pathfinder network, a Kohonen self-organizing map, or a pennant diagram. Each display gives a somewhat different overview of the literature cited with the seed in a specialty (e.g., Thomas Mann studies). Each is also a live interface for retrieving (1) the documents that co-cite the seed with another user-selected author, and (2) the works by the seed and the other author that are co-cited. This article describes the Pathfinder and Kohonen maps, but focuses much more on AuthorWeb pennant diagrams, exhibited here for the first time. Pennants are interesting because they unite ego-centered co-citation data from bibliometrics, the TF*IDF formula from information retrieval, and Sperber and Wilson's relevance theory (RT) from linguistic pragmatics. RT provides a cognitive interpretation of TF*IDF weighting. By making people's inferential processes a primary concern, RT also yields insights into both topical and non-topical relevance, central matters in information science. Pennants for several authors in the humanities demonstrate these insights.
The increasing number of publications make searching and accessing the produced literature a challenging task. A recent development in bibliographic databases is to use advanced information retrieval techniques in combination with bibliographic means like citations. In this work we will present an approach that combines a cognitive information retrieval framework based on the principle of polyrepresentation with document clustering to enable the user to explore a collection more interactively than by just examining a ranked result list. Our approach uses information need representations as well as different document representations including citations. To evaluate our ideas we employ a simulated user strategy utilising a cluster ranking approach. We report on the possible effectiveness of our approach and on several strategies how users can achieve a higher search effectiveness through cluster browsing. Our results confirm that our proposed polyrepresentative cluster browsing strategy can in principle significantly improve the search effectiveness. However, further evaluations including a more refined user simulation are needed.
Models of science address statistical properties and mechanisms of science. From the perspective of scholarly information retrieval (IR) science models may provide some potential to improve retrieval quality when operationalized as specific search strategies or used for rankings. From the science modeling perspective, on the other hand, scholarly IR can play the role of a validation model of science models. The paper studies the applicability and usefulness of two particular science models for re-ranking search results (Bradfordizing and author centrality). The paper provides a preliminary evaluation study that demonstrates the benefits of using science model driven ranking techniques, but also how different the quality of search results can be if different conceptualizations of science are used for ranking.
The article considers whether Big Data, in the form of data-driven science, will enable the discovery, or appraisal, of universal scientific theories, instrumentalist tools, or inductive inferences. It points out, initially, that such aspirations are similar to the now-discredited inductivist approach to science. On the positive side, Big Data may permit larger sample sizes, cheaper and more extensive testing of theories, and the continuous assessment of theories. On the negative side, data-driven science encourages passive data collection, as opposed to experimentation and testing, and hornswoggling (unsound statistical fiddling). The roles of theory and data in inductive algorithms, statistical modeling, and scientific discoveries are analyzed, and it is argued that theory is needed at every turn. Data-driven science is a chimera.
In this paper we explored three areas: decision making and information seeking, the relationship between information seeking and uncertainty, and the role of expertise in influencing information use. This was undertaken in the context of a qualitative study into decision making in the initial stages of emergency response to major incidents. The research took an interpretive approach in which activity theory is used as an analytical framework. The research provides further evidence that the context of the activity and individual differences influence the choice of decision mode and associated information behavior. We also established that information is often not used to resolve uncertainty in decision making and indeed information is often sought and used after the decision is made to justify the decision. Finally, we point to the significance of both expertise and confidence in understanding information behavior. The contribution of the research to existing theoretical frameworks is discussed and a modified version of Wilson's problem-solving model is proposed.
The Internet facilitates the provision of accessible information to people with learning disabilities. However, problems with navigation and retrieval represent a barrier for this cohort. This article addresses one aspect of page design, testing whether a horizontal or vertical contents arrangement facilitates faster access to content for people with learning disabilities. Participants were timed as they looked for one-word dummy menu entries appearing in various locations along a horizontal or vertical grid. The words corresponded to images shown at random in a word-search type activity. Results were analyzed using mixed effects models. Results showed that mean search times increased as the position shifted from left to right and from top to bottom. Thus, participants undertook the test as if it were a reading exercise, despite the images appearing in the center of the page and the words appearing at random positions. The research also suggests that a horizontal menu may be more effective than a vertical one, with the most important links placed on the left. The propensity to imbibe information serially (word-for-word) rather than to skim or look globally has important website design implications.
Many discussions exist regarding the credibility of information on the Internet. Similar discussions happen on the interpretation of social scientific research data, for which information triangulation has been proposed as a useful method. In this article, we explore a design theoryconsisting of a kernel theory, meta-requirements, and meta-designsfor software and services that triangulate Internet information. The kernel theory identifies 5 triangulation methods based on Churchman's inquiring systems theory and related meta-requirements. These meta-requirements are used to search for existing software and services that contain design features for Internet information triangulation tools. We discuss a prototyping study of the use of an information triangulator among 72 college students and how their use contributes to their opinion formation. From these findings, we conclude that triangulation tools can contribute to opinion formation by information consumers, especially when the tool is not a mere fact checker but includes the search and delivery of alternative views. Finally, we discuss other empirical propositions and design propositions for an agenda for triangulator developers and researchers. In particular, we propose investment in theory triangulation, that is, tools to automatically detect ethically and theoretically alternative information and views.
We sought to understand how users interpret meanings of symbols commonly used in information systems, especially how icons are processed by the brain. We investigated Chinese and English speakers' processing of 4 types of visual stimuli: icons, pictures, Chinese characters, and English words. The goal was to examine, via functional magnetic resonance imaging (fMRI) data, the hypothesis that people cognitively process icons as logographic words and to provide neurological evidence related to human-computer interaction (HCI), which has been rare in traditional information system studies. According to the neuroimaging data of 19 participants, we conclude that icons are not cognitively processed as logographical words like Chinese characters, although they both stimulate the semantic system in the brain that is needed for language processing. Instead, more similar to images and pictures, icons are not as efficient as words in conveying meanings, and brains (people) make more effort to process icons than words. We use this study to demonstrate that it is practicable to test information system constructs such as elements of graphical user interfaces (GUIs) with neuroscience data and that, with such data, we can better understand individual or group differences related to system usage and user-computer interactions.
In this article, we investigate the application of entity type models in extractive multi-document summarization using automatic caption generation for images of geo-located entities (e.g., WestminsterAbbey) as an application scenario. Entity type models contain sets of patterns aiming to capture the ways geo-located entities are described in natural language. They are automatically derived from texts about geo-located entities of the same type (e.g., churches, lakes). We integrate entity type models into a multi-document summarizer and use them to address the 2 major tasks in extractive multi-document summarization: sentence scoring and summary composition. We experiment with 3 different representation methods for entity type models: signature words, n-gram language models, and dependency patterns. We evaluate the summarizer with integrated entity type models relative to (a) a summarizer using standard text-related features commonly used in text summarization and (b) the Wikipedia location descriptions. Our results show that entity type models significantly improve the quality of output summaries over that of summaries generated using standard summarization features and Wikipedia summaries. The representation of entity type models using dependency patterns is superior to the representations using signature words and n-gram language models.
This study examines whether there are some general trends across subject fields regarding the factors affecting the number of citations of articles, focusing especially on those factors that are not directly related to the quality or content of articles (extrinsic factors). For this purpose, from 6 selected subject fields (condensed matter physics, inorganic and nuclear chemistry, electric and electronic engineering, biochemistry and molecular biology, physiology, and gastroenterology), original articles published in the same year were sampled (n=230-240 for each field). Then, the citation counts received by the articles in relatively long citation windows (6 and 11 years after publication) were predicted by negative binomial multiple regression (NBMR) analysis for each field. Various article features about author collaboration, cited references, visibility, authors' achievements (measured by past publications and citedness), and publishing journals were considered as the explanatory variables of NBMR. Some generality across the fields was found with regard to the selected predicting factors and the degree of significance of these predictors. The Price index was the strongest predictor of citations, and number of references was the next. The effects of number of authors and authors' achievement measures were rather weak.
This paper proposes an approach to analyzing and prioritizing venture capital investments with the use of scientometric and patentometric indicators. The article highlights the importance of such investments in the development of technology-based companies and their positive impacts on the economic development of regions and countries. It also notes that the managers of venture capital funds struggle to objectify the evaluation of investment proposals. This paper analyzes the selection process of 10 companies, five of which received investments by the largest venture capital fund in Brazil and the other five of which were rejected by this same fund. We formulated scientometric and patentometric indicators related to each company and conducted a comparative analysis of each by considering the indicators grouped by the nonfinancial criteria (technology, market, and divestiture team) from analysis of the investment proposals. The proposed approach clarifies aspects of the criteria evaluated and contributes to the construction of a method for prioritizing venture capital investments.
To explore the possible facilitative role of the Internet in the process of research collaboration, this study endeavored to systematically compare the phenomenon of co-authorship and the impacts of co-authorship between pre-web and post-web stages in the field of information systems. Three hypotheses were proposed in this study. First, research collaboration increases in the post-web stage relative to the pre-web stage. Second, research collaboration is positively related to research impact, operationally defined as the number of citations. Lastly, the positive relationship between research collaboration and research impact is stronger in the post-web stage than that in the pre-web stage. Articles published in the field of information systems in both time periods were collected to test the hypotheses. The empirical results strongly support H1 and H2, showing that co-authorship increases in the post-web stage, and positively correlates with citations received by information systems articles. The positive effects of interdisciplinary collaborations and collaborations among multiple authors are enhanced in the post-web stage, but such enhancement is not found for international collaboration. H3 is partially supported.
This paper presents a methodological discussion of a study of tagging quality in subject indexing. The data analysis in the study was divided into 3 phases: analysis of indexing consistency, analysis of tagging effectiveness, and analysis of the semantic values of tags. To analyze indexing consistency, this study employed the vector space model-based indexing consistency measures. An analysis of tagging effectiveness with tagging exhaustivity and tag specificity was conducted to ameliorate the drawbacks of consistency analysis based on only the quantitative measures of vocabulary matching. To further investigate the semantic values of tags at various levels of specificity, a latent semantic analysis (LSA) was conducted. To test statistical significance for the relation between tag specificity and semantic quality, correlation analysis was conducted. This research demonstrates the potential of tags for web document indexing with a complete assessment of tagging quality and provides a basis for further study of the strengths and limitations of tagging.
Green information technology (IT) initiatives cannot be implemented in isolation if they are to have a significant and lasting impact on environmental sustainability. Instead, there is a need to harness the collective IT resources of the diverse stakeholders operating in the interorganizational business networks that characterize the contemporary business landscape. This, in turn, demands an appropriate leadership structure. However, the notion of "green leadership" has not received adequate research attention to date. Using a case study of green IT implementation at China Mobile, the world's largest mobile telecommunications provider, this study seeks to shed light on the underlying process through which green leadership is achieved and subsequently enacted to facilitate collective green IT initiatives. With its findings, this study presents a process theory that complements the dominant, internally-oriented perspective of green IT and provides practitioners with a useful reference for leveraging the collective IT resources of their network partners to contribute toward preserving the environment for future generations.
Although social network sites (SNS) users' privacy concerns cannot be completely removed by privacy policies and security safeguards, the user base of SNS is constantly expanding. To explain this phenomenon, we use the lens of the calculus of behavior within a cost-benefit framework suggesting privacy concerns as cost factors and behavior enticements as benefit factors and examine how the enticements operate against privacy concerns in users' cost-benefit calculus regarding disclosing personal information and using SNS continuously. Adopting social influence process theory, we examine three enticementsthe motivation of relationship management through SNS, the perceived usefulness of SNS for self-presentation, and the subjective social norms of using SNS. From a survey of 362 Facebook users who have disclosed personal information on Facebook, we find that the motivation of relationship management through SNS and the perceived usefulness of SNS for self-presentation lead users to disclose information but that subjective social norms do not, suggesting that the perceived benefit of behavior enticements should be assimilated into users' own value systems to truly operate as benefit factors. The results regarding the positive and negative effects of suggested benefit and cost factors on information disclosure show that only the combined positive effects of all three behavior enticements exceed the negative effect of privacy concerns, suggesting that privacy concerns can be offset only by multiple benefit factors.
The Internet Movie Database (IMDb) is one of the most-visited websites in the world and the premier source for information on films. Similar to Wikipedia, much of IMDb's information is user contributed. IMDb also allows users to voice their opinion on the quality of films through voting. We investigate whether there is a connection between user voting data and economic film characteristics. We perform distribution and correlation analysis on a set of films chosen to mitigate effects of bias due to the language and country of origin of films. Production budget, box office gross, and total number of user votes for films are consistent with double-log normal distributions for certain time periods. Both total gross and user votes are consistent with a double-log normal distribution from the late 1980s onward while for budget it extends from 1935 to 1979. In addition, we find a strong correlation between number of user votes and the economic statistics, particularly budget. Remarkably, we find no evidence for a correlation between number of votes and average user rating. Our results suggest that total user votes is an indicator of a film's prominence or notability, which can be quantified by its promotional costs.
The concept of h-index has been proposed to easily assess a researcher's performance with a single number. However, by using only this number, we lose significant information about the distribution of citations per article in an author's publication list. In this article, we study an author's citation curve and we define two new areas related to this curve. We call these "penalty areas", since the greater they are, the more an author's performance is penalized. We exploit these areas to establish new indices, namely Perfectionism Index and eXtreme Perfectionism Index (XPI), aiming at categorizing researchers in two distinct categories: "influentials" and "mass producers"; the former category produces articles which are (almost all) with high impact, and the latter category produces a lot of articles with moderate or no impact at all. Using data from Microsoft Academic Service, we evaluate the merits mainly of PI as a useful tool for scientometric studies. We establish its effectiveness into separating the scientists into influentials and mass producers; we demonstrate its robustness against self-citations, and its uncorrelation to traditional indices. Finally, we apply PI to rank prominent scientists in the areas of databases, networks and multimedia, exhibiting the strength of the index in fulfilling its design goal.
In this paper we propose a new technique to semantically analyze knowledge flows across countries by using publication and citation data. We start with the identification of research topics produced by a given source country. Then, we collect papers, published by the authors outside the source country, citing the identified research topics. At last, we group each set of citing papers separately to determine the scholarly impact of the actual identified research topics in the cited topics. The research topics are identified using our proposed topic model with distance matrix, an extension of classic Latent Dirichlet Allocation model. We also present a case study to illustrate the use of our proposed techniques in the subject area Energy during 2004-2009 using the Scopus database. We compare the Japanese and Chinese papers that cite the scientific literature produced by the researchers from the United States in order to show the difference in the use of same knowledge. The results indicate that Japanese researchers focus in the research areas such as efficient use of Photovoltaic, Energy Conversion and Superconductors (to produce low-cost renewable energy). In contrast with the Japanese researchers, Chinese researchers focus in the areas of Power Systems, Power Grids and Solar Cells production. Such analyses are useful for understanding the dynamics of the relevant knowledge flows across the nations.
In this study we tested the fruitfulness of advanced bibliometric methods for mapping subdomains in philosophy. The development of the number of publications on free will and sorites, the two subdomains treated in the study, over time was studied. We applied the cocitation approach to map the most cited publications, authors and journals, and we mapped frequently occurring terms, using a term co-occurrence approach. Both subdomains show a strong increase of publications in Web of Science. When we decomposed the publications by faculty, we could see an increase of free will publications also in social sciences, medicine and natural sciences. The multidisciplinary character of free will research was reflected in the cocitation analysis and in the term co-occurrence analysis: we found clusters/groups of cocited publications, authors and journals, and of co-occurring terms, representing philosophy as well as non-philosophical fields, such as neuroscience and physics. The corresponding analyses of sorites publications displayed a structure consisting of research themes rather than fields. All in all, both philosophers involved in this study acknowledge the validity of the various networks presented. Bibliometric mapping appears to provide an interesting tool for describing the cognitive orientation of a research field, not only in the natural and life sciences but also in philosophy, which this study shows.
The empirical and theoretical justification of Gartner "hype curves" is a very relevant open question in the field of Technological Life Cycle analysis. The scope of the present paper is to introduce a simple model describing the growth of scientific/technological research impact, in the specific case where science is the main source of a new idea driving a technological development, leading to "hype-type" evolution curves. The main idea of the model is that, in a first stage, the growth of the scientific interest of a new specific field (as can be measured by publication numbers) basically follows the classical "logistic" growth curve. At a second stage, starting at a later trigger time, the technological development based on that scientific idea (as can be measured by patent deposits) can be described as the integral (in a mathematical sense) of the first curve, since technology is based on the overall accumulated scientific knowledge. The model is preliminary tested through a bibliometric analysis of the publication and patent deposit rate for organic light emitting diodes scientific research and technology.
In this paper we have conducted a study that covers computer science publications from 1936 to 2010 in order to analyze the evolution of women in computing research. We have considered the computing conferences and journals that are available in the digital bibliography and library project database, which contains more than 1.5 million papers and more than 4 million authorships, corresponding to about 4,000 journals, conferences and workshops. We analyze the participation of women as the authors of publications, productivity and its relationship with the average research life of women in comparison to that of men, the gender distribution of conference and journal authorships depending on different computer science topics, and authors' behavior as regards collaborating with one gender and/or the other. We also detail the method used to obtain and validate the data. The results of our study have led us to some interesting conclusions concerning various aspects of the evolution of female authorship in computing research.
The structure and evolution of co-authorship networks have been extensively studied in literature. However, the studies on the co-authorship network in a specific interdisciplinary field may be complementary to the mainstream of existing works. In this paper, the interdisciplinary field of "evolution of cooperation", which has been prevalent in the last decades as a promising scientific frontier, is analyzed by extracting its co-authorship network mainly from Web of Science. The results show that the development of this field is characterized by the growth of a giant component of its collaboration network. Originally formed by assembling a few local clusters, the giant component has gradually evolved from a small cluster to a structure of "chained-communities", and then to a small-world structure. Through examining the degree distributions and analyzing the vulnerability, we uncover that the giant component is comprised of the "elite", the "middle-class" and the "grassroots", with respect to the nodes' degrees and their functions in structuring the giant component. Furthermore, the elite and the middle-class constitute a robust cohesive-core, which underpins the modular network of the giant component. The overall results of this work may illuminate more endeavors on the collaboration network in other interdisciplinary fields.
Despite increasing awareness of the need to trace the trajectory of innovation system research, so far little attention has been given to quantitative depiction of the evolution of this fast-moving research field. This paper uses CiteSpace to demonstrate visually intellectual structures and developments. The study uses citation analysis to detect and visualize disciplinary distributions, keyword co-word networks and journal cocitation networks, highly cited references, as well as highly cited authors to identify intellectual turning points, pivotal points and emerging trends, in innovation systems system research from 1975 to 2012.
This study uses several quantitative techniques to enable a multidimensional analysis of 47 key business journals by analyzing the scientific communication patterns and structural influences of these journals. Apart from using clustering techniques to establish research clusters in the Business domain, we apply a refined PageRank method by differentiating between the citation types to enable a cross-sectional evaluation of the selected journals. The results indicate that the five most influential journals are from Finance and Economics. The selected Finance journals are knowledge hubs and the selected Economics journals are knowledge sources when ISI's entire journal database is considered. However, within the Business domain, the selected Finance journals appear to be high impact knowledge hubs while the selected Economics journals appear to be high impact journals despite weak citation activity. All in all, such analyses are beneficial to scholars when selecting publication outlets to showcase their research, and to agencies such Financial Times and Bloomberg when selecting their journals basket for their annual journal evaluation exercises.
Academic evaluation committees have been increasingly receptive for using the number of published indexed articles, as well as citations, to evaluate the performance of scientists. It is, however, impossible to develop a stand-alone, objective numerical algorithm for the evaluation of academic activities, because any evaluation necessarily includes subjective preference statements. In a market, the market prices represent preference statements, but scientists work largely in a non-market context. I propose a numerical algorithm that serves to determine the distribution of reward money in Mexico's evaluation system, which uses relative prices of scientific goods and services as input. The relative prices would be determined by an evaluation committee. In this way, large evaluation systems (like Mexico's Sistema Nacional de Investigadores) could work semi-automatically, but not arbitrarily or superficially, to determine quantitatively the academic performance of scientists every few years. Data of 73 scientists from the Biology Institute of Mexico's National University are analyzed, and it is shown that the reward assignation and academic priorities depend heavily on those preferences. A maximum number of products or activities to be evaluated is recommended, to encourage quality over quantity.
Modeling distributions of citations to scientific papers is crucial for understanding how science develops. However, there is a considerable empirical controversy on which statistical model fits the citation distributions best. This paper is concerned with rigorous empirical detection of power-law behaviour in the distribution of citations received by the most highly cited scientific papers. We have used a large, novel data set on citations to scientific papers published between 1998 and 2002 drawn from Scopus. The power-law model is compared with a number of alternative models using a likelihood ratio test. We have found that the power-law hypothesis is rejected for around half of the Scopus fields of science. For these fields of science, the Yule, power-law with exponential cut-off and log-normal distributions seem to fit the data better than the pure power-law model. On the other hand, when the power-law hypothesis is not rejected, it is usually empirically indistinguishable from most of the alternative models. The pure power-law model seems to be the best model only for the most highly cited papers in "Physics and Astronomy". Overall, our results seem to support theories implying that the most highly cited scientific papers follow the Yule, power-law with exponential cut-off or log-normal distribution. Our findings suggest also that power laws in citation distributions, when present, account only for a very small fraction of the published papers (less than 1 % for most of science fields) and that the power-law scaling parameter (exponent) is substantially higher (from around 3.2 to around 4.7) than found in the older literature.
The object of the present study is the evaluation of the research quality of the three Greek chemical engineering departments (Athens, Thessaloniki, Patras) by means of several advanced bibliometric indices calculated separately for each academic using a twofold approach, namely in department and academic rank level. This allows the ranking of the studied departments, but also sheds light on the distribution of the research activity among the various ranks. In addition, to assess the research profile and background of the current faculty of the Greek chemical engineering departments in International context their research output is compared with that of Massachusetts chemical engineering department, Massachusetts Institute of Technology (MIT). Dependency of the bibliometric indices on seniority is also investigated, conducting the bibliometric analysis using a common time basis for all academics, i.e., research performance during the last decade. Available data are also used to investigate the temporal progress of the research productivity. Finally, gender distribution among the academics of the various ranks is also studied to explore the gender balance in research. In general, bibliometrics demonstrate that Patras department host academics of better quality, with higher scientific activity over the last decade, but superiority of MIT department against the Greek departments is also evident. Results also indicate that no common standards in hiring/promotion of academics are established between the departments. The negative impact of the European socio-economic crisis on the research productivity is also highlighted, while the university system suffers from unequal gender distribution with pronounced male dominance.
Salant (J Polit Econ 77(4):545-558, 1969) complained that on many occasions he found the writing of his fellow economists "nearly incomprehensible," and made suggestions to improve economists' writing skills (and, by extension, those of natural and social scientists in general). Among other things, he argued that good writers tend to use shorter words. We call this "the Salant hypothesis," and use standard statistical techniques to test this claim by comparing the average length of words used by Nobel laureates in their banquet speeches. We find that Literature laureates tend to use shorter words than laureates in other disciplines, and the difference is statistically significant. These results support Salant's idea that words should be used efficiently. This includes using short words instead of longer ones whenever possible. In short, good writing is also "economical writing.".
Under a set of reasonable assumptions, it is shown that all manuscripts submitted to any journal will ultimately be published, either by the first journal or by one of the following journals to which a manuscript is resubmitted. This suggests that low quality manuscripts may also be published, which further suggests that there may be too many journals.
Between 1991 and 2010, 45 scientists were honored with Nobel prizes in Physiology or Medicine. It is shown that these 45 Nobel laureates are separated, on average, by at most 2.8 co-authorship steps from at least one cross-disciplinary broker, defined as a researcher who has published co-authored papers both in some biomedical discipline and in some non-biomedical discipline. If Nobel laureates in Physiology or Medicine and their immediate collaborators can be regarded as forming the intuitive "center" of the biomedical sciences, then at least for this 20-year sample of Nobel laureates, the center of the biomedical sciences within the co-authorship graph of all of the sciences is closer to the edges of multiple non-biomedical disciplines than typical biomedical researchers are to each other.
This study identifies the top individual contributors to 31 LIS journals from 2007 to 2012, both worldwide (all disciplines) and among four groups: LIS faculty in the US and Canada, LIS faculty in the UK, LIS faculty in other countries, and librarians worldwide. The distribution of authorship is highly skewed. Although more than 9,800 authors (86.4 %) each contributed no more than a single article over the six-year period, the top 50 authors (0.4 %) each contributed eight or more articles, with an average of 13.0. Together, the top 50 authors account for nearly 8 % of the LIS literature. Moreover, the most productive LIS faculty are concentrated in relatively few universities. Faculty in the natural sciences and LIS are more likely to be found among the top 50 authors than their overall contributions would suggest, while librarians, computer scientists and non-academic authors are underrepresented. Top authors are especially likely to publish in the Journal of Informetrics and Scientometrics. Among American LIS faculty, the list of the most prolific authors has changed substantially over time. Only three of the top 21 authors of the 1999-2004 period can be found on the current top-20 list.
This study aims to explore the relationship between science and technology via analyzing cross citations between papers and patents in fuel cells. To calculate cross citation indicators, papers were retrieved from the WOS database and patent data from the USPTO during the period between 1991 and 2010, resulting in a total of 20,758 papers and 8112 patents. This study shows that there is a gradually increasing convergence between science and technology, particularly of science linkage in recent years. Papers citing patents are more time-sensitive than patents citing papers. Academic institutions are more likely to cite papers and patents published by other academic institutions. (c) 2015 Elsevier Ltd. All rights reserved.
This study reveals the roles of three tie-generative mechanisms, namely, assortative mixing, preferential attachment, and triadic closure, in forming citation links in journals through the exponential random graph modeling approach. The study also adopts a longitudinal design to examine how the roles of the three mechanisms evolve over time. The data involve citation exchanges in Internet research among 680 journals in 12 subject areas from 2000 to 2013. Assortative mixing by discipline and publication foci is a significant tie-generative mechanism in journal citation networks. The magnitude of as sortative mixing by discipline remains stable over time, whereas that by publication foci declines over time. Journals in Internet research demonstrate an increasing preference for influential journals to form citation links over time. The indegree of journals does not exert a significant impact on citation link formation among journals, whereas the outdegree of journals imposes a significantly negative impact on citation link formation among journals. Triadic closure is an important force that facilitates the formation of citation links among journals. The findings of this study improve our knowledge of the organizing principles that underlie journal citation networks and advance our understanding of the production process of scientific knowledge in Internet research. (c) 2015 Elsevier Ltd. All rights reserved.
Journal impact factors (JIFs) are widely used and promoted but have important limitations. In particular, JIFs can be unduly influenced by individual highly cited articles and hence are inherently unstable. A logical way to reduce the impact of individual high citation counts is to use the geometric mean rather than the standard mean in JIF calculations. Based upon journal rankings 2004-2014 in 50 sub-categories within 5 broad categories, this study shows that journal rankings based on JIF variants tend to be more stable over time if the geometric mean is used rather than the standard mean. The same is true for JIF variants using Mendeley reader counts instead of citation counts. Thus, although the difference is not large, the geometric mean is recommended instead of the arithmetic mean for future JIF calculations. In addition, Mendeley readership-based JIF variants are as stable as those using Scopus citations, confirming the value of Mendeley readership as an academic impact indicator. (c) 2015 Elsevier Ltd. All rights reserved.
Classically, unsupervised machine learning techniques are applied on data sets with fixed number of attributes ( variables). However, many problems encountered in the field of informetrics face us with the need to extend these kinds of methods in a way such that they may be computed over a set of nonincreasingly ordered vectors of unequal lengths. Thus, in this paper, some new dissimilarity measures (metrics) are introduced and studied. Owing to that we may use, e.g. hierarchical clustering algorithms in order to determine an input data set's partition consisting of sets of producers that are homogeneous not only with respect to the quality of information resources, but also their quantity. (c) 2015 Elsevier Ltd. All rights reserved.
The Italian National Scientific Qualification (ASN) was introduced in 2010 as part of a major reform of the national university system. Under the new regulation, the scientific qualification for a specific role (associate or full professor) and field of study is required to apply for a permanent professor position. The ASN is peculiar since it makes use of bibliometric indicators with associated thresholds as one of the parameters used to assess applicants. The first round of the ASN received 59,149 applications, and the results have been made publicly available for a short period of time, including the values of the quantitative indicators for each applicant. The availability of this wealth of information provides an opportunity to draw a fairly detailed picture of a nation-wide evaluation exercise, and to study the impact of the bibliometric indicators on the qualification results. In this paper, we provide a first account of the Italian National Scientific Qualification from a quantitative point of view. We show that significant differences exist among scientific disciplines, in particular with respect to the fraction of qualified applicants, that cannot be easily explained. Furthermore, we describe some issues related to the definition and use of the bibliometric indicators and the corresponding thresholds. Our analysis aims at drawing attention to potential problems that should be addressed by decision-makers in future rounds of the ASN. (c) 2015 Elsevier Ltd. All rights reserved.
This study investigates how scientific performance in terms of publication rate is influenced by the gender, age and academic position of the researchers. Previous studies have shown that these factors are important variables when analysing scientific productivity at the individual level. What is new with our approach is that we have been able to identify the relative importance of the different factors based on regression analyses (OLS) of each major academic field. The study, involving almost 12,400 Norwegian university researchers, shows that academic position is more important than age and gender. In the fields analysed, the regression model can explain 13.5-19 per cent of the variance in the publication output at the levels of individuals. This also means that most of the variance in publication rate is due to other factors. (C) 2015 The Authors. Published by Elsevier Ltd.
The basic indicators of a researcher's productivity and impact are still the number of publications and their citation counts. These metrics are clear, straightforward, and easy to obtain. When a ranking of scholars is needed, for instance in grant, award, or promotion procedures, their use is the fastest and cheapest way of prioritizing some scientists over others. However, due to their nature, there is a danger of oversimplifying scientific achievements. Therefore, many other indicators have been proposed including the usage of the PageRank algorithm known for the ranking of webpages and its modifications suited to citation networks. Nevertheless, this recursive method is computationally expensive and even if it has the advantage of favouring prestige over popularity, its application should be well justified, particularly when compared to the standard citation counts. In this study, we analyze three large datasets of computer science papers in the categories of artificial intelligence, software engineering, and theory and methods and apply 12 different ranking methods to the citation networks of authors. We compare the resulting rankings with selfcompiled lists of outstanding researchers selected as frequent editorial board members of prestigious journals in the field and conclude that there is no evidence of PageRank-based methods outperforming simple citation counts. (c) 2015 Elsevier Ltd. All rights reserved.
The first significant digit patterns arising from a mixture of uniform distributions with a random upper bound are revisited. A closed-form formula for its first significant digit distribution (FSD) is obtained. The one-parameter model of Rodriguez is recovered for an extended truncated Pareto mixing distribution. Considering additionally the truncated Erlang, gamma and Burr mixing distributions, and the generalized Benford law, for which another probabilistic derivation is offered, we study the fitting capabilities of the FSD's for various Benford like data sets from scientific research. Based on the results, we propose the general use of a fine structure index for Benford's law in case the data is well fitted by the truncated Erlang member of the uniform random upper bound family of FSD's. (C) 2015 Elsevier Ltd. All rights reserved.
There are two versions in the literature of counting co-author pairs. Whereas the first version leads to a two-dimensional (2-D) power function distribution; the other version shows three-dimensional (3-D) graphs, totally rotatable around and their shapes are visible in space from all possible points of view. As a result, these new 3-D computer graphs, called "Social Gestalts" deliver more comprehensive information about social network structures than simple 2-D power function distributions. The mathematical model of Social Gestalts and the corresponding methods for the 3-D visualization and animation of collaboration networks are presented in Part I of this paper. Fundamental findings in psychology/sociology and physics are used as a basis for the development of this model. The application of these new methods to male and to female networks is shown in Part II. After regression analysis the visualized Social Gestalts are rather identical with the corresponding empirical distributions (R-2 >0.99). The structures of female co-authorship networks differ markedly from the structures of the male co-authorship networks. For female co-author pairs' networks, accentuation of productivity dissimilarities of the pairs is becoming visible but on the contrary, for male co-author pairs' networks, accentuation of productivity similarities of the pairs is expressed. (c) 2015 The Authors. Published by Elsevier Ltd.
The theoretical approach of the mathematical model of Social Gestalts and the corresponding methods for the 3-D visualization and animation of collaboration networks are presented in Part I. The application of these new methods to male and female networks is shown in Part II. After regression analysis the visualized Social Gestalts are rather identically with the corresponding empirical distributions (R-2 >0.99). The structures of female co-authorship networks differ markedly from the structures of the male co-authorship networks. For female co-author pairs' networks, accentuation of productivity dissimilarities of the pairs is becoming visible but on the contrary, for male co-author pairs' networks, accentuation of productivity similarities of the pairs is expressed. (c) 2015 The Authors. Published by Elsevier Ltd.
Nearly a decade ago, the science community was introduced to the h-index, a proposed statistical measure of the collective impact of the publications of any individual researcher. Of course, any method of reducing a complex data set to a single number will necessarily have certain limitations and introduce certain biases. However, in this paper we point out that the definition of the h-index actually suffers from something far deeper: a hidden mathematical incompleteness intrinsic to its definition. In particular, we point out that one critical step within the definition of h has been missed until now, resulting in an index which only achieves its stated objectives under certain rather limited circumstances. For example, this incompleteness explains why the h-index ultimately has more utility in certain scientific subfields than others. In this paper, we expose the origin of this incompleteness and then also propose a method of completing the definition of h in a way which remains close to its original guiding principle. As a result, our "completed" h not only reduces to the usual h in cases where the h-index already achieves its objectives, but also extends the validity of the h-index into situations where it currently does not. (c) 2015 Elsevier Ltd. All rights reserved.
In general, scientometrics studies tend to focus on citations received from journals (incoming citations) and usually neglect references to journals (outgoing citations). The aim of this study is to suggest a new approach to the journal impact factor on a wider scale, i.e., from the viewpoint of citing journals. I studied how citations (references) given by JCR journals contribute to the 2-year and 5-year journal impact factors (JIF). To do so, data were obtained from the 2011 edition of JCR (Science Edition) available for universities in Spain, and the citing journal matrix for each journal was used. This matrix records the number of times articles published in other journals (cited journals) were cited in a given journal (citing journal) in 2011. The results showed that a set of 50 journals produced about 15% of all references that contributed to the 2-year JIF. Similarly, a set of 50 journals produced about 13% of all references that contributed to the 5-year JIF. A Bradford-like plot was obtained by plotting the cumulative number of references that contributed to the 2-year and 5-year JIF against the cumulative number of citing journals. The distribution of journals according to the number and percentage of references they contributed to the 2-year and 5-year JIF showed peaks. A rank-order distribution of references that contributed to the 2-year and 5 year JIF was obtained with a previously described empirical two-exponent equation. Based on the maximum contribution to the 2-year JIF of different 2-year rolling reference windows, the second rolling window (references to articles published 2 and 3 years before 2011) made the greatest contribution to impact in 41% of journals. (c) 2015 Elsevier Ltd. All rights reserved.
Evaluative bibliometrics compare the citation impact of researchers, research groups and institutions with each other across time scales and disciplines. Both factors, discipline and period - have an influence on the citation count which is independent of the quality of the publication. Normalizing the citation impact of papers for these two factors started in the mid-1980s. Since then, a range of different methods have been presented for producing normalized citation impact scores. The current study uses a data set of over 50,000 records to test which of the methods so far presented correlate better with the assessment of papers by peers. The peer assessments come from F1000Prime - a post-publication peer review system of the biomedical literature. Of the normalized indicators, the current study involves not only cited-side indicators, such as the mean normalized citation score, but also citing-side indicators. As the results show, the correlations of the indicators with the peer assessments all turn out to be very similar. Since F1000 focuses on biomedicine, it is important that the results of this study are validated by other studies based on datasets from other disciplines or (ideally) based on multi-disciplinary datasets. (c) 2015 Elsevier Ltd. All rights reserved.
At global and local levels, we are observing an increasing range and rate of disease outbreaks that show evidence of jumping from animals to humans, and from food to humans. Zoonotic infections (i.e. Hendra, swine flu, anthrax) affect animal health and can be deadly to humans. The increasing rate of outbreaks of infectious diseases transferring from animals to humans (i.e. zoonotic diseases) necessitates detailed understanding of the education, research and practice of animal health and its connection to human health. These emerging microbial threats underline the need to exploring the evolutionary dynamics of zoonotic research across public health and animal health. This study investigates the collaboration network of different countries engaged in conducting zoonotic research. We explore the dynamics of this network from 1980 to 2012 based on large scientific data developed from Scopus. In our analyses, we compare several properties of the network including density, clustering coefficient, giant component and centrality measures over time. We also map the network over different time intervals using VOSviewer. We analyzed 5182 publication records. We found United States and United Kingdom as the most collaborative countries working with 110 and 74 other countries in 1048 and 599 cases, respectively. Our results show increasing close collaboration among scientists from the United States, several European countries including United Kingdom, Italy, France, Netherland, Switzerland, China and Australia with scientists from other parts of the world.
Ethanol obtained from the conversion process of different types of biomass is a renewable source of fuel and since 2010 it has been classified as an "advanced fuel" by the EPA, due to its contribution to the reduction of the impacts of GHG emissions. Recent literature stresses the importance of the use of second-generation fuels to reduce the impacts of the direct and indirect use of land, mostly on agricultural prices. Although these demands constitute a clear clue to R&D activities, there are an impressive number of alternatives, regarding different kinds of biomass, processes and byproducts, a complex matrix of technological opportunities and the demands that generates a clear incentive for collaboration. This paper uses both the Bibliometry and Scientometry approach and the Innovation System (IS) literature under the perspective of Social Networks Analysis (SNA) to build Collaborative Networks (CNs) to the second-generation ethanol (lignocellulosic) using ISI Web of Science database. The adopted procedure emerges once authors, countries and institutions related to bioenergy have incentives to share information in the process of creating a new role in partnership-a network point-of-view. The results show that the United States is in a better position than other countries, improving the role of the university in their IS while China proves to be a great ally of the United States regarding the production of technology to produce lignocellulosic ethanol. Brazil however, does not appear well placed in the network, despite being the second largest producer of first-generation ethanol in the world.
The subscription prices of peer-reviewed journals have in the past not been closely related to the scientific quality. This relationship has been further obscured by bundled e-licenses. The situation is different for Open Access (OA) journals that finance their operations via article processing charges (APCs). Due to competition and the fact that authors are often directly involved in making APC payments from their own or other limited funds, APC pricing has so far been sensitive to the quality and services offered by journals. We conducted a systematic survey of prices charged by OA journals indexed in Scopus and this revealed a moderate (0.40) correlation between the APCs and Source Normalized Impact per Paper values, a measure of citation rates. When weighted by article volumes the correlations between the quality and the price were significantly higher (0.67). This would seem to indicate that while publishers to some extent take the quality into account when pricing their journals, authors are even more sensitive to the relationship between price and quality in their choices of where to submit their manuscripts.
Global landscape of scientific activity is changing and becoming more diverse with emerging economies particularly China redrawing the contours of scientific research in the twenty-first century. Research publications, the most cherished output of science, provides robust evidence of this changing landscape. The global publication share of advanced scientific countries is decreasing with significant rise in publication share of China and also of other emerging economies such as India, South Korea, and Brazil. Their publications though are still lagging in global reception as measured through citations. However, with increasing international collaboration and publishing in promising areas and high impact journals, the citation reception of their papers is increasing. Indian publication growth is much behind China whose growth has been dramatic! However, India's emergence is interesting as from a leading country among developing economies in scientific publications till early 1980s, her publication growth exhibited sharp decline in the late 1980s. Only from 1995 onwards India started making an assertion in the global publication race and in some promising areas of high relevance such as nanotechnology her publication growth has been impressive. India to a large extent epitomises the scientific activity of emerging economies. Thus through the lens of India's publication trend, the paper underscores the changing global landscape of science. To place India's publishing activity in proper context, the paper broadly examines the publication activity of some advanced OECD countries and BRICKS (Brazil, Russia, India, China, South Korea and South Africa) countries. Implications of this study are discussed.
Topic models are a well known clustering approach for textual data, which provides promising applications in the bibliometric context for the purpose of discovering scientific topics and trends in a corpus of scientific publications. However, topic models per se provide poorly descriptive metadata featuring the discovered clusters of publications and they are not related to the other important metadata usually available with publications, such as authors affiliation, publication venue, and publication year. In this paper, we propose a methodological approach to topic modeling and post-processing of topic models results to the end of describing in depth a field of research over time. In particular, we work on a selection of publications from the international statistical literature, we propose an approach that allows us to identify sophisticated topic descriptors, and we analyze the links between topics and their temporal evolution.
Scientometric evaluation of nanoscience/nanotechnology requires complex search strategies and lengthy queries which retrieve massive amount of information. In order to offer some insight based on the most frequently occurring terms our research focused on a limited amount of data, collected on uniform principles. The prefix nano comes about in many different compound words thus offering a possibility for such assessment. The aim is to identify the scatter of nanoconcepts, among and within journals, as well as more generally, in the Web of Science (WOS). Ten principal journals were identified along with all unique nanoterms in article titles. Such terms occur on average in half of all titles. Terms were thoroughly investigated and mapped by lemmatization or stemming to the appropriate roots-nanoconcepts. The scatter of concepts follows the characteristics of power laws, especially Zipf's law, exhibiting clear inversely proportional relationship between rank and frequency. The same three nanoconcepts are most frequently occurring in as many as seven journals. Two concepts occupy the first and the second rank in six journals. The same six concepts are the most frequently occurring in ten journals as well as full WOS database, representing almost two thirds of all nanotitled articles, in both instances. Subject categories don't play a decisive role. Frequency falls progressively, quickly producing a long tail of rare concepts. Drop is almost linear on the log scale. The existence of hundreds of different closed-form compound nanoterms has consequences for the retrieval on the Internet search engines (e.g. Google Scholar) which do not permit truncation.
We proposed in this study to use anomaly detection models to discover research trends. The application was illustrated by applying a rule-based anomaly detector (WSARE), which was typically used for biosurveillance purpose, in the research trend analysis in social computing research. Based on articles collected from SCI-EXPANDED and CPCI-S databases during 2000 to 2013, we found that the number of social computing studies went up significantly in the past decade, with computer science and engineering among the top important subjects. Followed by China, USA was the largest contributor for studies in this field. According to anomaly detected by the WSARE, social computing research gradually shifted from its traditional fields such as computer science and engineering, to the fields of medical and health, and communication, etc. There was an emerging of various new subjects in recent years, including sentimental analysis, crowd-sourcing and e-health. We applied an interdisciplinary network evolution analysis to track changes in interdisciplinary collaboration, and found that most subject categories closely collaborate with subjects of computer science and engineering. Our study revealed that, anomaly detection models had high potentials in mining hidden research trends and may provided useful tools in the study of forecasting in other fields.
In the context of scientometrics and bibliometrics, several research fields are dealing with bibliographic data. In this paper, we will explore how the combination of online analytical processing (OLAP) analysis and information networks could be an interesting issue. In Business Intelligence, OLAP is a technology supported by data warehousing systems. It provides tools for analyzing data according to multiple dimensions and multiple hierarchical levels. At the same time, several information networks (co-authors network, citations network, institutions network, etc.) can be built based on bibliographic databases. Originally, OLAP was introduced to analyze structured data. However, in this paper, we wonder if, by combining OLAP and information networks, we can provide a new way of analyzing bibliographic data. OLAP should be able to handle information networks and be also useful for monitoring, browsing and analyzing the content and the structure of bibliographic networks. The goal of this survey paper is to review previous work on OLAP and information networks dealing with bibliographic data. We also propose a comparison between traditional OLAP and OLAP on information networks and discuss the challenges OLAP faces regarding bibliographic networks.
Studies on publication and citation scores tend to focus mostly on frequently published and cited scholars. This paper contributes to advancing knowledge by simultaneously looking into both high and low performing scholars, including non-publishing scholars, and by focusing on factors increasing or impeding scholarly performances. To this end, two complementary sources of data are used: (1) data from ISI web of science on publications and citations of scholars from 35 Canadian business schools and, and (2) survey data on factors explaining the productivity and impact performances of these scholars. The analysis of the data reveals five scholar profiles: (i) non-publishing scholars; (ii) low performing scholars; (iii) frequently publishing scholars; (iv) frequently cited scholars and; (v) high-impact frequently publishing scholars. Statistical modeling is then used to look into factors that explain why scholars are any of these performance configuration rather another. Two major results emerge: first, scholars in the low performing profile differ from those in the non-publishing profile only by being in top tier universities and by having high levels of funding from research councils. Second, scholars who publish frequently and are frequently cited differ from those in the low performing profile in many ways: they are full professors, they dedicate more time to their research activities, they receive all their research funding from research councils, and, finally, they are located in top tier universities. The last part of the paper discusses policy implications for the development of research skills by university managers willing to increase the publication and citation scores of their faculty members.
In this paper, we explore the longitudinal research collaboration network of 'mammography performance' over 30 years by creating and analysing a large collaboration network data using Scopus. The study of social networks using longitudinal data may provide new insights into how this collaborative research evolve over time as well as what type of actors influence the whole network in time. The methods and findings presented in this work aim to assist identifying key actors in other research collaboration networks. In doing so, we apply a rank aggregation technique to centrality measures in order to derive a single ranking of influential actors. We argue that there is a strong correlation between the level of degree and closeness centralities of an actor and its influence in the research collaboration network (at macro/country level).
Ethnobiology is a clearly interdisciplinary field, with several connections to other research approaches, such as studies examining traditional ecological knowledge (TEK). The central question investigated is if Brazilian studies are disproportionately attached to the prefix "ethno" when compared to the profiles of other countries with high contributions to these scientific fields. I used a bibliometric review to investigate this question and discussed several outcomes of the resulting patterns. I retrieved 8470 articles, 6117 using keywords associated with TEK and 2954 using keywords associated with ethnobiology and related subfields. A unique scenario emerges only for Brazil, where there is a stronger attachment to the ethno prefix than the rest of the world, which reflects the history of these scientific approaches and the context of scientific production.
In this study, we compare the difference in the impact between open access (OA) and non-open access (non-OA) articles. 1761 Nature Communications articles published from 1 January 2012 to 31 August 2013 are selected as our research objects, including 587 OA articles and 1174 non-OA articles. Citation data and daily updated article-level metrics data are harvested directly from the platform of nature.com. Data is analyzed from the static versus temporal-dynamic perspectives. The OA citation advantage is confirmed, and the OA advantage is also applicable when extending the comparing from citation to article views and social media attention. More important, we find that OA papers not only have the great advantage of total downloads, but also have the feature of keeping sustained and steady downloads for a long time. For article downloads, non-OA papers only have a short period of attention, when the advantage of OA papers exists for a much longer time.
Being able to effectively measure similarity between patents in a complex patent citation network is a crucial task in understanding patent relatedness. In the past, techniques such as text mining and keyword analysis have been applied for patent similarity calculation. The drawback of these approaches is that they depend on word choice and writing style of authors. Most existing graph-based approaches use common neighbor-based measures, which only consider direct adjacency. In this work we propose new similarity measures for patents in a patent citation network using only the patent citation network structure. The proposed similarity measures leverage direct and indirect co-citation links between patents. A challenge is when some patents receive a large number of citations, thus are considered more similar to many other patents in the patent citation network. To overcome this challenge, we propose a normalization technique to account for the case where some pairs are ranked very similar to each other because they both are cited by many other patents. We validate our proposed similarity measures using US class codes for US patents and the well-known Jaccard similarity index. Experiments show that the proposed methods perform well when compared to the Jaccard similarity index.
In order to efficiently allocate academic resources, an awareness of the properties of the underlying production function's returns to scale is of crucial importance. For instance, the question arises as to what extent an expansion of a university department's academic staff would be advisable in order to utilize increasing marginal gains of research production. On the other hand, it is disputable whether an optimal university department size exists. Empirical studies covering these questions render various answers. In this paper, we analyse which properties of returns to scale the Business Administration research of universities in Germany exhibits. On the basis of research data from 2001 until 2009 provided by the Centre for Higher Education, and using Data Envelopment Analysis, we demonstrate that typically sized business schools show nearly constant returns to scale. Furthermore, we observe tendencies of decreasing returns to scale for large-sized business schools. Diverse robustness and sensitivity analyses confirm the validity of our inferred empirical findings.
This article analyzes "happiness studies" as an emerging field of inquiry throughout various scientific disciplines and research areas. Utilizing four operationalized search terms in the Web of Science; "happiness", "subjective well-being", "life satisfaction" and "positive affect", a dataset was created for empirical citation analysis. Combined with qualitative interpretations of the publications, our results show how happiness studies has developed over time, in what journals the citing papers have been published, and which authors and researchers are the most productive within this set. We also trace various trends in happiness studies, such as the social indicators movement, the introduction of positive psychology and various medical and clinical applications of happiness studies. We conclude that "happiness studies" has emerged in many different disciplinary contexts and progressively been integrated and standardized. Moreover, beginning at the turn of the millennium, happiness studies has even begun to shape an autonomous field of inquiry, in which happiness becomes a key research problem for itself. Thus, rather than speaking of a distinct "happiness turn", our study shows that there have been many heterogeneous turns to happiness, departing in a number of different disciplines.
Forward citations are widely recognized as a useful measure of the impact of patents upon subsequent technological developments. However, an inherent characteristic of forward citations is that they take time to accumulate. This makes them valuable for retrospective impact evaluations, but less helpful for prospective forecasting exercises. To overcome this, it would be desirable to have indicators that forecast future citations at the time a patent is issued. In this paper, we outline one such indicator, based on the size of the inventor teams associated with patents. We demonstrate that, on average, patents with eight or more co-inventors are cited significantly more frequently in their first 5 years than peer patents with fewer inventors. This result holds true across technologies, assignee type, citation source (examiner versus applicant), and after self-citations are accounted for. We hypothesize that inventor team size may be a reflection of the amount of resources committed by an organization to a given innovation, with more researchers attached to innovations regarded as having particular promise or value.
The aim of this study was to analyse the scientific productivity of the BRIC countries (Brazil, Russia, India and China) in viticulture and oenology through bibliometric analyses of articles in the Science Citation Index Expanded database for the period 1993-2012. A total of 1067 research articles were published in 363 domestic and international journals. We highlight important growth during the mentioned period in the published research papers, particularly in China and Brazil over the last 5 years. Papers have been published in numerous journals in a number of subject areas, such as Revista Brasileira de Fruticultura and Pesquisa Agropecuaria Brasileira, which are the most productive among the BRIC countries. A social network analysis of collaboration between each of the four BRIC countries was also performed.
Given the development in modern science and technology, scientists need interdisciplinary knowledge and collaborations. In the National Natural Science Foundation of China (NSFC), more than 59 % of individuals change their disciplinary application codes to pursue interdisciplinary applications for scientific funding. An algorithm that classifies interdisciplinary applications and calculates the diversity of individual research disciplines (DIRD) is proposed based on three-level disciplinary application codes. Using a sample of 37,330 unique individuals at the NSFC from 2000 to 2013, this research analyzed the DIRD of all sponsored individuals and found that DIRDs differ significantly among scientific departments, research areas, and universities. Sponsored individuals prefer not to engage in cross-research-fields or interdisciplinary applications. In addition, top-class universities in China exhibit stronger ability to carry out interdisciplinary research than do other universities. This thorough investigation of interdisciplinary applications in a scientific foundation provides new insights in managing scientific funding.
There has been an increase in research published on information behavior in recent years, and this has been accompanied by an increase in its diversity and interaction with other fields, particularly information retrieval. The aims of this study are to determine which researchers have contributed to producing the current body of knowledge on this subject, and to describe its intellectual basis. A bibliometric and network analysis was applied to authorship and co-authorship as well as citation and co-citation. According to these analyses, there is a small number of authors who can be considered to be the most productive and who publish regularly, and a large number of transient ones. Other findings reveal a marked predominance of theoretical works, some examples of qualitative methodology that originate in other areas of social science, and a high incidence of research focused on the user interaction with information retrieval systems and the information behavior of doctors.
In order to distinguish the research focus between different Library and Information Science (LIS) research institutions in China, we use the Keyword Activity Index (KAI) to identify their institution-specific keywords. The KAI, whose idea is borrowed from the Activity Index, measures whether an institution has alternatively comparative advantage in a particular topic according to its share in publications. In this study, a total of 65,653 papers from 19 core LIS journals in China during the period of 2000-2013 are collected. The top 8 most prolific LIS research institutions in China are selected for further investigation of the utility of KAI. Their institution-specific keywords are extracted based on the KAI values to represent their research focus and then clustered using co-word analysis; the research advantages of each institution are analyzed and compared according to these clusters. The reasons of their research advantages are analyzed based on their research function and research background.
Previous studies have provided inconsistent evidence pertaining to the relationship between university-industry collaboration and university performance. This study's objective is to go beyond traditional viewpoints, which mostly confine university-industry collaboration within a separate channel, to build the relationship between university-industry collaboration overall channel characteristics and university research performance. In doing so, we define two collaboration strategies, collaboration breadth, which is the scope of different channels, and collaboration depth, which is the extent that universities deepen into different channels. Based on a comprehensive panel dataset of Chinese universities in mainland China in 2009-2013, we find that collaboration breadth and collaboration depth have a linear and curvilinear effect on academic research performance, respectively. Moreover, the interaction of collaboration breadth and depth shows a negative impact on academic research performance.
ResearchGate is a social network site for academics to create their own profiles, list their publications, and interact with each other. Like Academia.edu, it provides a new way for scholars to disseminate their work and hence potentially changes the dynamics of informal scholarly communication. This article assesses whether ResearchGate usage and publication data broadly reflect existing academic hierarchies and whether individual countries are set to benefit or lose out from the site. The results show that rankings based on ResearchGate statistics correlate moderately well with other rankings of academic institutions, suggesting that ResearchGate use broadly reflects the traditional distribution of academic capital. Moreover, while Brazil, India, and some other countries seem to be disproportionately taking advantage of ResearchGate, academics in China, South Korea, and Russia may be missing opportunities to use ResearchGate to maximize the academic impact of their publications.
Although there are a number of social networking services that specifically target scholars, little has been published about the actual practices and the usage of these so-called academic social networking services (ASNSs). To fill this gap, we explore the populations of academics who engage in social activities using an ASNS; as an indicator of further engagement, we also determine their various motivations for joining a group in ASNSs. Using groups and their members in Mendeley as the platform for our case study, we obtained 146 participant responses from our online survey about users' common activities, usage habits, and motivations for joining groups. Our results show that (a) participants did not engage with social-based features as frequently and actively as they engaged with research-based features, and (b) users who joined more groups seemed to have a stronger motivation to increase their professional visibility and to contribute the research articles that they had read to the group reading list. Our results generate interesting insights into Mendeley's user populations, their activities, and their motivations relative to the social features of Mendeley. We also argue that further design of ASNSs is needed to take greater account of disciplinary differences in scholarly communication and to establish incentive mechanisms for encouraging user participation.
This paper furthers the development of methods to distinguish truth from deception in textual data. We use rhetorical structure theory (RST) as the analytic framework to identify systematic differences between deceptive and truthful stories in terms of their coherence and structure. A sample of 36 elicited personal stories, self-ranked as truthful or deceptive, is manually analyzed by assigning RST discourse relations among each story's constituent parts. A vector space model (VSM) assesses each story's position in multidimensional RST space with respect to its distance from truthful and deceptive centers as measures of the story's level of deception and truthfulness. Ten human judges evaluate independently whether each story is deceptive and assign their confidence levels (360 evaluations total), producing measures of the expected human ability to recognize deception. As a robustness check, a test sample of 18 truthful stories (with 180 additional evaluations) is used to determine the reliability of our RST-VSM method in determining deception. The contribution is in demonstration of the discourse structure analysis as a significant method for automated deception detection and an effective complement to lexicosemantic analysis. The potential is in developing novel discourse-based tools to alert information users to potential deception in computer-mediated texts.
In recent years, technological innovation has re-ignited an interest in privacy as designers, policy makers, and users each strive to reconcile the advantages of technology with the new demands they pose for privacy. Driven by a classic approach to defining concepts, scholars have not been able to agree on a unified definition of privacy. This poses a barrier to those who seek to implement privacy through their decisions. A critical component of their work involves anticipating and responding to potential privacy risks. In choosing one definition over another, practitioners might be missing nuanced contextual overlaps that bear on privacy and thus bias their subsequent decisions. For these practical endeavors, it is important to adopt an inclusive and rich definition. Such a definition should also be responsive to how those affected by decisions that might compromise privacy, namely citizens and technology users, conceive of privacy. The present paper applies a prototype perspective on privacy that acknowledges the fuzziness of concepts and goes on to develop such a definition in a series of empirical studies. The relevance of the privacy prototype is then explored as it applies to privacy theorists, practitioners, and methodologists, suggesting new avenues for future research.
This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Contextual relationships of URL categories are learned by the hidden Markov model. The top-level domains (TLDs) extracted from URLs themselves and the corresponding categories are caught by the TLD model. Given a URL to be predicted, its TLD and current context are empirically combined in an aggregation model. In addition to the uses of the current context, the predictions of the URL accessed previously in different contexts by various users are also considered by majority rule to improve the aggregation model. Large-scale experiments show that the advanced aggregation approach achieves promising performance while maintaining an acceptably low false positive rate. Different strategies are introduced to integrate the model with the blacklist it generates for filtering objectionable web pages without analyzing their content. In practice, this is complementary to the existing content analysis from users' behavioral perspectives.
The article reports a field study investigating the temporality of the information behavior of 44 grade 8 students from initiation to completion of their school inquiry-based history project. The conceptual framework for the study is Kuhlthau's 6-stage information-search process (ISP) model. The objective of the study is to test and extend ISP model concepts. As per other ISP model studies, our study measured the evolution of the feelings, thoughts, and actions of the study participants over the 3-month period of their class project. The unique feature of this study is the unlimited access the researchers had to a real-life history class, resulting in 12 separate measuring periods. We report 2 important findings of the study. First, through factor analysis, we determined 5 factors that define the temporality of completing an inquiry-based project for these grade 8 students. The second main finding is the importance of the students' consultations with their classmates, siblings, parents, and teachers in the construction of the knowledge necessary to complete their project.
Query-biased summaries help users to identify which items returned by a search system should be read in full. In this article, we study the generation of query-biased summaries as a sentence ranking approach, and methods to evaluate their effectiveness. Using sentence-level relevance assessments from the TREC Novelty track, we gauge the benefits of query expansion to minimize the vocabulary mismatch problem between informational requests and sentence ranking methods. Our results from an intrinsic evaluation show that query expansion significantly improves the selection of short relevant sentences (5-13 words) between 7% and 11%. However, query expansion does not lead to improvements for sentences of medium (14-20 words) and long (21-29 words) lengths. In a separate crowdsourcing study, we analyze whether a summary composed of sentences ranked using query expansion was preferred over summaries not assisted by query expansion, rather than assessing sentences individually. We found that participants chose summaries aided by query expansion around 60% of the time over summaries using an unexpanded query. We conclude that query expansion techniques can benefit the selection of sentences for the construction of query-biased summaries at the summary level rather than at the sentence ranking level.
User domain knowledge affects search behaviors and search success. Predicting a user's knowledge level from implicit evidence such as search behaviors could allow an adaptive information retrieval system to better personalize its interaction with users. This study examines whether user domain knowledge can be predicted from search behaviors by applying a regression modeling analysis method. We identify behavioral features that contribute most to a successful prediction model. A user experiment was conducted with 40 participants searching on task topics in the domain of genomics. Participant domain knowledge level was assessed based on the users' familiarity with and expertise in the search topics and their knowledge of MeSH (Medical Subject Headings) terms in the categories that corresponded to the search topics. Users' search behaviors were captured by logging software, which includes querying behaviors, document selection behaviors, and general task interaction behaviors. Multiple regression analysis was run on the behavioral data using different variable selection methods. Four successful predictive models were identified, each involving a slightly different set of behavioral variables. The models were compared for the best on model fit, significance of the model, and contributions of individual predictors in each model. Each model was validated using the split sampling method. The final model highlights three behavioral variables as domain knowledge level predictors: the number of documents saved, the average query length, and the average ranking position of the documents opened. The results are discussed, study limitations are addressed, and future research directions are suggested.
Using Scopus data, we construct a global map of science based on aggregated journal-journal citations from 1996-2012 (N of journals=20,554). This base map enables users to overlay downloads from Scopus interactively. Using a single year (e.g., 2012), results can be compared with mappings based on the Journal Citation Reports at the Web of Science (N=10,936). The Scopus maps are more detailed at both the local and global levels because of their greater coverage, including, for example, the arts and humanities. The base maps can be interactively overlaid with journal distributions in sets downloaded from Scopus, for example, for the purpose of portfolio analysis. Rao-Stirling diversity can be used as a measure of interdisciplinarity in the sets under study. Maps at the global and the local level, however, can be very different because of the different levels of aggregation involved. Two journals, for example, can both belong to the humanities in the global map, but participate in different specialty structures locally. The base map and interactive tools are available online (with instructions) at http://www.leydesdorff.net/scopus_ovl.
Understanding the knowledge-diffusion networks of patent inventors can help governments and businesses effectively use their investment to stimulate commercial science and technology development. Such inventor networks are usually large and complex. This study proposes a multidimensional network analysis framework that utilizes Exponential Random Graph Models (ERGMs) to simultaneously model knowledge-sharing and knowledge-transfer processes, examine their interactions, and evaluate the impacts of network structures and public funding on knowledge-diffusion networks. Experiments are conducted on a longitudinal data set that covers 2 decades (1991-2010) of nanotechnology-related US Patent and Trademark Office (USPTO) patents. The results show that knowledge sharing and knowledge transfer are closely interrelated. High degree centrality or boundary inventors play significant roles in the network, and National Science Foundation (NSF) public funding positively affects knowledge sharing despite its small fraction in overall funding and upstream research topics.
Since the adoption of faceted search in a small number of academic libraries in 2006, faceted search interfaces have gained popularity in academic and public libraries. This article clarifies whether faceted search improves the interactions between searchers and library catalogs and sheds light on ways that facets are used during a library search. To study searchers' behaviors in natural situations, we collected from the servers a data set with more than 1.5 million useful search logs. Logs were parsed, statistically analyzed, and manually studied using visualization tools to gain a general understanding of how facets are used in the search process. A user experiment with 24 subjects was conducted to further understand contextual information, such as the searchers' motivations and perceptions. The results indicate that most searchers were able to understand the concept of facets naturally and easily. The faceted search was not able to shorten the search time but was able to improve the search accuracy. Facets were used more for open-ended tasks and difficult tasks that require more effort to learn, investigate, and explore. Overall, the results weaved a detailed "story" about the ways that people use facets and the ways that facets help people use library catalogs.
Scholarly peer review is a complex collaborative activity that is increasingly supported by web-based systems, yet little is known about how reviewers and authors interact in such environments, how criticisms are conveyed, or how the systems may affect the interactions and use of language of reviewers and authors. We looked at one aspect of the interactions between reviewers and authors, the use of politeness in reviewers' comments. Drawing on Brown and Levinson's (1987) politeness theory, we analyzed how politeness strategies were employed by reviewers to mitigate their criticisms in an open peer-review process of a special track of a human-computer interaction conference. We found evidence of frequent use of politeness strategies and that open peer-review processes hold unique challenges and opportunities for using politeness strategies. Our findings revealed that less experienced researchers tended to express unmitigated criticism more often than did experienced researchers, and that reviewers tended to use more positive politeness strategies (e.g., compliments) toward less experienced authors. Based on our findings, we discuss implications for research communities and the design of peer-reviewing processes and the information systems that support them.
Offensive Internet chats, particularly the child-exploiting type, tend to follow a documented psychological behavioral pattern. Researchers have identified some important stages in this pattern. The psychological stages broadly include befriending, information exchange, grooming, and approach. Similarities among the posts of a chat play an important role in differentiating as well as in identifying these stages. In this article a novel similarity measure is constructed which gives high Inter-post-similarity among the chat-posts within a particular behavioral stage and low inter-post-similarity across different behavioral stages. A psychological stage corpus-based dictionary is constructed from mining the terms associated with each stage. The dictionary works as a background knowledge-base to support the similarity measure. To find the inter-post similarity a modified sentence similarity measure is used. The proposed measure gives improved recognition of inter-stage and intra-stage similarity among the chat posts compared with other types of similarity measures. The pairwise inter-post similarity is used for clustering chat-posts into the psychological stages. Results of experiments demonstrate that the new clustering method gives better results than some current clustering methods.
Applying a recently developed method for measuring the level of specialization over time for a selection of library and information science (LIS)-core journals seems to reveal that Journal of the Association for Information Science and Technology (JASIST) is slowly transforming into a specialty journal. The transformation seems to originate from a growing interest in bibliometric topics. This is evident from a longitudinal study (1990-2012) of the bibliometric coupling strength between Scientometrics and other LIS-core journals (including JASIST). The cause of this gradual transformation is discussed, and possible explanations are analyzed.
The present study reports on the information seeking processes in a visual context, referred to throughout as visual information seeking. This study synthesizes research throughout different, yet complementary, areas, each capable of contributing findings and understanding to visual information seeking. Methods previously applied for examining the visual information seeking process are reviewed, including interactive experiments, surveys, and various qualitative approaches. The methods and resulting findings are presented and structured according to generalized phases of existing information seeking models, which include the needs, actions, and assessments of users. A review of visual information needs focuses on need and thus query formulation; user actions, as reviewed, centers on search and browse behaviors and the observed trends, concluded by a survey of users' assessments of visual information as part of the interactive process. This separate examination, specific to a visual context, is significant; visual information can influence outcomes in an interactive process and presents variations in the types of needs, tasks, considerations, and decisions of users, as compared to information seeking in other contexts.
The goal of entity linking is to associate references to an entity that is found in unstructured natural language content to an authoritative inventory of known entities. This article describes the construction of 6 test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with 2 crowdsourced validation stages to affordably generate ground-truth annotations with an accuracy comparable to that of a completely manual process. The resulting test collections each contain between 642 (Arabic) and 2,361 (Romanian) person references in non-English texts for which the correct resolution in English Wikipedia is known, plus a similar number of references for which no correct resolution into English Wikipedia is believed to exist. Fully automated cross-language person-name linking experiments with 20 non-English languages yielded a resolution accuracy of between 0.84 (Serbian) and 0.98 (Romanian), which compares favorably with previously reported cross-language entity linking results for Spanish.
International collaboration tends to result in more highly cited research and, partly as a result of this, many research funding schemes are specifically international in scope. Nevertheless, it is not clear whether this citation advantage is the result of higher quality research or due to other factors, such as a larger audience for the publications. To test whether the apparent advantage of internationally collaborative research may be due to additional interest in articles from the countries of the authors, this article assesses the extent to which the national affiliations of the authors of articles affect the national affiliations of their Mendeley readers. Based on English-language Web of Science articles in 10 fields from science, medicine, social science, and the humanities, the results of statistical models comparing author and reader affiliations suggest that, in most fields, Mendeley users are disproportionately readers of articles authored from within their own country. In addition, there are several cases in which Mendeley users from certain countries tend to ignore articles from specific other countries, although it is not clear whether this reflects national biases or different national specialisms within a field. In conclusion, research funders should not incentivize international collaboration on the basis that it is, in general, higher quality because its higher impact may be primarily due to its larger audience. Moreover, authors should guard against national biases in their reading to select only the best and most relevant publications to inform their research.
Blogs that cite academic articles have emerged as a potential source of alternative impact metrics for the visibility of the blogged articles. Nevertheless, to evaluate more fully the value of blog citations, it is necessary to investigate whether research blogs focus on particular types of articles or give new perspectives on scientific discourse. Therefore, we studied the characteristics of peer-reviewed references in blogs and the typical content of blog posts to gain insight into bloggers' motivations. The sample consisted of 391 blog posts from 2010 to 2012 in Researchblogging.org's health category. The bloggers mostly cited recent research articles or reviews from top multidisciplinary and general medical journals. Using content analysis methods, we created a general classification scheme for blog post content with 10 major topic categories, each with several subcategories. The results suggest that health research bloggers rarely self-cite and that the vast majority of their blog posts (90%) include a general discussion of the issue covered in the article, with more than one quarter providing health-related advice based on the article(s) covered. These factors suggest a genuine attempt to engage with a wider, nonacademic audience. Nevertheless, almost 30% of the posts included some criticism of the issues being discussed.
Social networking sites (SNSs) can encourage interaction among users. Existing research mainly focuses on the ways in which SNSs are used in libraries and on librarians' or users' attitudes towards these SNSs. This study focused on the flow of information via SNS interactions between librarians and users on library Facebook, Twitter, and Chinese Weibo sites, and developed an SNS user interaction type model based on these information flows. A mixed-method approach was employed combining quantitative data generated from the analysis of 1,753 posts sampled from 40 library SNSs and qualitative data from interviews with 10 librarians. Four types of interactions were identified: information/knowledge sharing, information dissemination, communication, and information gathering. The study found that SNSs were used primarily as channels for disseminating news and announcements about things currently happening in the library. Communication allowed open-ended questions and produced more replies. In Facebook posts, Chinese Facebook users generated less likes than English-speaking users. The comparison of data between Facebook-like and Twitter-like SNSs in different library settings suggested that libraries need to coordinate different types of SNSs, and take library settings and sociocultural environments into consideration in order to enhance and encourage user engagement and interaction.
The current work applies a method for mapping the supply of new knowledge from public research organizations, in this case from Italian institutions at the level of regions and provinces (NUTS2 and NUTS3). Through the analysis of scientific production indexed in the Web of Science for the years 2006-2010, the new knowledge is classified in subject categories and mapped according to an algorithm for the reconciliation of authors' affiliations. Unlike other studies in the literature based on simple counting of publications, the present study adopts an indicator, Scientific Strength, which takes account of both the quantity of scientific production and its impact on the advancement of knowledge. The differences in the results that arise from the 2 approaches are examined. The results of works of this kind can inform public research policies, at national and local levels, as well as the localization strategies of research-based companies.
In scientometrics, citer behavior is traditionally investigated using one of two main approaches. According to the normative point of view, the behavior of scientists is regulated by norms that make the detection of citation patterns useful for the interpretation of bibliometric measures. According to the constructivist perspective, citer behavior is influenced by other factors linked to the social and/or psychological sphere that do not allow any statistical inferences that are useful for the purposes of interpretation. An intermediate position supports normative theories in describing citer behavior with respect to high citation frequencies and constructivist theories with respect to low citation counts. In this paper, this idea was tested in a case study of the Italian sociology community. Italian sociology is characterized by an unusual organization into three political or ideological camps, and belonging to one camp can be considered a potentially strong constructivist reason to cite. An all-author co-citation analysis was performed to map the structure of the Italian sociology community and look for evidence of three camps. We did not expect to find evidence of this configuration in the co-citation map. The map, in fact, included authors who obtained high citation counts that are supposedly produced by a normative-oriented behavior. The results confirmed this hypothesis and showed that the clusters seemed to be divided according to topic and not by camp. Relevant scientific works were cited by the members of the entire community regardless of their membership in any particular camp.
This study compares the range of disciplines of citing journal articles to determine how closely related journals assigned to the same Web of Science research area are. The frequency distribution of disciplines by citing articles provides a signature for a cited journal that permits it to be compared with other journals using similarity comparison techniques. As an initial exploration, citing discipline data for 40 high-impact-factor journals assigned to the information science and library science category of the Web of Science were compared across 5 time periods. Similarity relationships were determined using multidimensional scaling and hierarchical cluster analysis to compare the outcomes produced by the proposed citing discipline and established cocitation methods. The maps and clustering outcomes reveal that a number of journals in allied areas of the information science and library science category may not be very closely related to each other or may not be appropriately situated in the category studied. The citing discipline similarity data resulted in similar outcomes with the cocitation data but with some notable differences. Because the citing discipline method relies on a citing perspective different from cocitations, it may provide a complementary way to compare journal similarity that is less labor intensive than cocitation analysis.
Existing computer technologies poorly support the ideation phase common to graphic design practice. Finding and indexing visual material to assist the process of ideation often fall on the designer, leading to user experiences that are less than ideal. To inform development of computer systems to assist graphic designers in the ideation phase of the design process, we conducted interviews with 15 professional graphic designers about their design process and visual information needs. Based on the study, we propose a set of requirements for an ideation-support system for graphic design.
Search result diversification is one of the key techniques to cope with the ambiguous and underspecified information needs of web users. In the last few years, strategies that are based on the explicit knowledge of query aspects emerged as highly effective ways of diversifying search results. Our contributions in this article are two-fold. First, we extensively evaluate the performance of a state-of-the-art explicit diversification strategy and pin-point its potential weaknesses. We propose basic yet novel optimizations to remedy these weaknesses and boost the performance of this algorithm. As a second contribution, inspired by the success of the current diversification strategies that exploit the relevance of the candidate documents to individual query aspects, we cast the diversification problem into the problem of ranking aggregation. To this end, we propose to materialize the re-rankings of the candidate documents for each query aspect and then merge these rankings by adapting the score(-based) and rank(-based) aggregation methods. Our extensive experimental evaluations show that certain ranking aggregation methods are superior to existing explicit diversification strategies in terms of diversification effectiveness. Furthermore, these ranking aggregation methods have lower computational complexity than the state-of-the-art diversification strategies.
We measure synergy for the Russian national, provincial, and regional innovation systems as reduction of uncertainty using mutual information among the 3 distributions of firm sizes, technological knowledge bases of firms, and geographical locations. Half a million units of data at firm level in 2011 were obtained from the Orbis database of Bureau Van Dijk. The firm level data were aggregated at the levels of 8 Federal Districts, the regional level of 83 Federal Subjects, and the single level of the Russian Federation. Not surprisingly, the knowledge base of the economy is concentrated in the Moscow region (22.8%) and Saint Petersburg (4.0%). Except in Moscow itself, high-tech manufacturing does not add synergy to any other unit at any of the various levels of geographical granularity; instead it disturbs regional coordination. Knowledge-intensive services (KIS; including laboratories) contribute to the synergy in all Federal Districts (except the North-Caucasian Federal District), but only in 30 of the 83 Federal Subjects. The synergy in KIS is concentrated in centers of administration. The knowledge-intensive services (which are often state affiliated) provide backbone to an emerging knowledge-based economy at the level of Federal Districts, but the economy is otherwise not knowledge based (except for the Moscow region).
This paper reviews information practices used by transnational migrants to become familiar with new urban surroundings. Drawing on interviews with 26 participants, all of whom had moved to New York City in the past 2 years, I analyze the interrelatedness of people, city space, and technology. By rooting my analysis in the experiences of transnational newcomers to New York, my investigation is directed both at library and information science (LIS) scholarship in transnational experience and urban informatics as an area of study. The findings section first addresses participants' practices for becoming familiar with their surroundings as embedded in everyday routines using Bergson's (1911) construct of habits as a means of organizing stimulus. I then develop an analysis of wandering, which emerged as an information practice used by participants to become familiar with their neighborhoods. Building on these themes, I suggest implications for human information behavior theory, arguing that LIS scholars should articulate more clearly and across a wider range of disciplines the concepts of mobile and ubiquitous technologies.
Adverse selection occurs when a firm signs a contract with a potential worker but his/her key skills are still not known at that time, which leads the employer to make a wrong decision. In this article, we study the example of adverse selection of reviewers when a potential referee whose ability is his private information faces a finite sequence of review processes for several scholarly journals, one after the other. Our editor's problem is to design a system that guarantees that each manuscript is reviewed by a referee if and only if the reviewer's ability matches the review's complexity. As is typically the case in solving problems of adverse selection in agency theory, the journal editor offers a menu of contracts to the potential referee, from which the reviewer chooses the contract that is best for him given his ability. The optimal contract will be the one that provides the right incentives to match the complexity of the review and the ability of the reviewer. The payment of contracts is made through a proportional increment of the reviewer factor, which measures the importance of reviewers to their field.
This paper reports findings from participatory design research aimed at uncovering how technological interventions can engage users in the domain of privacy. Our work was undertaken in the context of a new design concept, Privacy Trends, whose aspiration is to foster technology users' digital literacy regarding ongoing privacy risks and elucidate how such risks fit within existing social, organizational, and political systems, leading to a longer-term privacy concern. Our study reveals 2 challenges for privacy intervention design: the need to develop technology users' intrinsic motivations with the privacy domain and the importance of framing the concept of privacy within users' interests. Setting our study within a design context enables us to identify 4 design opportunities for fostering engagement with the privacy domain through technology design.
This study looks at mean and cruel online behavior through the lens of design, with the goal of developing positive technologies for youth. Narrative inquiry was used as a research method, allowing two focus groupsone composed of teens and the other of undergraduate studentsto map out 4 cyberbullying stories. Each cyberbullying story revealed 2 subplotsthe story that is (as perceived by these participants) and the story that could be (if the participants' design recommendations were embedded in social media). The study resulted in a user-generated framework for designing affordances on social media sites to counter acts of cyberbullying. Seven emergent design themes are evident in the participants' cyberbullying narratives: design for hesitation, design for consequence, design for empathy, design for personal empowerment, design for fear, design for attention, and design for control and suppression. We conclude with a typological analysis of the values present in the participants' design recommendations, applying Cheng and Fleischman's values framework (2010).
We describe an unsupervised, language-independent spelling correction search system. We compare the proposed approach with unsupervised and supervised algorithms. The described approach consistently outperforms other unsupervised efforts and nearly matches the performance of a current state-of-the-art supervised approach.
Scholars have uncovered abundant data about the history of the term information, as well as some of its many combined phrases (e.g., information science, information retrieval, and information technology). Many other compounds that involve information seem, however, not to have a known origin yet. In this article, further information about the phrase information storage and retrieval is provided. Knowing the history of terms and their associated concepts is an important prescription against poor terminological phrasing and theoretical confusion.
In this study, we analysed the statistical association between e-journal use and research output at the institution level in South Korea by performing comparative and diachronic analyses, as well as the analysis by field. The datasets were compiled from four different sources: national reports on research output indicators in science fields, two statistics databases on higher education institutions open to the public, and e-journal usage statistics generated by 47 major publishers. Due to the different data sources utilized, a considerable number of missing values appeared in our datasets and various mapping issues required corrections prior to the analysis. Two techniques for handling missing data were applied and the impact of each technique was discussed. In order to compile the institutional data by field, journals were first mapped, and then the statistics were summarized according to subject field. We observed that e-journal use exhibited stronger correlations with the number of publications and the times cited, in contrast to the number of undergraduates, graduates, faculty members and the amount of research funds, and this was the case regardless of the NA handling method or author type. The difference between the maximum correlation for the amount of external research funding with two average indicators and that of the correlation for e-journal use were not significant. Statistically, the accountability of e-journal use for the average times cited per article and the average JIF was quite similar with external research funds. It was found that the number of e-journal articles used had a strong positive correlation (Pearson's correlation coefficients of r > 0.9, p < 0.05) with the number of articles published in SCI(E) journals and the times cited regardless of the author type, NA handling method or time period. We also observed that the top-five institutions in South Korea, with respect to the number of publications in SCI(E) journals, were generally across a balanced range of academic activities, while producing significant research output and using published material. Finally, we confirmed that the association of e-journal use with the two quantitative research indicators is strongly positive, even for the analyses by field, with the exception of the Arts and Humanities.
Cuban scientific output is analyzed for the period 2003-2011, in Scopus database. Based on a set of bibliometric indicators, we try to shed light on the evolution of the volume of scientific output in Cuban and foreign journals, and its distribution and visibility by quartiles. Also analyzed is the citation per document received, broken down by language of publication and type of collaboration. The results reveal patterns and strategies of expansion in scientific communication that may be useful for academic and institutional decision-makers, suggesting means of amending editorial policy to improve scientific quality and international diffusion of output. It is hoped that these results will spur debate about research policies and actions to be taken to enhance the quality of research.
The main objective of this paper is to study the development and growth of scientific literature on women in science and higher education. A total of 1415 articles and reviews published between 1991 and 2012 were extracted from the Thomson Reuters Web of Science database. Standard bibliometric indicators and laws (e.g. Price's, Lotka's, and Bradford's laws) were applied to these data. In addition, the Gender Inequality Index (GII) was obtained for each country in order to rank them. The results suggest an upward trend not only in the number of papers but also in the number of authors per paper. However, this increase in the number of authors was not accompanied by greater international collaboration. The interest in gender differences in science extends too many authors (n = 3064), countries (n = 67), and research areas (n = 86). Data showed a high dispersion of the literature and a small set of core journals focused on the topic. Regarding the research areas, the area with the highest frequency of papers was Education and Educational Research. Finally, our results showed that countries with higher levels of inequality (higher GII values) tend to present higher relative values of scientific productivity in the field.
Recent interest towards university rankings has led to the development of several ranking systems at national and global levels. Global ranking systems tend to rely on internationally accessible bibliometric databases and reputation surveys to develop league tables at a global level. Given their access and in-depth knowledge about local institutions, national ranking systems tend to include a more comprehensive set of indicators. The purpose of this study is to conduct a systematic comparison of national and global university ranking systems in terms of their indicators, coverage and ranking results. Our findings indicate that national rankings tend to include a larger number of indicators that primarily focus on educational and institutional parameters, whereas global ranking systems tend to have fewer indicators mainly focusing on research performance. Rank similarity analysis between national rankings and global rankings filtered for each country suggest that with the exception of a few instances global rankings do not strongly predict the national rankings.
This paper analyses the interrelationship between perceived journal reputation and its relevance for academics' work. Based on a survey of 705 members of the German Economic Association (GEA), we find a strong interrelationship between perceived journal reputation and relevance where a journal's perceived relevance has a stronger effect on its reputation than vice versa. Moreover, past journal ratings conducted by the Handelsblatt and the GEA directly affect journals' reputation among German economists and indirectly also their perceived relevance, but the effect on reputation is more than twice as large as the effect on perceived relevance. In general, citations have a non-linear impact on perceived journal reputation and relevance. While the number of landmark articles published in a journal (as measured by the so-called H-index) increases the journal's reputation, an increase in the H-index even tends to decrease a journal's perceived relevance, as long as this is not simultaneously reflected in a higher Handelsblatt and/or GEA rating. This suggests that a journal's relevance is driven by average article quality, while reputation depends more on truly exceptional articles. We also identify significant differences in the views on journal relevance and reputation between different age groups.
Quantitative and qualitative studies of scientific performance provide a measure of scientific productivity and represent a stimulus for improving research quality. Whatever the goal (e.g., hiring, firing, promoting or funding), such analyses may inform research agencies on directions for funding policies. In this article, we perform a data-driven assessment of the performance of top Brazilian computer science researchers considering three central dimensions: career length, number of students mentored, and volume of publications and citations. In addition, we analyze the researchers' publishing strategy, based upon their area of expertise and their focus on venues of different impact. Our findings demonstrate that it is necessary to go beyond counting publications to assess research quality and show the importance of considering the peculiarities of different areas of expertise while carrying out such an assessment.
Social networks are said to have a positive impact on scientific development. Conventionally, it is argued that female and male researchers differ in access to and participation in networks and hence experience unequal career opportunities. Due to limited capacities of time and resources as well as homophily, top-level scientists may structure their contacts to reduce problems of complexity and uncertainty. The outcomes of the structuring can be cohesive subgroups within networks of relation. Women in science might suffer exclusion from cliques because of being dissimilar in the arena. The present paper aims to explore integration in and composition of scientific cliques. A three-step analysis is conducted: Firstly, cliques are identified. Secondly, overlap structures are examined. Thirdly, group compositions are analysed in terms of other personal attributes of the researchers involved. Building on network data of female and male investigators, the article applies a comparative case study design including two cutting edge research institutions from the German Excellence Initiative. The study contrasts a Cluster of Excellence with a Graduate School and the corresponding formal with the informal networks. The results imply that the general hypothesis of unfavourably embedded female researchers cannot be supported. Although women are less integrated in scientific cliques, the majority is involved in an inner social circle which enables access to career-relevant network resources.
The emergence of new networking research organisations is explained by the need to promote excellence in research and to facilitate the resolution of specific problems. This study focuses on a Spanish case, the Biomedical Research Networking Centres (CIBER), created through a partnership of research groups, without physical proximity, who work on common health related issues. These structures are a great challenge for bibliometricians due to their heterogeneous composition and virtual nature. Therefore, the main objective of this paper is to assess different approaches based on addresses, funding acknowledgements and authors to explore which search strategy or combination is more effective to identify CIBER publications. To this end, we downloaded all the Spanish publications from the Web of Science databases, in the subject categories of Gastroenterology/Hepatology and Psychiatry during the period 2008-2011. Our results showed that, taken alone, the dataset based on addresses identified more than 60 % of all potential CIBER publications. However, the best outcome was obtained by combining it with additional datasets based on funding acknowledgements and on authors, recovering more than 80 % of all possible CIBER publications without losing accuracy. In terms of bibliometric performance, all the CIBER sets showed scores above the country average, thus proving the relevance of these virtual organisations. Finally, given the increasing importance of these structures and the fact that authors do not always mention their connection to CIBER, some recommendations are offered to develop clear policies on how, when and where to specify this relationship.
Previous studies have reported the increased use of English as the "lingua franca" for academic purposes among non-Anglophone researchers. But despite data that confirm this trend, little is known about the reasons why researchers decide to publish their results in English rather than in their first language. The aim of this study is to determine the influence of researchers' scientific domain on their motivation to publish in English. The results are based on a large-scale survey of Spanish postdoctoral researchers at four different universities and one research centre, and reflect responses from 1717 researchers about their difficulties, motivations, attitudes and publication strategies. Researchers' publication experiences as corresponding authors of articles in English and in their first language are strongly related to their scientific domain. But surprisingly, Spanish researchers across all domains expressed a similar degree of motivation when they write research articles in English. They perceive a strong association between this language and the desire for their research to be recognized and rewarded. Our study also shows that the target scientific audience is a key factor in understanding the choice of publication language. The implications of our findings go beyond the field of linguistics and are relevant to studies of scientific productivity and visibility, the quality and impact of research, and research assessment policies.
Over the past six decades, Modeling and Simulation (M&S) has been used as a method or tool in many disciplines. While there is no doubt that the emergence of modern M&S is highly connected with that of Computing and Systems science, there is no clear evidence of the contribution of M&S to those disciplines. Further, while there is a growing body of knowledge (BoK) in M&S, there is no easy way to identify it due to the multidisciplinary nature of M&S. In this paper, we examine whether M&S is its own discipline by performing content analysis of a BoK in Computer Science. Content analysis is a research methodology that aims to identify key concepts and relationships in a body of text through computational means. It can be applied to research articles in a BoK to identify the prominent topics and themes. It can also be used to explore the evolution of a BoK over time or to identify the contribution of one BoK to another. The contribution of this paper is twofold; (1) the establishment of M&S as its own discipline and the examination of its relationship with the sister disciplines of Computer Science and Systems Engineering over the last 60 years and (2) the examination of the contribution of M&S to the sciences as represented in the Public Library of Science.
This study aims to undertake a bibliometric investigation of the NASA Astrobiology Institute (NAI) funded research that was published between 2008 and 2012 (by teams of Cooperative Agreement Notice Four and Five). For this purpose, the study creates an inventory of publications co-authored through NAI funding and investigates journal preferences, international and institutional collaboration, and citation behaviors of researchers to reach a better understanding of interdisciplinary and collaborative astrobiology research funded by the NAI. Using the NAI annual reports, 1210 peer-reviewed publications are analyzed. The following conclusions are drawn: (1) NAI researchers prefer publishing in high-impact multidisciplinary journals. (2) Astronomy and astrophysics are the most preferred categories to publish based on Web of Science subject categories. (3) NAI is indeed a virtual institution; researchers collaborate with other researchers outside their organization and in some cases outside the U.S. (4) There are prominent scholars in the NAI co-author network but none of them dominates astrobiology.
The automatically construction of term taxonomy can enhance our ability for expressing the science mapping. In this paper, we introduce the definition of weighted co-occurring word pair and corresponding improved method of word co-occurrence analysis. An application and evaluation of this proposed method in the library and information science is also discussed, which includes how to get the expanded effective keywords, how to calculate the weight of keywords and their relations, and how to abstract the hierarchical structures and other relations such as synonyms and etc. A visualization tool and a prototype search system are designed for browsing the term taxonomy identified. Finally, we report the experiment of evaluation and comparison. The experiment results prove that this proposed method in helping users doing semantic searches and expanding their searching results is effective and can meet the requirement of some specific domains.
This study draws on publication and citation data related to plant biotechnology from a 10-year (2004-2013) period to assess the research performance, impact, and collaboration of member states of the Association of Southeast Asian Nations (ASEAN). Plant biotechnology is one of the main areas of cooperation between ASEAN member states and among the research areas promoted to achieve regional food security and sustainable development. In general, findings indicate increased scientific output, influence, and overall collaboration of ASEAN countries in plant biotechnology over time. Research performance and collaboration (domestic, regional, and international) of the region in plant biotechnology are linked to the status of the economic development of each member country. Thailand produced the most publications of the ASEAN member states while Singapore had the highest influence as indicated by its citation activity in plant biotechnology among the ASEAN countries. Domestic and international collaborations on plant biotechnology are numerous. Regional collaboration or partnership among ASEAN countries was, however, was found to be very limited, which is a concern for the region's goal of economic integration and science and technology cooperation. More studies using bibliometric data analysis need to be conducted to understand plant biotechnology cooperation and knowledge flows between ASEAN countries.
In this study, the accuracy of the Author ID (author identification) in the Scopus bibliographic database was evaluated. For this purpose, we adopted the KAKEN database as the source of "correct data". KAKEN is an open database and is the biggest funding database in Japan because it manages all the information of the largest public fund for academic researchers. In the KAKEN database, each researcher has a unique Researcher Number, which must be used when a proposal or annual report is submitted to the database. Thus, the concordance between each researcher and the associated Researcher Number is checked automatically and constantly. For this reason, we used this number to evaluate the Scopus Author ID. After matching bibliographic records between Scopus and KAKEN, we calculated recall and precision of the Scopus Author ID for Japanese researchers. We found that recall and precision were around 98 and 99 % respectively. This result showed the Author ID, though not perfectly accurate in terms of individual identification, was reliable enough to be used as a new tool for bibliometrics. We hope that academic researchers outside Japan will also evaluate the accuracy of the Scopus Author ID as a tool to uniquely identify individual researchers.
In this paper we present and discuss the results of six enquiries into the first author's academic writing over the last 50 years. Our aim is to assess whether or not his academic writing style has changed with age, experience, and cognitive decline. The results of these studies suggest that the readability of textbook chapters written by Hartley has remained fairly stable for over 50 years, with the later chapters becoming easier to read. The format of the titles used for chapters and papers has also remained much the same, with an increase in the use of titles written in the form of questions. It also appears that the format of the chosen titles had no effect on citation rates, but that papers that obtained the highest citation rates were written with colleagues rather by Hartley alone. Finally it is observed that Hartley's publication rate has remained much the same for over 50 years but that this has been achieved at the expense of other academic activities.
Bibliometrics is a relatively young and rapidly evolving discipline. Essential for this discipline are bibliometric databases and their information content concerning scientific publications and relevant citations. Databases are unfortunately affected by errors, whose main consequence is represented by omitted citations, i.e., citations that should be ascribed to a certain (cited) paper but, for some reason, are lost. This paper studies the impact of omitted citations on the bibliometric statistics of the major Manufacturing journals. The methodology adopted is based on a recent automated algorithm-introduced in (Franceschini et al., J Am Soc Inf Sci Technol 64(10):2149-2156, 2013)-which is applied to the Web of Science (WoS) and Scopus database. Two important results of this analysis are that: (i) on average, the omitted-citation rate (p) of WoS is slightly higher than that of Scopus; and (ii) for both databases, p values do not change drastically from journal to journal and tend to slightly decrease with respect to the issue year of citing papers. Although it would seem that omitted citations do not represent a substantial problem, they may affect indicators based on citation statistics significantly. This paper analyses the effect of omitted citations on popular bibliometric indicators like the average citations per paper and its most famous variant, i.e., the ISI Impact Factor, showing that journal classifications based on these indicators may lead to questionable discriminations.
Alternative metrics are currently one of the most popular research topics in scientometric research. This paper provides an overview of research into three of the most important altmetrics: microblogging (Twitter), online reference managers (Mendeley and CiteULike) and blogging. The literature is discussed in relation to the possible use of altmetrics in research evaluation. Since the research was particularly interested in the correlation between altmetrics counts and citation counts, this overview focuses particularly on this correlation. For each altmetric, a meta-analysis is calculated for its correlation with traditional citation counts. As the results of the meta-analyses show, the correlation with traditional citations for micro-blogging counts is negligible (pooled r = 0.003), for blog counts it is small (pooled r = 0.12) and for bookmark counts from online reference managers, medium to large (CiteULike pooled r = 0.23; Mendeley pooled r = 0.51).
In this global information age, accessing, disseminating, and controlling information is an increasingly important aspect of human life. Often, these interests are expressed in the language of human rightsfor example, rights to expression, privacy, and intellectual property. As the discipline concerned with facilitating the effective communication of desired information between human generator and human user (Belkin, 1975, p. 22), library and information science (LIS) has a central role in facilitating communication about human rights and ensuring the respect for human rights in information services and systems. This paper surveys the literature at the intersection of LIS and human rights. To begin, an overview of human rights conventions and an introduction to human rights theory is provided. Then the intersections between LIS and human rights are considered. Three central areas of informational human rightscommunication, privacy, and intellectual propertyare discussed in detail. It is argued that communication rights in particular serve as a central linchpin in the system of human rights.
This article provides the first historical analysis of the relationship between collaboration and scientific impact using three indicators of collaboration (number of authors, number of addresses, and number of countries) derived from articles published between 1900 and 2011. The results demonstrate that an increase in the number of authors leads to an increase in impact, from the beginning of the last century onward, and that this is not due simply to self-citations. A similar trend is also observed for the number of addresses and number of countries represented in the byline of an article. However, the constant inflation of collaboration since 1900 has resulted in diminishing citation returns: Larger and more diverse (in terms of institutional and country affiliation) teams are necessary to realize higher impact. The article concludes with a discussion of the potential causes of the impact gain in citations of collaborative papers.
This is a publisher ranking study based on a citation data grant from Elsevier, specifically, book titles cited in Scopus history journals (2007-2011) and matching metadata from WorldCat (R) (i.e., OCLC numbers, ISBN codes, publisher records, and library holding counts). Using both resources, we have created a unique relational database designed to compare citation counts to books with international library holdings or libcitations for scholarly book publishers. First, we construct a ranking of the top 500 publishers and explore descriptive statistics at the level of publisher type (university, commercial, other) and country of origin. We then identify the top 50 university presses and commercial houses based on total citations and mean citations per book (CPB). In a third analysis, we present a map of directed citation links between journals and book publishers. American and British presses/publishing houses tend to dominate the work of library collection managers and citing scholars; however, a number of specialist publishers from Europe are included. Distinct clusters from the directed citation map indicate a certain degree of regionalism and subject specialization, where some journals produced in languages other than English tend to cite books published by the same parent press. Bibliometric rankings convey only a small part of how the actual structure of the publishing field has evolved; hence, challenges lie ahead for developers of new citation indices for books and bibliometricians interested in measuring book and publisher impacts.
Thesauri and other types of controlled vocabularies are increasingly re-engineered into ontologies described using the Web Ontology Language (OWL), particularly in the life sciences. This has led to the perception by some that thesauri are ontologies once they are described by using the syntax of OWL while others have emphasized the need to re-engineer a vocabulary to use it as ontology. This confusion is rooted in different perceptions of what ontologies are and how they differ from other types of vocabularies. In this article, we rigorously examine the structural differences and similarities between thesauri and meaning-defining ontologies described in OWL. Specifically, we conduct (a) a conceptual comparison of thesauri and ontologies, and (b) a comparison of a specific thesaurus and a specific ontology in the same subject field. Our results show that thesauri and ontologies need to be treated as 2 orthogonal kinds of models with superficially similar structures. An ontology is not a good thesaurus, nor is a thesaurus a good ontology. A thesaurus requires significant structural and other content changes to become an ontology, and vice versa.
Much of the literature of information science and knowledge organization has accepted and built upon Elaine Svenonius's (2004) claim that paradigmatic relationships are those that are context-free, definitional, and true in all possible worlds (p. 583). At the same time, the literature demonstrates a common understanding that paradigmatic relations are the kinds of semantic relations used in thesauri and other knowledge organization systems (including equivalence relations, hierarchical relations, and associative relations). This understanding is problematic and harmful because it directs attention away from the empirical and contextual basis for knowledge-organizing systems. Whether A is a kind of X is certainly not context-free and definitional in empirical sciences or in much everyday information. Semantic relations are theory-dependent and, in biology, for example, a scientific revolution has taken place in which many relations have changed following the new taxonomic paradigm named cladism. This biological example is not an exception, but the norm. Semantic relations including paradigmatic relations are not a priori but are dependent on subject knowledge, scientific findings, and paradigms. As long as information scientists and knowledge organizers isolate themselves from subject knowledge, knowledge organization cannot possibly progress.
Calls for interdisciplinary collaboration have become increasingly common in the face of large-scale complex problems (including climate change, economic inequality, and education, among others); however, outcomes of such collaborations have been mixed, due, among other things, to the so-called translation problem in interdisciplinary research. This article presents a potential solution: an empirical approach to quantitatively measure both the degree and nature of differences among disciplinary tongues through the social and epistemic terms used (a research area we refer to as discourse epistemetrics), in a case study comparing dissertations in philosophy, psychology, and physics. Using a support-vector model of machine learning to classify disciplines based on relative frequencies of social and epistemic terms, we were able to markedly improve accuracy over a random selection baseline (distinguishing between disciplines with as high as 90% accuracy) as well as acquire sets of most indicative terms for each discipline by their relative presence or absence. These lists were then considered in light of findings of sociological and epistemological studies of disciplines and found to validate the approach's measure of social and epistemic disciplinary identities and contrasts. Based on the findings of our study, we conclude by considering the beneficiaries of research in this area, including bibliometricians, students, and science policy makers, among others, as well as laying out a research program that expands the number of disciplines, considers shifts in socio-epistemic identities over time and applies these methods to nonacademic epistemological communities (e.g., political groups).
This longitudinal study examined the search behavior of 10 students as they completed assigned exercises for an online professional course in expert searching. The research objective was to identify, describe, and hypothesize about features of the behavior that are indicative of procedural knowledge gained during guided instruction. Log-data of search interaction were coded using a conceptual framework focused on components of search practice hypothesized to organize an expert searcher's attention during search. The coded data were analyzed using a measure of pointwise mutual information and state-transition analysis. Results of the study provide important insight for future investigation of domain-independent search expertise and for the design of systems that assist searchers in gaining expertise.
Many open access journals have a reputation for being of low quality and being dishonest with regard to peer review and publishing costs. Such journals are labeled predatory journals. This study examines author profiles for some of these predatory journals as well as for groups of more well-recognized open access journals. We collect and analyze the publication record, citation count, and geographic location of authors from the various groups of journals. Statistical analyses verify that each group of journals has a distinct author population. Those who publish in predatory journals are, for the most part, young and inexperienced researchers from developing countries. We believe that economic and sociocultural conditions in these developing countries have contributed to the differences found in authorship between predatory and nonpredatory journals.
This research examines interactions among members of an online breast cancer community, focusing on how information and social support were exchanged, how these exchanges influenced health decisions, and how the community was integrated into participants' everyday lives. This article is the result of a 2-year ethnography comprising online archives analysis, participant observation, and 31 interviews. In the course of the research, the findings revealed that, not only did participants exchange valuable information and helpful social support, there was often little separation between the two, with each overlaying the other throughout most interactions. Expressions of support permeated many informational messages and at the same time served as information to participants. This article argues that social support and information were inextricably connected within participant interactions and that social support is, itself, a form of information that impacts actions and emotional experiences, contributing to participants being able to make sense of their experiences and to move forward both physically and emotionally. This research builds on work in information science that looks at the ways in which people exchange information in informal environments and extends that research by drawing on conceptualizations of social support to exhibit the connections between social support and information.
Characterization of waste recycling (WR) research has to start by defining the scope of this scientific area. Previous works and expert assessment recommend the adoption of an inclusive definition, with the aim of including all relevant uses of waste within this study. An ad-hoc designed capture strategy has been used to retrieve WR-related peer-reviewed journal papers from selected databases, and the information contained in their author keyword field has been thoroughly cleaned. Author keyword co-occurrence data have been used for building a similarity measure between keywords, plus cluster analysis for revealing the main WR research being addressed by the scientific community. Results have been further analyzed using advanced visualization tools to determine which clusters formed strongly-linked research areas that could set the main cognitive divisions of WR science. This process has been repeated with 2002 and 2012 data, and science maps reflecting the main research areas and clusters have been generated. Results show that WR mainly deals with the recovery of basic, widely used raw materials like water and fertile soil. Energy generation and waste management are other relevant fields that show an interesting evolution, revealing signs of growth in research, together with the emergence of sub-areas reflecting consolidating research specialties.
Twitter as a potential alternative source of external links for use in webometric analysis is analyzed because of its capacity to embed hyperlinks in different tweets. Given the limitations on searching Twitter's public application programming interface (API), we used the Topsy search engine as a source for compiling tweets. To this end, we took a global sample of 200 universities and compiled all the tweets with hyperlinks to any of these institutions. Further link data was obtained from alternative sources (MajesticSEO and OpenSiteExplorer) in order to compare the results. Thereafter, various statistical tests were performed to determine the correlation between the indicators and the possibility of predicting external links from the collected tweets. The results indicate a high volume of tweets, although they are skewed by the performance of specific universities and countries. The data provided by Topsy correlated significantly with all link indicators, particularly with OpenSiteExplorer (r=0.769). Finally, prediction models do not provide optimum results because of high error rates. We conclude that the use of Twitter (via Topsy) as a source of hyperlinks to universities produces promising results due to its high correlation with link indicators, though limited by policies and culture regarding use and presence in social networks.
Serendipity is not an easy word to define. Its meaning has been stretched to apply to experiences ranging from the mundane to the exceptional. Serendipity, however, is consistently associated with unexpected and positive personal, scholarly, scientific, organizational, and societal events and discoveries. Diverse serendipitous experiences share a conceptual space; therefore, what lessons can we draw from an exploration of how serendipity unfolds and what may influence it? This article describes an investigation of work-related serendipity. Twelve professionals and academics from a variety of fields were interviewed. The core of the semi-structured interviews focused on participants' own work-related experiences that could be recalled and discussed in depth. This research validated and augmented prior research while consolidating previous models of serendipity into a single model of the process of serendipity, consisting of: Trigger, Connection, Follow-up, and Valuable Outcome, and an Unexpected Thread that runs through 1 or more of the first 4 elements. Together, the elements influence the Perception of Serendipity. Furthermore, this research identified what factors relating to the individual and their environment may facilitate the main elements of serendipity and further influence its perception.
Much of the recent research on user contributions to electronic networks has focused on attracting and motivating participations. We, instead, investigate online community defections, their cause, and their impact through an empirical investigation of defections from Wikipedia. Our research uses justice theory to determine the effects of injustice perceptions on contributor defections and draws on fairness heuristic theory to distinguish the relative effects of distributive injustice and procedural injustice. The results show that perceptions of injustice concerning collaboration outcomes (distributive injustice) raise contributor dissatisfaction, which in turn leads to defection. Perceptions of injustice concerning the process (procedural injustice), by comparison, have a direct impact on defections and exert a stronger influence on dissatisfaction than distributive injustice. Justice emerges as a hygiene factor adding to the level of dissatisfaction among dissatisfied contributors but not among satisfied contributors. The findings contribute to our understanding of collaborative knowledge creation by drawing attention to contributors' post hoc passive emotions and behaviors instead of predominantly investigating their initial prosharing behaviors. The work also has practical implications for community governance because it suggests how to sustain communities in the long term.
This study investigates the reasons and factors associated with individuals' intentions to upload content on Wikipedia with a focus on the role of ego involvement and social norms. The study also compares the associations between the factors and uploading intentions of two different cultures, the United States and South Korea. Using data from surveys of college students from the two countries (n=249 and 173), structural equation modeling analyses revealed that ego involvement not only plays an essential role in explaining uploading intention but also functions as an antecedent of attitude and perceived behavioral control in both countries. However, the subjective norm was not significantly associated with uploading intention in either country, and the descriptive norm was a marginally significant factor in accounting for uploading intention in the United States. Moreover, it was found that, only in the United States, attitude was significantly associated with uploading intention, whereas perceived behavioral control was a significant factor in explaining uploading intention in both countries. These findings suggest that the effects of cultural differences, the impact of subjective norms in particular, may be weakened with the more self-oriented nature of Web 2.0 applications and social media such as Wikipedia. Theoretical implications are discussed.
The BRICS countries (Brazil, Russia, India, China, and South Africa) are notable for their increasing participation in science and technology. The governments of these countries have been boosting their investments in research and development to become part of the group of nations doing research at a world-class level. This study investigates the development of the BRICS countries in the domain of top-cited papers (top 10% and 1% most frequently cited papers) between 1990 and 2010. To assess the extent to which these countries have become important players at the top level, we compare the BRICS countries with the top-performing countries worldwide. As the analyses of the (annual) growth rates show, with the exception of Russia, the BRICS countries have increased their output in terms of most frequently cited papers at a higher rate than the top-cited countries worldwide. By way of additional analysis, we generate coauthorship networks among authors of highly cited papers for 4 time points to view changes in BRICS participation (1995, 2000, 2005, and 2010). Here, the results show that all BRICS countries succeeded in becoming part of this network, whereby the Chinese collaboration activities focus on the US.
This paper intends to describe the population evolution of a scientific information web service during 2011-2012. Quarterly samples from December 2011 to December 2012 were extracted from Google Scholar Citations to analyse the number of members, distribution of their bibliometric indicators, positions, institutional and country affiliations and the labels to describe their scientific activity. Results show that most of the users are young researchers, with a starting scientific career and mainly from disciplines related to information sciences and technologies. Another important result is that this service is settled by waves emanating from specific institutions and countries. This work concludes that this academic social network presents some biases in the population distribution that does not make it representative of the real scientific population.
In this paper, we assess whether quality survives the test of time in academia by comparing up to 80 years of academic journal article citations from two top journals, Econometrica and the American Economic Review. The research setting under analysis is analogous to a controlled real world experiment in that it involves a homogeneous task (trying to publish in top journals) by individuals with a homogenous job profile (academics) in a specific research environment (economics and econometrics). Comparing articles published concurrently in the same outlet at the same time (same issue) indicates that symbolic capital or power due to institutional affiliation or connection does seem to boost citation success at the beginning, giving those educated at or affiliated with leading universities an initial comparative advantage. Such advantage, however, does not hold in the long run: at a later stage, the publications of other researchers become as or even more successful.
The large amounts of publicly available bibliographic repositories on the web provide us great opportunities to study the scientific behaviors of scholars. This paper aims to study the way we collaborate, model the dynamics of collaborations and predict future collaborations among authors. We investigate the collaborations in three disciplines including physics, computer science and information science,and different kinds of features which may influence the creation of collaborations. Path-based features are found to be particularly useful in predicting collaborations. Besides, the combination of path-based and attribute-based features achieves almost the same performance as the combination of all features considered. Inspired by the findings, we propose an agent-based model to simulate the dynamics of collaborations. The model merges the ideas of network structure and node attributes by leveraging random walk mechanism and interests similarity. Empirical results show that the model could reproduce a number of realistic and critical network statistics and patterns. We further apply the model to predict collaborations in an unsupervised manner and compare it with several state-of-the-art approaches. The proposed model achieves the best predictive performance compared with the random baseline and other approaches. The results suggest that both network structure and node attributes may play an important role in shaping the evolution of collaboration networks.
The study establishes three synthetic indicators derived from academic traces-assignee traces T-1, T-2 and ST-and investigates their application in evaluating technological performance of assignees. Patent data for the top 100 assignees in three fields, "Computer Hardware & Software", "Motors, Engines & Parts", and "Drugs & Medical", were retrieved from USPTO for further analysis. The results reveal that traces are indeed valid and useful indicators for measuring technological performance and providing detailed technical information about assignees and the industry. In addition, we investigate the relationship between traces and three other indicators: patent citation counts, Current Impact Index and patent h-index. In comparison with the three other indicators, traces demonstrate unique advantages and can be a good complement to patent citation analysis.
The "Social Gestalt" model is a new parametric model visualizing 3-D graphs, using animation to show these graphs from different points of view. A visible 3-D graph image is the emerging pattern at the macro level of a system of co-authorships by the process of self-organization. Well-ordered 3-D computer graphs are totally rotatable and their shapes are visible from all possible points of view. The objectives of this paper are the description of several methods for three-dimensional modelling and animation and the application of these methods to two co-authorship networks selected for demonstration of varying 3-D graph images. This application of the 3-D graph modelling and animation shows for both the journal "NATURE" and the journal "Psychology of Women Quarterly" that at any time and independently on the manifold visible results of rotation, the empirical values nearly exactly match the theoretical distributions (Called "Social Gestalts") obtained by regression analysis. In addition the emergence of different shapes between the 3-D graphs of "NATURE" and "Psychology of Women Quarterly" is explained.
Organizational socialization is gaining momentum in business research, and statistical data shows us the importance of this topic for practitioners as well. In this study, the vast organizational socialization literature published over the past three decades is analysed using bibliometric methods in order to explore the scope of the field, detect current research priorities, and identify the most prominent papers and authors. We identify thematically related clusters of research and show how the organizational socialization field has evolved through interconnected, yet distinct, subfields. Specifically, three distinct aspects have been emphasized at different time periods: (1) the organizational socialization tactics view in the 1980s; (2) newcomer proactivity, information seeking, and the uncertainty reduction process in the 1990s; and (3) a person-by-situation approach in the last decade, which is a mix of both. The implications for future research into organizational socialization research are presented and discussed.
An extensive body of research indicated that the USA and China were the first two largest producers in the nanoscience and nanotechnology field while China performed better than USA in terms of quantity; it had produced inferior quality publications. Yet, no studies investigated whether the specific institutions are consistent with these conclusions or not. In this study, we identify two institutions National Center for Nanoscience and Technology (NCNST) from China and University of California Los Angeles-California Nanosystems Institute (CNSI) from the USA) and compare their scientific research. Further, we develop and exploit a novel and updated dataset on paper co-authorship to assess their scientific research. Our analysis reveals NCNST has many advantages in regards to author and paper quantities, growth rate and the strength of collaborations but loses dominance with respect to research quality. We do find that the collaboration networks of both NCNST and CNSI have small-world and scale-free properties. Besides, the analysis of knowledge networks shows that they have similar research interests or hotspots. Using statistical models, we test and discover that degree centrality has a significant inverted-U shape effect on scientific output and influence. However, we fail to find any significant effect of structural holes.
The growing influence of the idea of world-class universities and the associated phenomenon of international academic rankings are intriguing issues for contemporary comparative analyses of higher education. Although the Academic Ranking of World Universities (ARWU or the Shanghai ranking) was originally devised to assess the gap between Chinese universities and world-class universities, it has since been credited with roles in stimulating higher education change on many scales, from increasing the labor value of individual high-performing scholars to wholesale renovation of national university systems including mergers. This paper exhibits the response of the ARWU indicators and rankings to institutional mergers in general, and specifically analyses the universities of France that are engaged in a major amalgamation process motivated in part by a desire for higher international rankings.
Zipf's law has intrigued people for a long time. This distribution models a certain type of statistical regularity observed in a text. George K. Zipf showed that, if a word is characterised by its frequency, then, rank and frequency are not independent and approximately verify the relationship: Rank x frequency approximate to constant Various explanations have been advanced to explain this law. In this article, we talk about the Mandelbrot process, which includes two very different approaches. In the first approach, Mandelbrot studies language generation as the transmission of a signal and bases it on information theory, using the entropy concept. In the second, geometric approach, he draws a parallel with the fractal theory, where each word of the text is a sequence of characters framed by two separators, meaning a simple geometric pattern. This leads us to hypothesise that, since the statistical regularities observed have several possible explanations, Zipf's law carries other patterns. To verify this hypothesis, we chose a text, which we modified and degraded in several successive stages. We called T-i the text degraded at step i. We then segmented T (i) into words. We found that rank and frequency were not independent and approximately verified the relationship: Rank beta(i) x frequency approximate to constant beta(i) > 1 The coefficient beta (i) increases with each step i. We call Eq. (1) the generalized Zipf law. We found statistical regularities in the deconstruction of the text. We notably observed a linear relationship between the entropy H-i and the amount of effort E (i) of the various degraded texts T (i) . To verify our assumptions, we degraded a text of approximately 200 pages. At each step, we calculated various parameters such as entropy, the amount of effort, and the coefficient. We observed an inter-textual relationship between entropy and the amount of effort. This paper therefore provides a proof of this relationship. coefficient. We observed an inter-textual relationship between entropy and the amount of effort. This paper therefore provides a proof of this relationship.
Monographs and edited books are important in scholarly communication, especially in the Social Sciences and Humanities. An edited book is a collection of chapters written by different authors, gathered and harmonized by one or more editors. This article analyses the characteristics and collaboration patterns of edited books in the Social Sciences and Humanities as practiced in Flanders, the Northern Dutch speaking part of Belgium. It is based upon a comprehensive set of 753 peer reviewed edited books, of which at least one of the editors has a Flemish university affiliation, and the 12,913 chapters published therein. The article analyses various characteristics of edited books, i.e. the distribution over publishers, the places of publication, language use, the presence of introductions and conclusions, the occurrence of co-editorship and co-authorship, and the number of unique authors and book chapters per volume. Almost half of the edited books are published with about 5 % of the publishers. English is the dominant publication language for all places of publication. Writing a conclusion seems rather uncommon. All in all, about 90 % of all volumes are co-edited. Edited books in the Social Sciences have a more diverse authorship then edited books in the Humanities. In general, the more co-authorship for articles occurs within a discipline, the more co-authorship occurs for book chapters, whereas the number of editors is independent from this trend.
Recommendation systems have drawn an increasingly broad range of interest since early 1990s. Recently, a search with the query of "recommendation systems" on Google Scholar found over 32,000 documents. As the volume of the literature grows rapidly, thus, a systematic review of the diverse research field and its current challenges becomes essential. This study surveys the literature of recommendation systems between 1992 and 2014. The overall structure of its intellectual landscape is illustrated in terms of thematic concentrations of co-cited references and emerging trends of bursting keywords and citations to references. Our review is based on two sets of bibliographic records retrieved from the Web of Science. The core dataset, obtained through a topic search, contains 2573 original research and review articles. The expanded dataset, consisting of 12,916 articles and reviews, was collected by citation expansion. We identified intellectual landscapes, landmark articles and bursting keywords of the domain in core and broader perspectives. We found that a number of landmark studies in 1980s and 1990s and techniques such as LDA, pLSI, and matrix factorization have tremendously influenced the development of the recommendation systems research. Furthermore, our study reveals that the field of recommendation systems is still evolving and developing. Thematic trends in recommendation systems research reflect the development of a wide variety of information systems such as the World Wide Web and social media. Finally, collaborative filtering has been a dominant research concept of the field. Recent emerging topics focus on enhancing the effectiveness of recommendation systems by addressing diverse challenges.
The South Korea's innovation system has been transformed in tandem with rapid economic growth over the last three decades. In order to explore the evolution process of the innovation system in Korea, this study examines the trends and patterns in collaboration activities among the triple helix actors, such as university, industry, and government (UIG), using co-patent data. The triple helix framework is employed to analyze innovation dynamics within the networks of the bi- and trilateral relations embedded in patent collaborations. The analyses focus on how the triple helix dynamics have been shaped and transformed in the course of development of the innovation system. The results reveal that collaboration activities among UIG largely increased across three developmental phases from 1980 to 2012. In the early periods, strategic R&D alliances between industry and government sector were set up to strengthen enterprises' innovation capabilities. When the Korean large conglomerates, Chaebols, became a dominant driver of domestic innovation activities, the primary agents of the collaborations shifted from industry-government to industry-university. The network analysis shows that university-industry collaboration is the strongest within the triple helix in recent years, followed by industry-government relations and then UIG relations. The tripartite collaboration has emerged with the rise of entrepreneur universities, but its network has rather been weak and inactive. While Korea has experienced a transition from statist model to a triple helix, the full-fledged triple helix model has not been established yet.
The use of quantitative performance measures to evaluate the productivity, impact and quality of research has spread to almost all parts of public R&D systems, including Big Science where traditional measures of technical reliability of instruments and user oversubscription have been joined by publication counts to assess scientific productivity. But such performance assessment has been shown to lead to absurdities, as the calculated average cost of single journal publications easily may reach hundreds of millions of dollars. In this article, the issue of productivity and impact is therefore further qualified by the use of additional measures such as the immediacy index as well as network analysis to evaluate qualitative aspects of the impact of contemporary Big Science labs. Connecting to previous work within what has been called "facilitymetrics", the article continues the search for relevant bibliometric measures of the performance of Big Science labs with the use of a case study of a recently opened facility that is advertised as contributing to "breakthrough" research, by using several more measures and thus qualifying the topic of performance evaluation in contemporary Big Science beyond simple counts of publications, citations, and costs.
Topic-based ranking of authors, papers and journals can serve as a vital tool for identifying authorities of a given topic within a particular domain. Existing methods that measure topic-based scholarly output are limited to homogeneous networks. This study proposes a new informative metric called Topic-based Heterogeneous Rank (TH Rank) which measures the impact of a scholarly entity with respect to a given topic in a heterogeneous scholarly network containing authors, papers and journals. TH Rank calculates topic-dependent ranks for authors by considering the combined impact of the multiple factors which contribute to an author's level of prestige. Information retrieval serves as the test field and articles about information retrieval published between 1956 and 2014 were extracted from web of science. Initial results show that TH Rank can effectively identify the most prestigious authors, papers and journals related to a specific topic.
The objective of this research is to examine the dynamic impact and diffusion patterns at the subfield level. Using a 15-year citation data set, this research reveals the characteristics of the subfields of computer science from the aspects of citation characteristics, citation link characteristics, network characteristics, and their dynamics. Through a set of indicators including incoming citations, number of citing areas, cited/citing ratios, self-citations ratios, PageRank, and betweenness centrality, the study finds that subfields such as Computer Science Applications, Software, Artificial Intelligence, and Information Systems possessed higher scientific trading impact. Moreover, it also finds that Human-Computer Interaction, Computational Theory and Mathematics, and Computer Science Applications are among the subfields of computer science that gained the fastest growth in impact. Additionally, Engineering, Mathematics, and Decision Sciences form important knowledge channels with subfields in computer science.
This paper provides a formal study on manuscript quality control in peer review. Within this analysis, a biased editor is defined operationally as an editor that exerts a higher (lower) level of quality control. Here we show that if the editor is more biased than the manuscript's author then the author undertakes the type of revision that the editor prefers instead of following his or her own opinion. Moreover, authors with a strong belief about the required level of quality control will be very motivated under editors who agree with them. By contrast, when authors do not undertake the revision type that the editor prefers, they will be very demotivated under editors that exert a different level of quality control and more so as the associate editor is more biased. The effects of editors' bias on authors' satisfaction and motivation cause sorting in the authors who submit manuscripts to scholarly journals, and therefore, match authors and journals with similar quality standards. It will decrease the demotivating effect that editors' bias had on some authors, so that bias becomes more effective at the peer review stage. Moreover, some journals will be forced to lower the quality standards in order to be able to compete with journals of more biased editors. This paper also shows that, under fairly weak conditions, it is optimal for the Editor-in-Chief to assign manuscripts to an editor that exerts a quality control higher than the journal's standard, against the competing journal whose editor holds the journal's standard.
The features of science and technology (S&T) systems change over time. Simultaneously, at an individual level, the characteristics of actors in these systems also change concomitantly. In this study, the characteristics of doctorates in a changing S&T system are analyzed. This is performed by a series of cluster analyses on doctorates-with the goal of identifying shifting profiles-in strategic periods spanning three decades, which represents milestones in an evolving S&T system. A series of archetypal profiles of doctorates are identified, including changes to the relative weights of each of them, along with a pattern of alternating convergence and divergence over time on the characteristics of these doctorates.
Robotics technology holds a significant promise for improving industrial automation and production lines, operating complex surgical procedures, performing space and security missions, and providing services to assist, educate and entertain humans. The emphasis of this paper is primarily on the scientific developments of robotics systems of innovation in a global perspective, identifying actors and institutions involved in developing and diffusing this innovative technology. This quantitative research is grounded on tech mining research method that is the combination of content analysis, bibliometrics and text mining. The analysis measures the scientific performance of individual countries based on robotics-related scientific publications from INSPEC database over the period 1995-2009. It discusses the role of academia, governmental institutions and firms in robotics scientific activities and further identifies the most prolific institutions involved in robotics research. The cross-country analysis sheds light on the evolution of robotics publication activities in time and reveals the relative technological specialization of individual countries in specific domains of robotics technology by the use of revealed technological advantage indices. The findings are particularly useful for science and technology policy makers and R&D strategists, presenting strengths and weaknesses of robotics innovation systems and existing and future scientific developments of robotics technology.
A perennial challenge for basic research funding agencies is assessing the technological importance of their investments in the private sector. In large measure, this stems from difficulties relating how private sector companies and technologies benefit from the major outputs of science research, such as papers, patents and conference proceedings. Here, we propose a data-mining procedure to assess the technological importance of patents, supported by a basic research funding, beyond academic and public sector entities. We applied this approach to patents partially funded by the Air Force Office of Scientific Research (AFOSR). Our procedure begins by identifying a large sample of AFOSR-funded patents and classifying their most recent patent assignees as listed on the US patent assignment database, where one can find records of patent rights being transferred between individuals or institutions. Next, the patents citing this sample of AFOSR-funded patents is mined and, again, we classify their associated assignees to estimate the downstream technological importance of basic research investments. Interestingly, while patents directly funded by AFOSR are modestly assigned to organizations in the private sector (similar to 20 %), patents citing these AFOSR-funded patents are overwhelmingly assigned to the private sector (similar to 86%). Following data collection, we consider whether patterns emerging from assignee data of both AFOSR-funded patents and the patents citing AFOSR-funded patents provide insights into real-world examples of the impact of government sponsored invention. As a case study, we investigated the most frequent assignee for patents citing our sample of AFOSR-funded patents: Digimarc Corporation. Examining the relationship between AFOSR-funded invention and Digimarc revealed several highly cited patents were granted based on government-funded academic research in mathematics and signal processing. These patents became the kernel of a tech start-up by the inventors, Cognicity, which was later acquired by Digimarc. These patents continue to contribute to the patent portfolio of this large technology service provider. We find that one can observe both increased downstream effects of publicly funded research on the private sector as well as
Previous researchers of citation analysis often analyze patent data of a single authority because of the availability of the data and the simplicity of analysis. Patent analysis, on the other hand, is used not only for filing and litigation, but also for technology trend analysis. However, global technology trends cannot be understood only with the analysis of patent data issued by a single authority. In this paper, we propose the use of patents from multiple authorities and discuss the effect of bundling patent family information. We investigate the effect of patent families with cases from automobile drivetrain technology. Based on the results, we conclude that the use of multiple authorities' patent data bundled with the patent family information can significantly improve the coverage and practicability of patent citation analysis.
Scientific network analysis takes at input large amounts of bibliographical data that are often incomplete. This leads to the introduction of different measurement errors in the scientific networks, which, in turn, influence the results of scientific networks analyses. Different authors have been studying the effects of measurement error on the results of network analysis, but these studies mostly rely on data gathered by survey questionnaires or on the study of incomplete data that are shown as random processes and emerge in unweighted undirected networks. This article aims at overcoming the limitations of these studies in three directions. First, we introduce measurement errors to network data following three most frequently present and well-known problems often present in bibliographic data: multiple authorship, homographs, and synonyms. Second, we apply missing data mechanisms to the identified incomplete data sources in order to link the latter with the probability of their occurrence. Third, we apply the incomplete data sources to different types of scientific networks and study the effects of measurement error in both, the weighted directed (i.e., citation) network and the weighted undirected (i.e., co-authorship) network. The results show that the most destructive incomplete data source is the problem of synonyms; it influences the accuracy and the robustness of the network structural measures the most. On the other hand, the multiple-authorship problem does not influence the results of network analysis at all.
Despite much scholarly fascination with the question of whether great minds appear in cycles, together with some empirical evidence that historical cycles exist, prior studies mostly disregard the "great minds'' hypothesis as it relates to scientists. Rather, researchers assume a linear relation based on the argument that science is allied with the development of technology. To probe this issue further, this study uses a ranking of over 5600 scientists based on number of appearances in Google Books over a period of 200 years (1800-2000). The results point to several peak periods, particularly for scientists born in the 1850-1859, 1897-1906, or 1900-1909 periods, suggesting overall cycles of around 8 years and a positive trend in distinction that lasts around 100 years. Nevertheless, a non-parametric test to determine whether randomness can be rejected indicates that non-randomness is less apparent, although once we analyse the greatest minds overall, rejection is more likely.
This paper investigates the scope and patterns of university-industry collaborations (UICs) in Chinese research-oriented universities. Based on the co-authored publications in international journals by Chinese universities' academics and researchers from industries with the method of bibliometric and latent cluster analysis, this study provides detailed results on the characters and clustering features from two aspects, namely diversified resources and academic influence. The results show that, although the co-authored publications with industrial researchers only account for a small part of all publications of Chinese universities, the importance of cooperation with industries in the academic research and the scientific contribution have been strengthened in China. Meanwhile, the academic influence of co-authored publications is periodically improving, but still in the development stage of quantity rather than quality. The analyses demonstrate that there are significant differences in the matching relationship of diversified resources and academic influence between universities. Only few UICs are in both the high level of diversified resources and strong academic influence. Most of UICs should attempt to maintain diversified resources advantages whilst also try to enhance academic influence of cooperation outcomes.
In the scientific publication world, there are an increasing number of open access (OA) journals. Many OA journals are financed by the article processing charges (APCs) that they charge authors. There is considerable interest in the funding source of such APCs. In 255 health and life sciences OA journals that charge APCs (APC OA journals) and 183 health and life sciences OA journals that do not charge APCs (free OA journals) that are indexed in the Thomson Reuters Web of Science, this study uses a bibliometric method to examine the relationship between two journal characteristics during 2009-2013: APCs and the percentage of published articles based on work that is supported by grants (grant-funded articles). According to the data collected, the percentage of grant-funded articles increases as the associated APCs increase. Average APCs of APC OA journals are higher in Europe and North America than elsewhere. The study also investigated the top ten countries in the number of scientific publications in the OA journals investigated. All ten countries had lower percentages of grant-funded articles in free OA journals than in APC OA and subscription journals. Of the ten countries, six in Europe and North America have higher percentages of grant-funded articles in APC OA journals than in subscription journals. The other four countries that have lower percentages of grant-funded articles in APC OA journals than in subscription journals are in Asia and South America, which are places where APC OA journals have low average APCs.
This paper aims to perform a detailed scientometric and text-based analysis of Computer Science (CS) research output of the 100 most productive institutions in India and in the world. The analytical characterization is based on research output data indexed in Scopus during the last 25 years period (1989-2013). Our computational analysis involves a two-dimensional approach involving the standard scientometric methodology and text-based analysis. The scientometric characterization aims to assess CS domain research output in leading Indian institutions vis-A -vis the leading world institutions and to bring out the similarities and differences among them. It involves analysis along traditional scientometric indicators such as total output, citation-based impact assessment, co-authorship patterns, international collaboration levels etc. The text-based characterization aims to identify the key research themes and their temporal trends for the two sets. The key contribution of the experimental work is that it's an analytical characterization of its kind, which identifies characteristic similarities and differences in CS research landscape of Indian institutions vis-A -vis world institutions.
Our study aims to analyse whether former feelings of happiness and/or physical appearance are significantly correlated with the subsequent observable research performance of scholars. To the best of our knowledge, both has not been analysed previously. To do so, we photographed 49 persons attending the 72nd annual conference of the German Academic Association for Business Research (VHB), which took place in Bremen in 2010. We interviewed them about their feelings of happiness. Later we asked students to evaluate the photographed persons' attractiveness, competence, trustworthiness, likeability and their feelings of happiness. To determine the academics' research performance we compiled a list of their recent journal publications, considering different journal weights and dividing them by the number of authors. Regression analyses reveal significant relationships between feelings of happiness in 2010 and research performance in 2011/2012. Conversely, we cannot observe significant relationships between previous research performance and subsequently reported feelings of happiness. Even though at first glance one would not expect that physical appearance is relevant for research output we find significant relationships. While previous studies show that scholars' evaluations of teaching are influenced by attractiveness, our results suggest that research performance is not influenced by attractiveness but especially by (perceived) trustworthiness. Our data also reveal a weakly significant correlation between scholars' perceived feelings of happiness and their reported feelings of happiness.
The global positioning system (GPS) represents one of the most compelling success stories of technology transfer from defense laboratories and academia to the private sector. In this short report, we applied a quantitative analysis to identify landmark research contributions to GPS. This technique, reference publication year spectroscopy (RPYS), yielded key insights into early works that allowed for both the development and widespread use of GPS. In addition, using this approach to identify individual contributions of scientific excellence offers an opportunity to credit not only the research investigators, but also their corresponding affiliations and funding sources. Indeed, the findings from our analysis suggest that RPYS might serve as a powerful tool to substantiate the contribution of funding agencies, universities and institutes to research fields. We stress, however, that this method should not stand-alone for such purposes, but should be wedded with the knowledge and experience of subject matter experts.
Aiming to investigate the citation advantage of author-pays model, the present communication compares open access (OA) and Toll Access (TA) papers recognition in author-pays OA journals in 2007-2011. This is the first large scale study concentrating on all APC-funded OA journals published by Springer and Elsevier as the two greatest publishers authorizing and embracing the model. According to the research findings, the OA papers have been exponentially increased in recent years. They are, also, found to outperform the TA ones in their impacts whether in the annual comparisons or across disciplines. The annual OA citation advantages range from 21.36 % for 2009 to 49.71 % for 2008. Social Sciences and Humanities (with 3.14 %) and Natural Sciences (with 35.95 %) gain the lowest and the highest advantages, respectively. The citation advantage can be attributed to the higher visibility of the OA articles, implying the popularity and usefulness of the OA author-pays model to their readership. It may, also, have roots in the selectivity of the authors in choosing the author-pays outlet to publish their high-quality papers, signifying the overall prestige of the OA papers published in the model. Whatever may be the ultimate interpretation, i.e. correlation or causation, the OA citation advantage may encourage the authors who are willing to support OA movement, while seeking to get published in the well-established traditional journals. This may help approach the not-yet-achieved critical mass necessary to evaluate the success of the model.
Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.
This paper considers classical bibliographic databases based on the Boolean retrieval model (such as MEDLINE and PsycInfo). This model is challenged by modern search engines and information retrieval (IR) researchers, who often consider Boolean retrieval a less efficient approach. The paper examines this claim and argues for the continued value of Boolean systems, and suggests two further considerations: (a) the important role of human expertise in searching (expert searchers and "information literate" users) and (b) the role of library and information science and knowledge organization (KO) in the design and use of classical databases. An underlying issue is the kind of retrieval system for which one should aim. Warner's (2010) differentiation between the computer science traditions and an older library-oriented tradition seems important; the former aim to transform queries automatically into (ranked) sets of relevant documents, whereas the latter aims to increase the selection power of users. The Boolean retrieval model is valuable in providing users with the power to make informed searches and have full control over what is found and what is not. These issues may have significant implications for the maintenance of information science and KO as research fields as well as for the information profession as a profession in its own right.
The motivation for this study was to better understand academics' searching and sensemaking processes when solving exploratory tasks for which they lack pre-existing frames. We focus on "influence" tasks because, although they appear to be unfamiliar, they arise in much academic discourse, at least tacitly. We report the processes of academics at different levels of seniority when completing exploratory search tasks that involved identifying influential members of their academic community and "rising stars," and similarly for an unfamiliar academic community. 11 think-aloud sessions followed by semi-structured interviews were conducted to investigate the roles of specific and general domain expertise in shaping information seeking and knowledge construction. Academics defined and completed the tasks through an iterative and interactive process of seeking and sensemaking, during which they constructed an understanding of their communities and determined qualities of "being influential". The Data/Frame Theory of Sensemaking was used to provide sensitising theoretical constructs. The study shows that both external and internal knowledge resources are essential to define a starting point or frame, make and support decisions, and experience satisfaction. Ill-defined or non-existent initial frames may cause unsubstantial or arbitrary decisions, and feelings of uncertainty and lack of confidence.
Information seeking in the workplace can vary substantially from one search to the next due to changes in the context of the search. Modeling these dynamic contextual effects is an important challenge facing the research community because it has the potential to lead to more responsive search systems. With this motivation, a study of software engineers was conducted to understand the role that contextual factors play in shaping their information-seeking behavior. Research was conducted in the field in a large technology company and comprised six unstructured interviews, a focus group, and 13 in-depth, semistructured interviews. Qualitative analysis revealed a set of contextual factors and related information behaviors. Results are formalized in the contextual model of source selection, the main contributions of which are the identification of two types of conditioning variables (requirements and constraints) that mediate between the contextual factors and source-selection decisions, and the articulation of dominant source-selection patterns. The study has implications for the design of context-sensitive search systems in this domain and may inform contextual approaches to information seeking in other professional domains.
Patient portals have the potential to provide content that is specifically tailored to a patient's information needs based on diagnoses and other factors. In this work, we conducted a survey of 41 lung cancer patients at an outpatient lung cancer clinic at the medical center of the University of California, Los Angeles, to gain insight into these perceived information needs and opinions on the design of a portal to fulfill them. We found that patients requested access to information related to diagnosis and imaging, with more than half of the patients reporting that they did not anticipate an increase in anxiety due to access to medical record information via a portal. We also found that patient educational background did not lead to a significant difference in desires for explanations of reports and definitions of terms.
This study demonstrates an improved conceptual foundation to support well-structured analysis of image topicality. First we present a conceptual framework for analyzing image topicality, explicating the layers, the perspectives, and the topical relevance relationships involved in modeling the topicality of art images. We adapt a generic relevance typology to image analysis by extending it with definitions and relationships specific to the visual art domain and integrating it with schemes of image-text relationships that are important for image subject indexing. We then apply the adapted typology to analyze the topical relevance relationships between 11 art images and 768 image tags assigned by art historians and librarians. The original contribution of our work is the topical structure analysis of image tags that allows the viewer to more easily grasp the content, context, and meaning of an image and quickly tune into aspects of interest; it could also guide both the indexer and the searcher to specify image tags/descriptors in a more systematic and precise manner and thus improve the match between the two parties. An additional contribution is systematically examining and integrating the variety of image-text relationships from a relevance perspective. The paper concludes with implications for relational indexing and social tagging.
This paper describes a clustering and authorship attribution study over the State of the Union addresses from 1790 to 2014 (224 speeches delivered by 41 presidents). To define the style of each presidency, we have applied a principal component analysis (PCA) based on the part-of-speech (POS) frequencies. From Roosevelt (1934), each president tends to own a distinctive style whereas previous presidents tend usually to share some stylistic aspects with others. Applying an automatic classification based on the frequencies of all content-bearing word-types we show that chronology tends to play a central role in forming clusters, a factor that is more important than political affiliation. Using the 300 most frequent word-types, we generate another clustering representation based on the style of each president. This second view shares similarities with the first one, but usually with more numerous and smaller clusters. Finally, an authorship attribution approach for each speech can reach a success rate of around 95.7% under some constraints. When an incorrect assignment is detected, the proposed author often belongs to the same party and has lived during roughly the same time period as the presumed author. A deeper analysis of some incorrect assignments reveals interesting reasons justifying difficult attributions.
Deciding whether a claim is true or false often requires a deeper understanding of the evidence supporting and contradicting the claim. However, when presented with many evidence documents, users do not necessarily read and trust them uniformly. Psychologists and other researchers have shown that users tend to follow and agree with articles and sources that hold viewpoints similar to their own, a phenomenon known as confirmation bias. This suggests that when learning about a controversial topic, human biases and viewpoints about the topic may affect what is considered "trustworthy" or credible. It is an interesting challenge to build systems that can help users overcome this bias and help them decide the truthfulness of claims. In this article, we study various factors that enable humans to acquire additional information about controversial claims in an unbiased fashion. Specifically, we designed a user study to understand how presenting evidence with contrasting viewpoints and source expertise ratings affect how users learn from the evidence documents. We find that users do not seek contrasting viewpoints by themselves, but explicitly presenting contrasting evidence helps them get a well-rounded understanding of the topic. Furthermore, explicit knowledge of the credibility of the sources and the context in which the source provides the evidence document not only affects what users read but also whether they perceive the document to be credible.
This study explores how the stability of users' preferences influences recommendation results and how this stability relates to the effectiveness of developing recommendation strategies. In this work, we propose an anchor-based hybrid filtering approach (AHF) to naturally measure and capture the user's preference stabilities for movie genres. That is, a pairwise preference of the genre comparison process with the genre-based fuzzy inference filtering was conducted in order to achieve effective interactive recommendations. To conduct this experiment, we recruited 30 users with different levels of preference stability for movie genres. The experimental results show that the proposed AHFapproach can effectively capture the user's preferences and filter out undesired movie genres. In addition, this approach can give a more precise recommendation than one without the anchoring process, especially for the user who has unstable preferences for movie genres. Our proposed approach achieves statistical significance and outperforms the baseline method for recommending users' favorite movies by more than 63% for the stable user group and 77% for the unstable group. The results suggest that the stability of users' preferences is a factor to be considered when developing effective recommendation strategies.
This research describes a user-centered design method for creating nonspeech auditory feedback to enhance information interactions with a visual information system. It involves 2 studies. In the first, a user-centered sound design method is used, based on one originally applied for visually impaired users. Three panels of end users are employed to collaboratively and iteratively design the required nonspeech sounds. The method ensures that the sounds designed are not based on designers' personal or ad hoc choices and instead exploits the creativity of a user group as an application of participatory sound design. Based on the results of this study, recommendations are made for extending the sound design method to novel interfaces and sighted users. A second study involves a formative evaluation of the information system integrated with the designed auditory feedback. This evaluation confirms that the user-centered sound design method leads to the creation of auditory feedback which conveys meaningful information to users.
The causal relation between research and economic growth is of particular importance for political support of science and technology as well as for academic purposes. This article revisits the causal relationship between research articles published and economic growth in Organisation for Economic Co-operation and Development (OECD) countries for the period 1981-2011, using bootstrap panel causality analysis, which accounts for cross-section dependency and heterogeneity across countries. The article, by the use of the specific method and the choice of the country group, makes a contribution to the existing literature. Our empirical results support unidirectional causality running from research output (in terms of total number of articles published) to economic growth for the US, Finland, Hungary, and Mexico; the opposite causality from economic growth to research articles published for Canada, France, Italy, New Zealand, the UK, Austria, Israel, and Poland; and no causality for the rest of the countries. Our findings provide important policy implications for research policies and strategies for OECD countries.
Blogs are readily available sources of opinions and sentiments that in turn could influence the opinions of the blog readers. Previous studies have attempted to infer influence from blog features, but they have ignored the possible influence styles that describe the different ways in which influence is exerted. We propose a novel approach to analyzing bloggers' influence styles and using the influence styles as features to improve the performance of influence diffusion detection among linked bloggers. The proposed influence style (INFUSE) model describes bloggers' influence through their engagement style, persuasion style, and persona. Methods used include similarity analysis to detect the creating-sharing aspect of engagement style, subjectivity analysis to measure persuasion style, and sentiment analysis to identify persona style. We further extend the INFUSE model to detect influence diffusion among linked bloggers based on the bloggers' influence styles. The INFUSE model performed well with an average F1 score of 76% compared with the in-degree and sentiment-value baseline approaches. Previous studies have focused on the existence of influence among linked bloggers in detecting influence diffusion, but our INFUSE model is shown to provide a fine-grained description of the manner in which influence is diffused based on the bloggers' influence styles.
The emergence of immersive documents, which allow the reader to perceive unreality as real, is foreseen. This new type of document will evolve from the combination of contemporary participatory, transmedia storytelling with pervasive computing technologies and multisensory interfaces. It is argued that a research program within library and information science is needed, to investigate new information behaviors associated with such documents, the new digital literacies needed to make effective use of them, and their place in the information communication chain.
Data occupy a key role in our information society. However, although the amount of published data continues to grow and terms such as data deluge and big data today characterize numerous (research) initiatives, much work is still needed in the direction of publishing data in order to make them effectively discoverable, available, and reusable by others. Several barriers hinder data publishing, from lack of attribution and rewards, vague citation practices, and quality issues to a rather general lack of a data-sharing culture. Lately, data journals have overcome some of these barriers. In this study of more than 100 currently existing data journals, we describe the approaches they promote for data set description, availability, citation, quality, and open access. We close by identifying ways to expand and strengthen the data journals approach as a means to promote data set access and exploitation.
Search engine retrieval effectiveness studies are usually small scale, using only limited query samples. Furthermore, queries are selected by the researchers. We address these issues by taking a random representative sample of 1,000 informational and 1,000 navigational queries from a major German search engine and comparing Google's and Bing's results based on this sample. Jurors were found through crowdsourcing, and data were collected using specialized software, the Relevance Assessment Tool (RAT). We found that although Google outperforms Bing in both query types, the difference in the performance for informational queries was rather low. However, for navigational queries, Google found the correct answer in 95.3% of cases, whereas Bing only found the correct answer 76.6% of the time. We conclude that search engine performance on navigational queries is of great importance, because users in this case can clearly identify queries that have returned correct results. So, performance on this query type may contribute to explaining user satisfaction with search engines.
Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.
Crowdsourcing has emerged as a way to harvest social wisdom from thousands of volunteers to perform a series of tasks online. However, little research has been devoted to exploring the impact of various factors such as the content of a resource or crowdsourcing interface design on user tagging behavior. Although images' titles and descriptions are frequently available in image digital libraries, it is not clear whether they should be displayed to crowdworkers engaged in tagging. This paper focuses on offering insight to the curators of digital image libraries who face this dilemma by examining (i) how descriptions influence the user in his/her tagging behavior and (ii) how this relates to the (a) nature of the tags, (b) the emergent folksonomy, and (c) the findability of the images in the tagging system. We compared two different methods for collecting image tags from Amazon's Mechanical Turk's crowdworkerswith and without image descriptions. Several properties of generated tags were examined from different perspectives: diversity, specificity, reusability, quality, similarity, descriptiveness, and so on. In addition, the study was carried out to examine the impact of image description on supporting users' information seeking with a tag cloud interface. The results showed that the properties of tags are affected by the crowdsourcing approach. Tags from the with description condition are more diverse and more specific than tags from the without description condition, while the latter has a higher tag reuse rate. A user study also revealed that different tag sets provided different support for search. Tags produced with description shortened the path to the target results, whereas tags produced without description increased user success in the search task.
Millions of micro texts are published every day on Twitter. Identifying the sentiment present in them can be helpful for measuring the frame of mind of the public, their satisfaction with respect to a product, or their support of a social event. In this context, polarity classification is a subfield of sentiment analysis focused on determining whether the content of a text is objective or subjective, and in the latter case, if it conveys a positive or a negative opinion. Most polarity detection techniques tend to take into account individual terms in the text and even some degree of linguistic knowledge, but they do not usually consider syntactic relations between words. This article explores how relating lexical, syntactic, and psychometric information can be helpful to perform polarity classification on Spanish tweets. We provide an evaluation for both shallow and deep linguistic perspectives. Empirical results show an improved performance of syntactic approaches over pure lexical models when using large training sets to create a classifier, but this tendency is reversed when small training collections are used.
This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Little detailed information is known about who reads research articles and the contexts in which research articles are read. Using data about people who register in Mendeley as readers of articles, this article explores different types of users of Clinical Medicine, Engineering and Technology, Social Science, Physics, and Chemistry articles inside and outside academia. The majority of readers for all disciplines were PhD students, postgraduates, and postdocs but other types of academics were also represented. In addition, many Clinical Medicine articles were read by medical professionals. The highest correlations between citations and Mendeley readership counts were found for types of users who often authored academic articles, except for associate professors in some sub-disciplines. This suggests that Mendeley readership can reflect usage similar to traditional citation impact if the data are restricted to readers who are also authors without the delay of impact measured by citation counts. At the same time, Mendeley statistics can also reveal the hidden impact of some research articles, such as educational value for nonauthor users inside academia or the impact of research articles on practice for readers outside academia.
We analyze access statistics of 150 blog entries and news articles for periods of up to 3 years. Access rate falls as an inverse power of time passed since publication. The power law holds for periods of up to 1,000 days. The exponents are different for different blogs and are distributed between 0.6 and 3.2. We argue that the decay of attention to a web article is caused by the link to it first dropping down the list of links on the website's front page and then disappearing from the front page and its subsequent movement further into background. The other proposed explanations that use a decaying with time novelty factor, or some intricate theory of human dynamics, cannot explain all of the experimental observations.
Archives are invaluable resources for those interested in understanding past activities and events. What makes archival collections valuable as evidence is that they are organized in a way that connects the materials to their creator(s), their associated activities, and recordkeeping systems. In order to create this organization archivists engage in a complex process that involves arranging materials physically and intellectually. This arrangement focuses on documenting the original order of the materials and making them evident to secondary users. The time-consuming nature of this process has resulted in a massive backlog of unprocessed (and thus unavailable) collections, which represents an impediment to the investigative activities of scholars and researchers. To help alleviate such a situation, this article presents a novel approach to archival arrangement, using tabletop computers and digitized images. The article explains the project design and implementation and discusses the evaluation results. This work provides several major contributions to the field, including: a new system that allows archival collections to be arranged digitally, new methods and metrics for evaluating archival arrangements, a detailed analysis of the steps involved in archival arrangement and how they correlate with the final outcomes of the process, and a method for analyzing arrangements based on the topologies created by processing archivists.
The use of social network sites offers many potential social benefits, but also raises privacy concerns and challenges for users. The trade-off users have to make between using sites such as Facebook to connect with their friends versus protecting their personal privacy is not well understood. Furthermore, very little behavioral research has focused on how personal privacy concerns are related to information disclosures made by one's friends. Our survey study of 116 Facebook users shows that engaging with friends through tagging activity and third-party application use is associated with higher levels of personal Facebook usage and a stronger emotional attachment to Facebook. However, users who have high levels of personal privacy concern and perceive a lack of effectiveness in Facebook's privacy policies tend to engage less frequently in tagging and app activities with friends, respectively. Our model and results explore illustrate the complexity of the trade-off between privacy concerns, engaging with friends through tagging and apps, and Facebook usage.
In explicit collaborative search, two or more individuals coordinate their efforts toward a shared goal. Every day, Internet users with similar information needs have the potential to collaborate. However, online search is typically performed in solitude. Existing search systems do not promote explicit collaborations, and collaboration opportunities (collabportunities) are missed. In this article, we describe a method to evaluate the feasibility of transforming these collabportunities into recommendations for explicit collaboration. We developed a technique called pseudocollaboration to evaluate the benefits and costs of collabportunities through simulations. We evaluate the performance of our method using three data sets: (a) data from single users' search sessions, (b) data with collaborative search sessions between pairs of searchers, and (c) logs from a large-scale search engine with search sessions of thousands of searchers. Our results establish when and how collabportunities would significantly help or hinder the search process versus searches conducted individually. The method that we describe has implications for the design and implementation of recommendation systems for explicit collaboration. It also connects system-mediated and user-mediated collaborative search, whereby the system evaluates the likely benefits of collaborating for a search task and helps searchers make more informed decisions on initiating and executing such a collaboration.
Sentiment analysis mainly focuses on the study of one's opinions that express positive or negative sentiments. With the explosive growth of web documents, sentiment analysis is becoming a hot topic in both academic research and system design. Fine-grained sentiment analysis is traditionally solved as a 2-step strategy, which results in cascade errors. Although joint models, such as joint sentiment/topic and maximum entropy (MaxEnt)/latent Dirichlet allocation, are proposed to tackle this problem of sentiment analysis, they focus on the joint learning of both aspects and sentiments. Thus, they are not appropriate to solve the cascade errors for sentiment analysis at the sentence or subsentence level. In this article, we present a novel jointly fine-grained sentiment analysis framework at the subsentence level with Markov logic. First, we divide the task into 2 separate stages (subjectivity classification and polarity classification). Then, the 2 separate stages are processed, respectively, with different feature sets, which are implemented by local formulas in Markov logic. Finally, global formulas in Markov logic are adopted to realize the interactions of the 2 separate stages. The joint inference of subjectivity and polarity helps prevent cascade errors. Experiments on a Chinese sentiment data set manifest that our joint model brings significant improvements.
There is increasing evidence that citations to Chinese research publications are rising sharply. A series of reasons have been highlighted in previous studies. This research explores another possibilitywhether there is a clubbing effect in China's surge in research citations, in which a higher rate of internal citing takes place among influential Chinese researchers. Focusing on the most highly cited research articles in nanotechnology, we find that a larger proportion of Chinese nanotechnology research citations are localized within individual, institutional, and national networks within China. Both descriptive and statistical tests suggest that highly cited Chinese papers are more likely than similar U.S. papers to receive internal and localized citations. Tentative explanations and policy implications are discussed.
We identified 7 theoretical models that have been used to explain technology adoption and use. We then examined the boundary conditions of these models of technology adoption when applied to the household context using longitudinal empirical data from households regarding their purchase and use decisions related to household technologies. We conducted 2 studies and collected 1,247 responses from U.S. households for the first study and 2,064 responses from U.S. households for the second study. Those households that had adopted household technologies were surveyed regarding their use behavior. Potential adopters (i.e., those who had currently not adopted) were surveyed regarding their purchase intentions. This allowed us to identify the most influential factors affecting a household's decision to adopt and use technologies. The results show that the model of adoption of technology in the household provided the richest explanation and explained best why households purchase and use technologies.
Even with very well-documented rankings of universities, it is difficult for an individual university to reconstruct its position in the ranking. What is the reason behind whether a university places higher or lower in the ranking? Taking the example of ETH Zurich, the aim of this communication is to reconstruct how the high position of ETHZ (in Europe rank no. 1 in PP[top 10%]) in the Centre for Science and Technology Studies (CWTS) Leiden Ranking 2013 in the field social sciences, arts and humanities came about. According to our analyses, the bibliometric indicator values of a university depend very strongly on weights that result in differing estimates of both the total number of a university's publications and the number of publications with a citation impact in the 90th percentile, or PP(top 10%). In addition, we examine the effect of weights at the level of individual publications. Based on the results, we offer recommendations for improving the Leiden Ranking (for example, publication of sample calculations to increase transparency).
Recent high-profile statements, criticisms, and boycotts organized against certain quantitative indicators (e.g., the DORA declaration) have brought misuses of performance metrics to the center of attention. A key concern captured in these movements is that the metrics appear to carry authority even where established agents of quality control have explicitly outlined limits to their validity and reliability as measurement tools. This raises a number of challenging questions for those readers of this journal who are implicated in questions of indicator production and, by extension, effects. In this opinion piece we wish to critically engage the question of how producers of indicators can come to terms with their role as (partly) responsible parties in the current age of evaluative bibliometrics. We do so through the illuminating case of the professional scientometrics community.
A specific quality of the discussion about innovation indices (scoreboards) is that more often than not the subject is dealt with from a purely technical point of view. Such a narrow approach silently assumes that indices used as a policy tool are an accurate reflection of the phenomenon and should not be questioned, and also that the whole discussion concerning them should refer to methodological aspects and is best left to the statisticians. This author is of the opinion that for an accurate evaluation of the value of indices as a policy tool, it is necessary to consider the matter from the broader point of view and from the context in which such indices are generated and used. This article puts forward the thesis that progress in science and innovation policy studies depends on a diversity of issues, approaches and perspectives. If that is the case, maintaining thematic and methodological variety may be more important than creating coherent and closed analytical tools, i.e. indices. The advantage of indices is that they focus attention on those variables which are deemed to be key. Among their disadvantages, however, are their highly abstract nature (in order to understand innovation-related phenomena, it is necessary to study them in tangible, composite forms); their tendency to skip unmeasurable determinants; their prior acceptance of definitions and concepts of innovation (instead of searching for them); the way they apply a single yardstick to diverse countries and regions, assumed linearity and causality in a complex and non-linear world, the way they direct policy towards implementing indicators (rather than identifying and solving problems). It is suggested that big data revolution will allow the emergence of a new measurement tools that will replace innovation indices.
We analyse a comprehensive panel dataset of economists working at Austrian, German, and Swiss universities and investigate how job mobility and characteristics of other researchers working at the same university affect research productivity. On aggregate, we find no influence of these local research characteristics on the productivity of researchers, if we control for their unobserved characteristics. This finding indicates that with today's information, communication and travelling technologies knowledge spillovers are globally available rather than dependent on physical co-presence. However, we find some evidence that high-productivity researchers could be more likely to benefit from local research characteristics.
We present a new tool, Kampal (http://kampal.unizar.es), developed to help to analyze the academic productivity of a research institution from the point of view of Complex Networks. We will focus on two main aspects: paper production and funding by research grants. Thus, we define a network of researchers and define suitable ways of describing their interaction, either by co-publication, project-collaboration, or a combination of both. From the corresponding complex networks, we extract maps which encode in graphical terms the relevant information and numerical parameters which encode the topological properties of the network. Thousands of these maps have been created and allow us to study the similarities and differences of the co-publications and the project-collaboration networks.
Research funding plays a key role in current science, thus it has become an aggregative interesting level in scientometric analysis. In this work, we try to explore the funding ratios of 21 major countries/territories in social science based on 813,809 research articles collected from the Web of Science and indexed by the Social Sciences Citation Index covering the period from 2009 to 2013. The results show that the funding ratios of sample countries/territories in social science are far below that in natural science and some specific subjects (chemistry, engineering, physics, neurosciences). However, there is a positive correlation between them. The funding ratios of People's Republic of China, Sweden and Japan rank the top 3 (over 30 %). Generally, the funding ratios of the top 1 % and top 10 % highly cited articles are higher than those of the rest of articles, and for most cases, the high funding ratio of all articles is related to the high funding ratio of the highly cited articles.
Technology Centres (TCs) are non-profit organisations created to contribute to the improvement of the productive sector, providing RTD support, especially for small and medium-sized enterprises (SMEs). Given TCs' main function, most authors present an industrial perspective of their performance. However, the bibliometric techniques can offer not only an overview of these centres, but also additional information about their features: the evolution of their publications, the degree of national and international collaboration, the Spanish institutional sectors and the main disciplines involved, the regional differences and their connections. In this article, Spanish TCs' documents downloaded from the Web of Science (2008-2012) are analysed, along with other indicators that can characterise these centres. The results show that national collaboration is important for TCs and even more when those links are local. This is in line with that stated by other authors, considering that geographical proximity is essential for knowledge transfer. Regarding the Spanish institutional sectors, the strongest relations are established with universities. For their part, firms have low participation in publications, although they show an upward trend over the years. Nevertheless, TCs' documents are mainly issued on industrial related topics, in agreement with their primary mission as promoters of firms' innovation. Finally, as expected, differences between regions' performance are seen, explained in part by disparities between regional systems. Notwithstanding, top producers establish connections with regions without TCs, mainly collaborating in documents related to engineering, medicine and environmental topics.
We describe the structural dynamics of two groups of scientists in relation to the independent simultaneous discovery (i.e., definition and application) of linear canonical transforms. This mathematical construct was built as the transfer kernel of paraxial optical systems by Prof. Stuart A. Collins, working in the ElectroScience Laboratory in Ohio State University. At roughly the same time, it was established as the integral kernel that represents the preservation of uncertainty in quantum mechanics by Prof. Marcos Moshinsky and his postdoctoral associate, Dr. Christiane Quesne, at the Instituto de Fisica of the Universidad Nacional Autnoma de M,xico. We are interested in the birth and parallel development of the two follower groups that have formed around the two seminal articles, which for more than two decades did not know and acknowledge each other. Each group had different motivations, purposes and applications, and worked in distinct professional environments. As we will show, Moshinsky-Quesne had been highly cited by his associates and students in Mexico and Europe when the importance of his work started to permeate various other mostly theoretical fields; Collins' paper took more time to be referenced, but later originated a vast following notably among Chinese applied optical scientists. Through social network analysis we visualize the structure and development of these two major coauthoring groups, whose community dynamics shows two distinct patterns of communication that illustrate the disparity in the diffusion of theoretical and technological research.
Detecting intellectual structure of a knowledge domain is valuable to track the dynamics of scientific research. Formal concept analysis (FCA) provides a new perspective for knowledge discovery and data mining. In this paper we introduce a FCA-based approach to detect intellectual structure of library and information science (LIS). Our approach relies on the mathematical theory which formulates the understanding of "concept" as a unit of extension (scholars) and intension (keywords) as a way of modelling the intellectual structure of a domain. By analyzing the papers published in sixteen prominent journals of LIS domain from 2001 to 2013, the intellectual structure of LIS in the new century has been identified and visualized. Nine major research themes of LIS were detected together with the core keywords and authors to describe each theme. The significant advantage of our approach is that the mathematical formulae produce a conceptual structure which automatically provides generalization and specialization relationships among the concepts. This provides additional information not available from other methods, especially when shared interests of authors from different granularities are also visualized in concept lattice.
International academic awards are popular as incentives and rewards for academics all over the world, and have played a significant role in the performance evaluations of individuals and institutions. However, little is known about the relative importance of awards and the relationships between awards. This study aims to establish a comprehensive global map of important international academic awards, which visually presents the relative reputations of awards and the close or distant relationships between awards. By surveying the reputations of the preselected 207 awards, 90 important international academic awards with above-average reputations were identified. Then, based on the number of "awardees in common" or named "co-awardees" between every pair of these 90 awards, a network of co-awardees was built. Finally, using mapping software of VOSviewer, these 90 important international academic awards were mapped by taking the reputation scores as the weights of awards and the network of co-awardees as the basis of the relationships between awards.
This paper uses a Bayesian hierarchical latent trait model, and data from eight different university ranking systems, to measure university quality. There are five contributions. First, I find that ratings tap a unidimensional, underlying trait of university quality. Second, by combining information from different systems, I obtain more accurate ratings than are currently available from any single source. And rather than dropping institutions that receive only a few ratings, the model simply uses whatever information is available. Third, while most ratings focus on point estimates and their attendant ranks, I focus on the uncertainty in quality estimates, showing that the difference between universities ranked 50th and 100th, and 100th and 250th, is insignificant. Finally, by measuring the accuracy of each ranking system, as well as the degree of bias toward universities in particular countries, I am able to rank the rankings.
It is shown that winners of the A. M. Turing Award or the John von Neumann Medal, both of which recognize achievement in computer science, are separated from some other A. M. Turing Award or John von Neumann Medal winner by at most 1.4 co-authorship steps on average, and from some cross-disciplinary broker, and hence from some discipline other than computer science, by at most 1.6 co-authorship steps on average. A. M. Turing Award and John von Neumann Medal recipients during this period are, therefore, on average closer in co-authorship terms to some other discipline that typical computer scientists are, on average, to each other.
Scientometric data on the citation success of different publication types and publication genres in psychology publications are presented. Data refer to references that are cited in these scientific publications and that are documented in PSYNDEX, the exhaustive database of psychology publications from the German-speaking countries either published in German or in English language. Firstly, data analyses refer to the references that are cited in publications of 2009 versus 2010 versus 2011. With reference to all cited references, the portion of journal articles ranges from 57 to 61 %, of books from 22 to 24 %, and of book chapters from 14 to 15 %, with a rather high stability across the three publication years analysed. Secondly, data analyses refer to the numbers of cited references from the German-speaking countries, which are also documented in PSYNDEX. These compose about 11 % of all cited references indicating that nearly 90 % of the references cited are of international and/or interdisciplinary publications not stemming from the German-speaking countries. The subsample shows the proportion of journal articles, books, and chapters, and these are very similar to the percentages identified for all references that are cited. Thirdly, analyses refer to document type, scientific genre, and psychological sub-discipline of the most frequently cited references in the psychology publications. The frequency of top-cited references of books and book chapters is almost equal to that of journal articles; two-thirds of the top-cited references are non-empirical publications, only one-third are empirical publications. Top-cited references stem particularly from clinical psychology, experimental psychology, as well as tests, testing and psychometrics. In summary, the results point to the fact that citation analyses, which are limited to journal papers, tend to neglect very high portions of references that are cited in scientific publications.
Despite the enthusiasm for technology convergence seen over the last decade in society and the broad consensus on its considerable impact, there is neither any substantive evidence that technology convergence occurs overall nor any objective explanation of the domains where it may be found. By using patents filed to the KIPO from 1996 to 2010 and demonstrating trends based on co-classification analysis at the entire technology domain level, we elucidate the extent of technology convergence in a technological innovation system and its change in status over time. Furthermore, our paper uses network analysis based on patent data to identify the occurrence of technology convergence in terms of its technological domains. Our findings are as follows: (1) the diffusion of technology convergence has been ongoing since the early 2000s; (2) technology convergence is evolving into a more complex and heterogeneous form; (3) convergent technology has a wider scope but requires more effort to develop than does non-convergent technology; and (4) evidence for the strong consistency of converged domains over time exists. These results support the numerous initiatives of governments and firms to promote technology convergence and illustrate the future form of technology convergence.
Author-level bibliometric indicators are becoming a standard tool in research assessment. It is important to investigate what these indicators actually measure to assess their appropriateness in scholar ranking and benchmarking average individual levels of performance. 17 author-level indicators were calculated for 512 researchers in Astronomy, Environmental Science, Philosophy and Public Health. Indicator scores and scholar rankings calculated in Web of Science (WoS) and Google Scholar (GS) were analyzed. The indexing policies of WoS and GS were found to have a direct effect on the amount of available bibliometric data, thus indicator scores and rankings in WoS and GS were different, correlations between 0.24 and 0.99. High correlation could be caused by scholars in bottom rank positions with a low number of publications and citations in both databases. The hg indicator produced scholar rankings with the highest level of agreement between WoS and GS and rankings with the least amount of variance. Expected average performance benchmarks were influenced by how the mean indicator value was calculated. Empirical validation of the aggregate mean h-index values compared to previous studies resulted in a very poor fit of predicted average scores. Rankings based on author-level indicators are influenced by (1) the coverage of papers and citations in the database, (2) how the indicators are calculated and, (3) the assessed discipline and seniority. Indicator rankings display the visibility of the scholar in the database not their impact in the academic community compared to their peers. Extreme caution is advised when choosing indicators and benchmarks in scholar rankings.
In this paper we investigate the BRIC and CIVETS economies from a new perspective, focusing on the analysis of one element linked to the knowledge economy. Concretely, we analyze the research-oriented repositories of those emerging markets as instruments that contribute to capture and diffuse scientific knowledge. For this end, we carry out a competitive analysis that allows us to study the BRIC and CIVETS repositories using their respective participation shares in terms of content supply, web visibility and size. The results reveal the absolute leadership position that the BRIC economy has over the CIVETS economy in terms of disseminating scientific knowledge as well as that this could weaken in a near future. Furthermore, two repositories linked to universities in Brazil and South Africa are the leaders in those emerging economies. Putting aside the leadership position, India and China are over and above the rest of the BRIC countries, meanwhile in the case of the CIVETS countries Indonesia has a relevant position.
The emergence of academic search engines (mainly Google Scholar and Microsoft Academic Search) that aspire to index the entirety of current academic knowledge has revived and increased interest in the size of the academic web. The main objective of this paper is to propose various methods to estimate the current size (number of indexed documents) of Google Scholar (May 2014) and to determine its validity, precision and reliability. To do this, we present, apply and discuss three empirical methods: an external estimate based on empirical studies of Google Scholar coverage, and two internal estimate methods based on direct, empty and absurd queries, respectively. The results, despite providing disparate values, place the estimated size of Google Scholar at around 160-165 million documents. However, all the methods show considerable limitations and uncertainties due to inconsistencies in the Google Scholar search functionalities.
The concept of open innovation has attracted considerable attention since Henry Chesbrough first coined it to capture the increasing reliance of firms on external sources of innovation. Although open innovation has flourished as a topic within innovation management research, it has also triggered debates about the coherence of the research endeavors pursued under this umbrella, including its theoretical foundations. In this paper, we aim to contribute to these debates through a bibliometric review of the first decade of open innovation research. We combine two techniques-bibliographic coupling and co-citation analysis-to (1) visualize the network of publications that explicitly use the label 'open innovation' and (2) to arrive at distinct clusters of thematically related publications. Our findings illustrate that open innovation research builds principally on four related streams of prior research, whilst the bibliographic network of open innovation research reveals that seven thematic clusters have been pursued persistently. While such persistence is undoubtedly useful to arrive at in-depth and robust insights, the observed patterns also signal the absence of new, emerging, themes. As such, 'open innovation' might benefit from applying its own ideas: sourcing concepts and models from a broader range of theoretical perspectives as well as pursuing a broader range of topics might introduce dynamics resulting in more impact and proliferation.
This paper examines the collaboration structures and dynamics of the co-authorship network of all Slovenian researchers. Its goal is to identify the key factors driving collaboration and the main differences in collaboration behavior across scientific fields and disciplines. Two approaches to modelling network dynamics are combined in this paper: the small-world model and the mechanism of preferential attachment, also known as the process of cumulative advantage. Stochastic-actor-based modelling of co-authorship network dynamics uses data for the complete longitudinal co-authorship networks for the entire Slovenian scientific community from 1996 to 2010. We confirmed the presence of clustering in all fields and disciplines. Preferential attachment is far more complex than a single global mechanism. There were two clear distinctions regarding collaboration within scientific fields and disciplines. One was that some fields had an internal national saturation inhibiting further collaboration. The second concerned the differential impact of collaboration with scientists from abroad on domestic collaboration. In the natural, technical, medical, and biotechnical sciences, this promotes collaboration within the Slovenian scientific community while in the social sciences and humanities this inhibits internal collaboration.
We revisit our recent study [Predicting results of the Research Excellence Framework using departmental h-index, Scientometrics, 2014, 102:2165-2180; <ExternalRef> <RefSource>arXiv:1411.1996</RefSource> <RefTarget Address="http://arxiv.org/abs/1411.1996" TargetType="URL"/> </ExternalRef>] in which we attempted to predict outcomes of the UK's Research Excellence Framework (REF 2014) using the so-called departmental h-index. Here we report that our predictions failed to anticipate with any accuracy either overall REF outcomes or movements of individual institutions in the rankings relative to their positions in the previous Research Assessment Exercise (RAE 2008).
Using (binomial) regression analysis, we run models using citation windows of one to ten years with both annual citation and cumulative citations as dependent variables, and with both bibliometric and quality indicators (judgments of peers) as independent variables. The bibliometric variables are the Journal Impact Factor (JIF) of the publication medium, the numbers of authors and pages, and the statistical citedness of the references used within the paper. We find that the JIF has a larger influence on the citation impact of a publication than the quality (measured by judgments of peers). However, the number of pages and the quality of the references are less influential. The influence of JIF peaks after three years and then declines (in most regression analyses), but remains higher than the influence of quality judgments even after ten years. These results call into question a discrepancy between the algorithmically based indicators and the qualitative judgments by experts. The latter seems less predictive for future citation than a combination of algorithmic constructs. The results of this study can contribute to the empirical specification of the relevance of a normative versus a constructivist theory of citation. (C) 2015 Elsevier Ltd. All rights reserved.
In this paper we investigate economies of scale and specialization of European universities. The proposed approach builds on the notion that university production is a multi-input multi-output process different than standard production activity. The analyses are based on an interesting database which integrates the main European universities data on inputs and outputs with bibliometric data on publications, impact and collaborations. We pursue a cross-country perspective; we include subject mix and introduce a robust modeling of production trade-offs. Finally we test the statistical significance of scale and specialization and find that they both have a significant impact on the efficiency of the Humboldt model. Nevertheless, confirming previous findings, specialization has not a significant impact on the efficiency of the research model. (C) 2015 Elsevier Ltd. All rights reserved.
Reference Publication Year Spectroscopy (RPYS) is a scientometric technique that effectively reveals punctuated peaks of historical scientific impact on a specified research field or technology. In many cases, a seminal discovery serves as the driving force underlying any given peak. Importantly, the results from a RPYS analyses are represented on their own distinct scales, the bounds of which vary considerably across analyses. This makes comparing years of punctuated impact across multiple RPYS analyses problematic. In this paper, we propose a data transformation and visualization technique that resolves this challenge. Specifically, using a rank-order normalization procedure, we compress the results of multiple RPYS analyses into a single, consistent rank scale that clearly highlights years of punctuated impact across RPYS analyses. We suggest that rank transformation increases the effectiveness of this scientometric technique to reveal the scope of historical impact of seminal works by allowing researchers to simultaneously consider results from multiple RPYS analyses. (C) 2015 Elsevier Ltd. All rights reserved.
The objective of this study is to evaluate the performance of five entity extraction methods for the task of identifying entities from scientific publications, including two vocabularybased methods (a keyword-based and a Wikipedia-based) and three model-based methods (conditional random fields (CRF), CRF with keyword-based dictionary, and CRF with Wikipedia-based dictionary). These methods are applied to an annotated test set of publications in computer science. Precision, recall, accuracy, area under the ROC curve, and area under the precision-recall curve are employed as the evaluative indicators. Results show that the model-based methods outperform the vocabulary-based ones, among which CRF with keyword-based dictionary has the best performance. Between the two vocabularybased methods, the keyword-based one has a higher recall and the Wikipedia-based one has a higher precision. The findings of this study help inform the understanding of informetric research at a more granular level. (C) 2015 Elsevier Ltd. All rights reserved.
The citation behavior of Nobel Prize winning articles in physics published by selected Chinese Americans, discussed before (Liu, Y. X. & Rousseau, R. (2014). Journal of American Society for Information Science and Technology, 65(2), 281-289), is analyzed using a unified Avrami-Weibull equation based on concepts of formation of citation nuclei instantaneously and progressively similar to the concepts involved in the theories of overall crystallization of solid phase from a closed liquid system of fixed volume. It was found that: (1) initial concave and convex curvatures of plots of cumulative citations L(t) of individual articles against citation time t are associated with the generation of citations by instantaneous and progressive citation nucleation, respectively, (2) the time constant ID and exponent q in the unified relation are indicators that distinguish between the L(t) plots with initial concave and convex curvatures for individual articles, (3) in cases of L(t) plots with initial convex curvature, the data may be described by the unified relation with q >1 (i.e. when nuclei are formed progressively) and/or by power-law relation, and (4) in some cases two citation regions of an L(t) plots follow different nucleation mechanisms or the same mechanism with different values of the parameters of its equation. (C) 2015 Elsevier Ltd. All rights reserved.
The increased interest in an impact measurement of research on other areas of the society than research has led in scientometrics to an investigation of altmetrics. Particular attention is paid here to a targeted broad impact measurement: The aim is to discover the impact which a particular publication set has on specific user groups (outside research) by using altmetrics. This study used the Mendeley application programming interface (API) to download the Mendeley counts (broken down by different user types of publications in Mendeley) for a comprehensive F1000Prime data set. F1000Prime is a post-publication peer review system for papers from the biomedical area. As the F1000 papers are provided with tags from experts in this area (Faculty members) which can characterise a paper more exactly (such as "good for teaching" or "new finding"), the interest of different user groups in specifically tagged papers could be investigated. This study's evaluation of the variously tagged F1000 papers provided interesting insights into the use of research papers by different user groups. The most interesting tag for altmetrics research is "good for teaching". This applies to papers which are well written and provide an overview of a topic. Papers with this tag can be expected to arouse interest among people who are hardly or not at all involved in research. The results of the regression models in this study do in fact show that lecturers, researchers at a non-academic institution, and others (such as librarians) have a special interest in this kind of papers. In the case of a key article in a field, or a particularly well written article that provides a good overview of a topic, then it will tend to be better received by people which are not particularly related to academic research. (C) 2015 Elsevier Ltd. All rights reserved.
This paper analysed the citations of patents to science- and non-science-based references as an agency of the linkage between technology and science. A review of the literature identified a variety of techniques of measuring science linkage (SL) with various results. Therefore, this study aimed to compare the differences between science-based SL and nonscience-based linkage (NSL). Patent data were collected from the United States Patent and Trademark Office database for the past two decades. Results showed a phenomenon of rapidly growing NSL of patents at different levels of technological fields and firms. In addition, field- and firm-specific differences in the linkages between science and technology were identified. This study analysed various types of SL performances of the top 20 firms in the Computers and Communications field and found that science-technology linkages were stronger in Lucent, Mitsubishi and Microsoft. It is worth noting that Texas Instruments (TI) was ranked thirteenth in science-based SL but third in Relative SL Ratio. Based on the Relative SL Ratio, TI's science-based SL was much higher than its NSL. (C) 2015 Elsevier Ltd. All rights reserved.
The purpose of the study is to compare the performance of count regression models to those of linear and lognormal regression models in modelling count response variables in informetric studies. Identified count response variables in informetric studies include the number of authors, the number of references, the number of views, the number of downloads, and the number of citations received by an article. Also of a count nature are the number of links from and to a website. Data were collected from the United States Patent and Trademark Office (www.usptagov), an open access journal (wwwinformationr.netiirJ), Web of Science, and Maclean's magazine. The datasets were then used to compare the performance of linear and lognormal regression models with those of Poisson, negative binomial, and generalized Poisson regression models. It was found that due to overdispersion in most response variables, the negative binomial regression model often seems to be more appropriate for informetric datasets than the Poisson and generalized Poisson regression models. Also, the regression analyses showed that linear regression model predicted some negative values for five of the nine response variables modelled, and for all the response variables, it performed worse than both the negative binomial and lognormal regression models when either Akaike's Information Criterion (AIC) or Bayesian Information Criterion (BIC) was used as the measure of goodness of fit statistics. The negative binomial regression model performed significantly better than the lognormal regression model for four of the response variables while the lognormal regression model performed significantly better than the negative binomial regression model for two of the response variables but there was no significant difference in the performance of the two models for the remaining three response variables. (C) 2015 Elsevier Ltd. All rights reserved.
Assessing the research performance of multi-disciplinary institutions, where scientists belong to many fields, requires that the evaluators plan how to aggregate the performance measures of the various fields. Two methods of aggregation are possible. These are based on: (a) the performance of the individual scientists or (b) the performance of the scientific fields present in the institution. The appropriate choice depends on the evaluation context and the objectives for the particular measure. The two methods bring about differences in both the performance scores and rankings. We quantify these differences through observation of the 2008-2012 scientific production of the entire research staff employed in the hard sciences in Italian universities (over 35,000 professors). Evaluators preparing an exercise must comprehend the differences illustrated, in order to correctly select the methodologies that will achieve the evaluation objectives. (C) 2015 Elsevier Ltd. All rights reserved.
Although various citation-based indicators are commonly used to help research evaluations, there are ongoing controversies about their value. In response, they are often correlated with quality ratings or with other quantitative indicators in order to partly assess their validity. When correlations are calculated for sets of publications from multiple disciplines or years, however, the magnitude of the correlation coefficient may be reduced, masking the strength of the underlying correlation. This article uses simulations to systematically investigate the extent to which mixing years or disciplines reduces correlations. The results show that mixing two sets of articles with different correlation strengths can reduce the correlation for the combined set to substantially below the average of the two. Moreover, even mixing two sets of articles with the same correlation strength but different mean citation counts can substantially reduce the correlation for the combined set. The extent of the reduction in correlation also depends upon whether the articles assessed have been pre-selected for being high quality and whether the relationship between the quality ratings and citation counts is linear or exponential. The results underline the importance of using homogeneous data sets but also help to interpret correlation coefficients when this is impossible. (C) 2015 Elsevier Ltd. All rights reserved.
Dyads of journals related by citations can agglomerate into specialties through the mechanism of triadic closure. Using the Journal Citation Reports 2011, 2012, and 2013, we analyze triad formation as indicators of integration (specialty growth) and disintegration (restructuring). The strongest integration is found among the large journals that report on studies in different scientific specialties, such as PLoS ONE, Nature Communications, Nature, and Science. This tendency toward large-scale integration has not yet stabilized. Using the Islands algorithm, we also distinguish 51 local maxima of integration. We zoom into the cited articles that carry the integration for: (i) a new development within high-energy physics and (ii) an emerging interface between the journals Applied Mathematical Modeling and the International Journal of Advanced Manufacturing Technology. In the first case, integration is brought about by a specific communication reaching across specialty boundaries, whereas in the second, the dyad of journals indicates an emerging interface between specialties. These results suggest that integration picks up substantive developments at the specialty level. An advantage of the bottom-up method is that no ex ante classification of journals is assumed in the dynamic analysis. (C) 2015 Elsevier Ltd. All rights reserved.
The phenomenon of China's rise as an emerging scientific power has been well documented, yet the development of its social science is less explored. Utilizing up-to-date Social Science Citation Index (SSCI) publication data (1978-2013), this paper probes the patterns and dynamics of China's social science research via bibliometric analyses. Our research indicates that despite the national orientation of social science research and the linguistic obstacle of publishing for an international audience, China's publications in the SSCI dataset have been rising in terms of volume, world share, and global ranking. But China is still not yet a major player in the arena of social sciences, as is evidenced by the number of Chinese journals indexed in SSCI and the lack of Olympic players. Team research features China's international publishing in social science, but the research outputs are highly unbalanced at regional and institutional levels. (C) 2015 Elsevier Ltd. All rights reserved.
In recent years, the Web ofScience Core Collection and Scopus databases have become primary sources for conducting studies that evaluate scientific investigations. Such studies require that duplicate records be excluded to avoid errors of overrepresentation. In this line, we identify duplicate records in Scopus and examine their origins. Identifying journals with duplicate records in Scopus, selecting and downloading bibliographic journal records, and identifying and analyzing the duplicate records is the methodology adopted. Duplicate records are found when articles published in a journal are incorrectly mapped by Scopus to this journal and to a different journal from the same publisher and when there are journal title changes, orthographic differences in the presentation of a journal name, and journal name variants. In these last three cases, one bibliographic record of each duplicate is mapped to Medline coverage of Scopus. Consequently, the identified duplicates and the significant differences in the number of citations received in duplicate articles may influence bibliometric studies. Thus, there is a need for rigorous quality control guidelines to govern database managers and editors to prevent the creation of duplicates. (C) 2015 Elsevier Ltd. All rights reserved.
A frequently used measure of scientific (or research) collaboration is co-authorship in scholarly publications. Less frequently used are joint grant proposals and patents. Many scholars believe that the use of co-authorship as the sole measure of research collaboration is insufficient because collaboration between researchers might not result in co-authorship. Collaborations involve informal communication (i.e., conversational exchange) between researchers. Using self-reports from 100 tenured/tenure-track faculty in the College of Engineering at the University of South Florida, researchers' networks are constructed from their communication relationships and also from collaborations in three areas: joint publications, joint grant proposals, and joint patents. The data collection: (1) covers both researchers' in-progress and completed collaborative outputs, (2) yields a rating from the researchers on the importance of a relationship to them and (3) obtains multiple types of relationship ties between researchers allowing for the comparison of multiple networks. Exponential random graph models (ERGMs) results show that the more communication researchers have the more likely they produce collaborative outputs. Furthermore, we find that joint grant proposals tend to have mixed gender teams and that researchers of the same race are more likely to publish together, but those demographic attributes have no additional explanatory power regarding joint patents. (C) 2015 Elsevier Ltd. All rights reserved.
We studied the factors ( recent and older journals, publication types, electronic or print form, open or subscription access, funding, affiliation, language and home country of publisher) that contributed to the growth of literature in Biomedical and Life Sciences as reflected in PubMed in the period 2004-2013. Only records indexed as journal articles were studied. 7364,633 journal articles were added in PubMed between 2004 and 2013 (48.9% increase from 2003). Recently launched journals showed the greater increase in published articles, but older journals contributed the greater number of articles. The observed growth was mainly attributed to articles to which no other PubMed publication type was assigned. Articles available in both print and electronic form increased substantially (61.1%). Both open (80.8%) and subscription access (54.7%) articles increased significantly. Funding from nonUS government sources also contributed significantly (74.5%). Asian (114%) and European (34.9%) first author affiliation increased at a higher rate than American publications (7.9%). English remained the predominant language of publications. USA- and England-based organizations published a gradually increasing body of literature. Open access, non-US government funding and Asian origin of the first author were the factors contributing to literature growth as depicted in PubMed. A better assignment of publication types is required. (C) 2015 Elsevier Ltd. All rights reserved.
Decision makers relying on web search engines in concept mapping for decision support are confronted with limitations inherent in similarity measures of relatedness proximity between concept pairs. To cope with this challenge, this paper presents research model for augmenting concept maps on the basis of a novel method of co-word analysis that utilizes webometrics web counts for improving similarity measures. Technology assessment serves as a use case to demonstrate and validate our approach for a spectrum of information technologies. Results show that the yielded technology assessments are highly correlated with subjective expert assessments (n = 136; r > 0.879), suggesting that it is safe to generalize the research model to other applications. The contribution of this work is emphasized by the current growing attention to big data. (C) 2015 Elsevier Ltd. All rights reserved.
An elite segment of the academic output gap between Denmark and Norway was examined using harmonic estimates of publication credit for contributions to Science and Nature in 2012 and 2013. Denmark still leads but the gap narrowed in 2013 as Norway's credit increased 58%, while Denmark's credit increased only 5.4%, even though Norway had 36% fewer, and Denmark 40% more, coauthor contributions than in 2012. Concurrently, the credit produced by the least productive half of the contributions rose tenfold from 0.9% to 10.1% for Norway, but dropped from 7.2% to 5.7% for Denmark. Overall, contributory inequality as measured by the Gini coefficient, fell from 0.78 to 0.51 for Norway, but rose from 0.63 to 0.68 for Denmark. Neither gap narrowing nor the positive association between reduced contributory inequality and increased credit were detected by conventional metrics. Conventional metrics are confounded by equalizing bias (EqB) which favours small contributors at the expense of large contributors, and which carries an element of reverse meritocracy and systemic injustice into bibliometric performance assessment. EqB was corrected by using all relevant byline information from every coauthored publication in the source data. This approach demonstrates the feasibility of using EqB-corrected publication credit in gap assessment at the national level. (C) 2015 The Author. Published by Elsevier Ltd.
A fundamental problem in citation analysis is the prediction of the long-term citation impact of recent publications. We propose a model to predict a probability distribution for the future number of citations of a publication. Two predictors are used: the impact factor of the journal in which a publication has appeared and the number of citations a publication has received one year after its appearance. The proposed model is based on quantile regression. We employ the model to predict the future number of citations of a large set of publications in the field of physics. Our analysis shows that both predictors (i.e., impact factor and early citations) contribute to the accurate prediction of long-term citation impact. We also analytically study the behavior of the quantile regression coefficients for high quantiles of the distribution of citations. This is done by linking the quantile regression approach to a quantile estimation technique from extreme value theory. Our work provides insight into the influence of the impact factor and early citations on the long-term citation impact of a publication, and it takes a step toward a methodology that can be used to assess research institutions based on their most recently published work. (C) 2015 Elsevier Ltd. All rights reserved.
The production and impact of male and female authors in Poland over the period 1975-2014 have been studied. The method is based on a special property of Polish last names, namely several popular last names have separate masculine (-ski, -cki) and feminine (-ska, -cka) forms. In this respect Polish is different from most other languages, in which the last name has only one form independent of the gender. A set of 56 634 unique publications of authors bearing one of 26 most popular -ski or -cki names was analyzed. The male dominance was observed over the entire studied period, yet it became systematically less significant over the period 1995-2014, especially in terms of production. (C) 2015 Elsevier Ltd. All rights reserved.
This paper compares Fractional, Geometric, Arithmetic, Harmonic, and Network-Based schemes for allocating coauthorship credits. Each scheme is operationalized to be flexible in producing credit distribution by changing parameters, and to incorporate a special situation where the first and corresponding authors are assigned equal credits. For testing each scheme, empirical datasets from economics, marketing, psychology, chemistry, and medicine, were collected and errors in how each scheme approximates empirical data was measured. Results show that Harmonic scheme performs best overall, contrary to some claims of preceding studies in support of Harmonic or Network-Based models. The performance of a scheme, however, seems to heavily depend on empirical datasets and flexibility of the scheme, not on its innate feature. This study suggests that the comparison of coauthorship credit allocation schemes should be taken with care. (C) 2015 Elsevier Ltd. All rights reserved.
A review is presented of the relation between information and entropy, focusing on two main issues: the similarity of the formal definitions of physical entropy, according to statistical mechanics, and of information, according to information theory; and the possible subjectivity of entropy considered as missing information. The paper updates the 1983 analysis of Shaw and Davis. The difference in the interpretations of information given respectively by Shannon and by Wiener, significant for the information sciences, receives particular consideration. Analysis of a range of material, from literary theory to thermodynamics, is used to draw out the issues. Emphasis is placed on recourse to the original sources, and on direct quotation, to attempt to overcome some of the misunderstandings and oversimplifications that have occurred with these topics. Although it is strongly related to entropy, information is neither identical with it, nor its opposite. Information is related to order and pattern, but also to disorder and randomness. The relations between information and the interesting complexity, which embodies both patterns and randomness, are worthy of attention.
This article introduces the Multidimensional Research Assessment Matrix of scientific output. Its base notion holds that the choice of metrics to be applied in a research assessment process depends on the unit of assessment, the research dimension to be assessed, and the purposes and policy context of the assessment. An indicator may by highly useful within one assessment process, but less so in another. For instance, publication counts are useful tools to help discriminate between those staff members who are research active, and those who are not, but are of little value if active scientists are to be compared with one another according to their research performance. This paper gives a systematic account of the potential usefulness and limitations of a set of 10 important metrics, including altmetrics, applied at the level of individual articles, individual researchers, research groups, and institutions. It presents a typology of research impact dimensions and indicates which metrics are the most appropriate to measure each dimension. It introduces the concept of a meta-analysis of the units under assessment in which metrics are not used as tools to evaluate individual units, but to reach policy inferences regarding the objectives and general setup of an assessment process.
An extensive analysis of the presence of different altmetric indicators provided by Altmetric.com across scientific fields is presented, particularly focusing on their relationship with citations. Our results confirm that the presence and density of social media altmetric counts are still very low and not very frequent among scientific publications, with 15%-24% of the publications presenting some altmetric activity and concentrated on the most recent publications, although their presence is increasing over time. Publications from the social sciences, humanities, and the medical and life sciences show the highest presence of altmetrics, indicating their potential value and interest for these fields. The analysis of the relationships between altmetrics and citations confirms previous claims of positive correlations but is relatively weak, thus supporting the idea that altmetrics do not reflect the same kind of impact as citations. Also, altmetric counts do not always present a better filtering of highly-cited publications than journal citation scores. Altmetric scores (particularly mentions in blogs) are able to identify highly-cited publications with higher levels of precision than journal citation scores (JCS), but they have a lower level of recall. The value of altmetrics as a complementary tool of citation analysis is highlighted, although more research is suggested to disentangle the potential meaning and value of altmetric indicators for research evaluation.
Here, we develop a theory of the relationship between the reviewer's effort and bias in peer review. From this theory, it follows that journal editors might employ biased reviewers because they shirk less. This creates an incentive for the editor to use monitoring mechanisms (e.g., associate editors supervising the peer review process) that mitigate the resulting bias in the reviewers' recommendations. The supervision of associate editors could encourage journal editors to employ more extreme reviewers. This theory helps to explain the presence of bias in peer review. To mitigate shirking by a reviewer, the journal editor may assign biased referees to generate information about the manuscript's quality and subject the reviewer's recommendations to supervision by a more aligned associate editor.
With the increasing importance of social media in people's lives, more mobile applications have incorporated features to support social networking activities. These applications enable communication between people, using features such as chatting and blogging. There is, however, little consideration of the collaboration between people during information seeking. Mobile applications should support the seeking, sharing, confirming, and validating of information systematically to help users complete their tasks and fulfill their information needs. To support information seeking, especially collaboratively as a group, there is a need to understand people's social interaction behavior. Using tourism as a domain, we conducted a diary study to look into tourists' social interaction during information seeking. Further, based on the diary study findings and current research, we describe a set of triggers that lead to collaboration for each step in the information-seeking process. Here we present the social collaboration patterns between tourists and the people around them. Further, based on a diary study and current research, we describe a set of triggers that lead to collaboration for each step in the BIG6 information-seeking process.
The success or failure of social media is highly dependent on the active participation of its users. In order to examine the influential factors that inspire dynamic and eager participation, this study investigates what motivates social media users to share their personal experiences, information, and social support with anonymous others. A variety of information-sharing activities in social media, including creating postings, photos, and videos in 5 different types of social media: Facebook, Twitter, Delicious, YouTube, and Flickr, were observed. Ten factors: enjoyment, self-efficacy, learning, personal gain, altruism, empathy, social engagement, community interest, reciprocity, and reputation, were tested to identify the motivations of social media users based on reviews of major motivation theories and models. Findings from this study indicate that all of the 10 motivations are influential in encouraging users' information sharing to some degree and strongly correlate with one another. At the same time, motivations differ across the 5 types of social media, given that they deliver different information content and serve different purposes. Understanding such differences in motivations could benefit social media developers and those organizations or institutes that would like to use social media to facilitate communication among their community members; appropriate types of social media could be chosen that would fit their own purposes and they could develop strategies that would encourage their members to contribute to their communities through social media.
In this article we argue for how a genre-theoretical approach to information history can contribute to our understanding of what has historically been conceived of as information, what sort of networks and activities triggered the production and use of information, and what forms information was presented and communicated in. Through 2 case studies we show how information and the genres used for communicating that information was perceived and used by the relevant agents involved with the genres. Based on the case studies, we conclude by discussing how the fields of information history and rhetorical genre theory can inform each other.
The quality of online health information for consumers has been a critical issue that concerns all stakeholders in healthcare. To gain an understanding of how quality is evaluated, this systematic review examined 165 articles in which researchers evaluated the quality of consumer-oriented health information on the web against predefined criteria. It was found that studies typically evaluated quality in relation to the substance and formality of content, as well as to the design of technological platforms. Attention to design, particularly interactivity, privacy, and social and cultural appropriateness is on the rise, which suggests the permeation of a user-centered perspective into the evaluation of health information systems, and a growing recognition of the need to study these systems from a social-technical perspective. Researchers used many preexisting instruments to facilitate evaluation of the formality of content; however, only a few were used in multiple studies, and their validity was questioned. The quality of content (i.e., accuracy and completeness) was always evaluated using proprietary instruments constructed based on medical guidelines or textbooks. The evaluation results revealed that the quality of health information varied across medical domains and across websites, and that the overall quality remained problematic. Future research is needed to examine the quality of user-generated content and to explore opportunities offered by emerging new media that can facilitate the consumer evaluation of health information.
Faced with highly competitive and dynamic environments, organizations are increasingly investing in technologies that provide them with new options for structuring work. At the same time, firms are increasingly dependent on employees' willingness and ability to make sense of novel tasks, problems, and rapidly changing situations. Yet, in spite of its importance, the impact of technology-enabled distributed work arrangements on sensemaking behavior is largely unknown. Sensemaking remains something that is perceived by many to be an idiosyncratic behavior that is, at best, loosely related to sociotechnical context and culture. Drawing on previous studies of cognitive dispositions (need for cognition, tendency for decisiveness, intolerance for ambiguity, and close-mindedness) and research on how technology-enabled distributed work arrangements affect interpersonal interaction, we theorize how workgroup geographic distribution interacts with individual cognitive differences to affect employees' willingness to engage in the core sociocognitive activities of sensemaking. Our results show that the consequences of individual tendencies can vary under different work arrangements, suggesting that managers seeking to facilitate sensemaking activities must make careful choices about the composition of distributed work groups, as well as how collaboration technologies can be used to encourage sensemaking behaviors.
One of the most significant contemporary technological trends is institutional adoption and use of mobile and location-based systems and services. We argue that the notion of location as it manifests itself in location-based systems is being produced as an object of exchange. Here we are specifically concerned with what happens to institutional roles, power relationships, and decision-making processes when a particular type of informationthat of spatiotemporal location of peopleis made into a technologically tradable object through the use of location-based systems. We examine the introduction of GPS (Global Positioning Systems) technologies by the California criminal justice system and the institution of parole for monitoring the movements of parolees, with consequences both for the everyday lives of these parolees and the work practices of their parole officers. We document the ways in which broad adoption of location-based and mobile technologies has the capacity to radically reconfigure the spatiotemporal arrangement of institutional processes. The presence of digital location traces creates new forms of institutional accountability, facilitates a shift in the understood relation between location and action, and necessitates new models of interpretation and sense making in practice.
Measures of semantic relatedness have been used in a variety of applications in information retrieval and language technology, such as measuring document similarity and cohesion of text. Definitions of such measures have ranged from using distance-based calculations over WordNet or other taxonomies to statistical distributional metrics over document collections such as Wikipedia or the Web. Existing measures do not explicitly consider the domain associations of terms when calculating relatedness: This article demonstrates that domain matters. We construct a data set of pairs of terms with associated domain information and extract pairs that are scored nearly identical by a sample of existing semantic-relatedness measures. We show that human judgments reliably score those pairs containing terms from the same domain as significantly more related than cross-domain pairs, even though the semantic-relatedness measures assign the pairs similar scores. We provide further evidence for this result using a machine learning setting by demonstrating that domain is an informative feature when learning a metric. We conclude that existing relatedness measures do not account for domain in the same way or to the same extent as do human judges.
In this article we examine 2 classic stochastic models of the accumulation of citations introduced by H.A. Simon and Derek John de Solla Price. These models each have 2 distinct aspects: growth, which is the introduction of new articles, and preferential attachment, which describes how established articles accumulate new citations. The attachment rules are the subtle portion of these models that supply the interesting explanatory power. Previous treatments included both aspects. Here we separate preferential attachment from the growth aspect of the model. This separation allows us to examine the results of the preferential attachment rules without confounding these with growth in the number of articles available to receive citations. We introduce the method using Markov chains. We show how to overcome the mathematical and computational complexity to obtain results. A comparison of Simon's and Price's rules are computed in 3 JournalCitationReports subject categories using articles published in the 1960s and allowed to accumulate citations to 1980. This comparison cannot be made through analysis of power laws.
In 2014, Thomson Reuters published a list of the most highly cited researchers worldwide (). Because the data are freely available for downloading and include the names of the researchers' institutions, we produced a ranking of the institutions on the basis of the number of highly cited researchers per institution. This ranking is intended to be a helpful amendment of other available institutional rankings.
Whether one should use a public e-mail account (e.g., Gmail, Yahoo!) or an institutional one (e.g., @wsiz.rzeszow.pl, @medicine.ox.ac.uk) as an address for correspondence is an important aspect of scientific communication. Some authors consider that public e-mail services are unprofessional and insecure, whereas others say that, in a dynamically changing working environment, public e-mail addresses allow readers to contact authors long after they have changed their workplace. To shed light on this issue, we analyzed how often authors of scientific papers provided e-mail addresses that were either public or institution based. We selected from the Web of Science database 1,000 frequently cited and 1,000 infrequently cited articles (all of the latter were noncited articles) published in 2000, 2005, and 2010, and from these we analyzed 26,937 e-mail addresses. The results showed that approximately three fourths of these addresses were institutional, but there was an increasing trend toward using public e-mail addresses over the period studied. No significant differences were found between frequently and infrequently cited papers in this respect. Further research is now needed to access the motivations and perceptions of scholars when it comes to their use of either public or institutional e-mail accounts.
We present a novel routine, namely medlineR, based on the R language, that allows the user to match data from Medline/PubMed with records indexed in the ISI Web of Science (WoS) database. The matching allows exploiting the rich and controlled vocabulary of medical subject headings (MeSH) of Medline/PubMed with additional fields of WoS. The integration provides data (e.g., citation data, list of cited reference, list of the addresses of authors' host organizations, WoS subject categories) to perform a variety of scientometric analyses. This brief communication describes medlineR, the method on which it relies, and the steps the user should follow to perform the matching across the two databases. To demonstrate the differences from Leydesdorff and Opthof (Journal of the American Society for Information Science and Technology, 64(5), 1076-1080), we conclude this artcle by testing the routine on the MeSH category Burgada syndrome.
Rule-based and corpus-based machine translation (MT) have coexisted for more than 20 years. Recently, boundaries between the two paradigms have narrowed and hybrid approaches are gaining interest from both academia and businesses. However, since hybrid approaches involve the multidisciplinary interaction of linguists, computer scientists, engineers, and information specialists, understandably a number of issues exist. While statistical methods currently dominate research work in MT, most commercial MT systems are technically hybrid systems. The research community should investigate the benefits and questions surrounding the hybridization of MT systems more actively. This paper discusses various issues related to hybrid MT including its origins, architectures, achievements, and frustrations experienced in the community. It can be said that both rule-based and corpus- based MT systems have benefited from hybridization when effectively integrated. In fact, many of the current rule/corpus-based MT approaches are already hybridized since they do include statistics/rules at some point.
This paper presents a bibliometric analysis of the Turkish software engineering (SE) community (researchers and institutions). The bibliometric analysis has been conducted based on the number of papers published in the software-engineering-related venues and indexed in the Scopus academic search engine until year 2014. According to the bibliometric analysis, the top-ranked institution is Middle East Technical University, and the top-ranked scholar is Ayse Basar Bener (formerly with Bogazici University and now with Ryerson University in Canada). The analysis reveals other important findings and presents a set of implications for the Turkish SE community and stakeholders (e.g., funding agencies and decision makers) such as the followings: (1) Turkey produces only about 0.49 % of the world-wide SE knowledge, as measured by the number of papers in Scopus, which is very negligible unfortunately. To take a more active role in the global SE community, the Turkish SE community has to increase their outputs. (2) We notice a lack of diversity in the general SE spectrum, e.g., we noticed very little focus on requirements engineering, software maintenance and evolution, and architecture. This denotes the need to further diversification in SE research topics in Turkey. (3) In total, 89 papers in the pool (30.8 % of the total) are internationally-authored SE papers. Having a good level of international collaborations is a good sign for the Turkish SE community. The highest international collaborations have been with researchers from United States, Canada and Netherlands. (4) In general, the involvement of industry in SE search is low. All stakeholders (e.g., government, industry and academia) should aim at increasing the level of industry-academia collaboration in the Turkish SE community, (5) Citation to Turkish SE papers, in general, are significantly lower than a set of three representative pools of SE papers. This is an area of concern which needs further investigation, and (6) In general, there is a need to increase both the quantity and quality of the Turkish SE papers, in the global stage. The approach we use in this study could be replicated in other countries to provide insights and trends about the SE research performance in other countries.
To understand the development of Ocean Engineering research both in China and worldwide, a bibliometric study based on the Science Citation Index-Expanded has been made. The research in this field has lasted for nearly half a century. The period from 2010 to 2014 is the rapid developing period for China whose annual output has been close to USA by 2014. Though abundant in output, China did not performance equally well in terms of citation counts and h-index. Dalian University of Technology is the only Chinese institute that has ranked among the top ten productive institutes in the world. The journal China Ocean Engineering has published the majority articles produced by China and it is the only journal published by China in the field of Ocean Engineering. In order to give insight into the smaller impact of China in the field, this study has compared the collaborative relationship of China in detail both from the aspects of countries and institutes.
This article aims to explore the evolution of social network in marketing research by analyzing the co-occurrence index and network structures of keywords. We find that the number of articles which subjective tittle consist of social networks within 19 marketing journals and 9 UTD (Utdallas list of top journals) management journals increase significantly and the number of keywords whose frequency are no less than two also grow dramatically since 2010, the network structures of keywords 2010-2014 become more dispersed shows as most of keywords' centralities are between 0.32 and 0.63, and more keywords have strong relationships (Higher Cosine Index) with social networks or networks than 2001-2009. We also conclude that social network analysis has been mainly applied to study relationships, diffusion, influence, customer analysis, and enterprise management five subfields. Since mobile internet, intelligent devices, new media and digital technology are developing rapidly, social networks will be a powerful tool to study the related research fields.
This paper investigates Vietnam's scientific publications between 1996 and 2013 from Scopus database, focusing on international collaboration. The total scientific output of the country increased about 16 publications per year during 1996-2001 and quickly increased 20 % per year during 2002-2013. However, the share of international collaboration was about 77 % of the total output. Biological and agricultural science, medicine dominated the total output, but 80-90 % of these publications are from international collaboration. In contrast, mathematics is the only field that domestic output is larger than collaboration output. Japan is the largest collaborating country, followed by United States, France, South Korea and United Kingdom. Analyzing titles of publications with these collaborating countries, we found high frequency of "Vietnam" or "Vietnamese" words. This result suggested that many study subjects of these research collaborations were from Vietnam. Furthermore, corresponding authors of these research collaborations are mainly from collaborating countries, which suggested that these research collaborations mainly led by foreign authors. Although the total output was quickly increased, especially collaboration output, Vietnamese researchers should be aware about their low contribution to these collaborations.
This paper presents a detailed scientometric and text-based analysis of Computer Science (CS) research output from Mexico during 1989-2014, indexed in Web of Science. The analytical characterization focuses on origins and growth patterns of CS research in Mexico. In addition to computing the standard scientometric indicators of TP, TC, ACPP, HiCP, H-index, ICP patterns etc., the major publication sources selected by Mexican computer scientists and the major funding agencies for CS research are also identified. The text-based analysis, on the other hand, focused on identifying major research themes pursued by Mexican computer scientists and their trends. Mexico, ranking 35th in the world CS research output during the mentioned period, is also unique in the sense that 75 % of the total CS publications are produced by top ten Mexican institutions alone. Similarly, Mexico has higher ICP instances than world average. The analysis presents a detailed characterization on these aspects.
The purpose of this paper is to compare the quantity and quality of management articles published in international journals by authors from the four major regions of China. The results showed that both the quality and quantity of papers published in management journals from ML, HK, and TW greatly improved from 2003 to 2012. Publication from MC also increased. TW exceeded ML and HK in the accumulated factors. HK and TW had the highest mean impact factor. ML published the most papers, and the most frequently cited paper came from ML. However, publications from ML were unevenly distributed among the provinces.
This paper presents a bibliometric analysis of articles from the Republic of Serbia in the period 2006-2013 that are indexed in the Thomson Reuters SCI-EXPANDED database. Analysis included 27,157 articles with at least one author from Serbia. Number of publications and its characteristics, collaboration patterns, as well as impact of analyzed articles in world science will be discussed in the paper. The number of articles indexed by SCI-EXPANDED has seen an increase in terms of Serbian articles that is considerably greater than the increase in number of all articles in SCI-EXPANDED. Researchers from Serbia have been involved in researches presented in several articles which have significant impact in world science. Beside indicators which take into account number of publications of certain type and number of citations, the analysis presented in this paper also takes authorship into consideration. The most cited analyzed articles have more than ten authors, but there were no first and corresponding authors from Serbia in these articles.
In previous research (Chiang and Yang in Appl Econ 44(22):2827-2839, 2012 ), they has been studied to analyze the growth of publication, the subject types, and the journal distributions, etc. for financial risk literatures through the perspective of bibliometrics from 1991 to 2009. From the growing incidence of financial risk Since 2008 year, The event of financial risk greater more impact on the economy, for example, the Lehman Brothers Holdings Inc bankruptcy, the Greek debt crisis, the Latin American sovereign Crisis etc. In this study, we extended previous research up to 2013 and investigates the features of financial crisis literature based on bibliometric methods from: (1) TP: the number of "total articles" of an institution or a country; (2) SP: the number of "single country article" (3) CP: the number of "internationally collaborative article" (4) FP: the number of "first author article", and (5) RP: the number of "corresponding author article". The distribution of journal articles was also examined utilizing Bradford's law and Citation model (Chiu and Ho in Scientometrics 63(1):3-23, 2005). Data were based on the Science Social Citation Index, from the Institute of Scientific Information Web of Science database. A total of 8485 entries from 1926 to 2013 were collected. This paper implemented the following publication type and language, characteristics of articles outputs, country, subject categories and journals, and the frequency of title-words and keywords used. Meanwhile, the analysis indicated the most relevant disciplines for financial crisis subject category provided by economics, business finance, and political science.
Information security has been a crucial issue in modern information management; thus cryptographic techniques have become inevitable to safeguard the digital information assets as well as to defend the invasion of privacy in modern information society, and likely to have far reaching impact on national security policies. This paper demonstrates the intellectual development of cryptographic research based on quantifiable characteristics of scholarly publications over a decade of the present century (2001 to 2010). The study critically examines the publication growth, authorship pattern, collaboration trends, and predominant areas of research in cryptology. Rank list of prolific contributors, productive institutions, and predominant countries have been carried out using fractional counting method. Strenuous efforts have been made to perform the activity index (performance indicator) of JOC, to determine the degree of collaboration in quantitative terms, to ascertain the collaboration density, as well as to test the empirical validation of Lotka's law in this scientific specialty. Major findings reveal that performance of JOC in cryptographic research corresponds precisely to the growth of world's publication activity (activity index = 1.1) over a decade of time; single-authored papers count only 25 % and average authorship accounts for 2.4 per paper; an increasing trend of multi-authored publications and a significant degree of collaboration (DC = 0.74) implies that cryptographers prefer to work in highly collaborative manner; author productivity distribution data partially fits the Lotka's law, when the value of alpha (productivity parameter) approximated to 2.35 (instead of 2) and the number of articles does not exceed two. While large majority of collaborations constituted across the countries (56 %), then adequate amount of inter-country bilateral and multilateral collaboration signifies higher density or greater strength in the research network; most of the potential collaborators are emanated from 10 institutions of 5 different countries; however, cryptographic research is dominated by USA and Israel. More interestingly, vast majority among top-twenty ranked productive authors are affiliated in USA and Israel; Yehuda Lindell is found to be the most prolific author followed by Rosario Gennaro (USA), Tamir Tassa (Israel), Jonathan Katz (USA), etc.; Anglo-American institutions are more open than their overseas competitors; University of California (six centers) is placed on the top of the productive institutions. The study entails distinct subject clusters (research streams); and author-assigned keyword frequencies revealed that cryptanalysis, discrete logarithm, elliptic curve, block cipher, provable security, cryptography, secure computation, oblivious transfer, public-key encryption, zero-knowledge are more prevalent and active topics of research in cryptology. The implications of empirical results to the field are discussed thoroughly, and further analyzes are proposed to visualize this assessment in a better way.
In this study, our aim is to evaluate the global scientific publications produced in the field of food allergy over the past 14 years. The Web of Science was considered as the primary database consisting of both Science Citation Index and Social Sciences Citation Index. This analysis provides an insight into the number of papers, journals, subjects, citations and the overall trend of research. A total of 8327 articles were considered for this study published between 2001 and 2014 in over a hundred journals. There has been a growing concern in the population as the food allergies are on an alarming rise over the past decade. Till date, research has failed to come up with a solution for this issue except for a complete abstinence of that food/ingredient from the diet. This new Bibliometric method would help the researchers in this field to understand the landscape of food allergy research, and set up the future research direction.
The purpose of this article is to explore the intellectual structure of supply chain management area by employing a co-citation method. Data was collected (4652 articles and 145,242 references) from the Web of Science online database. Author productivity and 41 frequently cited articles were identified through citation and document co-citation analysis. The result of this study identified four core topics to supply chain management research: sustainable supply chain management, strategic competition, value of information, and development of supply chain management. This study illustrates the shift in the intellectual structure of SCM, indicating research priorities, such as sustainability in supply chain management, relationships among organisations, the bullwhip effect within the supply chain, and performance measurement in supply chain management. This study implies that managers should expend effort formulating effective performance measurements in supply chain management and share information within the supply chain to mitigate the bullwhip effect.
In this study, we aim to evaluate the global scientific output of liposome research, and try to find an approach to quantitatively and qualitatively assess the current global research on liposome. Data were based on the science citation index expanded database, from the Institute of Scientific Information Web of Science database. Bibliometric method was used to analyze publication outputs, journals, countries/territories, institutions, authors, research areas, research hotspots and trends. Globally, there were 37,327 publications referring to liposome during 1995-2014. Liposome research experienced notable growth in the past two decades. The International Journal of Pharmaceutics published the largest number of liposome-related publications in the surveyed period. Major author clusters and research regions are located in the USA, Western Europe, and Asia. The USA was a leading contributor to liposome research with the largest number of publications. The Osaka Univ (Japan), Kyoto Univ (Japan), and Univ Texas (USA) were the three institutions with the largest number of liposome-related publications. Van Rooijen N (Netherlands) was a leading contributor to liposome research with the largest number of publications. The chemistry accounts for the largest number of publications in the research area of liposomes. A keywords analysis revealed that gene, drug delivery, cell and cancer were the research hotspots in the study period. The nanotechnology, drug delivery, small interfering RNA and cancer therapy received dramatically increased attention during the analyzed period, possibly signaling future research trends. Bibliometric method could quantitatively characterize the development of global liposome research.
Taking the theses' keywords in China from 1986 to 2014 as the research materials, use the basis concept of the Big Data Theory to further study the keywords which related to oil and gas industry. Analyze the keywords frequency of the theses in oil and gas industry and its co-occurrence frequency pair, and then use the theory of mapping knowledge domain to visualize the keywords co-occurrence network in petroleum industry so as to make further research of the heated issues that mapping knowledge domain has shown. According to the research we can see that the application technology R&D (research and development) predominate the oil and gas industry, featuring a high concentration and long tail phenomenon (which means various researches focus on different kinds of things, the scale of the research is large).
This paper describes a study investigating the performance of Electromagnetic Fields (EMF) research work using bibliometric analysis covering the period 2003-2013. The study focuses on the distribution and growth of publications across journals, titles, and fields over the period, and collaboration network patterns among scholars and scientists. A total of 1737 articles were gathered from the IEEE ICES EMF Database. Among these, a 29,047 citation count was reported from 432 journal titles. The most cited journal title and the one with greatest number of publications was the journal Bioelectromagnetics. Most of the cited articles focused mainly on radiation risk and biological effects of EMF. The fields of Engineering & Physics produced the highest number of articles while Epidemiology journals showed the most outstanding performance across all fields. 95 % (1651) of the articles were identified as co-authored publications, indicating involvement in a collaborative network. Only 20 % (341) of the publications involved international collaboration, the majority of these among European-European and Europe-North American countries/regions.
Bibliometrics are often used as key indicators when evaluating academic groups and individual researchers in biomedical research. Citation metrics, when used as indicators of research performance, require accurate benchmarking for homogenous groups. This study describes the research performance of academic departments in the University of Toronto's Faculty of Medicine using article-level bibliometrics for scientific papers published from 2008 to 2012. Eligible publications of all academic faculty members were verified from each researcher's curriculum vitae and Web of Science (R) (Thomson Reuters). For 3792 researchers, we identified 26,845 unique papers with 79,502 authors published from 2008 to 2012. The overall mean citations per paper for the faculty was 17.35. The academic departments with the highest levels of collaboration and interdisciplinary research activity also had the highest research impact. The citation window for biomedical scientific papers was still active at 5 years after publication, indicating that the citation window for publications in biomedical research is active longer than previously thought, and this may hinder the reliable use of bibliometrics when evaluating recent scientific publications in biomedical research.
Like all other discipline Library and Information Science also gives emphasis on research in doctoral level. Since the first doctoral degree in 1957 it grows slowly and gets acceleration after IT revolution in India during 1990s. Based on INDCAT, Vidyanidhi, INFLIBNET and University News databases it has been found 1058 doctoral thesis have been submitted in various universities in India during 1950-2012. Present study is an attempt to find out the research trends of library management in Library and Information Science and quantitatively analyze the research activity in India based on doctoral thesis which has been already awarded in the period of 1950-2012. This work analysed 1058 PhD thesis awarded various universities in India during last 63 years, based on INDCAT database, Vidyanidhi and INFLIBNET databases and University News data.
Using the science citation index and the social sciences citation index databases, we performed a bibliometric analysis of the global studies on the digital evaluation model (DEM) that were conducted between 1994 and 2013. We identified the authorial, institutional, international, categorical, spatiotemporal patterns, and hotspots in DEM-related research. The number of DEM-related publications has been continuously increasing since 1994. Geology, engineering, and physical geography were the most frequently used subject categories in DEM-related studies, whereas Geomorphology, International Journal of Remote Sensing, and IEEE Transactions on Geoscience and Remote Sensing were the most active journals in the field. Toutin, T., Favalli, M., and Hancock, G. R. were the most prolific authors in DEM-related research. Authors of DEM-related research were mostly located in the USA, the European Union, and East Asia. The USA, China, and the UK were the top three most productive countries. The Chinese Academy of Sciences, the largest contributor of single-institution and collaborative publications on DEM, has a key position in collaboration networks. Both the number of articles that are internationally and inter-institutionally collaborative is increasing, with the former being more prevalent than the latter. Through keyword analysis, we observed several interesting terminology preferences, revealed the adoption of advanced technologies, and examined the patterns and underlying processes of geoscience. In summary, our study identified the quantitative, qualitative, temporal, and spatial characteristics, academic collaborations, as well as hotspots in scientific outputs and provided an innovative method of revealing global trends in DEM-related research to guide future studies.
In the 1980s, Morocco was vowed to be a leader country of scientific research and output in North Africa. However, the scientific output underwent stagnation and even decline in the mid-2000s relegating Morocco to the last rank of the Maghreb countries. Analysis of the effects of several major governmental decisions taken in the 2000s, shows a clear impact on research activity. The major decisions that have had the most important negative impact are: (1) the use of different scientific languages in schools and universities, (2) abolition of the State Doctorate, (3) the promotion system of professors, (4) the complex bureaucracy of project management, (5) the Voluntary Departure initiative, (6) administrative equivalence of Master and PhD, (7) teaching pressure in universities, (8) ageing of professors and supporting administrative personnel, (9) preference given to expertise over research activity, (10) fled of PhD students.
Tourism research literatures have increased rapidly in the past few decades while there have been few efforts to map the panorama of global tourism research. A scientometric analysis was applied to evaluate the performance status and research front using knowledge domain visualization techniques in this work. The data consists of 17,413 tourism-themed literatures retrieved from Science Citation Index Expanded and Social Sciences Citation Index databases since 1900. What were analyzed included the publication outputs, document types, productive publications, languages used frequently as well as fruitful countries/territories and institutions contributing to English articles and the co-citation network of references in the past decade. The cumulated literature records of tourism-related research have increased exponentially over past 40 years, revealing that tourism research is in its rapid development stage. It turned out that there were fourteen types of documents and academic articles accounted for the largest proportion. Literatures were published in a wide range of 2082 publications, of which Annals of Tourism Research and Tourism Management were in the vanguard of the most productive journals for tourism research. English is the most popular language in publication. USA, UK and Australia were the most productive countries; New Zealand, Peoples R China, UK and Australia were the most close cooperation groups, and Austria; New Zealand, Japan and Peoples R China had the most innovation. The most productive institutes included Hong Kong Polytechnic University, Griffith University, University of Queensland, University of Surrey and Texas A&M University. The co-citation cluster analysis reveals the distribution of popular topics and the representative literatures during the past decade. Online tourism, consumer perceptions and behavioral intentions, tourism demand forecasting, as well as tourism destination competitiveness are the research frontiers of Tourism Discipline.
E-commerce (EC) is sweep across the globe and has become a most important commercial activity. Accordingly, EC also causes the academia's research interests. A lot of research achievements have been gained in recent years. This paper takes these achievements as research object and collects 8488 research papers published in academic journals during 2000-2013 included in Web of Science database. Using text mining techniques, 68 terms are identified as the main keywords of EC field. Then the scientific structure of the EC is mapped through multidimensional scaling, based upon the co-occurrence of the main terms in the academic journals. The results show that the EC domain is composed of three main fields, such as technology, management and customer. Furthermore, knowledge graph based on the EC research network is visualized and it shows that the whole EC research papers covering seven important subnets, which are: internet, consumer behaviour, customer satisfaction, online shopping, reputation, Taiwan and knowledge management.
In the present study, we preformed a bibliometric analysis of published research papers related to the Mekong River during 1991-2012, based on the Science Citation Index Expended. We investigated nation, institution, authors' research performance, the cited times, impact factor, journals and subject categories of the published papers. CPP was used as an indicator to assess the visibility of an article. The impact of countries, institutions, and authors was assessed by h-index and newly developed indicator TC2012. Results indicated that Vietnam had the most output and the most-frequent partners, next is Japan. The authors with high contribution were mainly from Japan. The most three active research fields in the Mekong River were environmental sciences, water resources and geoscience. The top five productive institutions were Chinese Acad Sci., Can Tho Univ., Cantho Univ., Yunnan Univ. and National Univ., while the top five institutions with high h-index were Chinese Acad Sci., Can Tho Univ., National University Singapore, Nong Lam Univ. and International Rice Research Insti. In addition, analysis of keywords was applied to reveal research interests which probably indicated a wide disparity in research focuses.
Understanding food webs is important and useful for planning environmental conservation, management and restoration. However, research on food webs is not uniform globally; it tends to be concentrated in specific areas or ecosystem types, and would hinder our understanding of food webs and ecosystem processes. This study examined the trends in food web research over the past decades by analysing publication data from Web of Science; in particular, it focused on the ecosystem types studied, countries in which the studies were done, and which countries collaborated on the studies. A total of 20,239 publications were examined. The results showed that research on food webs has dramatically increased since the 1990s. Most publications related focused on aquatic ecosystems. North American and European countries contributed much more in terms of research productivity than those from Africa and South America. Collaboration among individual authors and countries has become increasingly intensive. The USA and Canada were consistently the top two productive countries, and had the most frequent collaborations. Our study indicates that food webs from ecosystems other than aquatic ones, such as terrestrial ecosystems, also require more attention in the future; in particular those that exist within countries from Africa and South America.
In this paper, scientometrics cognitive and knowledge visualization technology were used to evaluate global scientific production and development trends in construction and building technology research of smart cities. All the data was collected from the Science Citation Index-Expanded (SCIE) database and Journal Citation Reports (JCR). The published papers from the subject of construction and building technology and their journals, authors, countries and keywords spanning over several aspects of research topics, proved that architecture/building research grew rapidly over the past 30 years, and the trend still continues in recent smart cities era. The purposed of this study were to identify the journals in the field of construction and building technology in smart city, make a comparative report on related researches, as well as propose a quality evaluation of those journals. Based on JCR and SCI paper data, the journals related to construction and building technology in smart city were assessed using ten metrics: languages, active degree, references, citation trends, main countries, leading institutes, cooperation trends, productive authors, author keywords and keywords plus. The results indicate that all the factors have great significance and are related to the impact of a journal. It also provides guidance to both evaluators and the study groups which assess journals during the process of judging or selecting research outlets, and future perspective on how to improve the impact of a paper or a journal.
2015 is the centennial of Einstein General Relativity. On this occasion, we examine the General Relativity and Quantum Cosmology (GRQC) field of research by analysing 38,291 papers uploaded on the electronic archives arXiv.org from 2000 to 2012. We establish a map of the countries contributing to GRQC in 2012. We determine the main journals publishing GRQC papers and which countries publish in which journals. We find that more and more papers are written by groups (instead of single) of authors with more and more international collaborations. There are huge differences between countries. Hence Russia is the country where most of papers are written by single authors whereas Canada is one of the countries where the most of papers imply international collaborations. We also study authors mobility, determining how some groups of authors spread worldwide with time in different countries. The largest mobilities are between USA-UK and USA-Germany. Countries attracting the most of GRQC authors are Netherlands and Canada whereas those undergoing a brain drain are Italy and India. There are few mobility between Europe and Asia contrarily to mobility between USA and Asia.
Severe acute respiratory syndrome (SARS) and human infection H7N9 influenza are emerging infectious diseases having a relatively high mortality. Epidemics of each began in China. By searching through Science Citation Index, this study analyzed the article literature on SARS and H7N9 influenza, particularly papers in the leading journals The Lancet, New England Journal of Medicine (NEJM), Nature and Science. The results show that the quantity and quality of SARS and H7N9 influenza literature from mainland China changed distinctly over the course of 10 years. Researchers from mainland China published 12 article literature in the The Lancet, NEJM, Nature and Science about H7N9 influenza, whereas mainland China had only 2 article literature about SARS in the same journals. The literature reflects China's growing strength in the science and technology of emerging infectious disease.
An increasing number of quantitative and qualitative methods have been used for future-oriented technology analysis (FTA) to develop understanding of situations, enable creativity, engage experts, and provide interaction. FTA practitioners have used frequently one or a suitable mixture of these methods for their activities. Changing policy and strategy making contexts as well as enabling technologies increased the need and possibility for performing adaptive Foresight studies in order to improve decision making about the future and using making better use of limited resources. This study performs a scientometric analysis of the publications in the major FTA journals with the aim of understanding the dynamics of using Foresight methods across time. Among the other branches of FTA, including forecasting, futures, and technology assessment, a special emphasis is given on Foresight as a systematic and inclusive way of exploring long term futures, developing visions and formulating policies for action. The study aims at detecting the key Trends and Weak Signals regarding the use of existing methods and emerging ones with potential uses for Foresight activities. Further implications will be achieved with the generation of networks for quantitative and qualitative methods. This will demonstrate the most frequently combined Foresight methods by researchers and practitioners. Where possible the methods will also be cross-fertilised with the key thematic areas to illustrate the relationships between policy domains and industrial sectors covered by the scope of study with methodological choice. This output is considered to be taken as a methodological guide for any researchers, practitioners or policy makers, who might embark upon or involved in a Foresight activity. Further outputs of the study will include the identification of centres of excellence in the use of Foresight methods and collaboration networks between countries, institutions and policy domains. Overall, the paper demonstrates how scientometric tools can be used to understand the dynamics of evolution in a research field. Thus, it provides an overview of the use of methods in Foresight, and how it is distinguished from the other FTA activities; the evolutionary characteristics of methodological design and factors influencing the choice of methods; and finally a discussion on the future potentials for new cutting-edge approaches.
Objective In recent years, with the abrupt growth of the amount of biomedical literature, a lot of implicit laws and new knowledge were buried in the vast literature, while the text mining technology, if applied in the biomedical field, can integrate and analyze massive biomedical literature data, obtaining valuable information to improve people's understanding of biomedical phenomena. This paper mainly discussed the research status of text mining technology applied in the biomedical field in recent 10 years in order to provide a reference for further studies of other researchers. Methods Biomedical text mining literature included in SCI from 2004 to 2013 were retrieved and filtered and then were analyzed from the perspectives of annual changes, regional distribution, research institutions, journals sources, research fields, keywords and so on. Results The total amount of global biomedical text mining literature is on the rise, among which literature relevant to named entity recognition, entity relation extraction, text categorization, text clustering, abbreviations extraction and co-occurrence analysis take up a large percentage; studies in USA and the UK are in the leading position. Conclusion Compared with other much more mature research topics, the application of text mining technology in biomedicine is still a relatively new research field worldwide, while with the constantly improving awareness of this field and deepening researches in this area, a number of core research areas, core research institutes and core research fields have been formed in this field. Therefore, further researches of this field will inject new vitality in the development of biomedicine.
In past 15 years, haze has become one of the most concerned environmental problem in China. This study examines China's performance on haze research using scientometrics indicators such as China's global publication share, rank, growth rate and citation impact, its publications in various areas, top journals based on last 15 years (2000-2014) publications data obtained from ISI Science citation index expanded database. Chinese share with international collaborative papers, h-core papers and high-cited papers were also determined. The results indicated a constantly rise in the amount of published papers, the average citations per paper and the presence of Chinese researchers and institutions in top journals. However, the impact and quality of the papers were far to satisfactory. There was still a great gap for impact between China and developed countries (G7 countries) in haze research.
This study utilizes the social network analysis technique to analyze and better understand the semantic and knowledge networks that are associated with the field of international trade and more specifically international free trade agreements. This study builds on the research that currently exists in the field of international trade negotiation by using the SNA technique to construct, visualize, and investigate the International Trade networked knowledge infrastructure by analyzing 3074 publications, 1054 journal sources, 4047 authors, 1516 organizations, 87 countries and keywords associated with this field of research. The network and ego level properties-such as, degree centralities, density, components, structural holes, and degree distribution-suggest that the international trade co-authorship network analyzed is relatively fragmented. The knowledge and sematic networks exhibit power law distribution in which the incoming nodes and links prefer to attach to the nodes that are already well connected. The study also sheds light on the emerging and fading themes in the domain.
The aim of this study is to conduct a retrospective bibliometric analysis of articles about traditional Chinese medicine (TCM) research in PubMed and to learn about the development and perspective of TCM. A systematic bibliometric search was performed based on the PubMed database covering related publications between January 1, 1995, and December 31, 2014. Numbers and type of articles, countries and language of publications, as well as major journals were analyzed in accordance with bibliometrics methodologies. The retrieve results were analyzed and described in the form of texts, tables, and graphs. A total of 42,192 articles were identified from the PubMed database, among which 43.56 % were published as original articles. The articles were originated from 102 countries and territories. China was ranked first with 20,121 articles, followed by United States with 2207 articles. 57.74 % of the articles are published in English. 4364 articles were published by Zhongguo Zhong Yao Za Zhi. And complementary medicine was the most focused research are involving 30,544 articles. The publication activity of TCM literature increased rapidly in the past 20 years, indicating enhanced attention attracted to TCM and more research input. In view of its great advances achieved in scientific studies, TCM will continue to play an important role in medical research.
This study analyses the Indian research output in rabies, one of the most vulnerable zoonotic disease in India. The data for this study was downloaded from PubMed database for the period 1950-2014. A total number of 495 records covered in the database during the study period. The literature growth, India's contribution compared to the world literature output, prolific authors and their collaborative pattern, journal distribution, most productive institution and geographical distribution are discussed in the study. 'Journal of the Association of Physicians of India (JAPI)' is the 1st in the rank list and the journal published 46 (9.29 %) papers during the study period. The study shows that 'Madhusudana SN' is the most productive author producing 57 (11.51 %) contributions. The study also shows the collaborative nature in rabies research in India. 'National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore' is the most productive institution in the field. The Bradford law of scattering does not apply to rabies research in India. We suggest that the agencies those involved in zoonotic disease research, shall give priority to the rabies research. Considering the disease burden and increasing trend of mortality rate, the state and the central Governments shall carry out the joint research projects in this field.
Research progress in membrane water treatment technology during 1985-2013 was analyzed based on the bibliometric analysis of published papers in ISI Web of Science Core Collection (Science Citation Index-Expanded, SCI-E) database, and patents in Derwent Innovation Index database. The searched data (18,075 SCI papers and 19,182 patents) were statistically analyzed from different time, countries, research institutions, mainstream technologies and application industries. The results indicated that, the number of papers and patents on the membrane water treatment technology showed an increasing trend. The United States and China published the most SCI papers and patents in this field, respectively. China has become one of the major countries in this field, but the research level should be further improved. The research levels of the United States, Germany and France were in the international leading level in this field. Japan published a small quantity of SCI papers, but possessed many patents and ranked in second place. Moreover, the patentees of the top 10 all belonged to Japanese companies or enterprises. The mainstream technologies in the field of membrane water treatment included microfiltration, ultrafiltration, nanofiltration, reverse osmosis, electrodialysis, pervaporation and liquid membrane. Among these technologies, ultrafiltration, reverse osmosis and nanofiltration were the research hotspots in SCI papers, while reverse osmosis and liquid membrane were the research hotspots in patents. These mainstream technologies were mainly used for the treatment of drinking water and desalination of seawater or brackish water, and it is necessary to promote the application of membrane technologies in industrial wastewater treatment.
The benthic community ecology of marine and coastal habitats has recently been faced with the challenge of needing a predictive model to anticipate the responses of these natural communities to environmental impacts. This challenge forces the use of quantitative methods to conduct more predictive science. This work is focused on multivariate quantitative methods applied to community ecological problems. A survey was conducted in the Science Citation Index using combined keywords that reflects multivariate quantitative methods, benthic assemblages and marine and coastal habitats. There has been analytical inertia in this research field, as the most commonly used methods have not changed over the years, and novel methods that have been developed inside and outside of ecology have not been included in the analytical tools of marine benthic ecologists. Methods that are increasing the predictive power of freshwater benthic ecology, such as machine learning, have not been used for the benthic community ecology of marine and coastal habitats.
The aim of this work is to make a bibliometric analysis of anticancer research literature based on the data from the Web of Science. Anticancer drug research references published from 2000 to 2014 were used. Citespace software was employed to generate the knowledge maps of country/institution, cited authors, cited journals, co-words and cited references related with anticancer drug research. Results of this analysis indicated that the USA is the most productive country and the Chinese Acad. Sci. is the most productive institution in this field. Maeda H is the most influential author, leading the highest citation author group. "CANCER RES" is the most cited journal in which the most influential anticancer drug research articles were published. Mosmann's (1983) paper is a representative and symbolic reference with the highest co-citation of number of 146 (centrality 0.29). The five hot anticancer drug research topics were also disclosed; they are: (1) chemotherapy drugs, (2) drug delivery, (3) bioscreening, (4) drug resistance research, and (5) enzyme inhibitor studies. Research frontiers identified included drug delivery with nanoparticles, controlled release, metabolism and so on. These analyses will be valuable for the reader to grasp an overall picture of anticancer research and research trends during these years.
The objective of this paper was to identify the intellectual profile of the Ebola research specialty and its behavior from its inception to 2014. This objective was met by chronologically mapping the information flows within the specialty using bibliometric and citation data extracted from 1638 Ebola research documents in conjunction with Histcite to produce an algorithmic historiography representing a view of the Ebola specialty's intellectual profile and evolution. The present study was guided by the following research questions. What is the bibliometric profile of the Ebola specialty in terms of publication output and the impact of its authors, journals, institutions, countries, and years? What influential Ebola research has been produced since its discovery, and how has the research evolved? The most significant results show the Ebola specialty citation network as a small-world and highly cohesive network. The Ebola specialty citation network was found to be symmetrical in structure and segmented into four distinct cliques representing specific research focuses (i.e., uncovering divergent strains, immune responses and vaccines, Ebola's pathogenesis, Ebola's molecular structure and physiology). Key authors and contributing journals were identified. The most substantial contributions to the specialty were from the government and academia. The Ebola specialty had a slow publication output and oscillating citation activity for the first few decades, coinciding with several outbreaks. The greatest production of Ebola research articles occurred after 2000, along with exponential citation behavior.
As an indigenous concept of China, guanxi has gained broad acceptance by the west without further translation or interpretation during the past 30 years, which is a fascinating topic to investigate. In order to discover the dissemination mechanism of the concept of guanxi, this study employs co-citation analysis and citation analysis to visualize the intellectual base and the evolution of guanxi studies; also it pinpoints the most productive authors, institutions and journals. The findings may provide some guidance for the scientific knowledge dissemination.
This study examines the effect of technology availability on traditional and evolving learning research output and trends by using bibliometric tools of analysis. Exponential growth in education and learning research output occurred as of the first half of the 1990s with the introduction of the World Wide Web. Rather than becoming an integral part of learning research, the support of network technologies for learning has grown into a research stream which is separate from traditional research areas such as formal and informal learning. It is affiliated with the sciences, including the medical field, more strongly than with the natural home of educational studies, the social sciences. Keyword analysis indicates terms of broad interest, yet their occurrence in the research publications shows the divergence between traditional learning and technology-based research streams. The community of technology-assisted learning research is undergoing evolution. We provide recommendations to promote a more cohesive research community to better navigate in a borderless digital world where learning occurs formally and informally.
Effort has been made in this case study to find out the ways to achieve maximum citations. In this study, top 50 medical institutes have been selected through the 'Scopus' database on the basis of the performance of their research output during 2003-2012. The main objective of this study was to find out the factors which are impacting the citations of the health professionals in India and how to make them aware to increase the positive impact which will help them in improving their citation count. Findings of the study show that the publications produced with international collaboration get published in international journals easily and received more citations than the publications published in Indian journals. A positive relationship between highly cited papers and international collaborative papers has been established. India has also showed good growth in medical literature produced with international collaboration.
A bibliometric analysis was conducted to assess the trend of microbial risk assessment publications indexed in Scopus from 1973 to 2015. The study was analyzed the distribution of languages, countries, journals, author keywords, authorship pattern, and co-authorship relationships. An exponentially increasing trend (R-2 = 0.98) by 14.72 % article production per year was seen from 1973 to 2015. Risk Anal., Int. J. Food Microbiol. and J. Water Health were published the most papers. United States with 618 articles (78.23 %), Netherlands; 151 articles (19.11 %) and Australia; 135 articles (17.09 %) played active roles in the publication. Ashbolt NJ (3.16 %) from Alberta University and Haas CN from Drexel University (3.16 %) were the most productive authors in this field. The English language was dominant language of all publications (94.18 %). The analysis of author keywords revealed that foods and drinking waters are the most important environment media those are related to the transmission of microbial contamination. An upward trend in the number of articles reveals to be continuing in the future. Given the importance of microbial risk assessment, this trend reflects an increasing attention to the issue of microbial risk assessment among the scientists. It is hoped that transferring the experiences of developed countries in this field to less-developed countries may increases the number of researches in this field. It is expected this study could be the basis for a better understanding and development of researches related to microbial risk assessment worldwide.
This study examined subject and research method trends in educational technology field from 2002 to 2014. Content analysis was applied in order to analyze 1255 articles published in BJET and ETR&D journals using the Educational Technology Papers Classification Form. According to the results, learning approaches/theories and learning environments were the subject most preferred by researchers. The most commonly used research methods were quantitative, qualitative, other (review or meta-analysis), and mixed method, in that order. Researchers tended to use questionnaires, documents, and interviews as data collection tools. The most commonly preferred sample type was the purposive sample, and undergraduate students were the most commonly chosen sample group, with the most common sample size being groups of 31-100. Frequencies, percentages, and tables were the most common presentation format for data in quantitative studies, while qualitative studies most often employed content analysis.
The relationship between information and complexity is analyzed using a detailed literature analysis. Complexity is a multifaceted concept, with no single agreed definition. There are numerous approaches to defining and measuring complexity and organization, all involving the idea of information. Conceptions of complexity, order, organization, and interesting order are inextricably intertwined with those of information. Shannon's formalism captures information's unpredictable creative contributions to organized complexity; a full understanding of information's relation to structure and order is still lacking. Conceptual investigations of this topic should enrich the theoretical basis of the information science discipline, and create fruitful links with other disciplines that study the concepts of information and complexity.
This study explores the usefulness of full-text retrieval in assessing obliteration by incorporation (OBI) by comparing patterns of OBI and citation substitution across economics, management, and psychology for two concept catch phrasesbounded rationality and satisficing. Searches using each term are conducted in JSTOR and in selected additional full-text journal sources from over the years 1987-2011. Two measures of OBI are used, one simply tallying the presence or absence of references to Simon's oeuvre (strict OBI) linked to the catch phrase and one counting only papers lacking any embedded reference as evidence of obliteration (lenient OBI). By either measure, OBI existed but varied across subject area, time period, and catch phrase. Economics had the highest strict OBI (82%) and lenient OBI (43%) for bounded rationality and the highest strict OBI (64%) for satisficing; all 3 subject areas were essentially tied for lenient OBI at about 30%. Sixty-two percent of the articles for bounded rationalitypsychology were retrieved only because the catch phrase occurred in a title in the article bibliography. OBI research can benefit from full-text searching; the main tradeoff is more detailed and nuanced evidence concerning OBI existence and trends versus increased noise in the retrieval.
The 2004-2010 VQR (Research Quality Evaluation), completed in July 2013, was Italy's second national research assessment exercise. The VQR performance evaluation followed a pattern also seen in other nations, as it was based on a selected subset of products. In this work, we identify the exercise's methodological weaknesses and measure the distortions that result from them in the university performance rankings. First, we create a scenario in which we assume the efficient selection of the products to be submitted by the universities and, from this, simulate a set of rankings applying the precise VQR rating criteria. Next, we compare these VQR rankings with those that would derive from the application of more-appropriate bibliometrics. Finally, we extend the comparison to university rankings based on the entire scientific production for the period, as indexed in the Web of Science.
Many studies (in information science) have looked at the growth of science. In this study, we reexamine the question of the growth of science. To do this we (a) use current data up to publication year 2012 and (b) analyze the data across all disciplines and also separately for the natural sciences and for the medical and health sciences. Furthermore, the data were analyzed with an advanced statistical techniquesegmented regression analysiswhich can identify specific segments with similar growth rates in the history of science. The study is based on two different sets of bibliometric data: (a) the number of publications held as source items in the Web of Science (WoS, Thomson Reuters) per publication year and (b) the number of cited references in the publications of the source items per cited reference year. We looked at the rate at which science has grown since the mid-1600s. In our analysis of cited references we identified three essential growth phases in the development of science, which each led to growth rates tripling in comparison with the previous phase: from less than 1% up to the middle of the 18th century, to 2 to 3% up to the period between the two world wars, and 8 to 9% to 2010.
In this article, the use of a new term extraction method for query expansion (QE) in text retrieval is investigated. The new method expands the initial query with a structured representation made of weighted word pairs (WWP) extracted from a set of training documents (relevance feedback). Standard text retrieval systems can handle a WWP structure through custom Boolean weighted models. We experimented with both the explicit and pseudorelevance feedback schemas and compared the proposed term extraction method with others in the literature, such as KLD and RM3. Evaluations have been conducted on a number of test collections (Text REtrivel Conference [TREC]-6, -7, -8, -9, and -10). Results demonstrated that the QE method based on this new structure outperforms the baseline.
Understanding the information-seeking behavior of visually impaired users is essential to designing search interfaces that support them during their search tasks. In a previous article, we reported the information-seeking behavior of visually impaired users when performing complex search tasks on the web, and we examined the difficulties encountered when interacting with search interfaces via speech-based screen readers. In this article, we use our previous findings to inform the design of a search interface to support visually impaired users for complex information seeking. We particularly focus on implementing TrailNote, a tool to support visually impaired searchers in managing the search process, and we also redesign the spelling-support mechanism using nonspeech sounds to address previously observed difficulties in interacting with this feature. To enhance the user experience, we have designed interface features to be technically accessible as well as usable with speech-based screen readers. We have evaluated the proposed interface with 12 visually impaired users and studied how they interacted with the interface components. Our findings show that the search interface was effective in supporting participants for complex information seeking and that the proposed interface features were accessible and usable with speech-based screen readers.
This article reports on a field study of the information behavior of Grade 8 students researching an inquiry-based class history project. Kuhlthau's 7-stage Information Search Process (ISP) model forms the conceptual framework for the study. The aim of the study was to define an end game for the ISP model by answering the following question: How do the student participants' feelings, thoughts, and information behavior lead to the construction of new knowledge? Study findings tentatively indicate that knowledge construction results from an iterative process between the student and information, which can be divided into 3 phases. In the first phase, the students formulate questions from their previous knowledge to start knowledge construction; in the second phase, newly found topic information causes students to ask questions; and in the third phase, the students answer the questions asked by this newly found topic information. Based on these results and Kuhlthau's own ISP stage 7 assessment definition of the ISP model end game, we propose a model of knowledge construction inserted as an extra row in the ISP model framework.
Online-community management is commonly presented as the facilitation of conversation and contributions, especially converting readers to contributors. However, the goal of many discussion communities is to produce a high-quality knowledge resource, whether to improve external task performance or to increase reputation and site traffic. What do moderation practices look like when the community is focused on the creation of a useable knowledge resource rather than facilitating an inclusive conversation? Under what conditions is this style of moderation likely to be successful? We present a case study from online gamingElitist Jerksin which aggressive moderation is used to transform a conversational medium into a high-quality knowledge resource, using the strategy of open censorship. We present a content analysis of moderator comments regarding censored messages. Our analysis revealed differences in types of contributor mistakes and the severity of moderator actions: infractions that interfered with both conversation and resource quality were punished harshly, whereas a set of infractions that supported conversation but undermined resource quality were more respectfully removed. We describe a set of conditions under which moderators should intervene in the conversion of conversation to knowledge resource rather than the conversion of lurkers to contributors.
The aim of this article is to explore the claim that communities of practice (CoPs) can be designed and managed. The concept of CoPs was originally developed as a social learning theory, and CoPs were defined by their informal emergent nature. This informal nature has been recognized to be of value to organizations, resulting in a desire to design CoPs. In this article, the nature of CoPs is addressed by focusing on aspects of formality and informality in relationships and learning; CoPs are described as emergent and designed practices. Furthermore, it is questioned whether a designed CoP may realize the essential characteristics attributed to an emergent CoP. It is argued that it is crucial to recognize the informal nature of CoPs in order to either encourage them as informal phenomena or to use the concept of CoPs as inspiration for designing imitations of them. However, when attempting to design them, the original meaning of a CoP is lost, even though, in some cases, the consequences of such a design may be beneficial to organizations. Nevertheless, when not taking the nature of a CoP into account, a designed construct may have a negative impact on learning and knowing.
With the rapid development of information and communication technologies, people are increasingly referring to web information to assist in their travel planning and decision making. Research shows that people conduct collaborative information searches while planning their travel activities online. However, little is known in depth about tourists' online collaborative search. This study examines tourists' collaborative information search behavior in detail, including their search stages, online search strategies, and information flow breakdowns. The data for analysis included pre- and postsearch questionnaires, web search and chat logs, and postsearch interviews. A model of tourist collaborative information retrieval was developed. The model identified collaborative planning, collaborative information searching, sharing of information, and collaborative decision making as four stages of tourists' collaborative search. The results show that tourists collaborated by planning their search strategies, dividing search tasks into subtasks and allocating workload, using search queries and URL links recommended by teammates, and discussing search results together. Related personal knowledge and experiences appeared important in trip planning and collaborative information search. During the collaborative search, tourists also encountered various information flow breakdowns in different search stages. These were classified and their effects on collaborative information search were reported. Implications for system design in support of collaborative information retrieval in travel contexts are also discussed.
Dataspace systems constitute a recent data management approach that supports better cooperation among autonomous and heterogeneous data sources with which the user is initially unfamiliar. A central idea is to gradually increase the user's knowledge about the contents, structures, and semantics of the data sources in the dataspace. Without this knowledge, the user is not able to make sophisticated queries. The dataspace systems proposed so far are usually application specific. In contrast, our idea in this paper is to develop an application-independent extensible markup language (XML) dataspace system with versatile facilities. Unlike the other proposed dataspace systems, we show that it is possible to build an interface based on conventional visual tools in terms of which the user can satisfy his or her sophisticated information needs. In our system, the user does not need to master programming techniques nor the XML syntax, which provides a good starting point for its declarative use.
It is essential for research funding organizations to ensure both the validity and fairness of the grant approval procedure. The ex-ante peer evaluation (EXANTE) of N=8,496 grant applications submitted to the Austrian Science Fund from 1999 to 2009 was statistically analyzed. For 1,689 funded research projects an ex-post peer evaluation (EXPOST) was also available; for the rest of the grant applications a multilevel missing data imputation approach was used to consider verification bias for the first time in peer-review research. Without imputation, the predictive validity of EXANTE was low (r=.26) but underestimated due to verification bias, and with imputation it was r=.49. That is, the decision-making procedure is capable of selecting the best research proposals for funding. In the EXANTE there were several potential biases (e.g., gender). With respect to the EXPOST there was only one real bias (discipline-specific and year-specific differential prediction). The novelty of this contribution is, first, the combining of theoretical concepts of validity and fairness with a missing data imputation approach to correct for verification bias and, second, multilevel modeling to test peer review-based funding decisions for both validity and fairness in terms of potential and real biases.
Bias in peer review entails systematic prejudice that prevents accurate and objective assessment of scientific studies. The disparity between referees' opinions on the same paper typically makes it difficult to judge the paper's quality. This article presents a comprehensive study of peer review biases with regard to 2 aspects of referees: the static profiles (factual authority and self-reported confidence) and the dynamic behavioral context (the temporal ordering of reviews by a single reviewer), exploiting anonymized, real-world review reports of 2 different international conferences in information systems / computer science. Our work extends conventional bias research by considering multiple biases occurring simultaneously. Our findings show that the referees' static profiles are more dominant in peer review bias when compared to their dynamic behavioral context. Of the static profiles, self-reported confidence improved both conference fitness and impact-based bias reductions, while factual authority could only contribute to conference fitness-based bias reduction. Our results also clearly show that the reliability of referees' judgments varies along their static profiles and is contingent on the temporal interval between 2 consecutive reviews.
In informetrics, journals have been used as a standard unit to analyze research impact, productivity, and scholarship. The increasing practice of interdisciplinary research challenges the effectiveness of journal-based assessments. The aim of this article is to highlight topics as a valuable unit of analysis. A set of topic-based approaches is applied to a data set on library and information science publications. Results show that topic-based approaches are capable of revealing the research dynamics, impact, and dissemination of the selected data set. The article also identifies a nonsignificant relationship between topic popularity and impact and argues for the need to use both variables in describing topic characteristics. Additionally, a flow map illustrates critical topic-level knowledge dissemination channels.
The predictive power of the h-index has been shown to depend on citations to rather old publications. This has raised doubts about its usefulness for predicting future scientific achievements. Here, I investigate a variant that considers only recent publications and is therefore more useful in academic hiring processes and for the allocation of research resources. It is simply defined in analogy to the usual h-index, but takes into account only publications from recent years, and it can easily be determined from the ISI Web of Knowledge.
The h-index was devised to represent a scholar's contributions to his field with respect to the number of publications and citations. It does not, however, take into consideration the scholar's position in the authorship list. I recommend a new supplementary index to score academics, representing the relative contribution to the papers with impact, be reported alongside the h-index. I call this index the AP-index, and it is simply defined as the average position in which an academic appears in authorship lists, on articles that factor in to that academic's h-index.
Various mathematical models have been proposed in the recent literature for estimating the h-index using measures such as number of articles (P) and citations received (C). These models have been previously empirically tested assuming a mathematical model and predetermining the models' parameter values at some fixed constant. The present study, from a statistical modeling viewpoint, investigates alternative distributions commonly used for this type of point data. The study shows that the typical assumptions for the parameters of the h-index mathematical models in such representations are not always realistic, with more suitable specifications being favorable. Prediction of the h-index is also demonstrated.
This paper presents a scientometric analysis of research work done on the emerging area of 'Big Data' during the recent years. Research on 'Big Data' started during last few years and within a short span of time has gained tremendous momentum. It is now considered one of the most important emerging areas of research in computational sciences and related disciplines. We have analyzed the research output data on 'Big Data' during 2010-2014 indexed in both, the Web of Knowledge and Scopus. The analysis maps comprehensively the parameters of total output, growth of output, authorship and country-level collaboration patterns, major contributors (countries, institutions and individuals), top publication sources, thematic trends and emerging themes in the field. The paper presents an elaborate and one of its kind scientometric mapping of research on 'Big Data'.
Solar hydrogen generation is one of the new topics in the field of renewable energy. Recently, the rate of investigation about hydrogen generation is growing dramatically in many countries. Many studies have been done about hydrogen generation from natural resources such as wind, solar, coal etc. In this work we evaluated global scientific production of solar hydrogen generation papers from 2001 to 2014 in any journal of all the subject categories of the Science Citation Index compiled by Institute for Scientific Information (ISI), Philadelphia, USA. Solar hydrogen generation was used as keywords to search the parts of titles, abstracts, or keywords. The published output analysis showed that hydrogen generation from the sun research steadily increased over the past 14 years and the annual paper production in 2013 was about three times 2010-paper production. The number of papers considered in this research is 141 which have been published from 2001 to this date. There are clear distinctions among author keywords used in publications from the five most high-publishing countries such as USA, China, Australia, Germany and India in solar hydrogen studies. In order to evaluate this work quantitative and qualitative analysis methods were used to the development of global scientific production in a specific research field. The analytical results eventually provide several key findings and consider the overview hydrogen production according to the solar hydrogen generation.
The purpose of this paper is to measure the influence and impact of competitiveness research by identifying the 100 most cited articles in competitiveness that are published in academic journals indexed in the database of Web of Science of the Institute for Scientific Information between 1980 and 2013. Using citation analysis we investigated the number of citations that were made to the 100 most cited articles that deal with competitiveness during this 34 year period. We also identified articles, authors, journals, institutions, and countries that have had the most contribution to the literature of competitiveness. Further, we determined in which categories of Web of Science these articles were published and how is the time distribution of their publication. Additionally, we investigated the level of competitiveness that has received the most attention, and the latest level of analysis in competitiveness research. We also explored the type of research design these articles used. Finally, we determined the most popular topics covered and the type of firm or industry/name of nation or region analyzed by these articles. The findings of this research provide a reliable basis for competitiveness researchers to better plan their studies and enhance the influence and impact of their research works. However, the most cited articles published in other databases and categories, and citation to these articles in other publications and resources may deserve future research attention.
The miniaturized analytical device or 'lab-on-a-chip' (LOC) has been the subject of increasing interest over the past two decades, especially in the fields of chemical analysis and biological analysis, which has resulted in an increase in the quality and number of published papers related to this topic. In this paper, the Science Citation Index Expanded and the Social Sciences Citation Index databases from the Institute for Scientific Information Web of Science were searched to identify the LOC-related works published from 2001 to 2013. In total, 3746 documents were found during this period. All collated documents were analysed based on the following parameters: publication year, document type, language, country, institution, author, journal, research area, number of citations, and international collaboration. The majority of works published on LOC included 'article' and 'review' document types. The Lab on a Chip journal published the largest number of papers on this topic. The majority of LOC research was published in the English language and originated from the USA. University of California System was the most productive institution and 'Chemistry' was the most popular research area. According to the citation count of publications, the top 10 most-cited articles and the top 10 most-cited review papers were identified and the characteristics of these papers were described.
The present study has to analyze quantitatively the publications about to genetic improvement in the period 2003-2013 by the Scopus database. Have been observed 3402 publications, 63 % of those on plant breeding, 33 % of animal breeding and 4 % others. It perceives a significant growth of publications found by the Chi-square test for both breeding animal and plant breeding, unlike occurred for the other category that showed equity over the years. Bovine species was the more quantitative publications in general, between the plants, wheat has the largest quantity, followed by rice, corn, soybeans, among others. All correlations analyzed were positive and significant, between publications and citations and between reason and publishing service with impact factor, demonstrating the importance of publications in breeding values for the impact factor of journals. In addition, several authors and journals have excelled and countries like United States, China, India and Brazil presented with larger quantity of publications in the field, a particular case of Brazil, is due to institutions like EMBRAPA in promoting scientific development of genetic improvement.
We propose that the ability of opinion leaders of influencing ordinary farmers to adopt more sustainable practices in irrigation systems during extremely dry periods depends on the type of social network in which they are embedded. We show that in disassortative networks, where influent people link preferentially to relatively disconnected ones, e.g., in large gravity-fed irrigation systems, opinion leaders can be important to encourage the adoption of sustainable practices because they tend to present relatively high betweenness centrality scores, which indicates a greater ability to bridge clusters of otherwise disconnected people. In assortative networks, i.e., in networks in which people tend to connect to other similar individuals, on the other hand, it seems that the ability of leaders to influence other people is not directly related to their betweeenness centrality degree. We conclude than that betweenness centrality is a source of influence only in disassortative networks, that additionally present low clustering coefficients. This suggests that opinion leaders may be important to help bridge the informational gap between agencies and stakeholders in some, but not in all, irrigation systems, in order to pave the way for the adoption of measures to cope with extreme droughts, like the one Brazil is currently experiencing.
Based on papers indexed in the Web of Science, we investigated publication activities of China, Japan, USA and EU with special focus on journals publishing most papers of the four economies. Both overall situation and activities in physics have been analyzed. The results show that world science is still led by the West represented by USA and EU. USA and EU share the most in selecting journals for publishing papers, capability of publishing in high-JIF values. China and Japan publish heavily in local MISC journals, but Japan performs slightly better in publishing in MISC journals with relatively higher-JIF values. Japan is closer to the West and share least with China in selecting journals to publish papers. All the four economies are most productive in physics, and with overlaps in other fields. Similar results are found in publication activities in physics.
Some bibliometric research has been carried out in sport sciences, but compared with other disciplines there is still no intensive study at macro level, especially on international collaboration. This study attempts to observe the status and trend of international collaboration in sport sciences at macro level, and to look at its relationship with academic impact. 20,804 publications from 63 consistently issued journals belonging to the Sport Sciences category in Web of Science database in 2000-2001 and 2010-2011 were analyzed. The main objects include co-authorship links of country pairs, the share of international co-authored publications, tendency and "affinity" in collaboration, and citation impact of international publications. Differences between countries and periods were observed. There is a rapid increase of the share of international collaboration in sport sciences. In some countries the share is even above 2/3; Co-authorship networks imply some cultural, political or geographical factors for collaboration, and their changes exhibit some new trends; Selected countries have strong tendency in collaboration; International collaborated publications have a higher performance than domestic ones in citation impact. But gaps between countries are narrowing. International collaboration really intensified in this field. European, especially Nordic countries are very fond of collaboration and have gained outstanding performance as a partner. It is meaningful to further explore the underlying motivation behind international collaboration in sport science research.
Global industrialization is accelerated under the driving force of developing countries' rapid economic development. Water pollution is inevitably worsened due to lagging investment in basic treatment infrastructure. Wastewater treatment cannot rely on just one treatment technique, so research in this field has attracted much attention to satisfy stringent recovery and emissions standards increasingly imposed on industrial wastewater. This case study is a bibliometric analysis conducted to evaluate industrial wastewater treatment research from 1991 to 2014, based on the Science Citation Index Expanded (SCIE) database. Journal of Chemical Technology and Biotechnology is the leading journal in this field, publishing 3.8 % of articles over this period, followed by Journal of Hazardous Materials and Water Research, the latter of which has the highest impact factor and h-index of all journals in this field. India and the Chinese Academy of Sciences were the most productive country and institution, respectively, while the USA, was the most internationally collaborative and had the highest h-index (82) of all countries. A new method named "word cluster analysis" was successfully applied to trace the research hotspots. Innovation in treatment methods is thought to relate to the growth in volume and increase in complexity of industrial wastewater, as well as to policy decisions in developing countries that encourage effective industrial wastewater treatment.
Information security has become a central societal concern over the past two decades. Few studies have examined the information security research domain, and no literature has been found that has applied an objective, quantitative methodology. The central aim of the current research was to quantitatively describe the profile and evolution of the information security specialty. Bibliometric data extracted from 74,021 Scopus research records published from 1965 to 2015 were examined using impact and productivity measures as well as co-word and domain visualization techniques. This scientometric study presents a comprehensive view of the information security specialty from several perspectives (e.g., temporal, seminal papers, institutions, sources, authors). After a long and steady period of growth (1965-2004), an exponential publication output occurred between 2005 and 2010. Among all the countries involved in information security research, the United States and China had the greatest impact, and China has surpassed the United States in terms of productivity. Information security as a specialty is largely populated by publications from the technical fields of computer science and engineering. Several research themes were found throughout the decades (e.g., cryptography and information security management and administration), and emergent research subspecialties appeared in later decades (e.g., intrusion detection, medical data security, steganography, wireless security). This study reduces the complexity of the specialty to controllable terms, supplies objective data for science policy making, identifies the salient bibliographic units, and uncovers growth patterns. It also serves as an information retrieval tool to identify important papers, authors, and institutions.
Recently, many countries have shown considerable interest in gas hydrates as an alternative energy source and have conducted many studies for exploring, drilling, and producing gas hydrates. In this paper, we investigate international R&D trend in gas hydrates in order to provide research direction by applying text clustering to the published papers and patent documents. In particular, our study examines research in the United States, where the most active research has been undertaken; Japan, a country that leads in technological development; and China, which has enormous resources of gas hydrates. This study delineates these countries' expertise and changing R&D trends in gas hydrates. Our findings can provide insights for academic research and technical development regarding gas hydrates.
This paper focus on the growth and development of mobile technology research in terms of publication output as reflected in Web of Science database. During 2000-2013 a total of 10,638 publications were published in the field. The average number of publications published per year was 759.86 and the highest number of publications 1495 were published in 2013. Output of total publications, 9037 were produced by multiple authors and 1601 by single authors. Authors from USA have contributed maximum number of publications compared to the other countries and India stood 16th ranking in terms of productivity in this study period. The most prolific author is Kim who contributed 42 publications followed by Kim with 36 publications. Collaboration Index ranges from 3.67 (2000) to 4.57 (2009) with an average of 4.32 per joint authored paper which implies the research team falls between 3 and 5 in the field of mobile technology. University of California System (USA) is the highly contributed institutions with 243 publications followed by University of London (UK) with 149 publications, Florida State University System (USA), National Chiao Tung University (China) with 88 publications.
With the changes in the operational environment of the air transport industry taking place, airlines are now involved in a more dynamic and complex system. However, planning a way to take into account all the current factors is only achievable with an adequate comprehension of how the current business models answer this challenge. This scenario has prompted many researchers to conduct studies related to business models and the air transport industry. This article maps the academic production in these two areas, to define the main periodicals, authors and geographic regions covered, based on a bibliometric study with qualitative indicators. From 1990 to 2012, we found nearly 19 thousand articles in the ISI Web of Knowledge and then refined the selection of relevant articles to about 500, to which we applied the same qualitative indexes. The results indicate that in the past 5 years the number of scientific articles has grown at a reasonable pace and that the authors that have published the most articles on these themes come from research centers in the United Kingdom, United States and Taiwan.
By employing the social network analysis technique, this study decomposed author and title keyword networks of the information technology management domain formed by 351 outlets, 914 institutions, 64 countries, 1913 authors, and thousands of keywords. The network and ego level properties-such as, degree centralities, density, components, and degree distribution-suggest that the keyword network exhibits power law distribution: a few popular keywords or themes are frequently used by follow-on studies. The study sheds light on the emerging and fading themes in the domain. In light of the analysis some important implications are discussed.
Medical research is known to be the most important research as it is directly related to the well being of humans. The government of India is spending a large amount of money on medical research through ICMR by way of various research projects. In this paper the growth of the research output of top 50 medical institutes of India has been observed in terms of national as well as international collaboration. It has been found that the growth of national research output has doubled and the growth of research with international collaboration has grown four times during 2003-2012. The overall growth is found to be consistent. Top 20 favorite countries of medical professionals in India have also been found with number of publication published with each country during 2003-2012.
We have analyzed in detail the physiology field in the Czech Republic and Hungary. Apart from classic descriptive bibliometric analysis we have also tried to compare research directions and topics inside the physiology field in the Czech Republic and Hungary with the world trends. For this purpose we have employed bibliometric mapping using the computer program VOSviewer. In conclusion, the Czech physiology field is quickly growing and catching up with the world average in respect of the publication production. Hungarian physiology field is growing only moderately. Citation impact of publications of both countries is lagging behind the field standards. International collaboration rate of the publications (co)authored by the Czech researchers has been considerably lower than that of Hungarian or other comparable European researchers. VOSviewer mapping indicated that Czech and Hungarian authors have been involved in less than 25 % of the physiology topics. The research topics of both countries have been considerably different; the share of the phrases with participation of both countries ranged from about 2 % in 1994 to 9 % in 2009. The analysis also indicates that low citation impact of the Czech and Hungarian publications is not due to the selection of less-cited research topics by the domestic researchers.
In this study, a bibliometric approach is applied to identify global trends related to the waste management (WM). As a key topic for environmental protection and resource utilization, WM has received considerable attention in the scientific community over the past decades, reflecting by the vast peer-reviewed articles on WM can be retrieved by accessing the web of science database. The data retrieved covers the period from 1997 to 2014. Analyzed parameters included document type, and publication output as well as distribution of countries, institutions, source titles, subject category, and author keywords. The implications for current trends and recent hotspots were presented and discussed. Furthermore, a contrastive analysis between the topics of concern in the industrialized and developing countries was carried out. Results of this case could provide a reference to the decision-making and policy of WM for the government to some extent and help in bringing key elements on the theoretical and practical contributions so far as well as on the future challenges the field must face.
Tropical trees of Calophyllum genus (Calophyllaceae) have chemical and biological importance as potential source of secondary active metabolites which can lead to the development of new drugs. Research on this species has been rising since 1992 due to the discovering of anti-HIV properties of Calanolide A found in Calophyllum inophyllum leaves. This compound is the most important natural product for potential development of new anti-HIV drugs and phytomedicines. The scientometric analysis (1953-2014) here performed revealed that the most studied species of Calophyllum genus are: C. inophyllum and C. brasiliense, distributed in the Asian, and American continents, respectively. Current research on these species is carried out mainly in India and Brazil, respectively, where these species grow. Research on C. brasiliense is focused mainly on ecological, antiparasitic, cytotoxic properties, and isolation of new compounds. Chemical studies and biodiesel development are the main topics in the case of C. inophyllum. Text mining analysis revealed that coumarins, and xanthones are the main secondary active metabolites responsible for most of the reported pharmacological properties, and are potential compounds for the treatment of leukemia and against intracellular parasites causing American Trypanosomiasis and Leshmaniasis. On the other hand, C. inophyllum represents an important source for the development of 2nd generation biodiesel. Medicinal and industrial applications of these species may impulse sustainable forest plantations. To our knowledge this is the first scientometric and text mining analysis of chemical and biomedical research on Calophyllum genus, C. brasiliense and C. inophyllum.
This paper presents a detailed chronological survey of papers published in the Web of Science category of dance from 1994 to 2013 based on Arts & Humanities Citation Index (A&HCI). An analysis of the research performance according to publication output, distribution of words in article title was carried out. Performances of authors, including total, single author, first author, and corresponding author publications were analyzed. The results indicated that annual output of the articles increased slightly. More document types were found in A&HCI database than in other Web of Science database. Dance Magazine published the most articles. Single-author article was the most-popular type of authorship. Editors were the dominant author. "Ballet" is the main research topic in dance field.
During the last 30 years, the growth of the interpreting industry in China has been outstanding. Increasing economic and political collaboration has driven the demand for interpreters to bridge the linguistic and cultural divides that exist between China and the West. With the creation of master's and bachelor's degrees in interpreting and translation all over China, hundreds of graduates from various universities have since undertaken distinctly different career paths. Using an exhaustive corpus of Masters' theses and a combination of logistic regression and Targeted Maximum Likelihood Estimation to establish causalities, this paper focuses on some of the structural determinants of graduate students' career choices. The paper examines to what extent university affiliations, thesis advisors, research methodology and thesis content influence the choice to pursue an academic career. The research reveals that graduating from a top university makes students less likely to become academics, and studying under a top advisor does not necessarily increase an individual's chances of securing an academic post. By contrast, writers of empirical theses or ones that are about training are more likely to enter the academic sphere.
This case study analyzes scientific research landscape of the Islamic World in order to access the research productivity, scholarly impact and international collaborations across all Science and Technology (S&T) areas over the time period of 2000-2011, using the Scopus database. While Turkey is clearly leading among the Islamic countries, Iran takes 2nd rank in terms of publication output. All S&T subject areas show annual increase in publications by more than 10 %. The highest percentage of publications of the Islamic World falls into the area of Veterinary with respect to the world. Dentistry is the top area with 7 % share of the world's top 25 % publications with respect to citation count. This undoubtedly shows that the impact of the scientific research of the Islamic World is very less as compared to the other developed nations. We also find that top collaborators of the Islamic World are mainly within Islamic countries. The findings of this case study provide an insight to the research landscape of the Islamic World and useful information to the scientific community as well as to the technology and innovation policy makers.
This study explores e-commerce (EC) research trends and forecasts applying bibliometric analysis from 1996 to July 2015 with topic as "e-commerce" in SSCI database. The bibliometric analytical technique is used to examine the topic in SSCI journals from 1996 to July 2015, we found 5429 articles with EC. This paper surveys and classifies EC articles using the eight categories for different distribution status in order to explore how EC research trends and applications have developed in this period. Besides, the paper will perform K-S test to verify the reliability of Lotka's Law. The study provides an EC roadmap to guide future research and abstract the trend information so that EC researchers can save some time to browse sources since core knowledge will be concentrated in EC core categories. In higher quality publications, it is very common for "success breeds success" phenomenon.
Individual adoption of technology is crucial for the success of technology implementation and has thus attracted much attention from researchers. Recent advances in citation-based analysis have been suggested as being efficient for analyzing knowledge dissemination within scientific disciplines. This article presents a case that examines the technology acceptance research through the newly developed citation-based approach, in particular main path analysis and edge-betweenness clustering analysis. Based on the citation network constructed from a total of 1555 journal articles from the period 1989 to 2014, the most critical 50 citations were identified and used as the basis to map the major knowledge flow in technology acceptance research. In addition, edge-betweenness based clustering was used to classify the citation network into coherent groups. As a result, five distinct research fronts were identified, namely e-learning, mobile-commerce, e-health, e-tourism, and technology post-acceptance research. This case study highlights the theoretical development trajectories, and identifies the most active research fronts of technology acceptance research, providing a research-based platform for further scholarly discussions.
Literature regarding translation studies has increased rapidly in recent decades, yet there have been few empirical studies to investigate the research context of translation studies at the global level. A bibliometric analysis was carried out in this research to probe the current status and the research themes of translation studies papers published between 2000 and 2015 in all journals indexed by the Web of Science database. Bibliometric methods and knowledge visualization technologies were employed to thoroughly investigate publication activities, geographic distributions, core literature, and the distinctive research areas of translation research. The study distinguishes three research areas in translation studies, namely, theoretical translation studies, translation and interpreting training, and descriptive translation studies. The dissemination of knowledge in these areas is realized by publication sources specializing in language and linguistics, applied linguistics and pragmatics, phonetics and acoustics, and translation and interpreting. The core literature in translation studies has been focused to focus on linguistic theories, research methodology, theoretical models, interpreting, and new perspectives. This study provides researchers with several useful insights to better understand developments in translation studies.
The paper highlights the role of medical education and research in the betterment of human lives and analyses the research output of top 50 medical institutes in India during 2003-2012. It ranks such institutes on the basis of their research publications using Scopus database and indicates which medical institute has contributed maximum publications and which has contributed minimum publications. The paper points out the qualitative as well as the quantitative ranking of top 50 medical institutes in India using total production, citation and h-Index etc.
The aim of this article is to assess China's R&D position and status on salt lake resources with a method combining bibliometrics with social network analysis. Patent data about the mining and usage of salt lake resources harvested from the SciFinder database and ranging from 1991 to 2010 is analyzed in this paper. The results show that there has been a rapid growth in patent publications regarding salt lake resources in the last 20 years, both at home and abroad. This is especially true with regard to the extraction and application of magnesium and potassium. China's R&D in this field is worthy of attention, because China owns considerable salt lake reserves. The status of R&D groups in China is assessed via network analysis, through both Ucinet software and NetDraw software. We use separate records from the China Academic Journal Network Publishing Database (CAJD) and the Science Citation Index Expanded (SCIE), covering the time span from 2001 to 2010. A collaboration network is established, and its structure and attributes are analyzed with a view to assessing the R&D groups in this field. Results from these analysis demonstrated that China stands in a disadvantaged position in the implementation of related research and technology; several research groups have been formed to explore the mining of salt lake resources; Collaboration is mostly still confined to the domestic scene; cross-national collaboration has not yet started to grow. Finally, proposals are put forward for the formulation of an R&D strategy and for cultivating R&D groups.
The aim of the paper is to evaluate the research productivity and performance of countries that fall in the Middle East. The data was gathered from the research analytical tool of Thomson Reuters, InCites. The data was collected over a period of 33 years (1981 through 2013) with "global comparisons" as the dataset and "compare countries/territories" as the report name under "national comparisons". The data was collected from 15 countries of Middle East (as per InCites categorization) viz; Bahrain, Egypt, Iran, Iraq, Israel, Jordan, Kuwait, Lebanon, Oman, Qatar, Saudi Arabia, Syria, Turkey, United Arab Emirates, and Yemen. Each country was assessed on the basis of six parameters: total no. of web of science documents; total citation count; average citations per documents, percentage of cited documents; impact relative to the world; and aggregate performance indicator. On all these parameters, Israel occupies the first position. The 2nd rank in terms of total web of science documents and total citation count is occupied by Turkey. Kuwait has 2nd highest percentage of cited documents, and Lebanon occupies 2nd rank in terms of relative impact (in comparison to world). In terms of aggregate performance, Qatar ranks 2nd.
This study aims to map the intellectual structure of social media research in China from 2006 to 2013. Bibliometric and co-word analysis were employed to reveal the characteristics and status on social media research in China. Data was collected from China Academic Journals Full-text Database during the period of 2006-2013. In bibliometric analysis, descriptors of years, themes, subjects, institutions and authors were applied to obtain the research characteristics of social media. In co-word analysis, hierarchical cluster analysis, strategic diagram and social network analysis were adopted. Main results show that, a total of 3178 CSSCI papers on social media have risen yearly and exponentially. The most and distinctive themes were microblog, blog, virtual community and social networking site. The most common subject was News and media, followed by Library, information and digital library, Computer software and application. Wuhan University, Renmin University of China and Nanjing University ranked the top three on the most publications. And the distribution of number of authors with different publications obeys power-law distribution. Moreover, the number of keyword frequency obeys power-law distribution. The core keywords include social media, traditional media, Internet, dissemination and user. There are ten research directions on social media in China, some of which are highly correlated. Generally, the relatively dispersive distribution of research topics suggests the imbalanced development on social media research in China. Some hot topics are well-developed and tend to be mature, a few topics have a great potential for further development, and many other topics are marginal and immature.
This paper aims to understand how the knowledge field of production engineering interacts with the field of public affairs, which is related to government, public bodies and public policies for many sectors. It is structured under bibliometric analysis and reveals a panorama of 188 papers from 2000 to 2012 that presents where and how these relationships between production engineering and the public affairs occur. This study presents five different analysis to understand these possibilities: papers by year, papers by production engineering areas, papers by author, papers by journal and papers by public functions. Main findings of this research show that the leading areas where these interfaces occur in academic research are environment, sustainability, industry and organizations. Another relevant result is that the intersection of production engineering and the public sector has not been figured year as a core research topic to any individual researcher or to a group of researchers around the world. The main contribution of this paper to the academic community is the pioneering on presenting a view on the size and on the main characteristics of this interface through bibliometric analysis.
To understand the history and research status of earthworm, earthworm research citation data was collected from the Science Citation Index Expanded during the period from 2000 to 2015. Next, HistCite was used to analyse the yearly output, country, institution, journal, citation impact and citation relationships in the field. Results indicated that the research of earthworm has increased during the studied 16-year period. The country with the highest research output was the USA, while the institution with the highest research output was the Chinese Academy of Sciences. The majority of articles and Total Location Citation Score (TLCS) values came from developed countries. Developed countries have more research advantages in this field than that of developing countries. The top three outputs journals were Soil Biology and Biochemistry, Pedobiologia and Applied Soil Ecology. The top 3 TLCS journals were the same as top outputs journals. Articles published with higher TLCS scores had a greater impact on the following research and played an important role in research development.
The patterns of tree-related stress research depended on their cultivation status and were statistically highly significant in all analyses. Non-cultivated tree species were studied more, cited more often, by authors from differing countries, with emphasis on different tree processes, stress types and research areas, and published in different journals. From 2001-2014, 4128 articles in 586 different academic journals dealt with tree stress. A majority of journals published stress-related research either on cultivated or on non-cultivated tree species. The articles were averagely cited 17 times, the five dominant journals being Acta Horticulturae, Tree Physiology, Trees-Structure and Function, Forest Ecology and Management and PLoS ONE. Research was published by authors from 109 countries, authors from China, USA, Spain, Brazil and Italy being the most productive. International collaboration was present in 21 % of the articles. A total of 1141 tree species were studied from 366 genera. The dominant species studied were Olea europaea, Malus x domestica, Pinus sylvestris, Prunus persica, Picea abies. Around A3/4 of the articles were single species studies. Water stress, followed by drought stress, salt stress, abiotic stress, and environmental stress were the most studied types with over 90 % articles dealing with a single stress type. Physiological and ecophysiological research of trees exposed to stress dominated, followed by molecular biology and biochemistry, genetics, ecology. Tree growth was the most studied process/activity, followed by photosynthesis, gene expression, stomatal conductance and water status. An increase in "-omics" type research was observed in recent years in cultivated tree research.
A bibliometric analysis was performed in this work to evaluate the research publications on Central Asia from 1990 to 2014 based on Science Citation Index and Social Science Citation Index databases. This study presented a comprehensive overview of the field from the aspects of major journals, subject categories, outstanding keywords, leading countries, institutions and authors, as well as the research collaborations. It was identified that a total of 11,025 papers were published in 2356 journals and there had been a steady development over the past 25 years for Central Asia research. Geosciences Multidisciplinary, Geochemistry and Geophysics, Paleontology, Environmental Sciences and Zoology were the most popular subject categories. Keywords analysis indicated that "Tien Shan", "climate change", "taxonomy", "new taxa", and "health care" were the topics that generated the most interest and concern. Besides, temporal evolution of keywords revealed the rapid growth of "Central Asia Orogenic Belt" and "Zircon UPb dating". According to research forces analysis, USA and Russian Academy of Sciences came as the leading contributors and had the dominant position in collaboration networks. This paper was a new attempt to better the understanding of the progress in Central Asia research. The findings of this study would help researchers improve the performance.
Bibliometrics is a research field that studies quantitatively the bibliographic material. This study analyzes the academic research developed in Latin America in economics between 1994 and 2013. The article uses the Web of Science database in order to collect the information and provides several bibliometric indicators including the total number of publications and citations, and the h-index. The results indicate that Brazil, Mexico, Chile, Argentina and Colombia are the only countries with a significant amount of publications in economics in Web of Science although Costa Rica and Uruguay have considerable results in per capita terms. The annual evolution shows a significant increase during the last 5 years that seems to continue in the future, probably with the objective of reaching similar standards than the most competitive countries around the World. The results also show that development, agricultural and health economics are the most significant topics in the region.
Ranking research productivity of institutions periodically is of a great necessity nowadays, for that can not only help understand the latest development level of related fields but also contribute to finding gaps and quickly improve. In previous studies, the number of publications and further impact factors of journals is two widely used indexes to measure research productivity. However, impact factors do not always tally with a quality of the journal, which will lead to a bias of research productivity. Given this, a new journal rating adjusted publications (JRAP) index is constructed in this paper, which is based on the journal rating of the ABS journal guide. It takes the quantity of publications and the quality of the publications into account at the same time. Compared with impact factors, academic journal guide can provide more authoritative and accurate measurement to the quality of journals. Experiments are conducted to rank Asia-Pacific institutions in business and management area based on JRAP, and it is the first time to rank Asia-Pacific institutions systematically. Ranking of institutions measured by three methods is also given. Compared the results obtained by three different rank methods, although institutions ranked in the top places keep the same, the specific rank differs. The results indicate that JRAP do prefer the institutions perform well in paper quantity and quality of journals.
This study uses bibliometric methods to analyze the scientific literature of fracking. Web of Science database, including the Science Citation Index, Sciences Citation Index and Conference Proceedings Citation Index-Science were used to collect the data. The analysis done in the paper looks at the annual distribution of publications, countries, institutes, authors, journals and categories. Furthermore, key topics and highly-cited papers were analyzed. The results show that fracking as a new research term appears in the Web of Science records from 1953 and its presence in the Web of Science has been growing ever since, becoming a hot topic recently. The countries with most of the contributions have been USA, China and Canada, whereas the Russian Academy of Sciences, University of Oklahoma and Tohoku University were the three institutions with most publications in fracking research. The publications have been concentrated in several journals, led by the Journal of Petroleum Technology, HeN"mND1/2oe XoD center dot ND(1)cmD(2)o and International Journal of Rock Mechanics and Mining Sciences, and categorized mainly in geosciences multidisciplinary, Engineering Petroleum and Energy Fuels. The study has identified that terms of fracking can be divided into three main clusters, related to "drilling methods", "exploitation/extraction process" and the "geoscience aspects". The highly cited papers in the period 1953-2013 were collected and analyzed, in order to show the papers with highest impact in fracking area.
The study utilized co-word analysis to explore papers in the field of Internet of Things to examine the scientific development in the area. The research data were retrieved from the WOS database from the period between 2000 and 2014, which consists of 758 papers. By using co-word analysis, this study found 7 clusters that represent the intellectual structure of IoT, including 'IoT and Security', 'Middleware', 'RFID', 'Internet', 'Cloud computing', 'Wireless sensor networks' and '6LoWPAN'. To understand these intellectual structures, this study used a co-occurrence matrix based on Pearson's correlation coefficient to create a clustering of the words using the hierarchical clustering technique. To visualize these intellectual structures, this study carried out a multidimensional scaling analysis, to which a PROXCAL algorithm was applied.
This study explores current collaboration trend between industry and academic institutions in fuel cells by examining collaborative papers and patents during the period 1991-2010. Papers and patents from industry-academia collaboration (IAC) are identified; the quantity, ratio, and their origins are analyzed; and the differences in performance of these collaborative documents between academic institutions and industrial institutions are contrasted. This study finds that quantities of industry-academia collaborative papers and patents increased annually in both academic institutions and industrial institutions. Countries with high production of papers and patents tend to produce more industry-academia collaborative papers and patents. Industrial institutions with high patent output and academic institutions with high paper output are active participants in IAC paper collaborations. Only a few pairs of industry-academic alliances have taken active part in IAC patent collaborations. Industry relies highly on collaboration with academia in paper publishing, but not in patenting, while academic institutions rarely rely on industry collaboration for paper or patent productivity.
Most approaches to patent citation network analysis are based on single-patent direct citation relation, which is an incomplete understanding of the nature of knowledge flow between patent pairs, which are incapable of objectively evaluating patent value. In this paper, four types of patent citation networks (direct citation, indirect citation, coupling and co-citation networks) are combined, filtered and recomposed based on relational algebra. Then, a method based on comprehensive patent citation (CPC) network for patent value evaluation is proposed, and empirical study of optical disk technology related patents has been conducted based on this method. The empirical study was carried out in two steps: observation of network characteristics over the entire process (citation time lag and topological and graphics characteristics), and measurement verification by independent proxies of patent value (patent family and patent duration). Our results show that the CPC network retains the advantages of patent direct citation, and performs better on topological structure, graphics features, centrality distribution, citation lag and sensitivity than a direct citation network; The verified results by the patent family and maintenance show that the proposed method covers more valuable patents than the traditional method.
Gender studies is a growing field in academe. It is intrinsically associated with feminism and political reforms, and has in Sweden enjoyed exclusive resources and legislated support. The present study aims to characterize gender studies published by authors based in Sweden, and poses a number of hypotheses regarding its rate of growth, impact, and other bibliographical variables. To this end, publications concerning gender by authors based at Swedish universities were collected from a range of sources and compiled to form a population database of publications between 2000 and 2010. The results show from which universities and disciplines the gender studies authors come from, and in which journals they are most frequently published. We also compare the proportion of gender studies to the entire body of publications from a number of countries, and show that in Sweden it has grown faster than other types of publications. A comparison between literatures that consider socially constructed gender or biological sex showed that the former is less cited and published in journals with lower IF than the latter. Our Swedish Gender Studies List population database, which also features an international, non-exhaustive comparison sample that is matched to the Swedish sample in certain respects, is made available for further scientific study of this literature, for example by enabling the extraction of random samples.
Comment to the article 'Characteristics of gender studies publications: A bibliometric analysis based on a Swedish population database' by Therese Soderlund and Guy Madison (Scientometrics, 2015). From the position of relevant expertise within gender studies and bibliometrics, this text offers a critique of the present study and some suggestions of alternative ways forward. It analyses (1) the object of study of the article (the terms used to denominate the field, keywords and methods to make sample selection), (2) technical issues and the question of language in relation to international citations and impact factor, and (3) the views presented in the article regarding gender studies and political ideology.
The present study examines that a research and development (R&D) performance creation process conforms to the stepwise chain structure of a typical R&D logic model regarding a national technology innovation R&D program. Based on a series of successive binary logistic regression models newly proposed in the present study, a sample of n = 929 completed government-sponsored R&D projects was analyzed empirically. Sensitivity analyses are summarized where the performance creation success probability is predicted for some key R&D performance factors.
In a previous study of patent classifications in nine material technologies for photovoltaic cells, Leydesdorff et al. (Scientometrics 102(1):629-651, 2015) reported cyclical patterns in the longitudinal development of Rao-Stirling diversity. We suggested that these cyclical patterns can be used to indicate technological life-cycles. Upon decomposition, however, the cycles are exclusively due to increases and decreases in the variety of the classifications, and not to disparity or technological distance, measured as (1 - cosine). A single frequency component can accordingly be shown in the periodogram. Furthermore, the cyclical patterns are associated with the numbers of inventors in the respective technologies. Sometimes increased variety leads to a boost in the number of inventors, but in early phases-when the technology is still under construction-it can also be the other way round. Since the development of the cycles thus seems independent of technological distances among the patents, the visualization in terms of patent maps, can be considered as addressing an analytically different set of research questions.
An attempt is made to cluster journals from the complete Web of Science database by using bibliographic coupling similarities. Since the sparseness of the underlying similarity matrix proved inappropriate for this exercise, second-order similarities have been used. Only 0.12 % out of 8282 journals had to be removed from the classification as being singletons. The quality at three hierarchical levels with 6, 14 and 24 clusters substantiated the applicability of this method. Cluster labelling was made on the basis of the about 70 subfields of the Leuven-Budapest subject-classification scheme that also allowed the comparison with the existing two-level journal classification system developed in Leuven. The further comparison with the 22 field classification system of the Essential Science Indicators does, however, reveal larger deviations.
Research project evaluation and selection is mainly concerned with evaluating a number of research projects and then choosing some of them for implementation. It involves a complex multiple-experts multiple-criteria decision making process. Thus this paper presents an effective method for evaluating and selecting research projects by using the recently-developed evidential reasoning (ER) rule. The proposed ER rule based evaluation and selection method mainly includes (1) using belief structures to represent peer review information provided by multiple experts, (2) employing a confusion matrix for generating experts' reliabilities, (3) implementing utility based information transformation to handle qualitative evaluation criteria with different evaluation grades, and (4) aggregating multiple experts' evaluation information on multiple criteria using the ER rule. An experimental study on the evaluation and selection of research proposals submitted to the National Science Foundation of China demonstrates the applicability and effectiveness of the proposed method. The results show that (1) the ER rule based method can provide consistent and informative support to make informed decisions, and (2) the reliabilities of the review information provided by different experts should be taken into account in a rational research project evaluation and selection process, as they have a significant influence to the selection of eligible projects for panel review.
Due to the vast array of fields that use or have used institutional repositories, and the varying degrees of technology used, it is important to identify the development of the field, institutional repositories, in order to understand the breadth and depth of studies involved. This longitudinal study of the subject terms associated with published journal articles allows for a clearer understanding of the field of institutional repositories as it developed, evolved, and changed over time. This study's uniqueness lies in its longitudinal nature, and its use of information visualization, multidimensional scaling, and parallel coordinate analysis provide information regarding the field of institutional repositories. The multidimensional scaling and parallel coordinate analysis in conjunction with temporal analysis reveal that institutional repositories transitioned through several development phases. Future studies of institutional repositories will most likely discover evaluation tactics and potential guidelines, resulting in a need for additional case studies. Observations from the parallel coordinate analysis reveal three major themes. The first theme is the maturity of institutional repositories as a field over time, the second is the fluctuation and developmental status of institutional repositories until 2010-2013 (Period IV), and the third theme is the emergence of the discipline information science and library science as the strong generator of institutional repository research. Through the visualization and temporal analysis, information was gained regarding the history, development, and future studies within institutional repositories.
Scientific mapping has now become an important subject in the scientometrics field. Journal clustering can provide insights into both the internal relations among journals and the evolution trend of studies. In this paper, we apply the affinity propagation (AP) algorithm to do scientific journal clustering. The AP algorithm identifies clusters by detecting their representative points through message passing within the data points. Compared with other clustering algorithms, it can provide representatives for each cluster and does not need to pre-specify the number of clusters. Because the input of the AP algorithm is the similarity matrix among data points, it can be applied to various forms of data sets with different similarity metrics. In this paper, we extract the similarity matrices from the journal data sets in both cross citation view and text view and use the AP algorithm to cluster the journals. Through empirical analysis, we conclude that these two clustering results by the two single views are highly complementary. Therefore, we further combine text information with cross citation information by using the simple average scheme and apply the AP algorithm to conduct multi-view clustering. The multi-view clustering strategy aims at obtaining refined clusters by integrating information from multiple views. With text view and citation view integrated, experiments on the Web of Science journal data set verify that the AP algorithm obtains better clustering results as expected.
This study explores the curvilinear (inverted U-shaped) association of three classical dimension of co-authorship network centrality, degree, closeness and betweenness and the research performance in terms of g-index, of authors embedded in a co-authorship network, considering formal rank of the authors as a moderator between network centrality and research performance. We use publication data from ISI Web of Science (from years 2002-2009), citation data using Publish or Perish software for years 2010-2013 and CV's of faculty members. Using social network analysis techniques and Poisson regression, we explore our research questions in a domestic co-authorship network of 203 faculty members publishing in Chemistry and it's sub-fields within a developing country, Pakistan. Our results reveal the curvilinear (inverted U-shaped) association of direct and distant co-authorship ties (degree centrality) with research performance with formal rank having a positive moderating role for lower ranked faculty.
The distribution of the number of academic publications against citation count for papers published in the same year is remarkably similar from year to year. We characterise the shape of such distributions by a 'width', , associated with fitting a log-normal to each distribution, and find the width to be approximately constant for publications published in different years. This similarity is not surprising, after all, why would papers in a given year be cited more than another year? Nevertheless, we show that simple citation models fail to capture this behaviour. We then provide a simple three parameter citation network model which can reproduce the correct width over time. We use the citation network of papers from the hep-th section of arXiv to test our model. Our final model reproduces the data's observed 'width' when around 20 % of the citations in the model are made to recently published papers in the entire network ('global information'). The remaining 80 % of citations are made using the references from these papers' bibliographies ('local searches'). We note that this is consistent with other studies, though our motivation to achieve the above distribution with time is very different. Finally, we find that, in the citation network model, varying the number of papers referenced by a new publication is important as it alters the parameters in the model which are fitted to the data. This is not addressed in current models and needs further work.
Novelty and salience of the research topics are vital for the competitiveness of research institutions and the development of science and technology. In this study, two novel weighting methods were proposed to differentiate the emergence and salience of research topics. A methodology was constructed to measure and visualize the contributions of research institutions to emerging themes and salient ones. The methods were illustrated with the data of ninety Chinese and American Library and Information Science research institutions collected from the Engineering Compendex and China National Knowledge Infrastructure databases between 2001 and 2012. The contributions of the investigated research institutions to the emerging themes and salient ones were calculated and visualized with the Treemap technique. The institutions were further ranked by their contributions and categorized into four types. The findings can help research institutions evaluate novelty and salience of their research topics, discover research fronts and hotspots and promote their research development.
Google Scholar, a widely used academic search engine, plays a major role in finding free full-text versions of articles. But little is known about the sources of full-text files in Google Scholar. The aim of the study was to find out about the sources of full-text items and to look at subject differences in terms of number of versions, times cited, rate of open access availability and sources of full-text files. Three queries were created for each of 277 minor subject categories of Scopus. The queries were searched in Google Scholar and the first ten hits for each query were analyzed. Citations and patents were excluded from the results and the time frame was limited to 2004-2014. Results showed that 61.1 % of articles were accessible in full-text in Google Scholar; 80.8 % of full-text articles were publisher versions and 69.2 % of full-text articles were PDF. There was a significant difference between the means of times cited of full text items and non-full-text items. The highest rate of full text availability for articles belonged to life science (66.9 %). Publishers' websites were the main source of bibliographic information for non-full-text articles. For full-text articles, educational (edu, ac.xx etc.) and org domains were top two sources of full text files. ResearchGate was the top single website providing full-text files (10.5 % of full-text articles).
Tenure decisions and university rankings are just two examples where interfield comparison of academic output is needed. There are differences in publication performances among fields when the number of papers is used as the quantity measure and the Journal Impact Factor is used as the quality measure. For example, it is well known that the economics departments publish less than the chemistry departments and their journals have less impact factors. But there is no consensus on the magnitude of the difference and the methodology for the adjustment. Every decision maker makes his own adjustment and uses a different formula. In this paper, we quantify the publication performance differences among nine academic fields by using data from 1417 departments in the United States. We use two quality measures. First we weigh the publications by the impact factor of the journals. Second, we only consider the publications in the journals that are in the top quartile of the subject categories. We see that there are vast interfield differences in terms of the number of publications. Moreover, we find that the interfield differences are augmented when we consider the quality of the publications. Lastly, we rank the departments according to the quality of their graduate programs. We see that there are also huge differences among the departments with graduate programs of comparable rank.
Related records searching, now a common option within bibliographic databases, is applied to an individual result record as a secondary way of refining the retrieval set obtained from the primary subject search operation. In one approach, an individual result record is linked to other article records on the basis of the number of references cited they share in common, the theory being that two articles that cite many of the same sources are likely to be highly similar in subject content. Results of the secondary search are usually displayed in the order of each item's actual number of commonly-shared references. In the present paper we suggest an improved way of ranking the results, employing statistical significance tests. We suggest two approaches, one involving a statistical test previously unknown in bibliometric circles, the binomial index of dispersion, and the other employing the more familiar centralized cosine measure; these turn out to produce nearly identical results. An example demonstrating the application of these measures, and contrasting such with the use of raw totals, is provided. In the example the results rankings are found to be only modestly (positively) correlated, suggesting that much information is lost to the user when raw totals alone are made the basis for ordering results.
In terms of technology evolution pathways, patents, articles and projects are the traditional analytical dimensions, particularly patent analysis. Analysis results based on traditional dimensions are used to present the evolutionary stage based on the theory of the technology life cycle (TLC). However, traditional TLC is insufficient to explain the inner driving force of technology evolution; instead, it just describes the process. Promoting ideality degree, one of evolutionary principles in the framework of Teoriya Resheniya Izobreatatelskikh Zadatch, is combined with patent and article analysis, and then a novel three-dimensional analytical method is introduced. In a case study with one curial material and novel technology, graphene attracted the attention of all types of organizations, but the development prospects of the graphene industry are not clear, and its potential abilities and applications should be deeply explored.
Recruitment and professorial appointment procedures are crucial for the administration and management of universities and higher education institutions in order to guarantee a certain level of performance quality and reputation. The complementary use of quantitative and objective bibliometric analyses is meant to be an enhancement for the assessment of candidates and a possible antidote for subjective, discriminatory and corrupt practices. In this paper, we present the Vienna University bibliometric approach, offering a method which relies on a variety of basicindicators and further control parameters in order to address the multidimensionality of the problem and to foster comprehensibility. Our "top counts approach" allows an appointment committee to pick and choose from a portfolio of indicators according to the actual strategic alignment. Furthermore, control and additional data help to understand disciplinary publication habits, to unveil concealed aspects and to identify individual publication strategies of the candidates. Our approach has already been applied to 14 professorial appointment procedures (PAP) in the life sciences, earth and environmental sciences and social sciences, comprising 221 candidates in all. The usefulness of the bibliometric approach was confirmed by all heads of appointment committees in the life sciences. For the earth and environmental sciences as well as the social sciences, the usefulness was less obvious and sometimes questioned due to the low coverage of the candidates' publication output in the traditional citation data sources. A retrospective assessment of all hitherto performed PAP also showed an overlap between the committees' designated top candidates and the bibliometric top candidates to a certain degree.
We present an analysis of sectoral collaboration for cardiovascular medical device development in South Africa over a 15 year period. The main objectives were to identify the nodes (organizations) and sectors that influence the behaviour of the cardiovascular device development network; to identify the types of collaboration that exist within the network; to quantify the extent of collaboration within the network; and finally to analyse the changes in overall collaboration over time. Collaboration across four sectors was considered: healthcare services, industry, universities, and lastly, science councils and facilities. Author affiliations, extracted from journal articles, were used to generate collaboration networks. Network metrics-degree centrality, betweenness centrality and graph densities-and network graphs were produced using network visualization software (UCINET) in order to identify the influential nodes and sectors, as well as to measure the extent of collaboration within and between sectors. The university and healthcare services sectors were found to make the largest contribution to the development of cardiovascular medical devices in South Africa. Collaboration between universities and healthcare service nodes was the most prevalent type of cross-sector collaboration. Universities were found to be potential key players in the transmission of information across the network, with greater potential than the remaining sectors to form new collaborations with isolated nodes, thereby enhancing device development activity. Foreign nodes played a role in connecting local nodes which would otherwise have been isolated. Overall, collaboration across sectors has increased over the 15 year period, but science councils and industry still have room to become more involved by partnering with the dominant sectors.
Quantitative evaluation of citation data to support funding decisions has become widespread. For this purpose there exist many measures (indices) and while their properties were well studied there is little comprehensive experimental comparison of the ranking lists obtained when using different methods. A further problem of the existing studies is that lack of available data about net citations prevents researchers from studying the effect of measuring scientific impact by using net citations (all citations minus self-citations). In this paper we use simulated data to study factors that could potentially influence the degree of agreement between the rankings obtained when using different indices with the emphasis given to the comparison of the number of net citations per author to other more established indices. We observe that the researchers publishing papers with a large number of co-authors are systematically ranked higher when using h-index or total citations (TC) instead of the number of citations per author (TCA), that the researchers who publish a small proportion of papers which receive many citations while the rest of their papers receive only few citations are systematically ranked higher when using TCA or TC instead of h-index, and that the authors who have lower proportion of self-citations are ranked higher when considering indices which include the number of net citations in comparison with indices considering only the total citation count. Results are verified and illustrated also by analyzing a large dataset from the field of medical science in Slovenia for the period 1986-2007.
Recent years have witnessed the increase of competition in science. While promoting the quality of research in many cases, an intense competition among scientists can also trigger unethical scientific behaviors. To increase the total number of published papers, some authors even resort to software tools that are able to produce grammatical, but meaningless scientific manuscripts. Because automatically generated papers can be misunderstood as real papers, it becomes of paramount importance to develop means to identify these scientific frauds. In this paper, I devise a methodology to distinguish real manuscripts from those generated with SCIGen, an automatic paper generator. Upon modeling texts as complex networks (CN), it was possible to discriminate real from fake papers with at least 89 % of accuracy. A systematic analysis of features relevance revealed that the accessibility and betweenness were useful in particular cases, even though the relevance depended upon the dataset. The successful application of the methods described here show, as a proof of principle, that network features can be used to identify scientific gibberish papers. In addition, the CN-based approach can be combined in a straightforward fashion with traditional statistical language processing methods to improve the performance in identifying artificially generated papers.
The increase in authorship of nuclear physics publications has been investigated using the large statistical samples. Large collections of bibliographical metadata represent a very powerful tool for understanding of the past, present, and, perhaps, future research trends. This has been accomplished with nuclear data mining of nuclear science references and the experimental nuclear reaction databases. The data analysis shows a strong anticorrelation between authorship increase of experimental papers and overall reduction of measurements due to closures of many small nuclear physics facilities. These findings suggest that article authorship is a very complex phenomenon, and presently-observed increase or "inflation" in authorship could be explained by the adaptation to the changing research environment, in addition to the evolving authorship rules that progressed over the years from very strict to lenient. The results of this study and their implications are discussed and conclusions presented.
We examine the structure and dynamics of network of scientific international collaborations within North Africa (Morocco, Algeria, Tunisia and Egypt) using both publishing and patent data. Results show that the region has undergone a sustained process of internationalization, which has translated in both an expansion of the network of scientific collaborations and a relative increase in the research output of international teams. At the same time we find the existence of a very limited degree of scientific integration at the regional level, i.e. within Northern Africa. Among the countries examined, Egypt seems to be the most active one in terms of size of research output as well as number and variety of international collaborations. Moreover, Egypt is the most central node of the regional research network and this centrality has considerably grown over time. This increased importance of Egypt as regional research hub is associated with a remarkable increase in the centrality of Saudi Arabia within Egypt's research network. This holds across a variety of research fields as well as in terms of applied science (as shown by patent data). Overall, these results suggest that the region is undergoing a deep transformation in the structure and composition of scientific collaborations.
Bibliometric methods or "analysis" are now firmly established as scientific specialties and are an integral part of research evaluation methodology especially within the scientific and applied fields. The methods are used increasingly when studying various aspects of science and also in the way institutions and universities are ranked worldwide. A sufficient number of studies have been completed, and with the resulting literature, it is now possible to analyse the bibliometric method by using its own methodology. The bibliometric literature in this study, which was extracted from Web of Science, is divided into two parts using a method comparable to the method of Jonkers et al. (Characteristics of bibliometrics articles in library and information sciences (LIS) and other journals, pp. 449-551, 2012: The publications either lie within the Information and Library Science (ILS) category or within the non-ILS category which includes more applied, "subject" based studies. The impact in the different groupings is judged by means of citation analysis using normalized data and an almost linear increase can be observed from 1994 onwards in the non-ILS category. The implication for the dissemination and use of the bibliometric methods in the different contexts is discussed. A keyword analysis identifies the most popular subjects covered by bibliometric analysis, and multidisciplinary articles are shown to have the highest impact. A noticeable shift is observed in those countries which contribute to the pool of bibliometric analysis, as well as a self-perpetuating effect in giving and taking references.
By analyzing the complete 2003-2007 publication lists of two top-ranking political science departments in Germany, this study explores the publication patterns of German political scientists and also analyzes their citation and reference characteristics. Two main communication networks in the publication patterns of German political scientists are distinguished in this study. The significant local communication network covers monographs and regionally-oriented journals that are mainly written in German. Its importance has slightly decreased over time. By contrast, the relatively smaller international one, which covers international publications in English, increased its volume slightly. The younger political scientists have more internationally-oriented publication behaviors, and thus would benefit in an evaluation from an international perspective. The average impact of WoS-indexed items in this study was found to be higher than the average impact in political science. In general, a growing degree of international orientation in this field can be expected as time elapses.
The present paper attempts to explore the relationship between the Turkish academic and industry systems by mapping the relationships under web indicators. We used the top 100 Turkish universities and the top 10 Turkish companies in 10 industrial sectors in order to observe the performance of web impact indicators. Total page count metric is obtained through Google Turkey and the pure link metrics have been gathered from Open Site Explorer. The indicators obtained both for web presence and web visibility indicated that there are significant differences between the group of academic institutions and those related to companies within the web space of Turkey. However, this current study is exploratory and should be replicated with a larger sample of both Turkish universities and companies in each sector. Likewise, a longitudinal study rather than sectional would eliminate or smooth fluctuations of web data (especially URL mentions) as a more adequate understanding of the relations between Turkish institutions, and their web impact, is reached.
This investigation tries to determine if, for highly visible journals, namely Nature, Science and Cell, articles with a short editorial delay time generally, receive more citations than those with a long editorial delay. Based on data for the period from 2005 to 2009, it is found that there is a clear, although statistically weak, tendency for an inverse relation between editorial delay time and number of received citations.
Based on publications indexed in the Science Citation Index Expanded (SCIE) of Thomson Reuters, we explored China-Germany collaboration in physics from perspectives including publication profiles, collaboration effect, as well as active institutions and active fields. We found that German researchers are more capable of publishing higher-quality papers than Chinese counterparts. Both China and Germany get benefit from collaboration in raising publication productivity. The collaboration helps improve Chinese researchers' citation impact and capability of publishing in higher-quality journals. Research capacities of German institutions are more evenly distributed than Chinese counterparts. Chinese institutions that are most active in collaborating with German counterparts are mainly those in leading positions in China, whereas those in disadvantageous situation are still isolated from the international community.
Starting from the perspective of Webometrics, this paper explores the improvement effect of institutional repositories (IRs) on their home institutions with respect to web presence and visibility. Taking 19 IRs from institutions affiliated to the Chinese Academy of Sciences (CAS) as study samples, we calculate the contribution of IRs to the webometric indicators of their home institutions in terms of four indicators: page counts, PDF counts, URL mention counts, and link counts. According to their open-access (OA) status, the IRs of CAS were divided into an OA group and a non-OA group, which were compared with respect to differences in the above indicators as well as browse counts and download counts. The results of the study show that: (1) IRs showed a relatively significant positive improvement with respect to Google page counts, Scholar page counts, and Google PDF counts, although the improvement effect with respect to Scholar PDF counts was almost nonexistent; (2) repositories presented a certain improvement effect with respect to URL mention counts, but the contribution of link counts was limited; and (3) OA repositories manifested noticeable advantages in terms of Google PDF counts, URL mention counts, and download counts. We conclude that IRs can improve the web presence and visibility of their home institutions, while OA IRs offer more benefits to their home institutions.
Ranking journals is a longstanding problem and can be addressed quantitatively, qualitatively or using a combination of both approaches. In the last decades, the Impact Factor (i.e., the most known quantitative approach) has been widely questioned, and other indices have thus been developed and become popular. Previous studies have reported strengths and weaknesses of each index, and devised meta-indices to rank journals in a certain field of study. However, the proposed meta-indices exhibit some intrinsic limitations: (1) the indices to be combined are not always chosen according to well-grounded principles; (2) combination methods are usually unweighted; and (3) some of the proposed meta-indices are parametric, which requires assuming a specific underlying data distribution. We propose a data-driven methodology that linearly combines an arbitrary number of indices to produce an aggregated ranking, using different techniques from statistics and machine learning to estimate the combining weights. We additionally consider correlations between indices and meta-indices, to quantitatively evaluate their differences. Finally, we empirically show that the considered meta-indices are also robust to significant perturbations of the values of the combined indices.
Questions about gender differences in the workplace usually attract much attention-but often generate more heat than light. To examine gender differences in several facets of scientific productivity and impact, a quantitative, scientometric approach is employed. Analyzing a sample of industrial and organizational psychologists (N (authors) = 4234; N (publications) = 46,656), this study raises both questions and concerns about gender differences in research, by showing that female and male I-O psychologists differ with regard to publication output (fewer publications authored by female researchers), impact (heterogeneous, indicator-dependent gender differences), their publication career courses (male researchers' periods of active publishing last longer and show longer interruptions), and research interests (only marginal gender differences). In order to get a glimpse of future developments, we repeated all analyses with the student subsample and found nearly no gender differences, suggesting a more gender-balanced future. Thus, this study gives an overview over the status quo of gender differences in an entire psychological sub-discipline. Future research will have to examine whether these gender differences are volitional in nature or the manifestation of external constraints.
Knowledge is a crucial asset in organizations and its diffusion and recombination processes can be affected by numerous factors. This study examines the influence of the status of individual researchers in social networks on the knowledge diffusion and recombination process. We contend that knowledge diversity, random diffusion, and parallel duplication are three primary factors characterizing diffusion paths in knowledge networks. Using multiple social network measures, we investigate how individuals in the respective institutional collaboration networks influence knowledge diffusion through scientific papers. Scientific publication data and citation data from six prolific institutions in China (Chinese Academy of Sciences) and the United States (University of California at Berkeley, University of Illinois, Massachusetts Institute of Technology, Northwestern University, and Georgia Institute of Technology) in nanotechnology field in the interval 2000-2010 were used for empirical analysis, and the Cox regression model was leveraged to analyze the temporal relationships between knowledge diffusion and social network measures of researchers in these leading institutions. Results show that structural holes and degree centrality are the most effective measures to explain the knowledge diffusion process within these six institutions. Knowledge recombination is mainly achieved through parallel duplication within groups and recombination of diverse knowledge across different groups. The results are similar for all six institutions except for Bonacich power and eigenvector measures, which may posit cultural difference across countries and institutions.
This article raises the awareness of university-sponsored supplements in high-impact journals and the issues of this new practice. Based on a library user's complaint of a dead OpenURL link, this study looks over dozens of articles from university-sponsored supplements in Science, Nature, and Cell Press journals. It compares their metadata with those of regular articles from the parent journals. It also compares how these supplements are indexed by comprehensive journal indexes (Academic Search Complete, Scopus, and Web of Science). It found that various universities and research institutes in East Asia, mainly China, are major sponsors of supplements of key journals in recent years. The issues along with this new practice include dead OpenURL linking, index irregularities, self-congratulatory sponsors and their misled audience in East Asia. The media in China was so misled that it ranked one sponsored story among world top 10 news. It questions the ethics in publishing university-sponsored supplements, and calls for standardizations of assigning metadata including DOI as well as adding a disclaimer to all supplement articles.
In 2014 Thomson Reuters (TR, provider of the Web of Science, WoS) published a list of highly-cited researchers worldwide. This includes those scientists who have published the most papers in their discipline which belong to the 1 % of the most-cited papers. Bornmann and Bauer (J Assoc Inf Sci Technol, in press) have presented a first evaluation in which the scientists are evaluated on the basis of their affiliations. In this short communication we would like to indicate how the TR data can be used to perform a meaningful country-specific evaluation. Germany serves as the example for the analysis.
Bibliometric and "tech mining" studies depend on a crucial foundation-the search strategy used to retrieve relevant research publication records. Database searches for emerging technologies can be problematic in many respects, for example the rapid evolution of terminology, the use of common phraseology, or the extent of "legacy technology" terminology. Searching on such legacy terms may or may not pick up R&D pertaining to the emerging technology of interest. A challenge is to assess the relevance of legacy terminology in building an effective search model. Common-usage phraseology additionally confounds certain domains in which broader managerial, public interest, or other considerations are prominent. In contrast, searching for highly technical topics is relatively straightforward. In setting forth to analyze "Big Data," we confront all three challenges-emerging terminology, common usage phrasing, and intersecting legacy technologies. In response, we have devised a systematic methodology to help identify research relating to Big Data. This methodology uses complementary search approaches, starting with a Boolean search model and subsequently employs contingency term sets to further refine the selection. The four search approaches considered are: (1) core lexical query, (2) expanded lexical query, (3) specialized journal search, and (4) cited reference analysis. Of special note here is the use of a "Hit-Ratio" that helps distinguish Big Data elements from less relevant legacy technology terms. We believe that such a systematic search development positions us to do meaningful analyses of Big Data research patterns, connections, and trajectories. Moreover, we suggest that such a systematic search approach can help formulate more replicable searches with high recall and satisfactory precision for other emerging technology studies.
It has become increasingly common to rely on the h index to assess scientists' contributions to their fields, and this is true in psychology. This metric is now used in many psychology departments and universities to make important decisions about hiring, promotions, raises, and awards. Yet, a growing body of research shows that there are gender differences in citations and h indices. We sought to draw attention to this literature, particularly in psychology. We describe the presence of a gender effect in h index in psychology and analyze why the effect is important to consider. To illustrate the importance of this effect, we translate the observed gender effect into a meaningful metric-that of salary-and show that the gender difference in h index could translate into significant financial costs for female faculty. A variety of factors are discussed that have been shown to give rise to gender differences in impact. We conclude that the h index, like many other metrics, may reflect systematic gender differences in academia, and we suggest using caution when relying on this metric to promote and reward academic psychologists.
Science classification schemes (SCSs) are built to categorize scientific resources (e.g. research publications and research projects) into disciplines for effective research analytics and management. With the explosive growth of the number of scientific resources in distributed research institutions in recent years, effectively mapping different SCSs, especially heterogeneous SCSs that categorize different kinds of scientific resources, is becoming an increasingly challenging problem for facilitating information interoperability and networking scientific resources. To effectively realize the heterogeneous SCSs mapping, we design a novel multi-faceted method to measure the similarity between two classes based on three important facets, namely descriptors, individuals, and semantic neighborhood. Particularly, the proposed approach leverages a hybrid method combining statistical learning, semantic analysis and structure analysis for effective measurement with the exploitation of symmetric Tversky's index, WordNet dictionary and the Hungarian Algorithm. The method has been evaluated based on two main SCSs that need mapping for information management and policy-making in NSFC, and shown satisfying results. The interoperability among heterogeneous SCSs is resolved to enhance the access to heterogeneous scientific resources and the development of appropriate research analytics policies.
The present study investigate to what extent basic-clinical collaboration and involvement in translational research improve performance of researchers, in the particular setting of hospitals affiliated with the Spanish National Health System (NHS). We used a combination of quantitative science indicators and perception-based data obtained through a survey of researchers working at NHS hospitals. Although collaborating with clinical researchers and health care practitioners may increase productivity of basic researchers working in clinical settings, the extent to which they are able to contribute to translational research is the factor that allows them to make a qualitative leap in their scientific production in highly ranked international scientific journals. Our results challenge the arguments by some authors that translational projects have more difficulties than basic proposals to be granted by funding agencies and to be published in high-impact journals. Although they are not conclusive, our results point towards the existence of a positive relationship between leadership and involvement in translational research. Basic-clinical collaboration and translational research should be an incentive for researchers as they are likely to favour their performance. Hospitals will benefit from encouraging researchers and health care practitioners to collaborate in the framework of translational projects, as a way to improve not only individual, but institutional research performance. Spanish hospitals should contribute to overcome obstacles to translational research, through the full integration of basic researchers within the hospital setting and the definition of a research career path within the NHS.
This study involved using three methods, namely keyword, bibliographic coupling, and co-citation analyses, for tracking the changes of research subjects in library and information science (LIS) during 4 periods (5 years each) between 1995 and 2014. We examined 580 highly cited LIS articles, and the results revealed that the two subjects "information seeking (IS) and information retrieval (IR)" and "bibliometrics" appeared in all 4 phases. However, a decreasing trend was observed in the percentage of articles related to IS and IR, whereas an increasing trend was identified in the percentage of articles focusing on bibliometrics. Particularly, in the 3rd phase (2005-2009), the proportion of articles on bibliometrics exceeded 80 %, indicating that bibliometrics became predominant. Combining various methods to explore research trends in certain disciplines facilitates a deeper understanding for researchers of the development of disciplines.
An abstract construction for general weighted impact factors is introduced. We show that the classical weighted impact factors are particular cases of our model, but it can also be used for defining new impact measuring tools for other sources of information-as repositories of datasets-providing the mathematical support for a new family of altmetrics. Our aim is to show the main mathematical properties of this class of impact measuring tools, that hold as consequences of their mathematical structure and does not depend on the definition of any given index nowadays in use. In order to show the power of our approach in a well-known setting, we apply our construction to analyze the stability of the ordering induced in a list of journals by the 2-year impact factor (). We study the change of this ordering when the criterium to define it is given by the numerical value of a new weighted impact factor, in which is used for defining the weights. We prove that, if we assume that the weight associated to a citing journal increases with its , then the ordering given in the list by the new weighted impact factor coincides with the order defined by the . We give a quantitative bound for the errors committed. We also show two examples of weighted impact factors defined by weights associated to the prestige of the citing journal for the fields of MATHEMATICS and MEDICINE, GENERAL AND INTERNAL, checking if they satisfy the "increasing behavior" mentioned above.
An increase in the number of scientific publications in the last few years, which is directly proportional to the appearance of new journals, has made the researchers' job increasingly complex and extensive regarding the selection of bibliographic material to support their research. Not only is it a time consuming task, it also requires suitable criteria, since the researchers need to elect systematically the most relevant literature works. Thus the objective of this paper is to propose the methodology called Methodi Ordinatio, which presents criteria to select scientific articles. This methodology employs an adaptation of the ProKnow-C for selection of publications and the InOrdinatio, which is an index to rank by relevance the works selected. This index crosses the three main factors under evaluation in a paper: impact factor, year of publication and number of citations. When applying the equation, the researchers identify among the works selected the most relevant ones to be in their bibliographic portfolio. As a practical application, it is provided a research sample on the theme technology transfer models comprising papers from 1990 to 2015. The results indicated that the methodology is efficient regarding the objectives proposed, and the most relevant papers on technology transfer models are presented.
This work studies the trends of co-author numbers in Mathematics, Chemistry and Physics, from 1960 to 2010. Journal data are analyzed to show the common trends across the disciplines. The single-author papers decline continuously over the years. The multi-author papers rise and fall following one another in succession. The data suggest that small-group collaborations in different disciplines go through the same stages at different times. Besides the common trends, the dramatic growth of the co-author number in Physics after 1990 is also discussed.
This paper aims to analyze patterns of participation of higher education institutions (HEIs) to European Framework Programs (EU-FP) and their association with HEI characteristics, country and geographical effects. We have analyzed a sample of 2235 HEIs in 30 countries in Europe, derived from the European Tertiary Education Register (ETER), which has been matched with data on participations in EU-FPs in 2011 using the EUPRO database. Our findings identified (1) a high concentration of EU-FP participation in a small group of HEIs with high reputation; (2) the participation of non-doctorate awarding HEIs in EU-FPs is very limited despite the fact that they account for a significant share of tertiary student enrolments; (3) the number of participations tends to increase proportionally to organizational size, and is strongly influenced by international reputation; (5) there is limited evidence of significant country effects in EU-FP participations, as well as of the impact of distance from Brussels. We interpret these results as an outcome of the close association between HEI reputation and the network structure of EU-FP participants.
Over the past 20 years, researchers contributed substantially, empirically, to the study of the determinants of academic research publication. All of these studies used a single equation to model the relationship between research publication (output) and research collaboration (input). In this modeling research publication was assumed as an endogenous variable and research collaboration was assumed as an exogenous. This study was the first ever study that provided an evidence of two way relationship between the input and output, and thus contributed theoretically and empirically to the existing body of literature. This study found that research collaboration contributed to the advancement of research publication and vice versa. Therefore, the previous studies that used a unidirectional relationship estimated biased estimates of the determinants of research publications.
Socio-cognitive action reproduces and changes both social and cognitive structures. The analytical distinction between these dimensions of structure provides us with richer models of scientific development. In this study, I assume that (1) social structures organize expectations into belief structures that can be attributed to individuals and communities; (2) expectations are specified in scholarly literature; and (3) intellectually the sciences (disciplines, specialties) tend to self-organize as systems of rationalized expectations. Whereas social organizations remain localized, academic writings can circulate, and expectations can be stabilized and globalized using symbolically generalized codes of communication. The intellectual restructuring, however, remains latent as a second-order dynamics that can be accessed by participants only reflexively. Yet, the emerging "horizons of meaning" provide feedback to the historically developing organizations by constraining the possible future states as boundary conditions. I propose to model these possible future states using incursive and hyper-incursive equations from the computation of anticipatory systems. Simulations of these equations enable us to visualize the couplings among the historical-i.e., recursive-progression of social structures along trajectories, the evolutionary-i.e., hyper-incursive-development of systems of expectations at the regime level, and the incursive instantiations of expectations in actions, organizations, and texts.
We investigate whether Nobel laureates' collaborative activities undergo a negative change following prize reception by using publication records of 198 Nobel laureates and analyzing their coauthorship patterns before and after the Nobel Prize. The results overall indicate less collaboration with new coauthors post award than pre award. Nobel laureates are more loyal to collaborations that started before the Prize: looking at coauthorship drop-out rates, we find that these differ significantly between coauthorships that started before the Prize and coauthorships after the Prize. We also find that the greater the intensity of pre-award cooperation and the longer the period of pre-award collaboration, the higher the probability of staying in the coauthor network after the award, implying a higher loyalty to the Nobel laureate.
Aiming to explore the applicability of bookmarking data in measuring the scientific impact, the present study investigates the correlation between conventional impact indicators (i.e. impact factors and mean citations) and bookmarking metrics (mean bookmarks and percentage of bookmarked articles) at author and journal aggregation levels in library and information science (LIS) field. Applying the citation analysis method, it studies a purposeful sample of LIS articles indexed in SSCI during 2004-2012 and bookmarked in CiteULike. Data are collected via WoS, Journal Citation Report, and CiteULike. There is a positive, though weak, correlation between LIS authors' mean citations and their mean bookmarks, as well as a moderate to large correlation between LIS journals' impact factors on the one hand and on the other, their mean bookmarks, and the percentage of their bookmarked articles. Given the correlation between the citation- and bookmark-based indicators at author and journal levels, bookmarking data can be used as a complement to, but not a substitute for, the traditional indicators to get to a more inclusive evaluation of journals and authors.
Recently, Scientometrics has published a paper titled "Is there bias in editorial choice? Yes" (Moustafa 2015) in which some comments are given on our published paper in Nature titled "Is there fame bias in editorial choice?" (Mahian et al. 2015). Unfortunately, the author of above mentioned paper and many other readers might misunderstand the main aim of our correspondence. Here, we try to give some explanations to clarify the main goal of analysis presented in the paper.
Policy-makers working at the national and regional levels could find the territorial mapping of research productivity by field to be useful in informing both research and industrial policy. Research-based private companies could also use such mapping for efficient selection in localizing R&D activities and university research collaborations. In this work we apply a bibliometric methodology for ranking by research productivity: (i) the fields of research for each territory (region and province); and (ii) the territories for each scientific field. The analysis is based on the 2008-2012 scientific output indexed in the Web of Science, by all professors on staff at Italian universities. The population is over 36,000 professors, active in 192 fields and 9 disciplines. (C) 2015 Elsevier Ltd. All rights reserved.
The enormous amount of biomedicine's natural-language texts creates a daunting challenge to discover novel and interesting patterns embedded in the text corpora that help biomedical professionals find new drugs and treatments. These patterns constitute entities such as genes, compounds, treatments, and side effects and their associations that spread across publications in different biomedical specialties. This paper proposes SemPathFinder to discover previously unknown relations in biomedical text. SemPathFinder overcomes the problems of Swanson's ABC model by using semantic path analysis to tell a story about plausible connections between biological terms. Storytelling-based semantic path analysis can be viewed as relation navigation for bio-entities that are semantically close to each other, and reveals insight into how a series of entity pairs is organized, and how it can be harnessed to explain seemingly unrelated connections. We apply SemPathFinder for two well-known use cases of Swanson's ABC model, and the experimental results show that SemPathFinder detects all intermediate terms except for one and also infers several interesting new hypotheses. (C) 2015 Elsevier Ltd. All rights reserved.
Discipline-specific research evaluation exercises are typically carried out by panels of peers, known as expert panels. To the best of our knowledge, no methods are available to measure overlap in expertise between an expert panel and the units under evaluation. This paper explores bibliometric approaches to determine this overlap, using two research evaluations of the departments of Chemistry (2009) and Physics (2010) of the University of Antwerp as a test case. We explore the usefulness of overlay mapping on a global map of science (with Web of Science subject categories) to gauge overlap of expertise and introduce a set of methods to determine an entity's barycenter according to its publication output. Barycenters can be calculated starting from a similarity matrix of subject categories (N dimensions) or from a visualization thereof (2 dimensions). We compare the results of the N-dimensional method with those of two 2-dimensional ones (Kamada-Kawai maps and VOS maps) and find that they yield very similar results. The distance between barycenters is used as an indicator of expertise overlap. The results reveal that there is some discrepancy between the panel's and the groups' publications in both the Chemistry and the Physics departments. The panels were not as diverse as the groups that were assessed. The match between the Chemistry panel and the Department was better than that between the Physics panel and the Department. (C) 2015 Elsevier Ltd. All rights reserved.
Individual research performance needs to be addressed by means of a diverse set of indicators capturing the multidimensional framework of science. In this context, Biplot methods emerge as powerful and reliable visualisation tools similar to a scatterplot but capturing the multivariate covariance structures among bibliometric indicators. In this paper, we introduce the Canonical Biplot technique to explore differences in the scientific performance of Spanish CSIC researchers, organised by field (Chemistry and Materials Science) and grouped by academic rank (research fellows and three types of full-time permanent scientists). This method enables us to build a Biplot where the groups of individuals are sorted out by the maximum discriminating power between the different indicators considered. Besides, as confidence intervals are displayed in the plot, statistical differences between groups are liable to be studied simultaneously. Since test hypotheses are sensitive to different sample size effects, sizes for some pairwise comparisons are computed. We have found two gradients: a primary gradient where scientists mainly differ in terms of age, production, number of collaborators, number of highly-cited papers and their position in the byline of the publications; and a second gradient, in which scientists with the same academic rank differ by sort of field. (C) 2015 Elsevier Ltd. All rights reserved.
The exponential growth in the number of scientific papers makes it increasingly difficult for researchers to keep track of all the publications relevant to their work. Consequently, the attention that can be devoted to individual papers, measured by their citation counts, is bound to decay rapidly. In this work we make a thorough study of the life-cycle of papers in different disciplines. Typically, the citation rate of a paper increases up to a few years after its publication, reaches a peak and then decreases rapidly. This decay can be described by an exponential or a power law behavior, as in ultradiffusive processes, with exponential fitting better than power law for the majority of cases. The decay is also becoming faster over the years, signaling that nowadays papers are forgotten more quickly. However, when time is counted in terms of the number of published papers, the rate of decay of citations is fairly independent of the period considered. This indicates that the attention of scholars depends on the number of published items, and not on real time. (C) 2015 Elsevier Ltd. All rights reserved.
Empirical evidence shows that co-authored publications achieve higher visibility and impact. The aim of the current work is to test for the existence of a similar correlation for Italian publications. We also verify if such correlation differs: (i) by subject category and macro-area; (ii) by document type; (iii) over the course of time. The results confirm world-level evidence, showing a consistent and significant linear growth in the citability of a publication with number of co-authors, in almost all subject categories. The effects are more remarkable in the fields of Social Sciences and Art & Humanities than in the Sciences - a finding not so obvious scrutinizing previous studies. Moreover, our results partly disavow the positive association between number of authors and prestige of the journal, as measured by its impact factor. (C) 2015 Elsevier Ltd. All rights reserved.
The h-index is a celebrated indicator widely used to assess the quality of researchers and organizations. Empirical studies support the fact that the h-index is well correlated with other simple bibliometric indicators, such as the total number of publications Nand the total number of citations C. In this paper we introduce a new formula (h) over tilde (w) = (h) over tilde (w) (N, C, C-MAX), as a representative predictive formula that relates functionally h to these aggregate indicators, N, C and the highest citation count C-MAX. The formula is based on the 'specific' assumption of geometrically distributed citations, but provides a good estimate of the h-index for the general case. To empirically evaluate the adequacy of the fit of the proposed formula (h) over tilde (w), an empirical study with 131 datasets (13,347 papers; 288,972 citations) was carried out. The overall fit (defined as the capacity of (h) over tilde (w) to reproduce the true value of h, for each single scientist) was remarkably accurate. The predicted value was within one of the actual value h for more than 60% of the datasets. We found, in approximately three cases out of four, an absolute error less than or equal to 2, and an average absolute error of only 1.9, for the whole sample of datasets. (C) 2015 Elsevier Ltd. All rights reserved.
In this paper we evaluate citation networks of authors, publications and journals, constructed from the 151 Web of Science database (Computer Science categories). Our aim was to find a method with which to rank authors of scientific papers so that the most important occupy the top positions. We utilized a hand-made list of authors, each of whom have received an ACM Fellowship or have been awarded by an ACM SIG (Artificial Intelligence or Hardware categories). The developed method also included the adoption of the PageRank algorithm, which can be considered a measure of prestige, as well as other measures of significance (h-index, publication count, citation count, publication's author count), with these measures analyzed regarding their influence on the final rankings. Our main objective, to determine whether a better author ranking can be obtained using journal values, was achieved. The best of our author ranking systems was obtained by using journal impact values in PageRank, which was applied to a citation network of publications. The effectiveness of the ranking system was confirmed after calculations were carried out involving authors who were awarded after the final year used in our dataset or who were awarded in selected categories. (C) 2015 Elsevier Ltd. All rights reserved.
Citation metrics are becoming pervasive in the quantitative evaluation of scholars, journals, and institutions. Hiring, promotion, and funding decisions increasingly rely on a variety of impact metrics that cannot disentangle quality from quantity of scientific output, and are biased by factors such as discipline and academic age. Biases affecting the evaluation of single papers are compounded when one aggregates citation-based metrics across an entire publication record. It is not trivial to compare the quality of two scholars that during their careers have published at different rates, in different disciplines, and in different periods of time. Here we evaluate a method based on the generation of a statistical baseline specifically tailored on the academic profile of each researcher. We demonstrate the effectiveness of the approach in decoupling the roles of quantity and quality of publications to explain how a certain level of impact is achieved. The method can be extended to simultaneously suppress any source of bias. As an illustration, we use it to capture the quality of the work of Nobel laureates irrespective of number of publications, academic age, and discipline, even when traditional metrics indicate low impact in absolute terms. The procedure is flexible enough to allow for the evaluation of, and fair comparison among, arbitrary collections of papers - scholar publication records, journals, and institutions; in fact, it extends a similar technique that was previously applied to the ranking of research units and countries in specific disciplines (Crespo), Ortuno-Orti, & Ruiz-Castillo, 2012). We further apply the methodology to almost a million scholars and over six thousand journals to measure the impact that cannot be explained by the volume of publications alone. (C) 2015 Elsevier Ltd. All rights reserved.
Scientific collaboration is one of the important drivers of research progress that supports researchers in the generation of novel ideas. Collaboration networks and their impact on scientific activities thus already attracted some attention in the research community, but no work so far studied possible factors which can influence the network positions of the researchers at the individual level. The objective of this paper is to investigate various characteristics and roles of the researchers occupying important positions in the collaboration network. For this purpose, we focus on the collaboration network among Canadian researchers during the period of 1996 to 2010 and employ multiple regression models to estimate the impact on network structure variables. Results highlight the crucial role of past productivity of the researchers along with their available funding in determining and improving their position in the co-authorship network. It is shown that researchers who have great influence on their local community do not necessarily publish high quality works. We also find that highly productive researchers not only have more important connections but also play a critical role in connecting other researchers. Moreover, although mid-career scientists tend to collaborate more in knit groups and on average have higher influence on their local community, our results specifically highlight the important role of young researchers who occupy mediatory positions in the network which enable them to connect different communities and fuel information transmission through the network. (C) 2015 Elsevier Ltd. All rights reserved.
The main rationale behind career grants is helping top talent to develop into the next generation leading scientists. Does career grant competition result in the selection of the best young talents? In this paper we investigate whether the selected applicants are indeed performing at the expected excellent level something that is hardly investigated in the research literature. We investigate the predictive validity of grant decision-making, using a sample of 260 early career grant applications in three social science fields. We measure output and impact of the applicants about ten years after the application to find out whether the selected researchers perform ex post better than the non-successful ones. Overall, we find that predictive validity is low to moderate when comparing grantees with all non-successful applicants. Comparing grantees with the best performing non-successful applicants, predictive validity is absent. This implies that the common belief that peers in selection panels are good in recognizing outstanding talents is incorrect. We also investigate the effects of the grants on careers and show that recipients of the grants do have a better career than the non-granted applicants. This makes the observed lack of predictive validity even more problematic. (C) 2015 Elsevier Ltd. All rights reserved.
We provide three axiomatic characterizations of Egghe's g-index, which measures a researcher's scientific output based on the number of papers the researcher has published and the number of citations of each of the researcher's papers. We formulate six new axioms for indexes, namely, tail independence (TA), square monotonicity (SM), the cap condition (CC), strong square monotonicity (SSM), increasing marginal citations (IMC), and increasing marginal citations+ (IMC+). Along with the two well-known axioms T1 and T2 (Woeginger, 2008a), the g-index is characterized by (i) T1, T2, TA, SM, and CC, (ii) T1, T2, TA, SSM, and IMC, and (iii) T1, TA, SM, and IMC+. Two out of three characterizations are obtained by adding axioms to our new characterization of the class of indexes satisfying TI, T2, and TA, which are defined as generalizations of the g-index. Thus, the remaining four axioms in our first and second characterizations SM, CC, SSM, and IMC distinguish the original g-index from other related indexes in the class. Furthermore, the independence of our axioms and that of Woeginger's study is shown. (C) 2015 Elsevier Ltd. All rights reserved.
National research impact indicators derived from citation counts are used by governments to help assess their national research performance and to identify the effect of funding or policy changes. Citation counts lag research by several years, however, and so their information is somewhat out of date. Some of this lag can be avoided by using readership counts from the social reference sharing site Mendeley because these accumulate more quickly than citations. This article introduces a method to calculate national research impact indicators from Mendeley, using citation counts from older time periods to partially compensate for international biases in Mendeley readership. A refinement to accommodate recent national changes in Mendeley uptake makes little difference, despite being theoretically more accurate. The Mendeley patterns using the methods broadly reflect the results from similar calculations with citations and seem to reflect impact trends about a year earlier. Nevertheless, the reasons for the differences between the indicators from the two data sources are unclear. (C) 2015 Elsevier Ltd. All rights reserved.
Although software has helped researchers conduct research, little is known of the impact of software on science. To fill this gap, this article proposes an improved bootstrapping method to extract software entities from full-text papers and assess their impact on science. Evaluation results show that the proposed entity extraction system outperforms three baseline methods on extracting software entities from full-text papers. The proposed method is then used to learn software entities from all papers published in PLoS ONE in 2014. More than 2000 unique software entities are obtained which accounted for more than 20,000 mentions and more than 7000 citations. The paper finds that software is commonly used in the scientific community along with a substantial uncitedness. (C) 2015 Elsevier Ltd. All rights reserved.
Bibliometric studies often rely on field-normalized citation impact indicators in order to make comparisons between scientific fields. We discuss the connection between field normalization and the choice of a counting method for handling publications with multiple co-authors. Our focus is on the choice between full counting and fractional counting. Based on an extensive theoretical and empirical analysis, we argue that properly field-normalized results cannot be obtained when full counting is used. Fractional counting does provide results that are properly field normalized. We therefore recommend the use of fractional counting in bibliometric studies that require field normalization, especially in studies at the level of countries and research organizations. We also compare different variants of fractional counting. In general, it seems best to use either the author-level or the address-level variant of fractional counting. (C) 2015 Elsevier Ltd. All rights reserved.
Governments sometimes need to analyse sets of research papers within a field in order to monitor progress, assess the effect of recent policy changes, or identify areas of excellence. They may compare the average citation impacts of the papers by dividing them by the world average for the field and year. Since citation data is highly skewed, however, simple averages may be too imprecise to robustly identify differences within, rather than across, fields. In response, this article introduces two new methods to identify national differences in average citation impact, one based on linear modelling for normalised data and the other using the geometric mean. Results from a sample of 26 Scopus fields between 2009 and 2015 show that geometric means are the most precise and so are recommended for smaller sample sizes, such as for individual fields. The regression method has the advantage of distinguishing between national contributions to internationally collaborative articles, but has substantially wider confidence intervals than the geometric mean, undermining its value for any except the largest sample sizes. (C) 2015 Elsevier Ltd. All rights reserved.
In a previous publication the journal sub-impact factor denoted as SIF, and derived sub-impact sequences have been introduced. Their calculation included a discrete step. Now we adapt this scheme to include an interpolation procedure. A mathematical proof is given showing that anomalies that may happen in the discrete approach cannot happen anymore in the interpolated approach. (C) 2015 Elsevier Ltd. All rights reserved.
In the literature and on the Web we can readily find research excellence rankings for organizations and countries by either total number of highly-cited articles (HCAs) or by ratio of HCAs to total publications. Neither are indicators of efficiency. In the current work we propose an indicator of efficiency, the number of HCAs per scientist, which can complement the productivity indicators based on impact of total output. We apply this indicator to measure excellence in the research of Italian universities as a whole, and in each field and discipline of the hard sciences. (C) 2015 Elsevier Ltd. All rights reserved.
Author name disambiguation (AND) creates a daunting challenge in that disambiguation techniques often draw false conclusions when applied to incomplete or incorrect publication data. It becomes a more critical issue in the biomedical domain where PubMed articles are written by a wide range of researchers internationally. To tackle this issue, we create a carefully hand-crafted training set drawn from the entire PubMed collection by going through multiple iterations. We assess the quality of our training set by comparing it with SCOPUS-based training set. In addition, for the performance enhancement of the AND techniques, we propose a new set of publication features extracted by text mining techniques. The results of the experiments show that all four supervised learning techniques (Random Forest, C4.5, KNN, and SVM) with the new publication features (called NER model) achieve improved performance over the baseline and hybrid edit distance model. (C) 2015 Elsevier Ltd. All rights reserved.
Government-funded research institutes (GRIs) have played a pivotal role in national R&D in many countries. A prerequisite for achieving desired goals of GRIs with the limited R&D budget is to be able to effectively measure and compare R&D performance of GRIs. This paper proposes the bottom-up approach in which the performance of a GRI is measured based on the efficiency of its R&D projects. Data envelopment analysis (DEA) is employed to measure R&D efficiency of projects, and nonparametric statistical tests are run to measure and compare the R&D performance of GRIs. We apply the bottom-up DEA approach to the performance measurements of 10 Korean GRIs conducting a total of 1481 projects. The two alternatives for incorporating the relative importance of the output variables - the assurance region (AR) model and output integration - are also discussed. The proposed bottom-up approach can be used for formulating and implementing national R&D policy by effectively assessing the performance of GRIs. (C) 2015 Elsevier Ltd. All rights reserved.
Research performance values are not certain. Performance indexes should therefore be accompanied by uncertainty measures, to establish whether the performance of a unit is truly outstanding and not the result of random fluctuations. In this work we focus on the evaluation of research institutions on the basis of average individual performance, where uncertainty is inversely related to the number of research staff. We utilize the funnel plot, a tool originally developed in meta-analysis, to measure and visualize the uncertainty in the performance values of research institutions. As an illustrative example, we apply the funnel plot to represent the uncertainty in the assessed research performance for Italian universities active in biochemistry. (C) 2015 Elsevier Ltd. All rights reserved.
While the modern science is characterized by an exponential growth in scientific literature, the increase in publication volume clearly does not reflect the expansion of the cognitive boundaries of science. Nevertheless, most of the metrics for assessing the vitality of science or for making funding and policy decisions are based on productivity. Similarly, the increasing level of knowledge production by large science teams, whose results often enjoy greater visibility, does not necessarily mean that "big science" leads to cognitive expansion. Here we present a novel, big-data method to quantify the extents of cognitive domains of different bodies of scientific literature independently from publication volume, and apply it to 20 million articles published over 60-130 years in physics, astronomy, and biomedicine. The method is based on the lexical diversity of titles of fixed quotas of research articles. Owing to large size of quotas, the method overcomes the inherent stochasticity of article titles to achieve <1% precision. We show that the periods of cognitive growth do not necessarily coincide with the trends in publication volume. Furthermore, we show that the articles produced by larger teams cover significantly smaller cognitive territory than (the same quota of) articles from smaller teams. Our findings provide a new perspective on the role of small teams and individual researchers in expanding the cognitive boundaries of science. The proposed method of quantifying the extent of the cognitive territory can also be applied to study many other aspects of 'science of science.' (C) 2015 Elsevier Ltd. All rights reserved.
This paper studies the assignment of responsibility to the participants in the case of coauthored scientific publications. In the conceptual part, we establish that one shortcoming of the full counting method is its incompatibility with the use of additively decomposable citation impact indicators. In the empirical part of the paper, we study the consequences of adopting the address-line fractional or multiplicative counting methods. For this purpose, we use a Web of Science dataset consisting of 3.6 million articles published in the 2005-2008 period, and classified into 5119 clusters. Our research units are the 500 universities in the 2013 edition of the CWTS Leiden Ranking. Citation impact is measured using the Mean Normalized Citation Score, and the Top 10% indicators. The main findings are the following. Firstly, although a change of counting methods alters co-authorship and citation impact patterns, cardinal differences between co-authorship rates and between citation impact values are generally small. Nevertheless, such small differences generate considerable re-rankings between universities. Secondly, the universities that are more favored by the adoption of a fractional rather than a multiplicative approach are those with a large co-authorship rate for the citation distribution as a whole, a small co-authorship rate in the upper tail of this distribution, a large citation impact performance, and a large number of solo publications. (C) 2015 Elsevier Ltd. All rights reserved.
The article presents a large-scale comparison of journal rankings based on seven impact measures: Impact Factor (2- and 5-year), SJR, IPP, SNIP, H index, and Article Influence Score. Three aspects of ranking stability in the 2007-2014 period were analyzed: temporal, cross-discipline, and cross-indicator. Impact measures based on five-year citation windows enable more stable journal rankings over time. Journal rankings based on the source-normalized indicator (SNIP) have the largest cross-discipline stability. Journals in the fields of social sciences and humanities have lower temporal and cross-discipline ranking stability compared to those in "hard" sciences. Although correlation coefficients indicate relatively high agreement among the rankings based on different indicators, variations in quartile and percentile ranks suggest different conclusions. WoS journals almost linearly improve their ranking positions in Scopus lists, while many high-impact journals covered by Scopus are not available in WoS. An important element of the ranking stability is the discriminability of impact measures. Beyond the segregation between the top and bottom ranked journals, our assessment of "quality" relies in most cases on a rather arguable assumption that a couple of citations more or less is making a big difference. (C) 2015 Elsevier Ltd. All rights reserved.
The aim of this study is to compare PhD students' performance with respect to gender using a number of matching methods. The data consists of fine-grained information about PhD-students at the Institute of Clinical Research at the University of Southern Denmark. Men and women are matched controlling for sub-disciplinary affiliation, education, year of enrolment and age. Publications and citations are identified in Web of Science. Our study shows that the average total number of publication is slightly higher for men than for women. Excluding the "other" group of publications from the analyses reveals that there is a negligible difference between men and women in terms of published articles. A substantial proportion of women is on maternity leave during the time period analysed and thus we would expect their productivity to be considerably lower. Similarly, we have found very little difference between the citation impact of men and women. We find matching methods to be a promising set of methods for evaluating productivity and impact of individuals from various sub-fields, universities and time periods as we are able to discard some of the underlying factors determining the results of analyses of gender differences in productivity and citation impact. (C) 2015 Elsevier Ltd. All rights reserved.
The assessment of research topics according to their development stage can be used for different purposes, most importantly for decisions regarding the (financial) support of research groups and regions. In this work, we try to determine the influencing factors of emerging scientific topics during their early development stage. Documents in five pre-defined fields are analyzed with regard to the characteristics of the involved authors, their references and journals. With the help of an assignment to emerging and established topics, the publication behavior of documents in different development stages can be compared. Foremost, indicators can be derived that can help to identify publications in emerging topics in science at an early-stage after publication. The results show that the field differences are so pronounced that they hamper generalization. The field specific analysis, however, suggests that at least for some fields a pre-selection of emerging topics can be made. In technical fields, the involvement of larger groups of researchers is an apparent feature, while in medicine a contrary observation could be made. In addition, for the field of engineering we found that emerging topics are more often published in older but smaller journals, which indicates a high specialization of the publications. (C) 2015 Elsevier Ltd. All rights reserved.
Over the last decade, the relationship between interdisciplinarity and scientific impact has been the focus of many bibliometric papers, with diverging results. This paper aims at contributing to this body of research, by analyzing the level of interdisciplinarity, compiled with the Simpson Index, of the top 1% most highly cited papers and of papers with lower citation percentile ranks. Results shows that the top 1% most cited papers exhibit higher levels of interdisciplinarity than papers in other citation rank classes and that this relationship is observed in more than 90% of NSF specialties. This suggests that interdisciplinary research plays a more important role in generating high impact knowledge. (C) 2015 Elsevier Ltd. All rights reserved.
In this study we combine the registered output of a whole university in the Netherlands with data retrieved from the Web of Science. The initial research question was: is it possible to show the impact of the university in its' full broadness, taking into account the variety of disciplines covered in the research profile of the university? In order to answer this question, we analyzed the output of the university as registered in the CRIS system METIS, over the years 2004-2009. The registration covers a wide variety of scholarly outputs, and these are all taken into account in the analysis. In the study we conduct analyses on the coverage of the output of the university, both from the perspective of the output itself, towards the Web of Science ("external"), as well as from the Web of Science perspective itself ("internal"). This provides us with the necessary information to be able to draw clear conclusions on the validity of the usage of standard bibliometric methodologies in the research assessment of universities with such a research profile.
The state is still the significant unit for innovative studies during the age of R&D globalization and innovation regionalization. Using the bibliometric method, this paper attempts to provide a comprehensive picture of national innovation studies based on data derived from the Web of Knowledge. In particular, we identify the most significant countries and institutions, major journals, seminal contributions and contributors, and clusters in the network of citations in the field of national innovation studies. The results are useful for understanding and promoting the field of national innovation.
With the advances of all research fields, the volume of scientific literature has grown exponentially over the past decades, and the management and exploration of scientific literature is becoming an increasingly complicated task. It calls for a tool that combines scientific impacts and social focuses to visualize relevant papers from a specific research area and time period, and to find important and interesting papers. Therefore, we propose a graphical article-level metric (gALM), which captures the impact and popularity of papers from scientific and social aspects. These two dimensions are combined and visualized graphically as a circular map. The map is divided into sectors of papers belonging to a publication year, and each block represents a paper's journal citations by block size and readerships in Mendeley by block color. In this graphical way, gALM provides a more intuitive comparison of large-scale literatures. In addition, we also design an online Web server, Science Navigation Map (SNM), which not only visualizes the gALM but provides it with interactive features. Through an interactive visualization map of article-level metrics on scientific impact and social popularity in Mendeley, users can intuitively make a comparison of papers as well as explore and filter important and relevant papers by these metrics. We take the journal PLoS Biology as an example and visualize all the papers published in PLoS Biology during 2003 and 2014 by SNM. From this map, one can easily and intuitively find basic statistics of papers, such as the most cited papers and the most popular papers in Mendeley during a time period. SNM on the journal PLoS Biology is publicly available at http://www.linkscholar.org/plosbiology/.
A framework is proposed for comparing different types of bibliometric indicators, introducing the notion of an Indicator Comparison Report. It provides a comprehensive overview of the main differences and similarities of indicators. The comparison shows both the strong points and the limitations of each of the indicators at stake, rather than over-promoting one indicator and ignoring the benefits of alternative constructs. It focuses on base notions, assumptions, and application contexts, which makes it more intelligible to non-experts. As an illustration, a comparison report is presented for the original and the modified source normalized impact per paper indicator of journal citation impact (SNIP).
The importance of interdisciplinary research in accelerating the progress and commercialization of science is widely recognized, yet little is known about how academic research self-organizes towards interdisciplinarity. In this paper, we therefore explore the micro-level behavior of researchers as they venture into a promising space for interdisciplinary research, namely translational research-a bridge between basic and applied biomedical research. More specifically, we ask (1) whether the researchers who choose to engage in translational research have a strong scientific record, (2) how interdisciplinary research spanning basic and applied research influences the output of academic research, and (3) how different disciplinary distance in interdisciplinary research contributes to reputational benefits of researchers. We find that for some types of collaboration, interdisciplinarity results in more highly cited research, while in others it is not, and look for explanations for this difference. Our results show that translational research draws higher citations when it involves university researchers from the most basic end of the disciplinary spectrum, and when its issues are directed at basic (rather than applied) research.
The current study used citation data and relied on network analysis to determine centrality scores for 24 communication journals and the authors of their publications during the years 2007-2011. Scores were used to rank journals and authors across the discipline. The results of centrality rankings reveal that Journal of Communication, Communication Research, Communication Research Reports, Human Communication Research, and Communication Studies are the central most journals in the citation network. Across these 24 journals, the top 1 % of central most scholars are presented in rank based on the placement of their publications. An additional list ranks the 14 central most (1 %) of scholars who published in the five central most journals. These centrality rankings for the journals and authors are discussed in comparison to previous ranking methods. The results for the central most journals mirror the findings of other network analysis research relying on various citation data. However, the findings for author centrality rankings revealed that traditional methods (e.g., summing total publications) for ranking communication scholars yield drastically different results when compared to centrality rankings (incorporating breadth of publications across journals). Future attempts to situate prolific authors should consider the conceptual utility of relying on network analysis methods to analyze citation data. The limitations of this study are also discussed.
This paper describes the process by which almost all authors of papers in the Web of Science (WoS) can be characterised by their sex and ethnicity or national background, based on their names. These are compared with two large databases of surnames and given names to determine to which of some 160 different ethnic groups they are most likely to belong. Since 2008 the authors of WoS papers are tagged with their addresses, and many have their given names if they appear on the paper, so the workforce composition of each country can be determined. Conversely, the current location of members of particular ethnic groups can be found. This will show the extent of a country's "brain drain", if any. Key results are shown for one subject area, and inter alia it appears that the majority of researchers of Indian origin who are active in lung cancer research are working in the USA. But East Asians (Chinese, Japanese and Koreans) tend to stay in their country of birth.
It is well known that women are underrepresented in the academic systems of many countries. Gender discrimination is one of the factors that could contribute to this phenomenon. This study considers a recent national academic recruitment campaign in Italy, examining whether women are subject to more or less bias than men. The findings show that no gender-related differences occur among the candidates who benefit from positive bias, while among those candidates affected by negative bias, the incidence of women is lower than that of men. Among the factors that determine success in a competition for an academic position, the number of the applicant's career years in the same university as the committee members assumes greater weight for male candidates than for females. Being of the same gender as the committee president is also a factor that assumes greater weight for male applicants. On the other hand, for female applicants, the presence of a full professor in the same university with the same family name as the candidate assumes greater weight than for male candidates.
We take up the issue of performance differences between male and female researchers, and investigate the change of performance differences during the early career. In a previous paper it was shown that among starting researchers gendered performance differences seem small to non-existent (Van Arensbergen et al. 2012). If the differences do not occur in the early career anymore, they may emerge in a later period, or may remain absent. In this paper we use the same sample of male and female researchers, but now compare performance levels about 10 years later. We use various performance indicators: full/fractional counted productivity, citation impact, and relative citation impact in terms of the share of papers in the top 10 % highly cited papers. After the 10 years period, productivity of male researchers has grown faster than of female researcher, but the field normalized (relative) citation impact indicators of male and female researchers remain about equal. Furthermore, performance data do explain to a certain extent why male careers in our sample develop much faster than female researchers' careers; but controlling for performance differences, we find that gender is an important determinant too. Consequently, the process of hiring academic staff still remains biased.
This article examines the structure of co-authorship networks' stability in time. The goal of the article is to analyse differences in the stability and size of groups of researchers that co-author with each other (core research groups) formed in disciplines from the natural and technical sciences on one hand and the social sciences and humanities on the other. The cores were obtained by a pre-specified blockmodeling procedure assuming a multi-core-semi-periphery-periphery structure. The stability of the obtained cores was measured with the Modified Adjusted Rand Index. The assumed structure was confirmed in all analysed disciplines. The average size of the cores obtained is higher in the second time period and the average core size is greater in the natural and technical sciences than in the social sciences and humanities. There are no differences in average core stability between the natural and technical sciences and the social sciences and humanities. However, if the stability of cores is defined by the splitting of cores and not also by the percentage of researchers who left the cores, the average stability of the cores is higher in disciplines from the scientific fields of Engineering sciences and technologies and Medical sciences than in disciplines of the Humanities, if controlling for the networks' and disciplines' characteristics. The analysis was performed on disciplinary co-authorship networks of Slovenian researchers in two time periods (1991-2000 and 2001-2010).
The number of citations that a patent receives is considered an important indicator of the quality and impact of the patent. However, a variety of methods and data sources can be used to calculate this measure. This paper evaluates similarities between citation indicators that differ in terms of (a) the patent office where the focal patent application is filed; (b) whether citations from offices other than that of the application office are considered; and (c) whether the presence of patent families is taken into account. We analyze the correlations between these different indicators and the overlap between patents identified as highly cited by the various measures. Our findings reveal that the citation indicators obtained differ substantially. Favoring one way of calculating a citation indicator over another has non-trivial consequences and, hence, should be given explicit consideration. Correcting for patent families, especially when using a broader definition (INPADOC), provides the most uniform results.
Bibliometric methods are used in multiple fields for a variety of purposes, namely for research evaluation. Most bibliometric analyses have in common their data sources: Thomson Reuters' Web of Science (WoS) and Elsevier's Scopus. The objective of this research is to describe the journal coverage of those two databases and to assess whether some field, publishing country and language are over or underrepresented. To do this we compared the coverage of active scholarly journals in WoS (13,605 journals) and Scopus (20,346 journals) with Ulrich's extensive periodical directory (63,013 journals). Results indicate that the use of either WoS or Scopus for research evaluation may introduce biases that favor Natural Sciences and Engineering as well as Biomedical Research to the detriment of Social Sciences and Arts and Humanities. Similarly, English-language journals are overrepresented to the detriment of other languages. While both databases share these biases, their coverage differs substantially. As a consequence, the results of bibliometric analyses may vary depending on the database used. These results imply that in the context of comparative research evaluation, WoS and Scopus should be used with caution, especially when comparing different fields, institutions, countries or languages. The bibliometric community should continue its efforts to develop methods and indicators that include scientific output that are not covered in WoS or Scopus, such as field-specific and national citation indexes.
In a "publish-or-perish culture", the ranking of scientific journals plays a central role in assessing the performance in the current research environment. With a wide range of existing methods for deriving journal rankings, meta-rankings have gained popularity as a means of aggregating different information sources. In this paper, we propose a method to create a meta-ranking using heterogeneous journal rankings. Employing a parametric model for paired comparison data we estimate quality scores for 58 journals in the OR/MS/POM community, which together with a shrinkage procedure allows for the identification of clusters of journals with similar quality. The use of paired comparisons provides a flexible framework for deriving an aggregated score while eliminating the problem of missing data.
The introduction of citational analysis has caused adaptive responses in the scientific community and its journals, which have been widely discussed in the literature on the evaluation of scientific research. In this work, we deal with the problem of quantitatively measuring the importance of scientific journals when an impact factor is not available, as occurs for most journals in the humanities areas. We conduct a survey to investigate the editorial polices of such journals. We conclude that the 'selectivity of journals in their choice of papers for publication', and the 'journal diffusion are sensitive and useful indicators, which can be used in conjunction with the classical impact indicators already available in order to evaluate the role of the journals.
The study aims to investigate the relationships between consumption of e-journals distributed by Elsevier ScienceDirect platform, publication (articles) and impact (citations) in a sample of 13 French universities, from 2003 to 2009. It adopts a value perspective as it questions whether or not publication activity and impact are some kind of return led by consumption. A bibliometric approach was used to explore the relations between these three variables. The analysis developed indicators inspired by the mathematical h-Index technique. Results show that the relation between consumption, publication and citations depends on the discipline's profile, the intensity of research and the size of each institution. Moreover, although relations have been observed between the three variables, it is not possible to determine which variable comes first to explain the phenomena. The study concludes by showing strong correlations, which nevertheless do not lead to clear causal relations. The article provide practical implication for academic library managers who want to show the added value of their electronic e-journals collections can replicate the study approach. Also for policy makers who want to take into account e-journals usage as an informative tool to predict the importance of publication activity.
This paper demonstrates the ways and degrees to which contemporary, U.S.-based, employed or retired ecologists aggregate into guild-like groups on the basis of their valuations of 15 professional traits. Principle components analysis of survey data from 904 Ecological Society of America respondents led to five emergent factors from the 15 traits: 'enjoying nature,' 'preserving nature,' 'questing for knowledge,' 'possessing epistemic expertise,' and 'accepting religious foundations for valuing nature.' Subsequent cluster analysis on these factors yielded four groups of respondents we designated as 'youthful relativists,' 'older naturalists,' scientific objectivists,' and 'optimistic traditionalists.' Surprisingly, the majority of respondents were negative about the 'enjoying nature,' or 'preserving nature' factors, a matter for further exploration. Also, differential levels of doubt existed as to the maintenance of objectivity during the practices of research, and especially in participation in environmental issues.
The paper discusses the trend of world literature on "International Business" in terms of the output of research publications as indexed in the Social Sciences Citation Index during the period from 2004 to 2013. A total of 3131 journals and 1623 papers were indexed on international business in the database during the 10 year study period. The average number of papers published per year was 162.30. The highest numbers of papers, i.e., 268 (16.513 %) were published in the year 2010. The author Eden L and Causgil ST have shared the top position who wrote the highest publications, i.e., 13 (0.801 %) each. The source title Journal of International Business Studies contained the highest number with 359 (22.12 %) publications. The most popular research area is Business Economics in which the highest number of publications, i.e., 1442 (88.848 %) counted. The United States contributed highest number of publications, i.e., 616 (37.954 %) among the total of 62 counties who contributed on the subject. Most productive institution was University of Leeds, which contributed a total of 28 (1.1164 %) publications among the total of 513 organizations. Articles amounted to 1329 (81.885 %) of the literature on international business. The study will help researchers and authors who can identify the most appropriate, influential journals in which to publish, as well as confirm the status of journals in which they have published (Hasan et al. in Proceedings of the fourth international conference of the digital libraries, 27-29 November 2013 New Delhi, India. TERI, New Delhi, pp 319-329, 2013). It will help professors, academicians and students who can discover where to find the current reading list in their respective fields (Krishna and Kumar in SRELS J Inf Manag 41(2):229-234, 2004). The publishers and editors can determine journals' influence in the marketplace and review editorial functions (Chuang et al. in Scientometrics 87(3):551-562, 2011). The educational institutions, business groups, to look into the trends and make appropriate policies related to international business on the basis of inferences depicted from the analysis. The administrators, policy and planning makers can track bibliometric and citation patterns to make strategic and funding decisions (Arora et al. in Curr Sci 104(3):307-315, 2013). The librarians and information analysts can support selection or removal of journals from their collections, and determine how long to keep each journal in the collection before archiving it (Trivedi in Libr Philos Pract, 1-6, 2010).
Technological research topics might enjoy dramatic increases in popularity, regardless of yet unclear commercialization prospects. The article analyzes the example of graphene, an advanced material, first demonstrated in 2004, which benefited from the visibility and expectations of policy makers, investors and R&D performers. The bibliometric analysis helps better understand the initial era of ferment in technology cycle, before graphene's technical and commercial feasibility was confirmed. It offers insights into the underlying dynamics, which accompanies the topic's emergence and the subsequent hype. Exponential growth in article counts is contrasted with decreasing citations per article and shares of highly-cited publications. The research field's growing complexity is demonstrated by decomposing the discourse into publications concerning manufacturing graphene, its characterization and potential applications in non-electronics areas of health, environment and energy. Activities of publication outlets are traced, with a small number of journals accounting for the majority of publications and citations, and gradual increases in graphene's presence in individual journals. International co-authorship patterns evolve over time, and while the network density and the average betweenness centrality of actors increase, the international concentration was found to follow a U-shaped pattern, initially promoting the field's openness, but later making it less accessible, so that only some researchers benefit from this "window of opportunity". The observed regularities follow a fashion-like pattern, with researchers joining the bandwagon to benefit from the topic's popularity. The timing of entry into an emerging research field is important for maximizing the scientific impact of researchers, institutions, journals and countries.
This paper introduces a statistical and other analysis of peer reviewers in order to approach their "quality" through some quantification measure, thereby leading to some quality metrics. Peer reviewer reports for the Journal of the Serbian Chemical Society are examined. The text of each report has first to be adapted to word counting software in order to avoid jargon inducing confusion when searching for the word frequency: e.g. C must be distinguished, depending if it means Carbon or Celsius, etc. Thus, every report has to be carefully "rewritten". Thereafter, the quantity, variety and distribution of words are examined in each report and compared to the whole set. Two separate months, according when reports came in, are distinguished to observe any possible hidden spurious effects. Coherence is found. An empirical distribution is searched for through a Zipf-Pareto rank-size law. It is observed that peer review reports are very far from usual texts in this respect. Deviations from the usual (first) Zipf's law are discussed. A theoretical suggestion for the "best (or worst) report" and by extension "good (or bad) reviewer", within this context, is provided from an entropy argument, through the concept of "distance to average" behavior. Another entropy-based measure also allows to measure the journal reviews (whence reviewers) for further comparison with other journals through their own reviewer reports.
Using the example of the 'h-related' publication dataset created for a previous study on the literature of Hirsch-type measures (Zhang et al. in J Informetri 5(3):583-593, 2011) and updated for the present paper, we attempt to study the robustness and dynamic evolution of 'core documents'. With using two different methods that have been recently introduced in the context of the detection and analysis of emerging topics (Glanzel and Thijs in Scientometrics 91(2):399-416, 2012), we show that the applied methods prove both stable and representative. Furthermore, the evolution of the core-document network represents and follows the general trends topic development in terms of focus shift from theory to application in an adequate manner.
This study aimed to monitor the status of scientific research in Japan, especially the diversity of scientific research, based on in-depth analyses of science maps. Analyses of six consecutive maps showed decreasing diversity in Japanese science relative to benchmarking countries. We proposed a new concept for a Sci-GEO chart (representing geographical characteristics of research areas on a science map) that aims to classify research areas in terms of continuity and cognitive linkage. Based on the Sci-GEO chart, we classified research areas into four Sci-GEO types, i.e., small island, island, peninsula, and continent, and analysed their properties, such as size and transitions across Sci-GEO types. Our analyses showed that Japanese science put more weight on continent-type research (stable and well-established research), compared with benchmarking countries, and the lack of diversity is attributable to low coverage of small island-type research (research areas with active replacement). We also demonstrated that the Sci-GEO chart well represented the funding characteristics and strategy of two major competitive funding agencies in Japan, and the chart provides policymakers and government officials with baseline information for the planning of science and technology policy.
Scientific productivity data (number of publications and h-indices), collected from the Web of Science (WoS) database for the period 2005-2010 for 13 countries of Southeast Europe (including Austria as the reference country) and for the 251 WoS categories were grouped, during extraction of data, into 41 fields of science (FoS) according to Frascati manual classification (OECD in Revised field of science and technology (FoS) classification in the Frascati manual, pp. 1-12, 2007). The Scientific Performance QuaLity (SPQL) level has been defined and calculated for the 13 studied countries and for all FoS based on the established best fit of the linear dependence between P (1/alpha) and h-index. From these data the SPQL levels of the six major fields and overall country levels have been generated in a way which makes them dependent not on the quantity of scientific publications output, but on its quality thus making them suitable for constructing conceivable science policies. Nevertheless, general observed trend shows growth of the quality of scientific performance (SPQL) with scientific production output, but there are evident exceptions from such a tendency in both positive and negative directions. The highest quality levels of the reached scientific performance have been identified for the 41 FoSs (subfields) and 6 major FoS by the countries concerned.
In recent decades China has witnessed an impressive improvement in science and its scientific output has become the second largest in the world. From both quantitative and qualitative perspectives, this paper aims to explore China's comparative advantages in different academic disciplines. This paper employs two datasets: publications in all journals and publications in the top 5 % journals by discipline. With the former database we investigate the comparative advantages of each academic discipline in terms of absolute output volume, and with the latter database we evaluate the scientific output published in prestigious resources. Different from the criticism stated in previous literature, this paper finds that the quality of China's research (represented by papers published in high-impact journals) is promising. Since 2006 the growth of scientific publications in China has been driven by papers published in English-language journals. The increasing visibility of Chinese science seems to be paving the way for its wider recognition and higher citation rates.
As publications are the principal method of distributing research, journal editors serve as the gatekeepers of emerging knowledge. Here, we provide a "case-control study" to examine the role of editorial bias in the New England Journal of Medicine, a major medical journal, by investigating author demographics of case reports that are either under editorial or meritorious selection. Our results indicate that editorial bias promoting the publication of authors from select high performance countries is declining, although there is increasing editorial preference for university-based authors. These findings are relevant to efforts aiming to increase transparency in scientific publishing.
Plagiarism represents a serious and growing problem in science, with only a fraction of such publications detected and retracted. Any initiative to deal efficiently with the problem of plagiarism would require a joint effort of academic publishers and editors. The most effective measure would be to establish a common plagiarism detection system, adopted by all peer-reviewed journals and major publishers, with automatic uploading and cross-checking of each newly submitted manuscript with both published material and all further and ongoing submissions. If adequately implemented, such system would fully resolve the problem of multiple submissions, and detect instances of plagiarism of unpublished material. Significant portion of scientific misconduct cases would be resolved, in most cases much before the publishing stage. This would greatly benefit the scientific community and science in general. The need for publishing retraction notices would be also diminished, thereby reducing the publishing costs. Lastly, the system would probably act as a deterrent, passively contributing to plagiarism frequency reduction.
Large, multi-institutional groups or collaborations of scientists are engaged in nuclear physics research projects, and the number of research facilities is dwindling. These collaborations have their own authorship rules, and they produce a large number of highly-cited papers. Multiple authorship of nuclear physics publications creates a problem with the assessment of an individual author's productivity relative to his/her colleagues and renders ineffective a performance metrics solely based on annual publication and citation counts. Many institutions are increasingly relying on the total number of first-author papers; however, this approach becomes counterproductive for large research collaborations with an alphabetical order of authors. A concept of fractional authorship (the claiming of credit for authorship by more than one individual) helps to clarify this issue by providing a more complete picture of research activities. In the present work, nuclear physics fractional and total authorships have been investigated using nuclear data mining techniques. Historic total and fractional authorship averages have been extracted from the Nuclear Science References database, and the current range of fractional contributions has been deduced. The results of this study and their implications are discussed and conclusions presented.
One interesting phenomenon that emerges from the typical structure of social networks is the friendship paradox. It states that your friends have on average more friends than you do. Recent efforts have explored variations of it, with numerous implications for the dynamics of social networks. However, the friendship paradox and its variations consider only the topological structure of the networks and neglect many other characteristics that are correlated with node degree. In this article, we take the case of scientific collaborations to investigate whether a similar paradox also arises in terms of a researcher's scientific productivity as measured by her H-index. The H-index is a widely used metric in academia to capture both the quality and the quantity of a researcher's scientific output. It is likely that a researcher may use her coauthors' H-indexes as a way to infer whether her own H-index is adequate in her research area. Nevertheless, in this article, we show that the average H-index of a researcher's coauthors is usually higher than her own H-index. We present empirical evidence of this paradox and discuss some of its potential consequences.
This article reviews theoretical and empirical studies on information credibility, with particular questions as to how scholars have conceptualized credibility, which is known as a multifaceted concept with underlying dimensions; how credibility has been operationalized and measured in empirical studies, especially in the web context; what are the important user characteristics that contribute to the variability of web credibility assessment; and how the process of web credibility assessment has been theorized. An agenda for future research on information credibility is also discussed.
Trust is the most important characteristic of digital repositories designed to hold and deliver archival documents that have persistent value to stakeholders. In theoretical models of trust in information, the concept of trustworthiness is emerging as both fundamentally important and understudied, particularly in the domain of digital repositories. This article reports on a qualitative study designed to elicit from groups of end users components of trustworthiness and to assess their relative importance. The study draws on interview data from 3 focus groups with experienced users of the Washington State Digital Archives. Utilizing thematic analysis and micro-interlocutor analysis to examine a combination of interview transcripts and video recordings, the study provides a realistic picture of the strength and character of emergent themes that underpin the more general concept of trustworthiness. The study reinforces the centrality of trustworthiness at the individual document level, but calls into question the formulation of trustworthiness as a concept in Kelton, Fleischmann, and Wallace's (2008) Integrated Model of Trust in Information.
Conclusions of research articles depend on bodies of data that cannot be included in articles themselves. To share this data is important for reasons of both transparency and reuse. Science, Technology, and Medicine journals have a role in facilitating sharing, but by what mechanism is not yet clear. The Journal Research Data (JoRD) Project was a JISC (Joint Information Systems Committee)-funded feasibility study on the potential for a central service on journal research data policies. The objectives of the study included identifying the current state of journal data sharing policies and investigating stakeholders' views and practices. The project confirmed that a large percentage of journals have no data sharing policy and that there are inconsistencies between those that are traceable. This state leaves authors unsure of whether they should share article related data and where and how to deposit those data. In the absence of a consolidated infrastructure to share data easily, a model journal data sharing policy was developed by comparing quantitative information from analyzing existing journal data policies with qualitative data collected from stakeholders. This article summarizes and outlines the process by which the model was developed and presents the model journal data sharing policy.
Better knowledge of the habits and preferences of patients helps one understand why and how patients might need or want to access health services online and offline. Such knowledge provides a basis for designing systems for providing complementary health information. This article discusses how patients' conceptualizations of their health-information-related preferences, motivations, and needs are linked to the perceived role of medical records as an informational artifact. We identified seven subject positions: (P1) Hypothetically positive to e-health services generally, (P2) Positive to reading medical records due to implications, (P3) Positive to all Internet use including medical records online, (P4) Distrustful and wants to be in control of health treatment, (P5) Worried about health, (P6) Wants communication with health care professionals, and (P7) Do not understand their medical record. These subject positions can explain the worry and enthusiasm documented in earlier literature. The diversity of subject positions implies that health care information services should be planned with different subject positions in mind rather than a simple demographic group. Special attention needs to be given to finding flexible solutions that address the opportunities and worries of the identified subject positions.
Although news websites are used by a large and increasing number of people, there is a lack of research within human-computer interaction regarding users' experience with this type of interactive technology. In the current research, existing measures of user-experience factors were identified and, using an online survey, answers to psychometric scales to measure website characteristics, need fulfillment, affective reactions, and constructs of technology acceptance and user experience were collected from regular users of news sites. A comprehensive user-experience model was formulated to explain acceptance and quality judgments of news sites. The main contribution of the current study is the application of influential models of user experience and technology acceptance to the domain of online news. By integrating both types of variable in a comprehensive model, the relationships between the types of variable are clarified both theoretically and empirically. Implications of the model for theory, further research, and system design are discussed.
The most significant challenge in facilitating a professional virtual community (PVC) is maintaining a continuous supply of knowledge from members, especially because lurkers often make up a large portion of an online community. However, we still do not understand how knowledge-sharing intention (KSI) is formed across poster and lurker groups. Accordingly, this study seeks to provide a fuller understanding of the formation of behavioral intention in PVCs by decomposing the psychological formation of KSI and focusing on factors deemed likely to influence the KSI of posters and lurkers. This study's online survey of 177 posters and 246 lurkers from 3 PVCs demonstrated that enjoyment in helping others positively influenced posters' attitudes toward knowledge sharing, whereas reciprocity and technology adoption variables (perceived ease of use and compatibility) positively influenced lurkers' attitudes. Interpersonal trust and peer influence strongly affected the subjective norm of knowledge sharing in both groups, with posters emphasizing interpersonal trust and lurkers emphasizing peer influence. Furthermore, knowledge self-efficacy and resource availability enhanced the perceived behavioral control of knowledge sharing in both groups, with knowledge self-efficacy affecting posters the most and resource availability influencing lurkers the most. The results of this study have important implications for both research and practice.
Online social shopping communities are transforming the way customers communicate and exchange product information with others. To date, the issue of customer participation in online social shopping communities has become an important but underexplored research area in the academic literature. In this study, we examined how online social interactions affect customer information contribution behavior. We also explored the moderating role of customer reputation in the relationship between observational learning and reinforcement learning as well as customer information contribution behavior. Analyses of panel data from 6,121 customers in an online social fashion platform revealed that they are significant factors affecting customer information contribution behavior and that reinforcement learning exhibits a stronger effect than observational learning. The results also showed that customer reputation has a significant negative moderating effect on the relationship between observational learning and customer information contribution behavior. This study not only enriched our theoretical understanding of information contribution behavior but also provided guidelines for online social shopping community administrators to better design their community features.
Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and the sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured documents (e.g., web pages) retrieval. In this study we utilize the structure of Tweets, induced by these blocks, for Twitter retrieval and Twitter opinion retrieval. For Twitter retrieval, a set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring Tweets can achieve state-of-the-art performance. Our approach does not rely on social media features, but when we do add this additional information, performance improves significantly. For Twitter opinion retrieval, we explore the question of whether structural information derived from the body of Tweets and opinionatedness ratings of Tweets can improve performance. Experimental results show that retrieval using a novel unsupervised opinionatedness feature based on structuring Tweets achieves comparable performance with a supervised method using manually tagged Tweets. Topic-related specific structured Tweet sets are shown to help with query-dependent opinion retrieval.
Managing the constant flow of incoming messages is a daily challenge faced by knowledge workers who use technologies such as e-mail and other digital communication tools. This study focuses on the most ubiquitous of these technologies, e-mail, and unobtrusively explores the ongoing inbox-management activities of thousands of users worldwide over a period of 8 months. The study describes the dynamics of these inboxes throughout the day and the week as users strive to handle incoming messages, read them, classify them, respond to them in a timely manner, and archive them for future reference, all while carrying out the daily tasks of knowledge workers. It then tests several hypotheses about the influence of specific inbox-management behaviors in mitigating the causes of e-mail overload, and proposes a continuous index that quantifies one of these inbox-management behaviors. This inbox clearing index (ICI) expands on the widely cited trichotomous classification of users into frequent filers, spring cleaners, and no filers, as suggested by Whittaker and Sidner (1996). We propose that the ICI allows shifting the focus, from classifying users to characterizing a diversity of user behaviors and measuring the relationships between these behaviors and desired outcomes.
Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.
Hundreds of thousands of hashtags are generated every day on Twitter. Only a few will burst and become trending topics. In this article, we provide the definition of a bursting hashtag and conduct a systematic study of a series of challenging prediction problems that span the entire life cycles of bursting hashtags. Around the problem of how to build a system to predict bursting hashtags, we explore different types of features and present machine learning solutions. On real data sets from Twitter, experiments are conducted to evaluate the effectiveness of the proposed solutions and the contributions of features.
YouTube is a successful social network that people use to upload, watch, and comment on videos. We believe comments left on these videos can provide insight into user interests, but to this point have not been used to map out a specific video community. Our study investigates whether and how user commenting behavior impacts the topology of the K-pop video community through analysis of co-commenting behavior on these videos. We apply a traditional author cocitation analysis to this behavior, in a process we refer to as co-comment analysis, to detect the topology of this community. This involves: a) an analysis of user co-comments to elicit the inclination of user homophily within the community; b) an analysis of user co-comments, weighted frequency of co-comments, to detect user interests in the community; and c) an analysis of user co-comments, weighted sentiment scores, to capture user opinions by polarity. The results indicate that users who comment on specific K-pop videos also tend to comment on topically similar YouTube videos. We also find that the number of comments made by users correlates with the degree of positivity of their comments. Conversely, users who comment negatively on K-pop videos are not inclined to form specific user groups, but rather present only their opinions individually.
The pervasive availability of smartphones and their connected external sensors or wearable devices can provide a new public health data collection capability. Current research and commercial efforts have concentrated on sensor-based collection of health data for personal fitness and healthcare feedback purposes. However, to date there has not been a detailed investigation of how such smartphones and sensors can be utilized for public health data collection purposes. Public health data have the characteristic of being capturable while still not infringing upon privacy, as the full detailed data of individuals are not needed but rather only anonymized, aggregate, de-identified, and non-unique data for an individual. For example, rather than details of physical activity including specific route, just total caloric burn over a week or month could be submitted, thereby strongly assisting non-re-identification. In this paper we introduce, prototype, and evaluate a new type of public health information system to provide aggregate population health data capture and public health intervention capabilities via utilizing smartphone and sensor capabilities, while fully maintaining the anonymity and privacy of each individual. We consider in particular the key aspects of privacy, anonymity, and intervention capabilities of these emerging systems and provide a detailed evaluation of anonymity preservation characteristics.
Despite increasing interest in and acknowledgment of the significance of video games, current descriptive practices are not sufficiently robust to support searching, browsing, and other access behaviors from diverse user groups. To address this issue, the Game Metadata Research Group at the University of Washington Information School, in collaboration with the Seattle Interactive Media Museum, worked to create a standardized metadata schema. This metadata schema was empirically evaluated using multiple approachescollaborative review, schema testing, semi-structured user interview, and a large-scale survey. Reviewing and testing the schema revealed issues and challenges in sourcing the metadata for particular elements, determining the level of granularity for data description, and describing digitally distributed games. The findings from user studies suggest that users value various subject and visual metadata, information about how games are related to each other, and data regarding game expansions/alterations such as additional content and networked features. The metadata schema was extensively revised based on the evaluation results, and we present the new element definitions from the revised schema in this article. This work will serve as a platform and catalyst for advances in the design and use of video game metadata.
A large body of research work examined, from both the query side and the user behavior side, the characteristics of medical- and health-related searches. One of the core issues in medical information retrieval (IR) is diversity of tasks that lead to diversity of categories of information needs and queries. From the evaluation perspective, another related and challenging issue is the limited availability of appropriate test collections allowing the experimental validation of medically task oriented IR techniques and systems. In this paper, we explore the peculiarities of TREC and CLEF medically oriented tasks and queries through the analysis of the differences and the similarities between queries across tasks, with respect to length, specificity, and clarity features and then study their effect on retrieval performance. We show that, even for expert oriented queries, language specificity level varies significantly across tasks as well as search difficulty. Additional findings highlight that query clarity factors are task dependent and that query terms specificity based on domain-specific terminology resources is not significantly linked to term rareness in the document collection. The lessons learned from our study could serve as starting points for the design of future task-based medical information retrieval frameworks.
This study applied LDA (latent Dirichlet allocation) and regression analysis to conduct a lead-lag analysis to identify different topic evolution patterns between preprints and papers from arXiv and the Web of Science (WoS) in astrophysics over the last 20 years (1992-2011). Fifty topics in arXiv and WoS were generated using an LDA algorithm and then regression models were used to explain 4 types of topic growth patterns. Based on the slopes of the fitted equation curves, the paper redefines the topic trends and popularity. Results show that arXiv and WoS share similar topics in a given domain, but differ in evolution trends. Topics in WoS lose their popularity much earlier and their durations of popularity are shorter than those in arXiv. This work demonstrates that open access preprints have stronger growth tendency as compared to traditional printed publications.
Text mining has been widely used in multiple types of user-generated data to infer user opinion, but its application to microblogging is difficult because text messages are short and noisy, providing limited information about user opinion. Given that microblogging users communicate with each other to form a social network, we hypothesize that user opinion is influenced by its neighbors in the network. In this paper, we infer user opinion on a topic by combining two factors: the user's historical opinion about relevant topics and opinion influence from his/her neighbors. We thus build a topic-level opinion influence model (TOIM) by integrating both topic factor and opinion influence factor into a unified probabilistic model. We evaluate our model in one of the largest microblogging sites in China, Tencent Weibo, and the experiments show that TOIM outperforms baseline methods in opinion inference accuracy. Moreover, incorporating indirect influence further improves inference recall and f1-measure. Finally, we demonstrate some useful applications of TOIM in analyzing users' behaviors in Tencent Weibo.
The question of which type of computer science (CS) publicationconference or journalis likely to result in more citations for a published paper is addressed. A series of data sets are examined and joined in order to analyze the citations of over 195,000 conference papers and 108,000 journal papers. Two means of evaluating the citations of journals and conferences are explored: h(5) and average citations per paper; it was found that h(5) has certain biases that make it a difficult measure to use (despite it being the main measure used by Google Scholar). Results from the analysis show that CS, as a discipline, values conferences as a publication venue more highly than any other academic field of study. The analysis also shows that a small number of elite CS conferences have the highest average paper citation rate of any publication type, although overall, citation rates in conferences are no higher than in journals. It is also shown that the length of a paper is correlated with citation rate.
In many scientific fields, the order of coauthors on a paper conveys information about each individual's contribution to a piece of joint work. We argue that in prior network analyses of coauthorship networks, the information on ordering has been insufficiently considered because ties between authors are typically symmetrized. This is basically the same as assuming that each coauthor has contributed equally to a paper. We introduce a solution to this problem by adopting a coauthorship credit allocation model proposed by Kim and Diesner (2014), which in its core conceptualizes coauthoring as a directed, weighted, and self-looped network. We test and validate our application of the adopted framework based on a sample data of 861 authors who have published in the journal Psychometrika. The results suggest that this novel sociometric approach can complement traditional measures based on undirected networks and expand insights into coauthoring patterns such as the hierarchy of collaboration among scholars. As another form of validation, we also show how our approach accurately detects prominent scholars in the Psychometric Society affiliated with the journal.
This study employs a panel threshold regression model to test whether the patent h-index has a threshold effect on the relationship between patent citations and market value in the pharmaceutical industry. It aims to bridge the gap in extant research on this topic. This study demonstrates that the patent h-index has a triple threshold effect on the relationship between patent citations and market value. When the patent h-index is less than or equal to the lowest threshold, 4, there is a positive relationship between patent citations and market value. This study indicates that the first regime (where the patent h-index is less than or equal to 4) is optimal, because this is where the extent of the positive relationship between patent citations and market value is the greatest.
In this brief contribution I argue that an apparent dichotomy between information behavior seen as the behavior of individuals and their respective information styles and information behavior considered as a social practice may be resolved by considering the underresearched corporeality of the human body aka embodiment, which is a fundamental aspect of any kind of behavior, including information behavior. Practice is inherently embodied too, which means embodiment can be utilized as a vantage point to seek conceptual grounding for the rather diverse range of theories and models in information behavior research. The challenge then is to articulate in what ways and on what levels a particular approach contributes to advancing information behavior research. Conceptual clarity would also help information behavior models and theories developed in libraries and information science become more accessible and hopefully also more relevant to researchers in cognate disciplines.
I discuss how, given a certain number of articles and citations of these articles, the h-index and the g-index are affected by the level of concentration of the citations. This offers the opportunity for a comparison between these 2 indices from a new perspective.
This study presents the calculation of odds, and odds ratios, for the comparison of the citation impact of universities in the Leiden Ranking. Odds and odds ratios can be used to measure the performance difference between a selected university and competing institutions, or the average of selected competitors, in a relatively simple but clear way.
Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single gold standard method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
Since its foundation in 2006, Twitter has enjoyed a meteoric rise in popularity, currently boasting over 500 million users. Its short text nature means that the service is open to a variety of different usage patterns, which have evolved rapidly in terms of user base and utilization. Prior work has categorized Twitter users, as well as studied the use of lists and re-tweets and how these can be used to infer user profiles and interests. The focus of this article is on studying why and how Twitter users mark tweets as favoritesa functionality with currently poorly understood usage, but strong relevance for personalization and information access applications. Firstly, manual analysis and classification are carried out on a randomly chosen set of favorited tweets, which reveal different approaches to using this functionality (i.e., bookmarks, thanks, like, conversational, and self-promotion). Secondly, an automatic favorites classification approach is proposed, based on the categories established in the previous step. Our machine learning experiments demonstrate a high degree of success in matching human judgments in classifying favorites according to usage type. In conclusion, we discuss the purposes to which these data could be put, in the context of identifying users' patterns of interests.
Encouraging users of social network sites (SNS) to actively provide personal information is vital if SNS are to prosper, but privacy concerns have hindered users from giving such information. Previous research dealing with privacy concerns has studied mostly worries about information misuse, focusing on the protection aspects of privacy. By adopting an interpersonal conception of privacy and communication privacy management theory, this study offers a new way of understanding privacy concerns by examining the social and presentational aspects of privacy. It examines privacy concerns in terms not only of others' misuse but of others' misunderstanding and personal information in terms not only of identity but of self-presentational information. Furthermore, it investigates the ways in which information and social risks inherent in SNS influence privacy concerns. A structural equation modeling analysis of a cross-sectional survey of 396 Facebook users finds that longer usage does not alleviate the impact of information risk on either concern, that a greater proportion of offline friends among one's SNS friends aggravates the impact of social risk on both concerns, and that concerns about information misuse affect the provision only of identity information, whereas concerns about information misunderstanding affect the provision of both identity and self-presentational information.
The aim of this study is to understand how teenagers use Internet forums to search for information. The activities of asking for and providing information in a forum were explored, and a set of messages extracted from a French forum targeting adolescents was analyzed. Results show that the messages initiating the threads are often requests for information. Teenagers mainly ask for peers' opinions on personal matters and specific verifiable information. The discussions following these requests take the form of an exchange of advice (question/answer) or a coconstruction of the final answer between the participants (with assessments of participants' responses, requests for explanations, etc.). The results suggest that discussion forums present different advantages for adolescents' information-seeking activities. The first is that this social medium allows finding specialized information on topics specific to this age group. The second is that the collaborative aspect of information seeking in a forum allows these adolescents to overcome difficulties commonly associated with the search process (making a precise request, evaluating a result).
Microblogging is growing in popularity and significance. Although many researchers have attempted to explain why and how people use this new medium, previous studies have produced relatively inconclusive results. For instance, in most of these studies, microblogging has been considered a social networking activity; however, quantitative analyses of microblogging usage have shown that people use microblogging as an information-broadcasting platform. In this study, we identified the factors that drive microblogging and which of them lead to user satisfaction. We developed a theoretical framework and then empirically validated the factors and the emergent mechanisms (value evaluation processes). We empirically tested our research model using a sample of 230 microbloggers, and the results showed that content and technology gratifications are the two key factors that drive user satisfaction with microblogging. That is, it is the value of information dissemination rather than social networking that makes people feel satisfied with the use of microblogging. We believe that this study will generate interest among researchers in social media. The results also provide platform administrators with insights into how people use microblogging and why they are satisfied with the technology.
Keyphrases represent the main topics a text is about. In this article, we introduce SemGraph, an unsupervised algorithm for extracting keyphrases from a collection of texts based on a semantic relationship graph. The main novelty of this algorithm is its ability to identify semantic relationships between words whose presence is statistically significant. Our method constructs a co-occurrence graph in which words appearing in the same document are linked, provided their presence in the collection is statistically significant with respect to a null model. Furthermore, the graph obtained is enriched with information from WordNet. We have used the most recent and standardized benchmark to evaluate the system ability to detect the keyphrases that are part of the text. The result is a method that achieves an improvement of 5.3% and 7.28% in F measure over the two labeled sets of keyphrases used in the evaluation of SemEval-2010.
Tag recommendation strategies that exploit term co-occurrence patterns with tags previously assigned to the target object have consistently produced state-of-the-art results. However, such techniques work only for objects with previously assigned tags. Here we focus on tag recommendation for objects with no tags, a variation of the well-known \textit{cold start} problem. We start by evaluating state-of-the-art co-occurrence based methods in cold start. Our results show that the effectiveness of these methods suffers in this situation. Moreover, we show that employing various automatic filtering strategies to generate an initial tag set that enables the use of co-occurrence patterns produces only marginal improvements. We then propose a new approach that exploits both positive and negative user feedback to iteratively select input tags along with a genetic programming strategy to learn the recommendation function. Our experimental results indicate that extending the methods to include user relevance feedback leads to gains in precision of up to 58% over the best baseline in cold start scenarios and gains of up to 43% over the best baseline in objects that contain some initial tags (i.e., no cold start). We also show that our best relevance-feedback-driven strategy performs well even in scenarios that lack user cooperation (i.e., users may refuse to provide feedback) and user reliability (i.e., users may provide the wrong feedback).
Descriptive document clustering aims at discovering clusters of semantically interrelated documents together with meaningful labels to summarize the content of each document cluster. In this work, we propose a novel descriptive clustering framework, referred to as CEDL. It relies on the formulation and generation of 2 types of heterogeneous objects, which correspond to documents and candidate phrases, using multilevel similarity information. CEDL is composed of 5 main processing stages. First, it simultaneously maps the documents and candidate phrases into a common co-embedded space that preserves higher-order, neighbor-based proximities between the combined sets of documents and phrases. Then, it discovers an approximate cluster structure of documents in the common space. The third stage extracts promising topic phrases by constructing a discriminant model where documents along with their cluster memberships are used as training instances. Subsequently, the final cluster labels are selected from the topic phrases using a ranking scheme using multiple scores based on the extracted co-embedding information and the discriminant output. The final stage polishes the initial clusters to reduce noise and accommodate the multitopic nature of documents. The effectiveness and competitiveness of CEDL is demonstrated qualitatively and quantitatively with experiments using document databases from different application fields.
Extant studies suggest implementing a business intelligence (BI) system is a costly, resource-intensive and complex undertaking. Literature draws attention to the critical success factors (CSFs) for implementation of BI systems. Leveraging case studies of seven large organizations and blending them with Yeoh and Koronios's (2010) BI CSFs framework, our empirical study gives evidence to support this notion of CSFs and provides better contextual understanding of the CSFs in BI implementation domain. Cross-case analysis suggests that organizational factors play the most crucial role in determining the success of a BI system implementation. Hence, BI stakeholders should prioritize on the organizational dimension ahead of other factors. Our findings allow BI stakeholders to holistically understand the CSFs and the associated contextual issues that impact on implementation of BI systems.
Vast amounts of information are daily exchanged and/or released. The sensitive nature of much of this information creates a serious privacy threat when documents are uncontrollably made available to untrusted third parties. In such cases, appropriate data protection measures should be undertaken by the responsible organization, especially under the umbrella of current legislation on data privacy. To do so, human experts are usually requested to redact or sanitize document contents. To relieve this burdensome task, this paper presents a privacy model for document redaction/sanitization, which offers several advantages over other models available in the literature. Based on the well-established foundations of data semantics and information theory, our model provides a framework to develop and implement automated and inherently semantic redaction/sanitization tools. Moreover, contrary to ad-hoc redaction methods, our proposal provides a priori privacy guarantees which can be intuitively defined according to current legislations on data privacy. Empirical tests performed within the context of several use cases illustrate the applicability of our model and its ability to mimic the reasoning of human sanitizers.
The organization of scientific papers typically follows a standardized pattern, the well-known IMRaD structure (introduction, methods, results, and discussion). Using the full text of 45,000 papers published in the PLoS series of journals as a case study, this paper investigates, from the viewpoint of bibliometrics, how references are distributed along the structure of scientific papers as well as the age of these cited references. Once the sections of articles are realigned to follow the IMRaD sequence, the position of cited references along the text of articles is invariant across all PLoS journals, with the introduction and discussion accounting for most of the references. It also provides evidence that the age of cited references varies by section, with older references being found in the methods and more recent references in the discussion. These results provide insight into the different roles citations have in the scholarly communication process.
This study introduces a new proposal to refine the classification of the SCImago Journal and Country Rank (SJR) platform by using clustering techniques and an alternative combination of citation measures from an initial 18,891 SJR journal network. Thus, a journal-journal matrix including simultaneously fractionalized values of direct citation, cocitation, and coupling was symmetrized by cosine similarity and later transformed into distances before performing clustering. The results provided a new cluster-based subject structure comprising 290 clusters that emerge by executing Ward's clustering in two phases and using a mixed labeling procedure based on tf-idf scores of the original SJR category tags and significant words extracted from journal titles. In total, 13,716 SJR journals were classified using this new cluster-based scheme. Although more than 5,000 journals were omitted in the classification process, the method produced a consistent classification with a balanced structure of coherent and well-defined clusters, a moderated multiassignment of journals, and a softer concentration of journals over clusters than in the original SJR categories. New subject disciplines such as nanoscience and nanotechnology or social work were also detected, providing evidence of good performance of our approach in refining the journal classification and updating the subject classification structure.
In theory, articles can attract readers on the social reference sharing site Mendeley before they can attract citations, so Mendeley altmetrics could provide early indications of article impact. This article investigates the influence of time on the number of Mendeley readers of an article through a theoretical discussion and an investigation into the relationship between counts of readers of, and citations to, 4 general library and information science (LIS) journals. For this discipline, it takes about 7 years for articles to attract as many Scopus citations as Mendeley readers, and after this the Spearman correlation between readers and citers is stable at about 0.6 for all years. This suggests that Mendeley readership counts may be useful impact indicators for both newer and older articles. The lack of dates for individual Mendeley article readers and an unknown bias toward more recent articles mean that readership data should be normalized individually by year, however, before making any comparisons between articles published in different years.
Some major concerns of universities are to provide quality in higher education and enhance global competitiveness, thus ensuring a high global rank and an excellent performance evaluation. This article examines the Quacquarelli Symonds (QS) World University Ranking methodology, pointing to a drawback of using subjective, possibly biased, weightings to build a composite indicator (QS scores). We propose an alternative approach to creating QS scores, which is referred to as the composite I-distance indicator (CIDI) methodology. The main contribution is the proposal of a composite indicator weights correction based on the CIDI methodology. It leads to the improved stability and reduced uncertainty of the QS ranking system. The CIDI methodology is also applicable to other university rankings by proposing a specific statistical approach to creating a composite indicator.
Explaining how, why, and to what extent humans use information systems has been at the heart of the information systems (IS) discipline, and although successful models have emerged, mostly relying on social and cognitive psychology in their theoretical underpinnings, there are still cases that remain unexplainable. In this article, we scrutinize one of these cases where the continued use of technology cannot be explained by one of the most prominent traditional IS models to date. We analyze our qualitative case study by juxtaposing two theoretical lenses: a traditional IS perspective (i.e., the unified theory of acceptance and use model) versus an evolutionary psychology perspective (i.e., the four-drive model). We find that a more comprehensive understanding of continued IS usage is possible when including an evolutionary psychology perspective to the existing models. Specifically, we propose three new concepts, including evolved performance expectancy, evolved effort expectancy, and evolved social influence. We also demonstrate that, in some situations, cognitive and social constructs dominate, whereas in other situations, the evolutionary dependent constructs associated with human nature take over.
This brief communication presents preliminary findings on automated Twitter accounts distributing links to scientific articles deposited on the preprint repository arXiv. It discusses the implication of the presence of such bots from the perspective of social media metrics (altmetrics), where mentions of scholarly documents on Twitter have been suggested as a means of measuring impact that is both broader and timelier than citations. Our results show that automated Twitter accounts create a considerable amount of tweets to scientific articles and that they behave differently than common social bots, which has critical implications for the use of raw tweet counts in research evaluation and assessment. We discuss some definitions of Twitter cyborgs and bots in scholarly communication and propose distinguishing between different levels of engagementthat is, differentiating between tweeting only bibliographic information to discussing or commenting on the content of a scientific work.
We discuss a real-world application of a recently proposed machine learning method for authorship verification. Authorship verification is considered an extremely difficult task in computational text classification, because it does not assume that the correct author of an anonymous text is included in the candidate authors available. To determine whether 2 documents have been written by the same author, the verification method discussed uses repeated feature subsampling and a pool of impostor authors. We use this technique to attribute a newly discovered Latin text from antiquity (the Compendiosa expositio) to Apuleius. This North African writer was one of the most important authors of the Roman Empire in the 2(nd) century and authored one of the world's first novels. This attribution has profound and wide-reaching cultural value, because it has been over a century since a new text by a major author from antiquity was discovered. This research therefore illustrates the rapidly growing potential of computational methods for studying the global textual heritage.
Political sentiment analysis using social media, especially Twitter, has attracted wide interest in recent years. In such research, opinions about politicians are typically divided into positive, negative, or neutral. In our research, the goal is to mine political opinion from social media at a higher resolution by assessing statements of opinion related to the personality traits of politicians; this is an angle that has not yet been considered in social media research. A second goal is to contribute a novel retrieval-based approach for tracking public perception of personality using Gough and Heilbrun's Adjective Check List (ACL) of 110 terms describing key traits. This is in contrast to the typical lexical and machine-learning approaches used in sentiment analysis. High-precision search templates developed from the ACL were run on an 18-month span of Twitter posts mentioning Obama and Romney and these retrieved more than half a million tweets. For example, the results indicated that Romney was perceived as more of an achiever and Obama was perceived as somewhat more friendly. The traits were also aggregated into 14 broad personality dimensions. For example, Obama rated far higher than Romney on the Moderation dimension and lower on the Machiavellianism dimension. The temporal variability of such perceptions was explored.
Recent work by Evans, Cordova, and Sipole (2014) reveals that in the two months leading up to the 2012 election, female House candidates used the social media site Twitter more often than male candidates. Not only did female candidates tweet more often, but they also spent more time attacking their opponents and discussing important issues in American politics. In this article, we examine whether the female winners of those races acted differently than the male winners in the 2012 election, and whether they differed in their tweeting-style during two months in the summer of 2013. Using a hand-coded content analysis of every tweet from each member in the U.S. House of Representatives in June and July of 2013, we show that women differ from their male colleagues in their frequency and type of tweeting, and note some key differences between the period during the election and the period after. This article suggests that context greatly affects representatives' Twitter-style.
Online discussion forums have become a popular medium for users to discuss with and seek information from other users having similar interests. A typical discussion thread consists of a sequence of posts posted by multiple users. Each post in a thread serves a different purpose providing different types of information and, thus, may not be equally useful for all applications. Identifying the purpose and nature of each post in a discussion thread is thus an interesting research problem as it can help in improving information extraction and intelligent assistance techniques. We study the problem of classifying a given post as per its purpose in the discussion thread and employ features based on the post's content, structure of the thread, behavior of the participating users, and sentiment analysis of the post's content. We evaluate our approach on two forum data sets belonging to different genres and achieve strong classification performance. We also analyze the relative importance of different features used for the post classification task. Next, as a use case, we describe how the post class information can help in thread retrieval by incorporating this information in a state-of-the-art thread retrieval model.
In the context of online question-answering services, an intermediary clarifies the user's needs by eliciting additional information. This research proposes that these elicitations will depend on the type of question. In particular, this research explores the relationship between three constructs: question types, elicitations, and the fee that is paid for the answer. These relationships are explored for a few different question typologies, including a new kind of question type that we call Identity. It is found that the kinds of clarifications that intermediaries elicit depend on the type of question in systematic ways. A practical implication is that interactive question-answering serviceswhether human or automatedcan be steered to focus attention on the kinds of clarification that are evidently most needed for that question type. Further, it is found that certain question types, as well as the number of elicitations, are associated with higher fees. This means that it may be possible to define a pricing structure for question-answering services based on objective and predictable characteristics of the question, which would help to establish a rational market for this type of information service. The newly introduced Identity question type was found to be especially reliable in predicting elicitations and fees.
Online communities are becoming an important tool in the communication and participation processes in our society. However, the most widespread applications are difficult to use for people with disabilities, or may involve some risks if no previous training has been undertaken. This work describes a novel social network for cognitively disabled people along with a clustering-based method for modeling activity and socialization processes of its users in a noninvasive way. This closed social network is specifically designed for people with cognitive disabilities, called Guremintza, that provides the network administrators (e.g., social workers) with two types of reports: summary statistics of the network usage and behavior patterns discovered by a data mining process. Experiments made in an initial stage of the network show that the discovered patterns are meaningful to the social workers and they find them useful in monitoring the progress of the users.
This article explores access to information through an analysis of sources and strategies as part of workplace learning in a medical context in an African developing country. It focuses on information practices in everyday patient care by a team of senior and junior physicians in a university teaching hospital. A practice-oriented, interpretative case study approach, in which elements from activity theory, situated learning theory, and communities of practice framework, was developed to form the theoretical basis for the study. The qualitative data from observations and interviews were analyzed with iterative coding techniques. The findings reveal that physicians' learning through everyday access to medical information is enacted by, embedded in, and sustained as a part of the work activity itself. The findings indicate a stable community of practice with traits of both local and general medical conventions, in which the value of used sources and strategies remains relatively uncontested, strongly based on formally and informally sanctioned and legitimized practices. Although the present study is particular and context specific, the results indicate a more generally plausible conclusion; the complementary nature of different information sources and strategies underscores that access to information happens in a context in which solitary sources alone make little difference.
Large amounts of health-related information of different types are available on the web. In addition to authoritative health information sites maintained by government health departments and healthcare institutions, there are many social media sites carrying user-contributed information. This study sought to identify the types of drug information available on consumer-contributed drug review sites when compared with authoritative drug information websites. Content analysis was performed on the information available for nine drugs on three authoritative sites (RxList, eMC, and PDRhealth) as well as three drug review sites (WebMD, RateADrug, and PatientsLikeMe). The types of information found on authoritative sites but rarely on drug review sites include pharmacology, special population considerations, contraindications, and drug interactions. Types of information found only on drug review sites include drug efficacy, drug resistance experienced by long-term users, cost of drug in relation to insurance coverage, availability of generic forms, comparison with other similar drugs and with other versions of the drug, difficulty in using the drug, and advice on coping with side effects. Drug efficacy ratings by users were found to be different across the three sites. Side effects were vividly described in context, with user assessment of severity based on discomfort and effect on their lives.
This article reports the findings of a qualitative research study that examined professional image users' knowledge of, and interest in using, content-based image retrieval (CBIR) systems in an attempt to clarify when and where CBIR methods might be applied. The research sought to determine the differences in the perceived usefulness of CBIR technologies among image user groups from several domains and explicate the reasons given regarding the utility of CBIR systems for their professional tasks. Twenty participants (archaeologists, architects, art historians, and artists), individuals who rely on images of cultural materials in the performance of their work, took part in the study. The findings of the study reveal that interest in CBIR methods varied among the different professional user communities. Individuals who showed an interest in these systems were primarily those concerned with the formal characteristics (i.e., color, shape, composition, and texture) of the images being sought. In contrast, those participants who expressed a strong interest in images of known items, images illustrating themes, and/or items from specific locations believe concept-based searches to be the most direct route. These image users did not see a practical application for CBIR systems in their current work routines.
We propose a tag-based framework that simulates human abstractors' ability to select significant sentences based on key concepts in a sentence as well as the semantic relations between key concepts to create generic summaries of transcribed lecture videos. The proposed extractive summarization method uses tags (viewer- and author-assigned terms) as key concepts. Our method employs Flickr tag clusters and WordNet synonyms to expand tags and detect the semantic relations between tags. This method helps select sentences that have a greater number of semantically related key concepts. To investigate the effectiveness and uniqueness of the proposed method, we compare it with an existing technique, latent semantic analysis (LSA), using intrinsic and extrinsic evaluations. The results of intrinsic evaluation show that the tag-based method is as or more effective than the LSA method. We also observe that in the extrinsic evaluation, the grand mean accuracy score of the tag-based method is higher than that of the LSA method, with a statistically significant difference. Elaborating on our results, we discuss the theoretical and practical implications of our findings for speech video summarization and retrieval.
Major efforts have been conducted on ontology learning, that is, semiautomatic processes for the construction of domain ontologies from diverse sources of information. In the past few years, a research trend has focused on the construction of educational ontologies, that is, ontologies to be used for educational purposes. The identification of the terminology is crucial to build ontologies. Term extraction techniques allow the identification of the domain-related terms from electronic resources. This paper presents LiTeWi, a novel method that combines current unsupervised term extraction approaches for creating educational ontologies for technology supported learning systems from electronic textbooks. LiTeWi uses Wikipedia as an additional information source. Wikipedia contains more than 30 million articles covering the terminology of nearly every domain in 288 languages, which makes it an appropriate generic corpus for term extraction. Furthermore, given that its content is available in several languages, it promotes both domain and language independence. LiTeWi is aimed at being used by teachers, who usually develop their didactic material from textbooks. To evaluate its performance, LiTeWi was tuned up using a textbook on object oriented programming and then tested with two textbooks of different domainsastronomy and molecular biology.
This article addresses the problem of unsupervised decomposition of a multi-author text document: identifying the sentences written by each author assuming the number of authors is unknown. An approach, BayesAD, is developed for solving this problem: apply a Bayesian segmentation algorithm, followed by a segment clustering algorithm. Results are presented from an empirical comparison between BayesAD and AK, a modified version of an approach published by Akiva and Koppel in 2013. BayesAD exhibited greater accuracy than AK in all experiments. However, BayesAD has a parameter that needs to be set and which had a nontrivial impact on accuracy. Developing an effective method for eliminating this need would be a fruitful direction for future work. When controlling for topic, the accuracy levels of BayesAD and AK were, in all but one case, worse than a baseline approach wherein one author was assumed to write all sentences in the input text document. Hence, room for improved solutions exists.
A statistical analysis of full text downloads of articles in Elsevier's ScienceDirect covering all disciplines reveals large differences in download frequencies, their skewness, and their correlation with Scopus-based citation counts, between disciplines, journals, and document types. Download counts tend to be 2 orders of magnitude higher and less skewedly distributed than citations. A mathematical model based on the sum of two exponentials does not adequately capture monthly download counts. The degree of correlation at the article level within a journal is similar to that at the journal level in the discipline covered by that journal, suggesting that the differences between journals are, to a large extent, discipline specific. Despite the fact that in all studied journals download and citation counts per article positively correlate, little overlap may exist between the set of articles appearing in the top of the citation distribution and that with the most frequently downloaded ones. Usage and citation leaks, bulk downloading, differences between reader and author populations in a subject field, the type of document or its content, differences in obsolescence patterns between downloads and citations, and different functions of reading and citing in the research process all provide possible explanations of differences between download and citation distributions.
Genius work, proposed by Avramescu, refers to scientific articles whose citations grow exponentially in an extended period, for example, over 50 years. Such articles were defined as sleeping beauties by van Raan, who quantitatively studied the phenomenon of delayed recognition. However, the criteria adopted by van Raan at times are not applicable and may confer recognition prematurely. To revise such deficiencies, this paper proposes two new criteria, which are applicable (but not limited) to exponential citation curves. We searched for genius work among articles of Nobel Prize laureates during the period of 1901-2012 on the Web of Science, finding 25 articles of genius work out of 21,438 papers including 10 (by van Raan's criteria) sleeping beauties and 15 nonsleeping-beauties. By our new criteria, two findings were obtained through empirical analysis: (a) the awakening periods for genius work depend on the increase rate b in the exponential function, and (b) lower b leads to a longer sleeping period.
This paper proposes a diffusion model to measure the evolution of stakeholders' disaster perceptions by integrating a disaster message model, a stakeholder model, and a stakeholder memory model, which collectively describe the process of information flow. Simulation results show that the rate of forgetting has a significantly negative effect on stakeholders' perceptions and the incremental increase in the number of affected individuals has a positive effect on the maximum level of stakeholders' perceptions, but negative effect on the duration of stakeholders' perceptions. Additionally, a delay effect, a stagnation effect, and a cumulative effect exist in the evolution of stakeholders' perceptions. There is a spike at the beginning of the profile of stakeholders' perceptions in the Damped Exponential Model. An empirical test supports the validity of this model of stakeholders' disaster perceptions.
Although the navigability of digital interfaces has been long discussed as a key determinant of media effects of web use, existing scholarship has not yielded a clear conceptual understanding of navigability, nor how to measure perceived navigability as an outcome. The present paper attempts to redress both and proposes that navigability be conceptually examined along three dimensions, namely, logic of structure, clarity of structure, and clarity of target. A 2x2x2 factorial between-subjects experiment (N=128) was conducted to examine distinct contributions of these dimensions to perceptions of a nonprofit website. The results showed significant effects for logic of structure and clarity on perceived navigability, while logic of structure and content domain involvement affected attitudes toward the website.
Main path analysis is a powerful tool for extracting the backbones of a directed network and has been applied widely in bibliometric studies. In contrast to the no-decay assumption in the traditional approach, this study proposes a novel technique by assuming that the strength of knowledge decays when knowledge contained in one document is passed on to another document down the citation chain. We propose three decay models, arithmetic decay, geometric decay, and harmonic decay, along with their theoretical properties. In general, results of the proposed decay models depend largely on the local structure of a citation network as opposed to the global structure in the traditional approach. Thus, the significance of citation links and the associated documents that are overemphasized by the global structure in the traditional no-decay approach is treated more properly. For example, the traditional approach commonly assigns high value to documents that heavily reference others, such as review articles. Specifically in the geometric and harmonic decay models, only truly significant review articles will be included in the resulting main paths. We demonstrate this new approach and its properties through the DNA literature citation network.
All text is ephemeral. Some texts are more ephemeral than others. The web has proved to be among the most ephemeral and changing of information vehicles. The research note revisits Koehler's original data set after about 20 years since it was first collected. By late 2013, the number of URLs responding to a query had fallen to 1.6% of the original sample. A query of the 6 remaining URLs in February 2015 showed only 2 still responding.
The Academic Ranking of World Universities (ARWU) uses six university performance indicators, including Alumni and Awardsthe number of alumni and staff winning Nobel Prizes and Fields Medals. These two indicators raised doubts about the reliability of this ranking method because they are difficult to cope with. Recently, a newsletter was published featuring a reduced ARWU ranking list, leaving out Nobel Prize and Fields Medal indicators: the Alternative Ranking (Excluding Award Factor). We used uncertainty and sensitivity analyses to examine and compare the stability and confidence of the official ARWU ranking and the Alternative Ranking. The results indicate that if the ARWU ranking is reduced to the 4-indicator Alternative Ranking, it shows greater certainty and stability in ranking universities.
Crowdsourcing has seen a substantial increase in interest from researchers and practitioners in recent years. Being a new form of work facilitated by information technology, the rise of crowdsourcing calls for the development of new theoretical insights. Our focus in this article is on extra-role behavioremployees' voluntary activities, which are not part of their prescribed duties. Specifically, we explored how user interface design can help increase extra-role behavior among crowdsourcing workers. In a randomized experiment, we examined the joint effects of the presentation of a performance display to crowdsourcing workers and the personal attributes of these workers on the workers' likelihood to engage in extra-role behavior. The experimental setting included an image analysis task performed on an environmental monitoring website. We compared workers' behavior across the different experimental conditions and found that the interaction between the presence of a performance display and the workers' personality trait of curiosity has a significant impact on the likelihood of engaging in extra-role behavior. In particular, the presence of a performance display was associated with increased likelihood of extra-role behavior among low-curiosity workers, and no change in extra-role behavior was observed among high-curiosity users. Implications for design are discussed.
The purpose of this study is to explore and compare citations and discipline distribution in journal articles and books in the field of information society. By investigating citations, co-citation analysis and social network analysis, this study highlights the major disciplines in the information society field, identifies the highly-cited works and the relationships among them, and analyzes the multidisciplinary nature of the field. A total of 84 selective documents related to the study of information society were collected. The Web of Science, including Science Citation Index Expanded, Social Sciences Citation Index, Arts and Humanities Citation Index, and Book Citation Index, was selected to search for citation and co-citation data from 2005 to 2012. A co-citation matrix was built and subject clusters were determined. Moreover, co-citation data acquired from a social network analysis tool, UCINET, were put through centrality analysis to explore the influence of each document in the field of information society. Conclusions were made based on research results.
Despite much in-depth investigation of factors influencing the coauthorship evolution in various scientific fields, our knowledge about how efficiency or creativity is linked to the longevity of collaborative relationships remains very limited. We explore what Nobel laureates' coauthorship patterns reveal about the nature of scientific collaborations looking at the intensity and success of scientific collaborations across fields and across laureates' collaborative lifecycles in physics, chemistry, and physiology/medicine. We find that more collaboration with the same researcher is actually no better for advancing creativity: publications produced early in a sequence of repeated collaborations with a given coauthor tend to be published better and cited more than papers that come later in the collaboration with the same coauthor. Our results indicate that scientific collaboration involves conceptual complementarities that may erode over a sequence of repeated interactions.
With the advent of co-citation techniques in the 1960's, studies on citation shifted from simple quantifications of the number of citations per document to more complex analyses focusing on the relationship between citations. The present study seeks to map the scientific and cognitive structure of Brazilian stem cell publications. Brazilian publications from 2001 to 2010 were retrieved from the Web of Science database, and data on journal co-citations were processed with the help of VOSViewer software. The results indicated that Brazilian stem cell publications are characterised by a strong emphasis on co-cited journals in the fields of biochemistry and molecular biology, cell biology and haematology. Such characteristics suggest that Brazilian stem cell research is oriented both towards the treatment of a variety of human diseases and injuries caused by accidents (research with stem cells) as well as the understanding of stem cells' mechanisms of division, differentiation and self-renewal (research on stem cells), the most promising facet of stem cells. The progress in Brazilian stem cell research likely results from a combination of a federal law established in the 2000's, which regulates stem cell use and research, and the growth of Brazilian science.
This paper studies the evaluation of research units that publish their output in several scientific fields. A possible solution relies on the prior normalization of the raw citations received by publications in all fields. In a second step, a citation indicator is applied to the units' field-normalized citation distributions. In this paper, we also study an alternative solution that begins by applying a size- and scale-independent citation impact indicator to the units' raw citation distributions in all fields. In a second step, the citation impact of any research unit is calculated as the average (weighted by the publication output) of the citation impact that the unit achieves in each field. The two alternatives are confronted using the 500 universities in the 2013 edition of the CWTS Leiden Ranking, whose research output is evaluated according to two citation impact indicators with very different properties. We use a large Web of Science dataset consisting of 3.6 million articles published in the 2005-2008 period, and a classification system distinguishing between 5119 clusters. The main two findings are as follows. Firstly, differences in production and citation practices between the 3332 clusters with more than 250 publications account for 22.5 % of the overall citation inequality. After the standard field-normalization procedure, where cluster mean citations are used as normalization factors, this quantity is reduced to 4.3 %. Secondly, the differences between the university rankings according to the two solutions for the all-sciences aggregation problem are of a small order of magnitude for both citation impact indicators.
Diversification and fragmentation of scientific exploration brings an increasing need for integration, for example through interdisciplinary research. The field of nanoscience and nanotechnology appears to exhibit strong interdisciplinary characteristics. Our objective was to explore the structure of the field and ascertain how different research areas within this field reflect interdisciplinarity through citation patterns. The complex relations between the citing and cited articles were examined through schematic visualization. Examination of WOS categories assigned to journals shows the scatter of nano studies across a wide range of research topics. We identified four distinctive groups of categories each showing some detectable shared characteristics. Three alternative measures of similarity were employed to delineate these groups. These distinct groups enabled us to assess interdisciplinarity within the groups and relationships between the groups. Some measurable levels of interdisciplinarity exist in all groups. However, one of the groups indicated that certain categories of both citing as well as cited articles aggregate mostly in the framework of physics, chemistry, and materials. This may suggest that the nanosciences show characteristics of a distinct discipline. The similarity in citing articles is most evident inside the respective groups, though, some subgroups within larger groups are also related to each other through the similarity of cited articles.
Interdisciplinarity is increasingly widespread. Many technological frontiers and hotspots are emerging in the intersecting research areas. The existing measurement indexes of interdisciplinarity are mostly based on the co-occurrence of authors, institutions, or references, and most focus on the tendency to interdisciplinarity. This paper introduces a new measurement index entitled topic terms interdisciplinarity (TI) for interdisciplinarity topic mining. Taking Information Science & Library Science (LIS) as a case study, this paper identifies interdisciplinary topics by calculating TI values together with Bet values, term frequency values, and others, and analyzes the evolution of interdisciplinary sciences based on social network analysis and time series analysis. It was found that the intersections of external disciplines and pivots of internal topics for LIS can be identified by the utilization of TI value and Bet values. The research has shown that the TI value can identify interdisciplinary topic terms well, and it will be an efficient indicator for interdisciplinary analysis by being complementary to other methods.
Scientific research activity produces the "Matthew Effect" on resource allocation. Based on a data set in the life sciences field from the National Natural Science Foundation of China (NSFC) during the 11th Five-Year-Plan (2006-2010), this paper makes an empirical study on how the Matthew Effect of funding allocation at the institutional level and city level impact scientific research activity output. With Gini coefficient evaluation, descriptive statistic analysis, and the Poisson regression model, we found that there has been a rapid increase in the concentration degree of funding allocation among institutions and cities. Within a period of 5 years, the Gini coefficients of total funding of institutions and cities as the units of measurement have increased from 0.61 and 0.74 to 0.67 and 0.79 respectively. However, this concentration in funding allocation did not result in significant additional benefits. Institutions awarded with more funding did not produce the expected positive spillover effect on their scientific research activity output. Instead, an "inverted U-shape" pattern of decreasing returns to scale was discovered, under which there was a negative effect on internal scientific research activity in the majority of institutions with concentrated funding allocation. Meanwhile, the result shows that Young Scholars projects under the NFSC produced high-level output. We conclude the study by discussing the possible reasons of the inverted U-shape pattern and its policy implications.
We analyze the distributions of paper-paper and paper-patent citations and estimate the relationship between them, based on a 4763-paper sample among the top 100 researchers in the life and medical sciences fields in Japan. We find that paper-paper citations peak at a 4-year average, while the corresponding lag for paper-patent citations is 6 years. Moreover, we show that paper quality is important for being cited by a patent. An inverse U-shaped relationship exists between the research grant and research quality, whereas a U-shaped relationship exists between research quality and total number of papers.
Recently, a normalized version of the g-index has been proposed as an impact and concentration measure: the so-called s measure. The main problem with this measure is that-somewhat paradoxically-it may be that s = 1 even in the case in which citations are perfectly evenly spread across articles. We prove that the measure s can and should be improved by a different choice of a normalizing term. This is done by defining the latter as a function of the citation count of the single most cited paper. The new index presented here does not suffer from insensitivity to citation transfers within the g-core.
A new indirect indicator is introduced for the assessment of scientific publications. The proposed indicator (-index) takes into account both the direct and indirect impact of scientific publications and their age. The indicator builds on the concept of generations of citations and acts as a measure of the accumulated impact of each scientific publication. A number of cases are examined that demonstrate the way the indicator behaves under well defined conditions in a Paper-Citation graph, like when a paper is cited by a highly cited paper, when cycles exist and when self-citations and chords are examined. Two new indicators for the assessment of authors are also proposed (fa-index and fas-index) that utilize the -index values of the scientific publications included in the Publication Record of an author. Finally, a comparative study of the and indices and a list of well known direct (Number of Citations, Mean number of citations, Contemporary h-index) and indirect (PageRank, SCEAS) indicators is presented.
Scientific peer-review and publication systems incur a huge burden in terms of costs and time. Innovative alternatives have been proposed to improve the systems, but assessing their impact in experimental studies is not feasible at a systemic level. We developed an agent-based model by adopting a unified view of peer review and publication systems and calibrating it with empirical journal data in the biomedical and life sciences. We modeled researchers, research manuscripts and scientific journals as agents. Researchers were characterized by their scientific level and resources, manuscripts by their scientific value, and journals by their reputation and acceptance or rejection thresholds. These state variables were used in submodels for various processes such as production of articles, submissions to target journals, in-house and external peer review, and resubmissions. We collected data for a sample of biomedical and life sciences journals regarding acceptance rates, resubmission patterns and total number of published articles. We adjusted submodel parameters so that the agent-based model outputs fit these empirical data. We simulated 105 journals, 25,000 researchers and 410,000 manuscripts over 10 years. A mean of 33,600 articles were published per year; 19 % of submitted manuscripts remained unpublished. The mean acceptance rate was 21 % after external peer review and rejection rate 32 % after in-house review; 15 % publications resulted from the first submission, 47 % the second submission and 20 % the third submission. All decisions in the model were mainly driven by the scientific value, whereas journal targeting and persistence in resubmission defined whether a manuscript would be published or abandoned after one or many rejections. This agent-based model may help in better understanding the determinants of the scientific publication and peer-review systems. It may also help in assessing and identifying the most promising alternative systems of peer review.
Science Parks are complex institutions that aim at promoting innovation and entrepreneurship at local level. Their activities entertain a large set of stakeholders going from internal and external researchers to entrepreneurs, local level public administration and universities. As a consequence, their performances extends on a large set of dimensions affecting each other. This feature makes Science Parks particularly difficult to be properly compared. However, evaluating their performances in a comparable way may be important for at least three reasons: (1) to identify best practices in each activity and allow a faster diffusion of these practices, (2) to inform potential entrepreneurs about institutions better supporting start-ups birth and first stages and (3) to guide public policies in the distribution of funds and incentives. The multidimensional nature of Science Parks raises the problem of aggregating performances in simple indexes that can be accessed by stakeholders willing to compare different structures on the basis of their own preferences. This paper exploits a new dataset on Italian Science Parks to provide a pilot study towards this direction. In particular, we apply Choquet integral based Multi-Attribute Value Theory to elicit stakeholders' preferences on different dimensions of Science Parks' performances and construct a robust index allowing to rank them. This tool can be used to support the decision making process of multiple stakeholders looking for best (or worst) performers and allows to account both for subjective nature of the evaluation process and the interactions among decision attributes. Despite the present study employs only a limited number of respondents and performance measures, the procedure we present can be straightforwardly adapted to much richer environments.
A bibliometric analysis of published geographical information system (GIS) research was performed to evaluate current research trends, quantitatively and qualitatively, over the period 1961-2010, based on SCIE & SSCI databases. Articles referring to GIS were concentrated on the analysis of scientific outputs, distribution of subject categories, source journals, international collaboration, geographic distribution of authors, temporal trends in keywords usage, and the relationship between GIS articles and computer numbers. The results showed that the growth of scientific outputs has exploded since 1991, with an increasing collaboration index, references, and citations. Environmental sciences, multidisciplinary geosciences, ecology, physical geography, water resources, geography, and remote sensing were most frequently used subject categories, and IJGIS was the most productive journal in this field. The United States produced the most independent and collaborative articles, took a central position in the collaboration network, and had the greatest number of most prolific institutions. North America, Western Europe, and East Asia had major clusters of authors. A keywords analysis demonstrated that the integration of GIS and RS was a key development trend. Spatial analysis, model, land use, map, and landscape were research hotspots. Generally, GIS research was significantly correlated with the development of personal computers, and there was a statistically significant quadratic polynomial growth in GIS-related articles. This study will help readers to understand global trends in GIS research during the past 50 years.
The purpose of this study was to determine whether authorship order-as measured by first author publications, citations to first author publications, and the first author h-index-plays a significant role in scholarly productivity. Scholarly productivity was assessed in this study with publications from 2011 to 2014 and citations to these publications as indexed by the Thomson Web of Science. Using a correlational design, a group-level analysis of 36 Ph.D.-granting departments of criminology and criminal justice revealed that ratings from a U.S. News & World Report (USN&WR) survey correlated significantly better with aggregate program first author publications than with aggregate program total publications, although citations to first author publications and the first author h-index failed to correlate significantly better with the USN&WR than citing articles to total publications and the total publication h-index, respectively. An individual-level correlational analysis of all 228 full professors from 44 programs offering a Ph.D. in criminology/criminal justice showed that time until promotion to full professor displayed a significantly stronger inverse correlation with the number of first author publications, the number of citations to first author publications, and the first author h-index than with the total number of publications, the number of citing articles to total publications, and the total publication h-index, respectively. Hence, at both the group and individual levels first author publications and at the individual level citations to first author publications and the first author h-index provided a better estimate of scholarly productivity than their respective total publication counterparts.
This article aims to provide a systematic and comprehensive comparison of the coverage of the three major bibliometric databases: Google Scholar, Scopus and the Web of Science. Based on a sample of 146 senior academics in five broad disciplinary areas, we therefore provide both a longitudinal and a cross-disciplinary comparison of the three databases. Our longitudinal comparison of eight data points between 2013 and 2015 shows a consistent and reasonably stable quarterly growth for both publications and citations across the three databases. This suggests that all three databases provide sufficient stability of coverage to be used for more detailed cross-disciplinary comparisons. Our cross-disciplinary comparison of the three databases includes four key research metrics (publications, citations, h-index, and hI, annual, an annualised individual h-index) and five major disciplines (Humanities, Social Sciences, Engineering, Sciences and Life Sciences). We show that both the data source and the specific metrics used change the conclusions that can be drawn from cross-disciplinary comparisons.
Along with the advance of internet and fast updating of information, nowadays it is much easier to search and acquire scientific publications. To identify the high quality articles from the paper ocean, many ranking algorithms have been proposed. One of these methods is the famous PageRank algorithm which was originally designed to rank web pages in online systems. In this paper, we introduce a preferential mechanism to the PageRank algorithm when aggregating resource from different nodes to enhance the effect of similar nodes. The validation of the new method is performed on the data of American Physical Society journals. The results indicate that the similarity-preferential mechanism improves the performance of the PageRank algorithm in terms of ranking effectiveness, as well as robustness against malicious manipulations. Though our method is only applied to citation networks in this paper, it can be naturally used in many other real systems, such as designing search engines in the World Wide Web and revealing the leaderships in social networks.
u A work strand planned by the Science Europe Working Group on Research Policy and Programme Evaluation aims at mapping the state of affairs in data collection and their use at European funding and research organisations. In particular, the project identifies and proposes solutions for issues experienced by the Member Organisations (MO) regarding collection, standardisation and treatment of data related to the analysis and ex-post evaluation of activities funded or performed by MOs. This is implemented through a survey sent to the MOs. The survey was analysed with special attention to the particular needs of funding and performing organisations. On the basis of the results and the discussion among the work strand members and within the WG, we draw a preliminary set of conclusions to produce guidance on relevant topics, including researcher and funding identification, potential, properties and limitations of data and indicators that are used in the context of measurement of research output and its assessment, classification systems used in sciences systems including their various types and issues of availability, confidentiality and harmonisation of data and indicators. Feedback from such discussions will be used to identify areas for further action by Science Europe.
The paper provides an introduction to the recently completed project to derive a Research Core Dataset (RCD) for the German science system. In addition to explaining the rationale and background of the standardization project, it describes the workflow of the RCD project by focusing on the challenges, approaches and processes as well as the guiding principles. In this context, the paper also explains the peculiarities of the German science system and compares the project to other international standardization endeavours. The paper concludes with a short outlook on the potential chances and risks of the project.
Recent developments in Scandinavia may be of interest in relation to developing an integrated European research information structure that could provide the basis for an improved knowledge base for research policy. This article describes how reliable bibliographic data in institutional or national research information systems have been developed in Denmark, Finland, Norway and Sweden with performance-based institutional funding models as a driver. It also discusses in more general terms the limitations and potentials of using data from research information systems in bibliometric analysis and in social studies of science.
This paper proposes an Ontology-Based Data Management (OBDM) approach to coordinate, integrate and maintain the data needed for Science, Technology and Innovation (STI) policy development. The OBDM approach is a form of integration of information in which the global schema of data is substituted by the conceptual model of the domain, formally specified through an ontology. Implemented in Sapientia, the ontology of multi-dimensional research assessment, it offers a transparent platform as the base for the assessment process; it enables one to define and specify in an unambiguous way the indicators on which the evaluation is based, and to track their evolution over time; also it allows to the analysis of the effects of the actual use of the indicators on the behavior of scholars, and spot opportunistic behaviors; and it provides a monitoring system to track over time the changes in the established evaluation criteria and their consequences for the research system. It is argued that easier access to and a more transparent view of scientific-scholarly outcomes help to improve the understanding of basic science and the communication of research outcomes to the wider public. An OBDM approach could successfully contribute to solve some of the key issues in the integration of heterogeneous data for STI policies.
Disclosure of personal information on social networks has been extensively researched in recent years from different perspectives, including the influence of demographic, personality, and social parameters on the extent and type of disclosure. However, although some of the most widespread uses of these networks nowadays are for professional, academic, and business purposes, a thorough investigation of professional information disclosure is still needed. This study's primary aim, therefore, is to conduct a systematic and comprehensive investigation into patterns of professional information disclosure and various factors involved on different types of social networks. To this end, a user survey was conducted. We focused specifically on Facebook and LinkedIn, the 2 diverse networks most widely used in Israel. Significant differences were found between these networks. For example, we found that on Facebook professional pride is a factor in professional information disclosure, whereas on LinkedIn, work seniority and income have a significant effect. Thus, our findings shed light on the attitudes and professional behavior of network members, leading to recommendations regarding advertising strategies and network-appropriate self-presentation, as well as approaches that companies might adopt according to the type of manpower required.
In this article, we describe a conceptual model for video games and interactive media. Existing conceptual models such as the Functional Requirements for Bibliographic Records (FRBR) are not adequate to represent the unique descriptive attributes, levels of variance, and relationships among video games. Previous video game-specific models tend to focus on the development of video games and their technical aspects. Our model instead attempts to reflect how users such as game players, collectors, and scholars understand video games and the relationships among them. We specifically consider use cases of gamers, with future intentions of using this conceptual model as a foundation for developing a union catalog for various libraries and museums. In the process of developing the model, we encountered many challenges, including conceptual overlap with and divergence from FRBR, entity scoping, complex relationships among entities, and the question of how to model additional content for game expansion. Future work will focus on making this model interoperable with existing ontologies as well as further understanding and description of content and relationships.
University libraries provide access to thousands of journals and spend millions of dollars annually on electronic resources. With several commercial entities providing these electronic resources, the result can be silo systems and processes to evaluate cost and usage of these resources, making it difficult to provide meaningful analytics. In this research, we examine a subset of journals from a large research library using a web analytics approach with the goal of developing a framework for the analysis of library subscriptions. This foundational approach is implemented by comparing the impact to the cost, titles, and usage for the subset of journals and by assessing the funding area. Overall, the results highlight the benefit of a web analytics evaluation framework for university libraries and the impact of classifying titles based on the funding area. Furthermore, they show the statistical difference in both use and cost among the various funding areas when ranked by cost, eliminating the outliers of heavily used and highly expensive journals. Future work includes refining this model for a larger scale analysis tying metrics to library organizational objectives and for the creation of an online application to automate this analysis.
Over the past few years, several major scientific fraud cases have shocked the scientific community. The number of retractions each year has also increased tremendously, especially in the biomedical field, and scientific misconduct accounts for more than half of those retractions. It is assumed that co-authors of retracted papers are affected by their colleagues' misconduct, and the aim of this study is to provide empirical evidence of the effect of retractions in biomedical research on co-authors' research careers. Using data from the Web of Science, we measured the productivity, impact, and collaboration of 1,123 co-authors of 293 retracted articles for a period of 5 years before and after the retraction. We found clear evidence that collaborators do suffer consequences of their colleagues' misconduct and that a retraction for fraud has higher consequences than a retraction for error. Our results also suggest that the extent of these consequences is closely linked with the ranking of co-authors on the retracted paper, being felt most strongly by first authors, followed by the last authors, with the impact is less important for middle authors.
Beginning with a short review of Public Library of Science (PLOS) journals, we focus on PLOS ONE and more specifically the contributions of Chinese authors to this journal. It is shown that their contribution is growing exponentially. In 2013 almost one fifth of all publications in this journal had at least one Chinese author. The average number of citations per publication is approximately the same for articles with a Chinese author and for articles without any Chinese coauthor. Using the odds-ratio, we could not find arguments that Chinese authors in PLOS ONE excessively cite other Chinese contributions.
The readability levels of books identify suitable reading materials. Unfortunately, the majority of published books are assigned a readability level range, which is not useful to readers who look for books at a particular grade level. Existing readability formulas/analysis tools require at least an excerpt of a book to estimate its readability level, which is a severe constraint, since copyright laws prohibit book contents from being made publicly accessible. To alleviate the constraint, we have developed TRoLL which relies on publicly accessible online book metadata, in addition to using a book's snippet, if it is available, to predict its readability level. Based on a multi-dimensional regression analysis, TRoLL determines the grade level of any book instantly, even without a sample of its text, and considers its topical suitability, which is unique. Furthermore, TRoLL is a significant contribution to the educational community, since its computed book readability levels can enrich K-12 readers' book selections and aid parents, teachers, and librarians in locating reading materials suitable for their K-12 readers, which can be a time-consuming and frustrating task that does not always yield a quality outcome. Conducted empirical studies have verified the prediction accuracy of TRoLL and demonstrated its superiority over well-known readability formulas/analysis tools.
Although citation counts are often used to evaluate the research impact of academic publications, they are problematic for books that aim for educational or cultural impact. To fill this gap, this article assesses whether a number of simple metrics derived from Amazon.com reviews of academic books could provide evidence of their impact. Based on a set of 2,739 academic monographs from 2008 and a set of 1,305 best-selling books in 15 Amazon.com academic subject categories, the existence of significant but low or moderate correlations between citations and numbers of reviews, combined with other evidence, suggests that online book reviews tend to reflect the wider popularity of a book rather than its academic impact, although there are substantial disciplinary differences. Metrics based on online reviews are therefore recommended for the evaluation of books that aim at a wide audience inside or outside academia when it is important to capture the broader impacts of educational or cultural activities and when they cannot be manipulated in advance of the evaluation.
Numerous past studies have demonstrated the effectiveness of the relevance model (RM) for information retrieval (IR). This approach enables relevance or pseudo-relevance feedback to be incorporated within the language modeling framework of IR. In the traditional RM, the feedback information is used to improve the estimate of the query language model. In this article, we introduce an extension of RM in the setting of relevance feedback. Our method provides an additional way to incorporate feedback via the improvement of the document language models. Specifically, we make use of the context information of known relevant and nonrelevant documents to obtain weighted counts of query terms for estimating the document language models. The context information is based on the words (unigrams or bigrams) appearing within a text window centered on query terms. Experiments on several Text REtrieval Conference (TREC) collections show that our context-dependent relevance model can improve retrieval performance over the baseline RM. Together with previous studies within the BM25 framework, our current study demonstrates that the effectiveness of our method for using context information in IR is quite general and not limited to any specific retrieval model.
The presented ontology-based model for indexing and retrieval combines the methods and experiences of traditional indexing languages with their cognitively interpreted entities and relationships with the strengths and possibilities of formal knowledge representation. The core component of the model uses inferences along the paths of typed relations between the entities of a knowledge representation for enabling the determination of result sets in the context of retrieval processes. A proposal for a general, but condensed, inventory of typed relations is given. The entities are arranged in aspect-oriented facets to ensure a consistent hierarchical structure. The possible consequences for indexing and retrieval are discussed.
Intellectual Structure (IS) is a bibliometric method that is widely applied in knowledge domain analysis and in science mapping. An intellectual structure consists of clusters of related documents ascribed to individual factors. Documents ascribed to a factor are generally associated with a common research theme. As such, the contents of documents ascribed to a factor are theorized to be similar to each other. This study shows that the link-based relatedness implies content-based similarity. The intellectual structures of two research domains were derived from data sets retrieved from the Microsoft Academic Search database. The collection of documents ascribed to a factor is referred to as a factor-based document cluster, which the content-based document clusters are compared with. All documents in an intellectual structure are re-clustered based on their content similarity, which is derived from the cosine of their vector form encoded in documents' term frequency inverse document frequency (TF-IDF) weighted terms. The factor-based document clusters are then compared with the content-based clusters for congruity. We used the Rand index and kappa coefficient to check the congruity between the factor-based and content-based document clusters. The kappa coefficient indicates that there is fair to moderate agreement between the clusters derived from these two different bases.
We present a novel measure for ranking evaluation, called Twist (). It is a measure for informational intents, which handles both binary and graded relevance. stems from the observation that searching is currently a that searching is currently taken for granted and it is natural for users to assume that search engines are available and work well. As a consequence, users may assume the utility they have in finding relevant documents, which is the focus of traditional measures, as granted. On the contrary, they may feel uneasy when the system returns nonrelevant documents because they are then forced to do additional work to get the desired information, and this causes avoidable effort. The latter is the focus of , which evaluates the effectiveness of a system from the point of view of the effort required to the users to retrieve the desired information. We provide a formal definition of , a demonstration of its properties, and introduce the notion of effort/gain plots, which complement traditional utility-based measures. By means of an extensive experimental evaluation, is shown to grasp different aspects of system performances, to not require extensive and costly assessments, and to be a robust tool for detecting differences between systems.
Several prominent scholars suggest that investigations of human information behavior or information needs, seeking, and uses rarely measure how received information is applied or its effects on the recipient, that is, its outcomes. This article explores this assertion via systematic analysis of studies published in journals between 1950 and 2012. Five time periods and four journals were sampled, including 1,391 journal articles, 915 of which were empirical studies. Based on these samples, the percentage of studies of information outcomes climbed from zero in the 1950s and 1960s, to 8% in recent research reports. The barriers to studying information outcomes and possible future research on this topic are explored.
In this article I investigate the shortcomings of exact string match-based author self-citation detection methods. The contributions of this study are twofold. First, I apply a fuzzy string matching algorithm for self-citation detection and benchmark this approach and other common methods of exclusively author name-based self-citation detection against a manually curated ground truth sample. Near full recall can be achieved with the proposed method while incurring only negligible precision loss. Second, I report some important observations from the results about the extent of latent self-citations and their characteristics and give an example of the effect of improved self-citation detection on the document level self-citation rate of real data.
In-text frequency-weighted citation counting has been seen as a particularly promising solution to the well-known problem of citation analysis that it treats all citations equally, be they crucial to the citing paper or perfunctory. But what is a good weighting scheme? We compare 12 different in-text citation frequency-weighting schemes in the field of library and information science (LIS) and explore author citation impact patterns based on their performance in these schemes. Our results show that the ranks of authors vary widely with different weighting schemes that favor or are biased against common citation impact patternssubstantiated, applied, or noted. These variations separate LIS authors quite clearly into groups with these impact patterns. With consensus rank limits, the hard upper and lower bounds for reasonable author ranks that they provide suggest that author citation ranks may be subject to something like an uncertainty principle.
The ACL Anthology is a large collection of research papers in computational linguistics. Citation data were obtained using text extraction from a collection of PDF files with significant manual postprocessing performed to clean up the results. Manual annotation of the references was then performed to complete the citation network. We analyzed the networks of paper citations, author citations, and author collaborations in an attempt to identify the most central papers and authors. The analysis includes general network statistics, PageRank, metrics across publication years and venues, the impact factor and h-index, as well as other measures.
Normalization of citation scores using reference sets based on Web of Science subject categories (WCs) has become an established (best) practice in evaluative bibliometrics. For example, the Times Higher Education World University Rankings are, among other things, based on this operationalization. However, WCs were developed decades ago for the purpose of information retrieval and evolved incrementally with the database; the classification is machine-based and partially manually corrected. Using the WC information science & library science and the WCs attributed to journals in the field of science and technology studies, we show that WCs do not provide sufficient analytical clarity to carry bibliometric normalization in evaluation practices because of indexer effects. Can the compliance with best practices be replaced with an ambition to develop best possible practices? New research questions can then be envisaged.
We investigate the relationships between the citation impacts of scientific papers and the sources of funding that are acknowledged as having supported those publications. We examine several relationships potentially associated with funding, including first citation, total citations, and the chances of becoming highly cited. Furthermore, we explore the links between citations and types of funding by organization and also with combined measures of funding. In particular, we examine the relationship between funding intensity and funding variety and citation. Our empirical work focuses on six small advanced European economies, applying a zero inflated negative binomial model to a set of more than 240,000 papers authored by researchers from these countries. We find that funding is not related to the first citation but is significantly related to the number of citations and top percentile citation impact. Additionally, we find that citation impact is positively related to funding variety and negatively related with funding intensity. Finally there is an inverse relationship between the relative frequency of funding and citation impact. The results presented in the paper provide insights for the design of research programs and the structure of research funding and for the behavior and strategies of research and sponsoring organizations.
Scientific progress is driven by important, infrequent discoveries that cannot be readily identified and quantified, which makes research assessment very difficult. Bibliometric indicators of important discoveries have been formulated using an empirical approach, based on the mathematical properties of the high-citation tail of the citation distribution. To investigate the theoretical basis of such formulations this study compares the US/European research performance ratios expressed in terms of highly cited papers and Nobel Prize-winning discoveries. The research performance ratio in terms of papers was studied from the citation distributions in the fields of chemistry, physics, and biochemistry and molecular biology. It varied as a function of the citation level. Selecting an appropriate high citation level, the ratios in terms of highly cited papers were compared with the corresponding ratios for Nobel Prize-winning discoveries in Chemistry, Physics, and Physiology or Medicine. Research performance ratios expressed in terms of highly cited papers and Nobel Prize-winning discoveries are reasonably similar, and suggest that the research success of the United States is almost 3 times that of Europe. A similar conclusion was obtained using articles published in Nature and Science.
Using Web of Science data, portfolio analysis in terms of journal coverage can be projected onto a base map for units of analysis such as countries, cities, universities, and firms. The units of analysis under study can be compared statistically across the 10,000+ journals. The interdisciplinarity of the portfolios is measured using Rao-Stirling diversity or Zhang etal.'s improved measure D-2(3). At the country level we find regional differentiation (e.g., Latin American or Asian countries), but also a major divide between advanced and less-developed countries. Israel and Israeli cities outperform other nations and cities in terms of diversity. Universities appear to be specifically related to firms when a number of these units are exploratively compared. The instrument is relatively simple and straightforward, and one can generalize the application to any document set retrieved from the Web of Science (WoS). Further instruction is provided online at .
Each year, researchers publish an immense number of scientific papers. While some receive many citations, others receive none. Here we investigate whether any of this variance can be explained by the choice of words in a paper's abstract. We find that doubling the word frequency of an average abstract increases citations by 0.70%. We also find that journals which publish papers whose abstracts are shorter and contain more frequently used words receive slightly more citations per paper. Specifically, adding a 5 letter word to an abstract decreases the number of citations by 0.02%. These results are consistent with the hypothesis that the style in which a paper's abstract is written bears some relation to its scientific impact. (C) 2015 The Authors. Published by Elsevier Ltd.
We propose a new index to quantify SSRN downloads. Unlike the SSRN downloads rank, which is based on the total number of an author's SSRN downloads, our index also reflects the author's productivity by taking into account the download numbers for the papers. Our index is inspired by - but is not the same as - Hirsch's h-index for citations, which cannot be directly applied to SSRN downloads. We analyze data for about 30,000 authors and 367,000 papers. We find a simple empirical formula for the SSRN author rank via a Gaussian function of the log of the number of downloads. (C) 2015 Elsevier Ltd. All rights reserved.
We propose a method to measure the effectiveness of the recruitment and turnover of professors, in terms of their research performance. The method presented is applied to the case of Italian universities over the period 2008-2012. The work then analyses the correlation between the indicators of effectiveness used, and between the indicators and the universities' overall research performance. In countries that conduct regular national assessment exercises, the evaluation of effectiveness in recruitment and turnover could complement the overall research assessments. In particular, monitoring such parameters could assist in deterring favoritism, in countries exposed to such practices. (C) 2015 Elsevier Ltd. All rights reserved.
The mean normalized citation score or crown indicator is a much studied bibliometric indicator that normalizes citation counts across fields. We examine the theoretical basis of the normalization method and, in particular, the determination of the expected number of citations. We observe a theoretical bias that raises the expected number of citations for low citation fields and lowers the expected number of citations for high citation fields when interdisciplinary publications are included. (C) 2015 Elsevier Ltd. All rights reserved.
The importance of collaboration in research is widely accepted, as is the fact that articles with more authors tend to be more cited. Nevertheless, although previous studies have investigated whether the apparent advantage of collaboration varies by country, discipline, and number of co-authors, this study introduces a more fine-grained method to identify differences: the geometric Mean Normalized Citation Score (gMNCS). Based on comparisons between disciplines, years and countries for two million journal articles, the average citation impact of articles increases with the number of authors, even when international collaboration is excluded. This apparent advantage of collaboration varies substantially by discipline and country and changes a little over time. Against the trend, however, in Russia solo articles have more impact. Across the four broad disciplines examined, collaboration had by far the strongest association with impact in the arts and humanities. Although international comparisons are limited by the availability of systematic data for author country affiliations, the new indicator is the most precise yet and can give statistical evidence rather than estimates. (C) 2015 Elsevier Ltd. All rights reserved.
A different number of citations can be expected for publications appearing in different subject categories and publication years. For this reason, the citation-based normalized indicator Mean Normalized Citation Score (MNCS) is used in bibliometrics. Mendeley is one of the most important sources of altmetrics data. Mendeley reader counts reflect the impact of publications in terms of readership. Since a significant influence of publication year and discipline has also been observed in the case of Mendeley reader counts, reader impact should not be estimated without normalization. In this study, all articles and reviews of the Web of Science core collection with a publication year of 2012 (and a DOI) are used to normalize their Mendeley reader counts. A new indicator that determines the normalized reader impact is obtained-the Mean Normalized Reader Score (MNRS) - and compared with the MNCS. The MNRS enables us to compare the impact a paper has had on Mendeley across subject categories and publication years. Comparisons on the journal and university level show that the MNRS and MNCS correlate larger for 9601 journals than for 76 German universities. (C) 2015 Elsevier Ltd. All rights reserved.
Several citation-based indicators, including patent h-index, have been introduced to evaluate the patenting activities of research organizations. However, variants developed to complement h-index have not been utilized yet in the domain of intellectual property management. The main purpose of this study is to propose new indices that can be used to evaluate the patenting activities of research and development (R&D) organizations, based on h-type complementary variants along with traditional indicators. Exploratory factor analysis (EFA) is used to identify those indices. By applying the proposed framework to pharmaceutical R&D organizations, which have their patents registered in the United States Patent Trademark Office (USPTO), the following three indices are obtained: the forward citation, impact per unit time, and patent family factors. The ranking obtained from the new indices can represent the productive capacity of the qualified patent, patent commercialization speed, and patent commercialization effort of research organizations. The new proposed indices in this study are expected to contribute to the evaluation of the patenting activities of R&D organizations from various perspectives. (C) 2015 Elsevier Ltd. All rights reserved.
In this paper, we try to answer two questions about any given scientific discipline: first, how important is each subfield and second, how does a specific subfield influence other subfields? We modify the well-known open-system Leontief Input-Output Analysis in economics into a closed-system analysis focusing on eigenvalues and eigenvectors and the effects of removing one subfield. We apply this method to the subfields of physics. This analysis has yielded some promising results for identifying important subfields (for example the field of statistical physics has large influence while it is not among the largest subfields) and describing their influences on each other (for example the subfield of mechanical control of atoms is not among the largest subfields cited by quantum mechanics, but our analysis suggests that these fields are strongly connected). This method is potentially applicable to more general systems that have input-output relations among their elements. (C) 2015 The Authors. Published by Elsevier Ltd.
This study aims to shed light on the implementation of the digital object identifier (DOI) in the two most important multidisciplinary databases, namely Web of Science Core Collection and Scopus, within the last decade (2005-2014). The results show a generally increased percentage of items with DOI in all the disciplines in both databases, which provide very similar numbers and trends. While the percentage of citable items with a DOI has already reached 90% in the Sciences and the Social Sciences in 2014, it has remained much lower in the Arts & Humanities, exceeding 50% only since 2013. The observed values for Books and Proceedings are even lower despite the importance of these document types, particularly for the Social Sciences and the Arts & Humanities. The fact that there are still journals with a large number of items still lacking DOIs in 2014 should be alarming for the corresponding editors and should give them reason to enhance the formal quality and visibility of their journals. Finally, scientists are also encouraged to review their publication strategies and to favour publication channels with established DOI assignments. (C) 2015 Elsevier Ltd. All rights reserved.
When comparing the citation impact of nations, departments or other groups of researchers within individual fields, three approaches have been proposed: arithmetic means, geometric means, and percentage in the top X%. This article compares the precision of these statistics using 97 trillion experimentally simulated citation counts from 6875 sets of different parameters (although all having the same scale parameter) based upon the discretised lognormal distribution with limits from 1000 repetitions for each parameter set. The results show that the geometric mean is the most precise, closely followed by the percentage of a country's articles in the top 50% most cited articles for a field, year and document type. Thus the geometric mean citation count is recommended for future citation-based comparisons between nations. The percentage of a country's articles in the top 1% most cited is a particularly imprecise indicator and is not recommended for international comparisons based on individual fields. Moreover, whereas standard confidence interval formulae for the geometric mean appear to be accurate, confidence interval formulae are less accurate and consistent for percentile indicators. These recommendations assume that the scale parameters of the samples are the same but the choice of indicator is complex and partly conceptual if they are not. (C) 2015 Elsevier Ltd. All rights reserved.
An innovative study of Wikipedia biographical pages is presented. It is shown that the dates of some historical cataclysms may be reproduced from peculiarities of lifespan changes over time. Time dependence of number of biographical pages related to a year has a broken linear trend in logarithmic scale. It shows a sudden change of the slope from 0.0006 to 0.008 per year near 1700 AC. Presumably, this reflects the emergence of new ways of information dissemination associated with printing of books and newspapers. Cultural or historical significance of a person is measured using a number of proper Wikipedia references. We divided human activity into nine categories using keyword search. They cover over 97% of the extracted data. Time dependencies of shares of each category reveal evolution of priorities or interests of mankind. Finally, categories were merged in just two classes. We call them Personal and Public, introducing a new index of human priorities as a ratio of Personal to Public. Being quite constant during almost the entire history, the index shows a sharp jump at the end of the 20th century mainly due to growth of Sport and Art groups over all others. We consider this as a kind of revolution. (C) 2015 Elsevier Ltd. All rights reserved.
We introduce the author keyword coupling analysis (AKCA) method to visualize the field of information science (2006-2015). We then compare the AKCA method with the author bibliographic coupling analysis (ABCA) method in terms of first- and all-author citation counts. We obtain the following findings: (1) The AKCA method is a new and feasible method for visualizing a discipline's structure, and the ABCA and AKCA methods have their respective strengths and emphases. The relation within the ABCA method is based on the same references (knowledge base), whereas that within the AKCA method is based on the same keywords (lexical linguistic). The AKCA method appears to provide a less detailed picture, and more uneven sub-areas of a discipline structure. The relationships between authors are narrow and direct and feature multiple levels in AKCA. (2) All-author coupling provides a comprehensive picture; thus, a complete view of a discipline structure may require both first- and all-author coupling analyses. (3) Information science evolved continuously during the second decade of the World Wide Web. The KDA (knowledge domain analysis) camp became remarkably prominent, while the IR camp (information retrieval) experienced a further decline in hard IR research, and became significantly smaller; Patent analysis and Open Access emerged during this period. Mapping of Science and Bibliometric evaluation also experienced substantial growth. (C) 2015 Elsevier Ltd. All rights reserved.
The search for correlates of scientific production is an important step toward the formulation of decision-making guidelines on academic and funding policy under a competitive system with continuously reduced budgets. Our goal here is to identify drivers of the scientific production of researchers working at the "Universidade Federal de Goias" (UFG), a medium-to-large public Brazilian University, focusing on the effects of teaching load and supervision of graduate and undergraduate students on scientific production of faculty members. We analyzed data for 1487 faculty members of UFG, including the total number of papers published between 2011-2013, a weighted-index of scientific production and the number of publications in high-ranked journals (according to a Brazilian system of journal ranking in different areas). These variables were regressed on gender, teaching load at undergraduate and graduate levels, number of supervised undergraduate, Master and Doctoral students, self-declared amount of time dedicated to research and outreach, year of doctoral graduation, year of hiring and the scientific production before doctoral graduation. Several regression models were used to model scientific production, including ordinary least-square regression and Hurdle negative binomial models. Although there are some differences among research areas, the most important explanatory variable was the publication record of the researcher before doctoral graduation, reinforcing the role of a solid academic formation in terms of research experience. Undergraduate and graduate teaching loads were negatively and positively correlated with scientific production, respectively. However, the strength of the relationship was much higher for the second than for the first relationship. These correlates of scientific production provide guidelines for policy and management in universities, including criteria for balancing research and teaching loads, awarding fellowships and research grants, designing new policy for future hiring and creation of new graduate programs. (C) 2015 Elsevier Ltd. All rights reserved.
The academic and research policy communities have seen a long debate concerning the merits of peer review and quantitative citation-based metrics in evaluation of research. Some have called for replacing peer review with use of metrics for some evaluation purposes, while others have called for the use peer review informed by metrics. Whatever one's position, a key question is the extent to which peer review and quantitative metrics agree. In this paper we study the relation between the three journal metrics source normalized impact per paper (SNIP), raw impact per paper (RIP) and Journal Impact Factor (JIF) and human expert judgement. Using the journal rating system produced by the Excellence in Research for Australia (ERA) exercise, we examine the relationship over a set of more than 10,000 journals categorized into 27 subject areas. We analyze the relationship from the dimensions of correlation, distribution of the metrics over the rating tiers, and ROC analysis. Our results show that SNIP consistently has stronger agreement with the ERA rating, followed by RIP and then JIF along every dimension measured. The fact that SNIP has a stronger agreement than RIP demonstrates clearly that the increase in agreement is due to SNIP's database citation potential normalization factor. Our results suggest that SNIP may be a better choice than RIP or JIF in evaluation of journal quality in situations where agreement with expert judgment is an important consideration. (C) 2015 Elsevier Ltd. All rights reserved.
Recent studies have shown that the Scopus bibliometric database is probably less accurate than one thinks. As a further evidence of this fact, this paper presents a structured collection of several weird typologies of database errors, which can therefore be classified as horrors. Some of them concern the incorrect indexing of so-called Online-First paper, duplicate publications, and the missing/incorrect indexing of references. A crucial point is that most of these errors could probably be avoided by adopting some basic data checking systems. Although this paper does not provide a quantitative and systematic analysis (which will be provided in a future publication), it can be easily understood that these errors can have serious consequences such as: (i) making it difficult or even impossible to retrieve some documents, and (ii) distorting bibliometric indicators/metrics relating to journals, individual scientists or research institutions. Our attention is focused on the Scopus database, although preliminary data show that the Web of Science database is far from being free from these errors. The tone of the paper is deliberately provocative, in order to emphasize the seriousness of these errors. (C) 2015 Elsevier Ltd. All rights reserved.
This paper analyzes from an axiomatic point of view a recent proposal for counting citations: the value of a citation given by a paper is inversely proportional to the total number of papers it cites. This way of fractionally counting citations was suggested as a possible way to normalize citation counts between fields of research having different citation cultures. It belongs to the "citing-side" approach to normalization. We focus on the properties characterizing this way of counting citations when it comes to ranking authors. Our analysis is conducted within a formal framework that is more complex but also more realistic than the one usually adopted in most axiomatic analyses of this kind. (C) 2015 Elsevier Ltd. All rights reserved.
We discuss, at the macro-level of nations, the contribution of research funding and rate of international collaboration to research performance, with important implications for the "science of science policy". In particular, we cross-correlate suitable measures of these quantities with a scientometric-based assessment of scientific success, studying both the average performance of nations and their temporal dynamics in the space defined by these variables during the last decade. We find significant differences among nations in terms of efficiency in turning (financial) input into bibliometrically measurable output, and we confirm that growth of international collaboration positively correlate with scientific success with significant benefits brought by EU integration policies. Various geo-cultural clusters of nations naturally emerge from our analysis. We critically discuss the factors that potentially determine the observed patterns. (C) 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Publication keywords have been widely utilized to reveal the knowledge structure of research domains. An important but under-addressed problem is the decision of which keywords should be retained as analysis objects after a great number of keywords are gathered from domain publications. In this paper, we discuss the problems with the traditional term frequency (TF) method and introduce two alternative methods: TF-inverse document frequency (TF-IDF) and TF-Keyword Activity Index (TF-KAI). These two methods take into account keyword discrimination by considering their frequency both in and out of the domain. To test their performance, the keywords they select in China's Digital Library domain are evaluated both qualitatively and quantitatively. The evaluation results show that the TF-KAI method performs the best: it can retain keywords that match expert selection much better and reveal the research specialization of the domain with more details. (C) 2016 Elsevier Ltd. All rights reserved.
Rankings in higher education are largely used to summarize a huge amount of information into easily understandable numbers. They are also used by governments in order to allocate funding. Nevertheless, they are often criticized. One stream of criticism refers to the fact that rankings build up an ordinal order by considering only the mean of the distribution of indicators and not their variability. Using the micro-data from the Italian evaluation of the quality of research (VQR, Valutazione della Qualita della Ricerca), we examine whether difference in performance between departments with different position in the ranking are distinguishable from random effects. We obtain a robust clustering of departments in a limited number of groups. The number of groups is in the range 3-7, while in most cases it is 4-6. The implications of these findings for evaluation and research policy are explored. (C) 2016 Elsevier Ltd. All rights reserved.
Several investigations to and approaches for categorizing academic journals/institutions/countries into different grades have been published in the past. To the best of our knowledge, most existing grading methods use either a weighted sum of quantitative indicators (including the case of one properly defined quantitative indicator) or quantified peer review results. Performance measurement is an important issue of concern for science and technology (S&T) management. In this paper we address this issue, leading to multi-level frontiers resulting from data envelopment analysis (DEA) models to grade selected countries/territories. We use research funding and researchers as input indicators, and take papers, citations and patents as output indicators. Our research results show that using DEA frontiers we can unite countries/territories by different grades. These grades reflect the corresponding countries' levels of performance with respect to multiple inputs and outputs. Furthermore, we use papers, citations and patents as single output (with research funding and researchers as inputs), respectively, to show country/territory grade changes. In order to increase the insight in this approach, we also incorporate a simple value judgment (that the number of citations is more important than the number of papers) as prior information into the DEA models to study the resulting changes of these Countries/Territories' performance grades. (C) 2016 Elsevier Ltd. All rights reserved.
This study uses cluster analysis as a tool for mapping diversity of publication patterns in the social sciences and humanities (SSH). By algorithmic clustering of 1828 senior authors affiliated with 16 disciplines at five universities in Flanders, Belgium, based on the similarity of their publication patterns during 2000-2011, we distinguish two broad publication styles, both of which are present within each discipline. We conclude that diversity in SSH publication patterns cuts across disciplinary boundaries. Cluster analysis shows promise for application in research evaluation for the SSH. (C) 2016 Elsevier Ltd. All rights reserved.
Increasing interest in developing treatments for pancreatic cancer has led to a surge in publications in the field. Analyses of drug-research trends are needed to minimize risk in anti-cancer drug development. Here, we analyzed publications on anti-cancer drugs extracted from PubMed records and ClinicalTrials datasets. We conducted a drug cluster analysis by proposing the entity Dirichlet Multinomial Regression (eDMR) technique and in-depth network analysis of drug cluster and target proteins. The results show two distinct research clusters in both the ClinicalTrials dataset and the PubMed records. Specifically, various targets associated with anti-cancer drugs are investigated in new drug testing while the diverse chemicals are studied together with a standard therapeutic agent in the academic literature. In addition, our study confirms that drug research published in PubMed is preceded by clinical trials. Although we only evaluate drugs for pancreatic cancer in the present study, our method can be applied to drug-research trends of other diseases. (C) 2016 Elsevier Ltd. All rights reserved.
This paper investigates the growth over time of the size of higher education institutions (HEls), as measured by the number of academic staff, and its association with HEI and country attributes. We analyze a sample of 837 HEIs from 18 countries derived from the European Tertiary Education Register (ETER) and from the European Micro Data dataset (EUMIDA) for the years 2008 and 2012. Our analysis shows that (1) HEIs growth is largely proportional to their size, leading to a nearly log-normal distribution of size (Gibrat's law), even if small institutions tend to grow faster; (2) the growth of the number of students and HEI's reputation level positively influences HEI growth. Consequently (3) small HEls need a lower level of reputation and less growth of students to continue growing over time, while only highly reputed HEIs are able to maintain a large size over time. Our results are relevant to understand the extent to which cumulative effects lead to a lasting concentration of resources in the HE system and whether public policies are able to redistribute resources based on merit. (C) 2016 Elsevier Ltd. All rights reserved.
Modeling coauthorship networks helps to understand the emergence and propagation of thoughts in academic society. A random geometric graph is proposed to model coauthor ship networks, the connection mechanism of which expresses the effects of the academic influences and homophily of authors, and the collaborations between research teams. Our analysis reveals that the modeled networks have a range of features of empirical coauthor ship networks, namely, the degree distribution made up of a mixture Poisson distribution with a power-law tail, clear community structure, small-world, high clustering, and degree assortativity. Moreover, the underlying formulae of the tail and forepart of the degree distribution, and the tail of the scaling relation between local clustering coefficient and degree are derived for the modeled networks, and are also applicable to the empirical networks. (C) 2016 Elsevier Ltd. All rights reserved.
In this study we present an application which can be accessed via www.excellence-networks.net and which represents networks of scientific institutions worldwide. The application is based on papers (articles, reviews and conference papers) published between 2007 and 2011. It uses (network) data, on which the SCImago Institutions Ranking is based (Scopus data from Elsevier). Using this data, institutional networks have been estimated with statistical models (Bayesian multilevel logistic regression, BMLR) for a number of Scopus subject areas. Within single subject areas, we have investigated and visualized how successfully overall an institution (reference institution) has collaborated (compared to all the other institutions in a subject area), and with which other institutions (network institutions) a reference institution has collaborated particularly successfully. The "best paper rate" (statistically estimated) was used as an indicator for evaluating the collaboration success of an institution. This gives the proportion of highly cited papers from an institution, and is considered generally as an indicator for measuring impact in bibliometrics. (C) 2016 Elsevier Ltd. All rights reserved.
Open access (OA) mandates are policies that require researchers to provide free, unrestricted access to their published research by including it in OA journals (gold OA) or depositing it in freely available disciplinary or institutional repositories (green OA). This study measures the degree of compliance with a Spanish government OA mandate 2.5 years after its implementation. A total of 58.4% of articles resulting from publicly funded research had at least one OA copy available 1 year after publication. Among these, 23.8% were in gold OA, 21.8% in green OA and 12.8% in gray OA, i.e., posted on websites and social networks. Most of the green OA articles were in 2 disciplinary repositories: arXiv and PubMed Central. Just 14.4% of the articles resulting from publicly funded research were available in institutional repositories, although more than 90% of the articles in the data set were the result of projects carried out at institutions that have such an archive. There is great potential for growth in green OA, because over two thirds of the articles that were not available as OA were published in journals whose publishers allow a preprint or a postprint copy to be deposited.
Social media are becoming increasingly popular in scientific communication. A range of platforms, such as academic social networking sites (SNS), are geared specifically towards the academic community. Proponents of the altmetrics approach have pointed out that new media allow for new avenues of scientific impact assessment. Traditional impact measures based on bibliographic analysis have long been criticized for overlooking the relational dynamics of scientific impact. We therefore propose an application of social network analysis to researchers' interactions on an academic social networking site to generate potential new metrics of scientific impact. Based on a case study conducted among a sample of Swiss management scholars, we analyze how centrality measures derived from the participants' interactions on the academic SNS ResearchGate relate to traditional, offline impact indicators. We find that platform engagement, seniority, and publication impact contribute to members' indegree and eigenvector centrality on the platform, but less so to closeness or betweenness centrality. We conclude that a relational approach based on social network analyses of academic SNS, while subject to platform-specific dynamics, may add richness and differentiation to scientific impact assessment.
The objective of this research was to investigate the institutional and individual factors that influence scientists' data-sharing behaviors across different scientific disciplines. Two theoretical perspectives, institutional theory, and theory of planned behavior, were employed in developing a research model that showed the complementary nature of the institutional and individual factors influencing scientists' data-sharing behaviors. This research used a survey method to examine to what extent those institutional and individual factors influence scientists' data-sharing behaviors in a range of scientific disciplines. A national survey (with 1,317 scientists in 43 disciplines) showed that regulative pressure by journals, normative pressure at a discipline level, and perceived career benefit and scholarly altruism at an individual level had significant positive relationships with data-sharing behaviors, and that perceived effort had a significant negative relationship. Regulative pressure by funding agencies and the availability of data repositories at a discipline level and perceived career risk at an individual level were not found to have any significant relationships with data-sharing behaviors.
The music information retrieval (MIR) community has long understood the role of evaluation as a critical component for successful information retrieval systems. Over the past several years, it has also become evident that user-centered evaluation based on realistic tasks is essential for creating systems that are commercially marketable. Although user-oriented research has been increasing, the MIR field is still lacking in holistic, user-centered approaches to evaluating music services beyond measuring the performance of search or classification algorithms. In light of this need, we conducted a user study exploring how users evaluate their overall experience with existing popular commercial music services, asking about their interactions with the system as well as situational and personal characteristics. In this paper, we present a qualitative heuristic evaluation of commercial music services based on Jakob Nielsen's 10 usability heuristics for user interface design, and also discuss 8 additional criteria that may be used for the holistic evaluation of user experience in MIR systems. Finally, we recommend areas of future user research raised by trends and patterns that surfaced from this user study.
Online collaborative projects have been utilized in a variety of ways over the past decade, such as bringing people together to build open source software or developing the world's largest free encyclopedia. Personal communication networks as a feature do not exist in all collaborative projects. It is currently unclear if a designer's decision to include a personal communication network in a collaborative project's structure affects outcome quality. In this study, I investigated Wikipedia's personal communication network and analyzed which Wikipedia editors are utilizing it and how they are connected to outcome quality. Evidence suggests that people who utilize these networks are more experienced in editing high quality articles and are more integrated in the community. Additionally, these individuals utilize the personal communication network for coordinating and perhaps mentoring editors who edit lower quality articles. The value of these networks is demonstrated by the characteristics of the users who use them. These findings indicate that designers of online collaborative projects can help improve the quality of outcomes in these projects by deciding to implement a personal communication network in their communities.
The study sets out to explore the factors that influence the evaluation of information and the judgments made in the process of finding useful information in web search contexts. Based on a diary study of 2 assigned tasks to search on Google and Google Scholar, factor analysis identified the core constructs of content, relevance, scope, and style, as well as informational and system ease of use as influencing the judgment that useful information had been found. Differences were found in the participants' evaluation of information across the search tasks on Google and on Google Scholar when identified by the factors related to both content and ease of use. The findings from this study suggest how searchers might critically evaluate information, and the study identifies a relation between the user's involvement in the information interaction and the influences of the perceived system ease of use and information design.
This paper reports findings from a study designed to gain broader understanding of sensemaking activities using the Data/Frame Theory as the analytical framework. Although this theory is one of the dominant models of sensemaking, it has not been extensively tested with a range of sensemaking tasks. The tasks discussed here focused on making sense of structures rather than processes or narratives. Eleven researchers were asked to construct understanding of how a scientific community in a particular domain is organized (e.g., people, relationships, contributions, factors) by exploring the concept of influence in academia. This topic was chosen because, although researchers frequently handle this type of task, it is unlikely that they have explicitly sought this type of information. We conducted a think-aloud study and semistructured interviews with junior and senior researchers from the human-computer interaction (HCI) domain, asking them to identify current leaders and rising stars in both HCI and chemistry. Data were coded and analyzed using the Data/Frame Model to both test and extend the model. Three themes emerged from the analysis: novices and experts' sensemaking activity chains, constructing frames through indicators, and characteristics of structure tasks. We propose extensions to the Data/Frame Model to accommodate structure sensemaking.
This research project explores, through a series of online surveys and subsequent series of individual interviews, stakeholders' attitudes and practices regarding poetry published exclusively in web-based media. This article specifically examines the project's gathered data on creative writing faculty from North American institutions who were surveyed and interviewed about online poetry publishing as both creators and consumers of the literary works. This study also explores creative writing faculty members' opinions about publishing in online literary publications in regard to career impact, including tenure and promotion. As online literary publishing disrupts what continues to be a very print-oriented practice, Rogers' diffusion of innovations provides a useful framework for exploring these issues. Because this project considers how innovations diffuse throughout a specific group of artists and scholars, and the information needs that emerge from these transformations, the concept of communities of practice also informed data analysis.
Research articles disseminate the knowledge produced by the scientific community. Access to this literature is crucial for researchers and the general public. Apparently, bibliogifts are available online for free from text-sharing platforms. However, little is known about such platforms. What is the size of the underlying digital libraries? What are the topics covered? Where do these documents originally come from? This article reports on a study of the Library Genesis platform (LibGen). The 25 million documents (42 terabytes) it hosts and distributes for free are mostly research articles, textbooks, and books in English. The article collection stems from isolated, but massive, article uploads (71%) in line with a biblioleaks scenario, as well as from daily crowdsourcing (29%) by worldwide users of platforms such as Reddit Scholar and Sci-Hub. By relating the DOIs registered at CrossRef and those cached at LibGen, this study reveals that 36% of all DOI articles are available for free at LibGen. This figure is even higher (68%) for three major publishers: Elsevier, Springer, and Wiley. More research is needed to understand to what extent researchers and the general public have recourse to such text-sharing platforms and why.
Social tagging is one of the most popular methods for collecting crowd-sourced information in galleries, libraries, archives, and museums (GLAMs). However, when the number of social tags grows rapidly, using them becomes problematic and, as a result, they are often left as simply big data that cannot be used for practical purposes. To revitalize the use of this crowd-sourced information, we propose using social tags to link and cluster artworks based on an experimental study using an online collection at the Gyeonggi Museum of Modern Art (GMoMA). We view social tagging as a folksonomy, where artworks are classified by keywords of the crowd's various interpretations and one artwork can belong to several different categories simultaneously. To leverage this strength of social tags, we used a clustering method called link communities to detect overlapping communities in a network of artworks constructed by computing similarities between all artwork pairs. We used this framework to identify semantic relationships and clusters of similar artworks. By comparing the clustering results with curators' manual classification results, we demonstrated the potential of social tagging data for automatically clustering artworks in a way that reflects the dynamic perspectives of crowds.
The significant amount of time needed to prepare concise and cohesive texts is among the reasons why extensive texts proliferate in organizations. Extensive documents are more likely to have low cohesion among their various sections, which may lead the reader to perceive the information as being of low quality. This research addresses this issue by presenting a tool (Text Matrix) composed of procedures and algorithms coded in software, with the aim of analyzing cohesion between the text sections in extensive documents. A design science research approach was applied to develop, test, and prove the usefulness of the Text Matrix. A total of 127 academic advisors and advisees were trained to use the Text Matrix and apply it in the development of the extensive text of dissertations. An analysis both of those users' perceptions and of variations in network density among 41 extensive texts that they improved through the analyses performed by the artifact proposed in the present study demonstrated the ease of operation and effectiveness of the Text Matrix to identify parts of extensive texts associated with cohesion failures that could be connected by cohesion devices or even excluded from the extensive text.
The objective of this paper is to propose a hierarchical topic evolution model (HTEM) that can organize time-varying topics in a hierarchy and discover their evolutions with multiple timescales. In the proposed HTEM, topics near the root of the hierarchy are more abstract and also evolve in the longer timescales than those near the leaves. To achieve this goal, the distance-dependent Chinese restaurant process (ddCRP) is extended to a new nested process that is able to simultaneously model the dependencies among data and the relationship between clusters. The HTEM is proposed based on the new process for time-stamped documents, in which the timestamp is utilized to measure the dependencies among documents. Moreover, an efficient Gibbs sampler is developed for the proposed HTEM. Our experimental results on two popular real-world data sets verify that the proposed HTEM can capture coherent topics and discover their hierarchical evolutions. It also outperforms the baseline model in terms of likelihood on held-out data.
In this article, we propose a new approach for indexing biomedical documents based on a possibilistic network that carries out partial matching between documents and biomedical vocabulary. The main contribution of our approach is to deal with the imprecision and uncertainty of the indexing task using possibility theory. We enhance estimation of the similarity between a document and a given concept using the two measures of possibility and necessity. Possibility estimates the extent to which a document is not similar to the concept. The second measure can provide confirmation that the document is similar to the concept. Our contribution also reduces the limitation of partial matching. Although the latter allows extracting from the document other variants of terms than those in dictionaries, it also generates irrelevant information. Our objective is to filter the index using the knowledge provided by the Unified Medical Language System (R). Experiments were carried out on different corpora, showing encouraging results (the improvement rate is +26.37% in terms of main average precision when compared with the baseline).
Pseudo relevance feedback (PRF) has shown to be effective in ad hoc information retrieval. In traditional PRF methods, top-ranked documents are all assumed to be relevant and therefore treated equally in the feedback process. However, the performance gain brought by each document is different as showed in our preliminary experiments. Thus, it is more reasonable to predict the performance gain brought by each candidate feedback document in the process of PRF. We define the quality level (QL) and then use this information to adjust the weights of feedback terms in these documents. Unlike previous work, we do not make any explicit relevance assumption and we go beyond just selecting good documents for PRF. We propose a quality-based PRF framework, in which two quality-based assumptions are introduced. Particularly, two different strategies, relevance-based QL (RelPRF) and improvement-based QL (ImpPRF) are presented to estimate the QL of each feedback document. Based on this, we select a set of heterogeneous document-level features and apply a learning approach to evaluate the QL of each feedback document. Extensive experiments on standard TREC (Text REtrieval Conference) test collections show that our proposed model performs robustly and outperforms strong baselines significantly.
This article introduces a new source of evidence of the value of medical-related research: citations from clinical guidelines. These give evidence that research findings have been used to inform the day-to-day practice of medical staff. To identify whether citations from guidelines can give different information from that of traditional citation counts, this article assesses the extent to which references in clinical guidelines tend to be highly cited in the academic literature and highly read in Mendeley. Using evidence from the United Kingdom, references associated with the UK's National Institute of Health and Clinical Excellence (NICE) guidelines tended to be substantially more cited than comparable articles, unless they had been published in the most recent 3 years. Citation counts also seemed to be stronger indicators than Mendeley readership altmetrics. Hence, although presence in guidelines may be particularly useful to highlight the contributions of recently published articles, for older articles citation counts may already be sufficient to recognize their contributions to health in society.
Bibliometric analysis based on literature in the Web of Science (WOS) has become an increasingly popular method for visualizing the structure of scientific fields. Keywords Plus and Author Keywords are commonly selected as units of analysis, despite the limited research evidence demonstrating the effectiveness of Keywords Plus. This study was conceived to evaluate the efficacy of Keywords Plus as a parameter for capturing the content and scientific concepts presented in articles. Using scientific papers about patient adherence that were retrieved from WOS, a comparative assessment of Keywords Plus and Author Keywords was performed at the scientific field level and the document level, respectively. Our search yielded more Keywords Plus terms than Author Keywords, and the Keywords Plus terms were more broadly descriptive. Keywords Plus is as effective as Author Keywords in terms of bibliometric analysis investigating the knowledge structure of scientific fields, but it is less comprehensive in representing an article's content.
Research data curation initiatives must support heterogeneous kinds of projects, data, and metadata. This article examines variability in data and metadata practices using institutions as the key theoretical concept. Institutions, in the sense used here, are stable patterns of human behavior that structure, legitimize, or delegitimize actions, relationships, and understandings within particular situations. Based on prior conceptualizations of institutions, a theoretical framework is presented that outlines 5 categories of institutional carriers for data practices: (a) norms and symbols, (b) intermediaries, (c) routines, (d) standards, and (e) material objects. These institutional carriers are central to understanding how scientific data and metadata practices originate, stabilize, evolve, and transfer. This institutional framework is applied to 3 case studies: the Center for Embedded Networked Sensing (CENS), the Long Term Ecological Research (LTER) network, and the University Corporation for Atmospheric Research (UCAR). These cases are used to illustrate how institutional support for data and metadata management are not uniform within a single organization or academic discipline. Instead, broad spectra of institutional configurations for managing data and metadata exist within and across disciplines and organizations.
This research aimed to gain a detailed understanding of how genealogists and historians interact with, and make use of, finding aids in print and digital form. The study uses the lens of human information interaction to investigate finding aid use. Data were collected through a lab-based study of 32 experienced archives' users who completed two tasks with each of two finding aids. Participants were able to carry out the tasks, but they were somewhat challenged by the structure of the finding aid and employed various techniques to cope. Their patterns of interaction differed by task type and they reported higher rates of satisfaction, ease of use, and clarity for the assessment task than the known-item task. Four common patterns of interaction were identified: top-down, bottom-up, interrogative, and opportunistic. Results show how users interact with findings aids and identify features that support and hinder use. This research examines process and performance in addition to outcomes. Results contribute to the archival science literature and also suggest ways to extend models of human information interaction.
Generally, multicountry papers receive more citations than single-country ones. In this contribution, we examine if this rule also applies to American scientists publishing in highly visible interdisciplinary journals. Concretely, we compare the citations received by American scientists in Nature, Science, and the Proceedings of theNationalAcademy ofSciences of theUnitedStates ofAmerica (PNAS). It is shown that, statistically, American scientists publishing in Nature and Science do not benefit from international collaboration. This statement also holds for communicated submissions, but not for direct and for contributed submissions, to PNAS.
Scientists continuously generate research data but only a few of them are published. If these data were accessible and reusable, researchers could examine them and generate new knowledge. Our purpose is to determine whether there is a relationship between the impact factor and the policies concerning open availability of raw research data in journals of Information Science and Library Science (ISLS) subject category from the Web of Science database. We reviewed the policies related to public availability of papers and data sharing in the 85 journals included in the ISLS category of the Journal Citation Reports in 2012. The relationship between public availability of published data and impact factor of journals is analysed through different statistical tests. The variable "statement of complementary material" was accepted in 50 % of the journals; 65 % of the journals support "reuse"; 67 % of the journals specified "storage in thematic or institutional repositories"; the "publication of the manuscript in a website" was accepted in 69 % of the journals. We have found a 50 % of journals that include the possibility to deposit data as supplementary material, and more than 60 % accept reuse, storage in repositories and publication in websites. There is a clear positive relationship between being a top journal in impact factor ranking of JCR and having an open policy.
This study investigates the evolution and structure of a national-scale co-publishing network in Korea from 1948 to 2011. We analyzed more than 700,000 papers published by approximately 415,000 authors for temporal changes in productivity and network properties with a yearly resolution. The resulting statistical properties were compared to findings from previous studies of coauthorship networks at the national and discipline levels. Our results show that both the numbers of publications and authors in Korea have grown exponentially in a 64 year time frame. Korean scholars have become more productive and collaborative. They now form a small-world-ish network where most authors can connect with one other within an average of 5.33 degrees of separation. The increasingly skewed distribution and concentration of both productivity and the number of collaborators per author indicate that a relatively small group of individuals have accumulated a large number of opportunities for co-publishing. This implies a potential vulnerability for the network and its wider context: the graph would disintegrate into a multitude of smaller components, where the largest one would contain < 2 % of all authors, if approximately 15 % (57,724) of the most connected scholars left the network, e.g., due to retirement or promotion to higher-level administrative positions.
It is sometimes pointed out that economic research is prone to move in cycles and react to particular events such as crises and recessions. The present paper analyses this issue through a quantitative analysis by answering the research question of whether or not the economic literature on business cycles is correlated with movements and changes in actual economic activity. To tackle this question, a bibliometric analysis of key terms related to business cycle and crises theory is performed. In a second step, these results are confronted with data on actual economic developments in order to investigate the question of whether or not the theoretical literature follows trends and developments in economic data. To determine the connection between economic activity and developments in the academic literature, a descriptive analysis is scrutinized by econometric tests. In the short run, the VARs with cyclical fluctuations point out multiple cases where economic variables Granger-cause bibliometric ones. In the long run, the fractionally cointegrated VARs suggest that many bibliometric variables respond to economic shocks. In the multivariate framework, the Diebold-Mariano test shows that economic variables significantly improve the quality of the forecast of bibliometric indices. The paper also includes impulse-response function analysis for a quantitative assessment of the effects from economic to bibliometric variables. The results point towards a qualified confirmation of the hypothesis of an effect of business cycles and crises in economic variables on discussions in the literature.
With conflicting public pressure for greater access to higher education and budget reductions and with continuing backlash over increasing tuition and skyrocketing student debt, public universities have intensified efforts to improve organizational efficiency, effectiveness, and productivity. One strategic option is merging institutions of higher education to better utilize resources, reap cost savings, and increase scholarly outputs. Mergers and acquisitions more commonly occur in the business domain and analysis specific to the higher education arena is limited to this point. Our research examines the effects of university merger on knowledge production in the form of faculty scholarly productivity. We use results of a continuing study of merger of two state-funded higher education institutions, with quite different organizational cultures and research orientations, to explore merger impacts. Using the extensive prior literature on job stress and associated person-organization fit, as well as social identity theory, we develop a model of predictors of post-merger research time allocation and associated productivity. We find lingering effects of pre-merger institutional affiliation, particularly for the low status university faculty, on post-merger job stress, organizational fit, and resulting research productivity. The results of our study advance practical approaches to mergers in higher education for policy makers and managers of higher education.
The academic journals rankings are widely used for academic purposes, especially in the field of Economics. There are many procedures to construct academic journals rankings. Some of them are based on citation analysis while other are based on expert opinion. In this study, we introduced a methodological innovation to aggregate different performance measures to build an alternative ranking of journals in Economics. Our approach is based on a pure output oriented Free Disposal Hull (FDH). We analyzed four indicators-Journal Impact Factor, Discounted Impact Factor, h-index, and Article Influence-for a set of 232 journals in Economics. The results allow us to reach two main conclusions. First, the ranking based on the FDH method seems to be consistent with other well-known reference rankings (i.e.: KMS, Invariant, Ambitious and Area Score). Second, the additional information that provides the FDH model may be used by the Editorial Board to formulate strategies to achieve goals. For instance, to improve a journal score by comparing it with the scores of similar journals.
Citation index measures the impact or quality of a research publication. Currently, all the standard journal citation indices are used to measure the impact of individual research article published in those journals and are based on the citation count, making them a pure quantitative measure. To address this, as our first contribution, we propose to assign weights to the edges of citation network using three context based quality factors: 1. Sentiment analysis of the text surrounding the citation in the citing article, 2. Self-citations, 3. Semantic similarity between citing and cited article. Prior approaches make use of PageRank algorithm to compute the citation scores. This being an iterative process is not essential for acyclic citation networks. As our second contribution, we propose a non-iterative graph traversal based approach, which uses the edge weights and the initial scores of the non-cited nodes to compute the citation indices by visiting the nodes in topologically sorted order. Experimental results depict that rankings of citation indices obtained by our approach are improved over the traditional citation count based ranks. Also, our rankings are similar to that of PageRank based methods; but, our algorithm is simpler and 70 % more efficient. Lastly, we propose a new model for future reference, which computes the citation indices based on solution of system of linear inequalities, in which human-expert's judgment is modeled by suitable linear constraints.
Information visualization and data visualization are often viewed as similar, but distinct domains, and they have drawn an increasingly broad range of interest from diverse sectors of academia and industry. This study systematically analyzes and compares the intellectual landscapes of the two domains between 2000 and 2014. The present study is based on bibliographic records retrieved from the Web of Science. Using a topic search and a citation expansion, we collected two sets of data in each domain. Then, we identified emerging trends and recent developments in information visualization and data visualization, captivated in intellectual landscapes, landmark articles, bursting keywords, and citation trends of the domains. We found out that both domains have computer engineering and applications as their shared grounds. Our study reveals that information visualization and data visualization have scrutinized algorithmic concepts underlying the domains in their early years. Successive literature citing the datasets focuses on applying information and data visualization techniques to biomedical research. Recent thematic trends in the fields reflect that they are also diverging from each other. In data visualization, emerging topics and new developments cover dimensionality reduction and applications of visual techniques to genomics. Information visualization research is scrutinizing cognitive and theoretical aspects. In conclusion, information visualization and data visualization have co-evolved. At the same time, both fields are distinctively developing with their own scientific interests.
The aim of the present contribution is to merge bibliographic data for members of a bounded scientific community in order to derive a complete unified archive, with top-international and nationally oriented production, as a new basis to carry out network analysis on a unified co-authorship network. A two-step procedure is used to deal with the identification of duplicate records and the author name disambiguation. Specifically, for the second step we strongly drew inspiration from a well-established unsupervised disambiguation method proposed in the literature following a network-based approach and requiring a restricted set of record attributes. Evidences from Italian academic statisticians were provided by merging data from three bibliographic archives. Non-negligible differences were observed in network results in the comparison of disambiguated and not disambiguated data sets, especially in network measures at individual level.
We conducted a bibliometric analysis of research in or about South African National Parks, published between 2003 and 2013. Our goal was to identify the major research topics, and to examine the role of in-house ("embedded") researchers in producing relevant knowledge and in leveraging additional benefits through collaboration with external researchers. The authorship of 1026 papers was highly collaborative, with the majority of papers (70 %) being contributed by external researchers. Research was concentrated in five of the 19 parks, and was biased towards animal and ecological process studies in savanna ecosystems. Researchers have mainly worked in older, larger, and arguably more aesthetically-appealing parks that are either close to hand or that provide subsidized accommodation to researchers, and that have established experimental setups or useful long-term data; smaller and more remote parks have received less research attention. Certain priority topics for management, such as degradation of freshwater ecosystems, global change, marine ecology, and socio-ecological dynamics have not received much attention, and are areas identified for growth. Embedded authors were found to be more highly connected and influential than external researchers, leveraging and connecting many research projects. We conclude that there are significant benefits to be gained for the management of protected areas through the maintenance of an embedded research capability.
The accuracy of interdisciplinarity measurements is directly related to the quality of the underlying bibliographic data. Existing indicators of interdisciplinarity are not capable of reflecting the inaccuracies introduced by incorrect and incomplete records because correct and complete bibliographic data can rarely be obtained. This is the case for the Rao-Stirling index, which cannot handle references that are not categorized into disciplinary fields. We introduce a method that addresses this problem. It extends the Rao-Stirling index to acknowledge missing data by calculating its interval of uncertainty using computational optimization. The evaluation of our method indicates that the uncertainty interval is not only useful for estimating the inaccuracy of interdisciplinarity measurements, but it also delivers slightly more accurate aggregated interdisciplinarity measurements than the Rao-Stirling index.
In collaboration with the managers of a university, we have conducted an action research to gauge the adequacy of texts written by researchers in the doctoral programs of the institution as the input (content) for the university's distance learning program. For the analyses, bibliometric data were collected regarding the articles in question from the regional SciELO repository. This repository was chosen because the articles are mostly published in Portuguese, which is in the domain of the target readership of the distance learning project. It was observed that there was a need for an indicator that related the number of downloads of an article to the total number of downloads of the journal in which it was published. Thus, we propose the D-index, defined as the number of papers with download number a parts per thousand yend, as a useful index for characterizing the academic popularity (hits) of a journal. The first applications of the D-index in terms of professional practice were positive, given its usefulness and practicality. The D-index aids the analysis of the download of an article, which is a fundamental event for subsequent reading, internalization and learning. An analysis of this set of actions is essential to the context of regional repositories, as their mission also includes the dissemination of information to aid the learning and education of their readers.
This review examines the current approaches to leadership by dividing them into two major categories: those that treat leadership as a hierarchical system and those that treat leadership as a complex, flexible framework. The innovation of the paper is in using a bibliometric analysis in order to observe whether our results bore a resemblance to what is known in the literature about the different approaches to leadership until now. The data sources for the analyses were the Science and Social Science Citation Index Expanded database and the World Catalog database. The main argument is that although transformational leadership still remains the most influential in this field of research, shared, complexity, and collective types of leadership are the approaches that show the next greatest intensity of research. A quantitate analysis of a bibliometric method supports this suggestion. We argue that the reason for their popularity in the field lies in the modern structure of Western society, with its shift from the Industrial Era to the Knowledge Era shaped by democratization, globalization, and growing complexity of modern society.
After the fall of the USSR in 1990, there was a steady stagnation of Russian science for 15 years. The restoration started in 2006 after the government introduced new science policies with funds depending on the research assessment. As it follows from this paper, the trends of publication activity in Russia have changed after that. On the other hand, the number of annual scientometric publications in Russia increased sharply from dozens to hundreds in the period of 2006-2014. In this paper, we consider whether these facts are related or not. We investigated the dynamics and structure of scientometric articles flow and revealed how it is related to the stages of Russian Science reformation. In the final part, we made a brief review of the most cited issues including country research specialization, low citing, and Matthew index. The aim of this paper is to make a review of new Russian scientometrics landscape and to explain the reasons why it has changed.
In this study, the historical roots of tribology (a sub-field of mechanical engineering and materials science) are investigated using a newly developed scientometric method called "Reference Publication Year Spectroscopy (RPYS)". The study is based on cited references (n = 577,472) in tribology research publications (n = 24,086). The Science Citation Index-Expanded (SCI-E) is used as data source. The results show that RPYS has the potential to identify the important publications: Most of the publications which have been identified in this study as highly cited (referenced) publications are landmark publications in the field of tribology.
Active users of Twitter are often overwhelmed with the vast amount of tweets. In this work we attempt to help users browsing a large number of accumulated posts. We propose a personalized word cloud generation as a means for users' navigation. Various user past activities such as user published tweets, retweets, and seen but not retweeted tweets are leveraged for enhanced personalization of word clouds. The best personalization results are attained with user past retweets. However, users' own past tweets are not as useful as retweets for personalization. Negative preferences derived from seen but not retweeted tweets further enhance personalized word cloud generation. The ranking combination method outperforms the preranking approach and provides a general framework for combined ranking of various user past information for enhanced word cloud generation. To better capture subtle differences of generated word clouds, we propose an evaluation of word clouds with a mean average precision measure.
This study investigated how people comprehend three-dimensional (3D) visualizations and what properties of such visualizations affect comprehension. Participants were asked to draw the face of a 3D visualization after it was cut in half. We videotaped the participants as they drew, erased, verbalized their thoughts, gestured, and moved about a two-dimensional paper presentation of the 3D visualization. The videorecords were analyzed using a grounded theory approach to generate hypotheses related to comprehension difficulties and visualization properties. Our analysis of the results uncovered three properties that made problem solving more difficult for participants. These were: (a) cuts that were at an angle in relation to at least one plane of reference, (b) nonplanar properties of the features contained in the 3D visualizations including curved layers and v-shaped layers, and (c) mixed combinations of layers. In contrast, (a) cutting planes that were perpendicular or parallel to the 3D visualization diagram's planes of reference, (b) internal features that were flat/planar, and (c) homogeneous layers were easier to comprehend. This research has direct implications for the generation and use of 3D information visualizations in that it suggests design features to include and avoid.
Collaborative Computer-Supported Argument Visualization (CCSAV) has often been proposed as an alternative over more conventional, mainstream platforms for online discussion (e.g., online forums and wikis). CCSAV tools require users to contribute to the creation of a joint artifact (argument map) instead of contributing to a conversation. In this paper we assess empirically the effects of this fundamental design choice and show that the absence of conversational affordances and socially salient information in representation-centric tools is detrimental to the users' collaboration experience. We report empirical findings from a study in which subjects using different collaborative platforms (a forum, an argumentation platform, and a socially augmented argumentation tool) were asked to discuss and predict the price of a commodity. By comparing users' experience across several metrics we found evidence that the collaborative performance decreases gradually when we remove conversational interaction and other types of socially salient information. We interpret these findings through theories developed in conversational analysis (common ground theory) and communities of practice and discuss design implications. In particular, we propose balancing the trade-off between knowledge reification and participation in representation-centric tools with the provision of social feedback and functionalities supporting meaning negotiation.
We investigated a topic-based navigation guidance system in the World Health Organization portal, compared the link connection network and the semantic connection network derived from the guidance system, analyzed the characteristics of the 2 networks from the perspective of the node centrality (in_closeness, out_closeness, betweenness, in_degree, and out_degree), and provided the suggestions to optimize and enhance the topic-based navigation guidance system. A mixed research method that combines the social network analysis method, clustering analysis method, and inferential analysis methods was used. The clustering analysis results of the link connection network were quite different from those of the semantic connection network. There were significant differences between the link connection network and the semantic network in terms of density and centrality. Inferential analysis results show that there were no strong correlations between the centrality of a node and its topic information characteristics. Suggestions for enhancing the navigation guidance system are discussed in detail. Future research directions, such as application of the same research method presented in this study to other similar public health portals, are also included.
The purpose of this study was to examine user tags that describe digitized archival collections in the field of humanities. A collection of 8,310 tags from a digital portal (Nineteenth-Century Electronic Scholarship, NINES) was analyzed to find out what attributes of primary historical resources users described with tags. Tags were categorized to identify which tags describe the content of the resource, the resource itself, and subjective aspects (e.g., usage or emotion). The study's findings revealed that over half were content-related; tags representing opinion, usage context, or self-reference, however, reflected only a small percentage. The study further found that terms related to genre or physical format of a resource were frequently used in describing primary archival resources. It was also learned that nontextual resources had lower numbers of content-related tags and higher numbers of document-related tags than textual resources and bibliographic materials; moreover, textual resources tended to have more user-context-related tags than other resources. These findings help explain users' tagging behavior and resource interpretation in primary resources in the humanities. Such information provided through tags helps information professionals decide to what extent indexing archival and cultural resources should be done for resource description and discovery, and understand users' terminology.
Analyzing hyperlink patterns has been a major research topic since the early days of the web. Numerous studies reported uncovering rich information and methodological advances. However, very few studies thus far examined hyperlinks in the rapidly developing sphere of social media. This paper reports a study that helps fill this gap. The study analyzed links originating from tweets to the websites of 3 types of organizations (government, education, and business). Data were collected over an 8-month period to observe the fluctuation and reliability of the individual data set. Hyperlink data from the general web (not social media sites) were also collected and compared with social media data. The study found that the 2 types of hyperlink data correlated significantly and that analyzing the 2 together can help organizations see their relative strength or weakness in the two platforms. The study also found that both types of inlink data correlated with offline measures of organizations' performance. Twitter data from a relatively short period were fairly reliable in estimating performance measures. The timelier nature of social media data as well as the date/time stamps on tweets make this type of data potentially more valuable than that from the general web.
This paper focuses on exploring the usage patterns and regularities of co-employment of various popular tags and their relationships with the activeness of users and the interest level of resources in social tagging. A hypernetwork for social tagging is constructed in which a tagging action is expressed as a hyperedge and the user, resource, and tag are expressed as nodes. Quantitative measures for the constructed hypernetwork are defined, including the hyperdegree and its distribution, the excess average hyperdegree, and the hyperdegree conditional probability distribution. Using the data set from Delicious, an empirical study was conducted. The empirical results show that multiple individual tags and one or very few popular tags are generally employed together in one tagging action, and the usage patterns and regularities of tags with varying popularity are correlated to both user activity and resource interest. The empirical results are further discussed and explained from the perspectives of tag functions and motivations. Finally, suggestions regarding the usage of various popular tags for both tagging users and service providers of social tagging are given.
The article presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH. Relation Extraction (RE) performance benefits from a syntactic-based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word-Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven RE rules for the recognition of semantic relationships from phrases of unstructured text.
Instant messaging (IM) has become increasingly prevalent in social life. However, the use of IM at work remains controversial, owing to its unclear benefits to organizations. In this study, we employ media performance theories and the concept model of communicative ecology to examine the impact of an IM tool on shaping guanxi (i.e., interpersonal relationship) networks in the workplace. Specifically, we propose that IM has the potential to enhance guanxi networks by improving communication quality and building interlocutors' mutual trust in the workplace. The conceptual model is validated with 253 survey responses collected from working professionals in China. The data indicate that communication quality and mutual trust contribute to the development of guanxi networks. The theoretical and practical implications of the findings are discussed.
Using Luhmann's communication framework, we examine the interaction implications for kindergarten to Grade 2 students using mathematics applications on four types of tablet computers. Research questions included what content is communicated between the child and the tablet computer and how engaged are children in the interaction. We found that mathematics applications developers have focused on creating applications for the practice of a priori knowledge, rather than on creating instructional applications. Results show preliminary evidence that child-tablet communication is generally successful, but this success comes at the cost of richer, multimodal interactions. Tablet computer application developers are being cautious in offering a variety of options for children to interact with the devices, and we suggest that there is scope for a broadening of communicative interaction modes.
Online Q&A services are information sources where people identify their information need, formulate the need in natural language, and interact with one another to satisfy their needs. Even though in recent years online Q&A has considerably grown in popularity and impacted information-seeking behaviors, we still lack knowledge about what motivates people to ask a question in online Q&A environments. Yahoo! Answers and WikiAnswers were selected as the test beds in the study, and a sequential mixed method employing an Internet-based survey, a diary method, and interviews was used to investigate user motivations for asking a question in online Q&A services. Cognitive needs were found as the most significant motivation, driving people to ask a question. Yet, it was found that other motivational factors (e.g., tension free needs) also played an important role in user motivations for asking a question, depending on asker's contexts and situations. Understanding motivations for asking a question could provide a general framework of conceptualizing different contexts and situations of information needs in online Q&A. The findings have several implications not only for developing better question-answering processes in online Q&A environments, but also for gaining insights into the broader understanding of online information-seeking behaviors.
Although Mendeley bookmarking counts appear to correlate moderately with conventional citation metrics, it is not known whether academic publications are bookmarked in Mendeley in order to be read or not. Without this information, it is not possible to give a confident interpretation of altmetrics derived from Mendeley. In response, a survey of 860 Mendeley users shows that it is reasonable to use Mendeley bookmarking counts as an indication of readership because most (55%) users with a Mendeley library had read or intended to read at least half of their bookmarked publications. This was true across all broad areas of scholarship except for the arts and humanities (42%). About 85% of the respondents also declared that they bookmarked articles in Mendeley to cite them in their publications, but some also bookmark articles for use in professional (50%), teaching (25%), and educational activities (13%). Of course, it is likely that most readers do not record articles in Mendeley and so these data do not represent all readers. In conclusion, Mendeley bookmark counts seem to be indicators of readership leading to a combination of scholarly impact and wider professional impact.
We have developed and tested an evidence-based method for early-stage identification of scientific discoveries. Scholarly publications are analyzed to track and trace breakthrough processes as well as their impact on world science. The focus in this study is on the incremental discovery of the ubiquitin-mediated proteolytic system in the late 1970s by a small international team of collaborating researchers. Analysis of their groundbreaking research articles, all produced within a relatively short period of time, and the network of citing articles shows the cumulative effects of the intense collaboration within a small group of researchers working on the same subject. Using bibliographic data from the Web of Science database and the PATSTAT patents database in combination with expert opinions shows that these discoveries accumulated into a new technology. These first findings suggest that potential breakthrough discoveries can be identified at a relatively early stage by careful analysis of publication and citation patterns.
A key impact the Internet is having on university teaching involves the new choices being provided because of open educational content. Wikipedia is a clear example of these new options. It is a gigantic open repository of knowledge, and it can also be considered a platform that facilitates collaboration in knowledge creation and dissemination. Our research objective is to understand what the main factors are that influence the teaching uses of Wikipedia among university faculty. Based on a technology acceptance model, and using data from a survey sent to all faculty members of the Universitat Oberta de Catalunya, we analyze the relationships within the internal and external constructs of the model. We found that both the perception of colleagues' opinions about Wikipedia and the perceived quality of the information in Wikipedia play a central role. These two constructs have a significant direct impact on the perceived usefulness of Wikipedia. This perceived usefulness affects, mediated by the behavioral intention of using Wikipedia, the effective use behavior of the encyclopedia. The degree to which an individual considers it is important to participate in open collaborative environments and the Web 2.0 profile of the faculty members also play an important role in our model.
Statistics are essential to many areas of research and individual statistical techniques may change the ways in which problems are addressed as well as the types of problems that can be tackled. Hence, specific techniques may tend to generate high-impact findings within science. This article estimates the citation advantage of a technique by calculating the average citation rank of articles using it in the issue of the journal in which they were published. Applied to structural equation modeling (SEM) and four related techniques in 3 broad fields, the results show citation advantages that vary by technique and broad field. For example, SEM seems to be more influential in all broad fields than the 4 simpler methods, with one exception, and hence seems to be particularly worth adding to statistical curricula. In contrast, Pearson correlation apparently has the highest average impact in medicine but the least in psychology. In conclusion, the results suggest that the importance of a statistical technique may vary by discipline and that even simple techniques can help to generate high-impact research in some contexts.
We studied steroid research from 1935 to 1965 that led to the discovery of the contraceptive pill and cortisone. Bibliometric and patent file searches indicate that the Syntex industrial laboratory located in Mexico and the Universidad Nacional Autonoma de Mexico (UNAM) produced about 54% of the relevant papers published in mainstream journals, which in turn generated over 80% of the citations and in the case of Syntex, all industrial patents in the field between 1950 and 1965. This course of events, which was unprecedented at that time in a developing country, was interrupted when Syntex moved its research division to the US, leaving Mexico with a small but productive research group in the chemistry of natural products.
The objective of this article is to further the study of journal interdisciplinarity, or, more generally, knowledge integration at the level of individual articles. Interdisciplinarity is operationalized by the diversity of subject fields assigned to cited items in the article's reference list. Subject fields and subfields were obtained from the Leuven-Budapest (ECOOM) subject-classification scheme, while disciplinary diversity was measured taking variety, balance, and disparity into account. As diversity measure we use a Hill-type true diversity in the sense of Jost and Leinster-Cobbold. The analysis is conducted in 3 steps. In the first part, the properties of this measure are discussed, and, on the basis of these properties it is shown that the measure has the potential to serve as an indicator of interdisciplinarity. In the second part the applicability of this indicator is shown using selected journals from several research fields ranging from mathematics to social sciences. Finally, the often-heard argument, namely, that interdisciplinary research exhibits larger visibility and impact, is studied on the basis of these selected journals. Yet, as only 7 journals, representing a total of 15,757 articles, are studied, albeit chosen to cover a large range of interdisciplinarity, further research is still needed.
The rapidly shifting ideological terrain of computing has a profound impact on Social Informatics's critical and empirical analysis of computerization movements. As these movements incorporate many of the past critiques concerning social fit and situational context leveled against them by Social Informatics research, more subtle and more deeply ingrained modes of ideological practice have risen to support movements of computerization. Among these, the current emphasis on the promises of data and data analytics presents the most obvious ideological challenge. In order to reorient Social Informatics in relation to these new ideological challenges, Louis Althusser's theory of ideology is discussed, with its implications for Social Informatics considered. Among these implications, a changed relationship between Social Informatics's critical stance and its reliance on empirical methods is advanced. Addressed at a fundamental level, the practice of Social Informatics comes to be reoriented in a more distinctly reflective and ethical direction.
Over the last two decades, emerging countries located outside North America and Europe have reshaped the global economy. These countries are also increasing their share of the world's scientific output. This paper analyzes the evolution of BRICS (Brazil, Russia, India, China and South Africa) and G-7 countries' international scientific collaboration, and compares it with high-technology economic exchanges between 1995-1997 and 2010-2012. Our results show that BRICS scientific activities are enhanced by their high-technology exports and, to a larger extent, by their international collaboration with G-7 countries which remains, over the period studied, at the core of the BRICS scientific collaboration network. However, while high-technology exports made by most BRICS countries to G-7 countries have increased over the studied period, both the intra-BRICS high-technology flows and the intra-BRICS scientific collaboration have remained very weak.
Despite a high level of interest in quantifying the scientific output of established researchers, there has been less of a focus on quantifying the performance of junior researchers. The available metrics that quantify a scientist's research output all utilize citation information, which often takes a number of years to accrue and thus would disadvantage newer researchers (e.g., graduate students, post-doctoral members, new professors). Based on this critical limitation of existing metrics, we created a new metric of scientific output, the zp-index, which remedies this issue by utilizing the journal quality rather than citation count in calculating an index of scientific output. Additionally, the zp-index also takes authorship position into account by allocating empirically derived weights to each authorship position, so that first authorship publications receive more credit than later authorship positions (Study 1). Furthermore, the zp-index has equal predictive validity as a measure of the number of publications but does a better job of discriminating researcher's scientific output and may provide different information than the number of publications (Study 2). Therefore, use of the zp-index in conjunction with the number of publications can provide a more accurate assessment of a new scientist's academic achievements.
This study investigates the importance of collegiality (i.e., good colleagues) and the quality of human capital investment in fostering the development and growth of stars in the field of economics, where stardom is measured by way of receipt of the John Bates Clark Medal, arguably the second-most prestigious award in economics. We provide a vignette as a foundation for both qualitative and quantitative analysis using Egghe's g-Index. Our results indicate that three institutions, namely Chicago, Harvard and MIT, with secondary consideration to Princeton, generally rank highest in fostering the growth and development of stars in the field of economics.
A webmaster's decision to link to a webpage can be interpreted as a "vote" for that webpage. But how far does the parallel between linking and voting extend? In this paper, we prove several "linking theorems" showing that link-based ranking tracks importance on the web in the limit as the number of webpages grows, given independence and minimal linking competence. The theorems are similar in spirit to the voting, or jury, theorem famously attributed to the 18th century mathematician Nicolas de Condorcet. We argue that the linking theorems provide a fundamental epistemological justification for link-based ranking on the web, analogous to the justification that Condorcet's theorems bestow on majority voting as a basic democratic procedure. The analogy extends to the practical limitations facing both kinds of result, in particular due to limited voting/linking independence. However, we argue, referring to the theoretical developments inspired by the jury theorem, that some of the pessimism expressed in the webometrics literature regarding the possibility of a "theory of linking" may be unjustified. The present study connects the two academic disciplines of webometrics in information science and epistemic democracy in political science by showing how they share a common structure. As such, it opens up new possibilities for theoretical cross-fertilization and interdisciplinary transference of concepts and results. In particular, we show how the relatively young field of webometrics can benefit from the extensive and sophisticated literature on the Condorcet jury theorem.
Authors tend to attribute manuscript acceptance to their own ability to write quality papers and simultaneously to blame rejections on negative bias in peer review, displaying a self-serving attributional bias. Here, a formal model provides rational explanations for this self-serving bias in a Bayesian framework. For the high-ability authors in a very active scientific field, the model predictions are: (1) Bayesian-rational authors are relatively overconfident about their likelihood of manuscript acceptance, whereas authors who play the role of referees have less confidence in manuscripts of other authors; (2) if the final disposition of his or her manuscript is acceptance, the Bayesian-rational author almost surely attributes this decision more to his or her own ability; (3) when the final disposition is rejection, the Bayesian-rational author almost surely attributes this decision more to negative bias in peer review; (4) some rational authors do not learn as much from the critical reviewers' comments in case of rejection as they should from the journal editor's perspective. In order to validate the model predictions, we present results from a survey of 156 authors. The participants in the experimental study are authors of articles published in Scientometrics from 2000 to 2012.
As federal programs are held more accountable for their research investments, The National Institute of Environmental Health Sciences (NIEHS) has developed a new method to quantify the impact of our funded research on the scientific and broader communities. In this article we review traditional bibliometric analyses, address challenges associated with them, and describe a new bibliometric analysis method, the Automated Research Impact Assessment (ARIA). ARIA taps into a resource that has only rarely been used for bibliometric analyses: references cited in "important" research artifacts, such as policies, regulations, clinical guidelines, and expert panel reports. The approach includes new statistics that science managers can use to benchmark contributions to research by funding source. This new method provides the ability to conduct automated impact analyses of federal research that can be incorporated in program evaluations. We apply this method to several case studies to examine the impact of NIEHS funded research.
Bibliometrics are important evaluation tools to the scientific production. A bibliometric study was used to evaluate researches about credit risk and bankruptcy. Credit support technique studies on economic and social orders are relevant in this knowledge field. Therefore, the aim of the current study is to identify and describe the application of multivariate data analysis techniques to credit risk and bankruptcy scenarios. The herein presented data were collected in publications indexed to Thomson Reuters' Web of Science database between 1968 and 2014. The results corroborate information in the literature and in previous bibliometric reviews, as well as highlight other indications regarding the construction and development of research fields. Since the 1990's, the neural networks became relevant due to their increased use as study object in publications. However, both the discriminant analysis (J Finance 23(4): 589-609, 1968. doi: 10.2307/2978933) and the logistic regression (J Account Res 18(1): 109-131, 1980. doi: 10.2307/2490395) are still often used in researches, fact that shows the tendency to find articles using more than one technique or hybrid models, artificial intelligence techniques and complex computer systems. This field appears to be multidisciplinary in journals and Web Science databases involving the business and economy, operational research, management, mathematics, data processing, engineering and statistics fields. Another relevant discovery was the increased number of publications about this subject launched right after the 2008 crisis.
I challenge a finding reported recently in a paper by Sotudeh et al. (Scientometrics, 2015. doi: 10.1007/s11192-015-1607-5). The authors argue that there is a citation advantage for those who publish Author-Pay Open Access (Gold Open Access) in journals published by Springer and Elsevier. I argue that the alleged advantage that the authors report for journals in the social sciences and humanities is an artifact of their method. The findings reported about the life sciences, the health sciences, and the natural sciences, on the other hand, are robust. But my finding underscores the fact that epistemic cultures in the social sciences and humanities are different from those in the other fields.
With the incorporation of universities' third mission into their traditional teaching and academic research activities, there has been a burgeoning literature on how the third mission influences universities' academic research, though there is little research on its impact on universities' traditional mission-teaching. This study thus intends to strengthen our understanding of the relationships among universities' three missions by examining the relationship between university and industry collaboration and university teaching performance. Thanks to a unique combined dataset from 61 universities from 2009 to 2013, empirical results indicate that there are distinct effects of collaboration channels on teaching performance. Specifically, there is an inverted U-shaped relationship between academic commercialization and teaching performance, while there is a U-shaped relationship between academic engagement and teaching performance. Academic commercialization and engagement yield a combined positive effect on teaching.
In this study, we combine the specialization scores for publications and patents (the latter is a new indicator of cross-disciplinary engagement) to achieve more comprehensive navigation of the innovation trajectory of a technology. The patent specialization score draws upon counterpart research publication indicator concepts to measure patent diversity. Two nano-based technologies-Nano-enabled drug delivery (NEDD) and Graphene-provide contrasting explorations of the behavior of this indicator, alongside research publication indicators. Results show distinctive patterns of the two technologies and for the respective publication and patent indicators. NEDD research, as evidenced by publication and citation patterns, engages highly diverse research fields. In contrast, NEDD development, as reflected in patent International Patent Classifications (IPCs), concentrates on relatively closely associated fields. Graphene presents the opposite picture, with closely linked disciplines contributing to research, but much more diverse fields of application for its patents. We suggest that analyzing the field diversity of research publications and patents together, employing both specialization scores, can offer fruitful insights into innovation trajectories. Such information can contribute to technology and innovation management and policy for such emerging technologies.
The main objective of this paper is to examine the effect of various proximity dimensions (geographical, cognitive, institutional, organizational, social and economic) on academic scientific collaborations (SC). The data to capture SC consists of a set of coauthored articles published between 2006 and 2010 by universities located in EU-15, indexed by the Science Citation Index (SCI Expanded) of the ISI Web of Science database. We link this data to institution-level information provided by the EUMIDA dataset. Our final sample consists of 240,495 co-authored articles from 690 European universities that featured in both datasets. Additionally, we also retrieved data on regional R&D funding from Eurostat. Based on the gravital equation, we estimate several econometrics models using aggregated data from all disciplines as well as separated data for Chemistry and Chemical Engineering, Life Sciences and Physics and Astronomy. Our results provide evidence on the substantial role of geographical, cognitive, institutional, social and economic distance in shaping scientific collaboration, while the effect of organizational proximity seems to be weaker. Some differences on the relevance of these factors arise at discipline level.
This paper analyzes the impact of several influencing factors on scientific production of researchers. Time related statistical models for the period of 1996 to 2010 are estimated to assess the impact of research funding and other determinant factors on the quantity and quality of the scientific output of individual funded researchers in Canadian natural sciences and engineering. Results confirm a positive impact of funding on the quantity and quality of the publications. In addition, the existence of the Matthew effect is partially confirmed such that the rich get richer. Although a positive relation between the career age and the rate of publications is observed, it is found that the career age negatively affects the quality of works. Moreover, the results suggest that young researchers who work in large teams are more likely to produce high quality publications. We also found that even though academic researchers produce higher quantity of papers it is the researchers with industrial affiliation whose work is of higher quality. Finally, we observed that strategic, targeted and high priority funding programs lead to higher quantity and quality of publications.
This study investigated the influence of the principle of least effort (PLE) introduced by Zipf in his 1949 work, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. The influence of the PLE was measured by examining articles across various disciplines published between 1950 and 2013 that cited Zipf's original work. Our findings show that Zipf's law was the most influential concept embedded in the original work, with the PLE being the second most cited concept. Although the PLE was the focus of Zipf's 1949 book, its influence was much lower than that of Zipf's law. Furthermore, Zipf's law showed an increasing influence over time, whereas a decreasing influence was observed for the PLE. Of the 31 disciplines citing the PLE, library and information science articles cited it most, followed by psychology articles. Articles primarily cited the PLE to explain human behavior and language use. However, articles citing Zipf's law focused on its formula.
This paper presents a Sciento-text framework to characterize and assess research performance of leading world institutions in fine-grained thematic areas. While most of the popular university research rankings rank universities either on their overall research performance or on a particular subject, we have tried to devise a system to identify strong research centres at a more fine-grained level of research themes of a subject. Computer science (CS) research output of more than 400 universities in the world is taken as the case in point to demonstrate the working of the framework. The Sciento-text framework comprises of standard scientometric and text analytics components. First of all every research paper in the data is classified into different thematic areas in a systematic manner and then standard scientometric methodology is used to identify and assess research strengths of different institutions in a particular research theme (say Artificial Intelligence for CS domain). The performance of framework components is evaluated and the complete system is deployed on the Web at url: www. universityselectplus. com. The framework is extendable to other subject domains with little modification.
We analyze the reaction of academic communities to a particular urgent topic which abruptly arises as a scientific problem. To this end, we have chosen the disaster that occurred in 1986 in Chornobyl (Chernobyl), Ukraine, considered as one of the most devastating nuclear power plant accidents in history. The academic response is evaluated using scientific-publication data concerning the disaster using the Scopus database to present the picture on an international scale and the bibliographic database "Ukrainika naukova'' to consider it on a national level. We measured distributions of papers in different scientific fields, their growth rates and properties of co-authorship networks. Elements of descriptive statistics and tools of complex network theory are used to highlight the interdisciplinary as well as international effects. Our analysis allows comparison of contributions of the international community to Chornobyl-related research as well as integration of Ukraine in international research on this subject. Furthermore, the content analysis of titles and abstracts of the publications allowed detection of the most important terms used for description of Chornobyl-related problems.
Link analysis is highly effective in detecting relationships between different institutions, relationships that are stronger the greater their geographical proximity. We therefore decided to apply an interlinking analysis to a set of geographically dispersed research entities and to compare the results with the co-authorship patterns between these institutions in order to determine how, and if, these two techniques might reveal complementary insights. We set out to study the specific sector of public health in Spain, a country with a high degree of regional autonomy. We recorded all Spanish health entities (and their corresponding URLs) that belong to, and were hyperlinked from, the national government or any of the regional governments, gathering a total of 263 URLs. After considering their suitability for web metric analysis, interlinking scores between all valid URLs were obtained. In addition, the number of co-authored articles by each pair of institutions and the total scientific output per institution were retrieved from Scopus. Both interlinking and co-authorship methods detect the existence of strength subnets of geographically distributed nodes (especially the Catalan entities) as well as their high connectivity with the main national network nodes (subnet of nodes distributed according to dependence on national government, in this case Spain). However, the resulting interlinking pattern shows a low but significant correlation (r = 0.5) with scientific co-authorship patterns. The existence of institutions that are strongly interlinked but with limited scientific collaboration (and vice versa) reveals that links within this network are not accurately reflecting existing scientific collaborations, due to inconsistent web content development.
This paper provides new insights on the effects of the enlargement of the European Union (EU) and European integration by investigating the issue of scientific collaboration within the new EU member states vis-a-vis the old EU member states. The question addressed is whether the EU membership following the two enlargement waves 2004 and 2007 has significantly increased the co-publication intensity of the new member states with other member countries. The empirical results based on data collected from the Web of Science database and Difference-in-Difference estimations point towards a conclusion that joining the EU indeed has had an additional positive impact on the co-publication intensity between the new and old member states and, in particular, within the new member states themselves. These results give tentative support for the successfulness of the EU's science policies in achieving a common 'internal market' in research. We also find evidence for early anticipation effects of the consecutive EU accession.
Examines scientometrically the trends in and the recent situation of research on and the teaching of the history of psychology in the German-speaking countries and compares the findings with the situation in other countries (mainly the United States) by means of the psychology databases PSYNDEX and PsycINFO. Declines of publications on the history of psychology are described scientometrically for both research communities since the 1990s. Some impulses are suggested for the future of research on and the teaching of the history of psychology. These include (1) the necessity and significance of an intensified use of quantitative, unobtrusive scientometric methods in historiography in times of digital "big data'', (2) the necessity and possibilities to integrate qualitative and quantitative methodologies in historical research and teaching, (3) the reasonableness of interdisciplinary cooperation of specialist historians, scientometricians, and psychologists, (4) the meaningfulness and necessity to explore, investigate, and teach more intensively the past and the problem history of psychology as well as the understanding of the subject matter of psychology in its historical development in cultural contexts. The outlook on the future of such a more up-to-date research on and teaching of the history of psychology is-with some caution-positive.
Comparative benchmarking with bibliometric indicators can be an aid in decision-making with regard to research management. This study aims to characterize scientific performance in a domain (Public Health) by the institutions of a country (Cuba), taking as reference world output and regional output (other Latin American centers) during the period 2003-2012. A new approach is used here to assess to what extent the leadership of a specific institution can change its citation impact. Cuba was found to have a high level of specialization and scientific leadership that does not match the low international visibility of Cuban institutions. This leading output appears mainly in non-collaborative papers, in national journals; publication in English is very scarce and the rate of international collaboration is very low. The Instituto de Medicina Tropical Pedro Kouri stands out, alone, as a national reference. Meanwhile, at the regional level, Latin American institutions deserving mention for their high autonomy in normalized citation would include Universidad de Buenos Aires (ARG), Universidade Federal de Pelotas (BRA), Consejo Nacional de Investigaciones Cienti ' ficas y Te ' cnicas (ARG), Instituto Oswaldo Cruz (BRA) and the Centro de Pesquisas Rene Rachou (BRA). We identify a crucial aspect that can give rise to misinterpretations of data: a high share of leadership cannot be considered positive for institutions when it is mainly associated with a high proportion of non-collaborative papers and a very low level of performance. Because leadership might be questionable in some cases, we propose future studies to ensure a better interpretation of findings.
The taxonomy considered here would be a crucial preliminary step towards a theory of S&T indicators. Its construction is different from that of other taxonomies in the field of innovation, since it has a methodological character, not a quantitative one. It serves as a caveat against hasty decisions, potentially damaging to progress on the study of indicators.
Based on our involvement in numerous consortia and projects with colleagues from low-and middle-income countries (LMICs), as well as on our extensive fieldwork experience in the global South, we have a shared concern on the actual inclusion of LMIC colleagues and institutions in coproducing highly valuable and policy-relevant science. While capacity building is stated as a major goal in various international research projects, especially when involving partners from LMICs or when focusing on research activities in these countries, we think that research from established groups and universities particularly in member countries of the Organisation for Economic Cooperation and Development (OECD), receives more interest and respect on a disproportionate basis. With the present submission, we hope to feed the debate on the academic valorization of research performed by LMICs scholars. Though difficult to measure, this merits close scrutiny.
An influx of mechanisms for the collection of large sets of data has prompted widespread consideration of the impact that data analytic methods can have on a number of disciplines. Having an established record of the use of a unique mixture of empirical methods, the work of understanding and designing for user behavior is well situated to take advantage of the advances claimed by "big data" methods. Beyond any straightforward benefit of the use of large sets of data, such an increase in the scale of empirical evidence has far-reaching implications for the work of empirically guided design. We develop the concept of "peak empiricism" to explain the new role that large-scale data comes to play in design, one in which data become more than a simple empirical tool. In providing such an expansive empirical setting for design, big data weakens the subjective conditions necessary for empirical insight, pointing to a more performative approach to the relationship between a designer and his or her work. In this, the work of design is characterized as "thinking with" the data in a partnership that weakens not only any sense of empiricism but also the agentive foundations of a classical view of design work.
We examined whether the microblog comments given by people after reading a web document could be exploited to improve the accuracy of a web document summarization system. We examined the effect of social information (i.e., tweets) on the accuracy of the generated summaries by comparing the user preference for TBS (tweet-biased summary) with GS (generic summary). The result of crowdsourcing-based evaluation shows that the user preference for TBS was significantly higher than GS. We also took random samples of the documents to see the performance of summaries in a traditional evaluation using ROUGE, which, in general, TBS was also shown to be better than GS. We further analyzed the influence of the number of tweets pointed to a web document on summarization accuracy, finding a positive moderate correlation between the number of tweets pointed to a web document and the performance of generated TBS as measured by user preference. The results show that incorporating social information into the summary generation process can improve the accuracy of summary. The reason for people choosing one summary over another in a crowdsourcing-based evaluation is also presented in this article.
User studies in the music information retrieval (MIR) domain tend to be exploratory and qualitative in nature, involving a small number of users, which makes it difficult to derive broader implications for system design. In order to fill this gap, we conducted a large-scale user survey questioning various aspects of people's music information needs and behaviors. In particular, we investigated if general music users' needs and behaviors have significantly changed over time by comparing our current survey results with a similar survey conducted in 2004. In this paper, we present the key findings from the survey data and discuss 4 emergent themes-(a) the shift in access and use of personal music collections; (b) the growing need for tools to support collaborative music seeking, listening, and sharing; (c) the importance of "visual" music experiences; and (d) the need for ontologies for providing rich contextual information. We conclude by making specific recommendations for improving the design of MIR systems and services.
Finding a place of interest (e.g., a restaurant, hotel, or attraction) is often related to a group information need, however, the actual multiparty collaboration in such searches has not been explored, and little is known about its significance and related practices. We surveyed 100 computer science students and found that 94% (of respondents) searched for places online; 87% had done so as part of a group. Search for place by multiple active participants was experienced by 78%, with group sizes typically being 2 or 3. Search occurred in a range of settings with both desktop PCs and mobile devices. Difficulties were reported with coordinating tasks, sharing results, and making decisions. The results show that finding a place of interest is a quite different group-based search than other multiparty information-seeking activities. The results suggest that local search systems, their interfaces and the devices that access them can be made more usable for collaborative search if they include support for coordination, sharing of results, and decision making.
Pseudo relevance feedback, as an effective query expansion method, can significantly improve information retrieval performance. However, the method may negatively impact the retrieval performance when some irrelevant terms are used in the expanded query. Therefore, it is necessary to refine the expansion terms. Learning to rank methods have proven effective in information retrieval to solve ranking problems by ranking the most relevant documents at the top of the returned list, but few attempts have been made to employ learning to rank methods for term refinement in pseudo relevance feedback. This article proposes a novel framework to explore the feasibility of using learning to rank to optimize pseudo relevance feedback by means of reranking the candidate expansion terms. We investigate some learning approaches to choose the candidate terms and introduce some state-of-the-art learning to rank methods to refine the expansion terms. In addition, we propose two term labeling strategies and examine the usefulness of various term features to optimize the framework. Experimental results with three TREC collections show that our framework can effectively improve retrieval performance.
Recruiting qualified reviewers, though challenging, is crucial for ensuring a fair and robust scholarly peer review process. We conducted a survey of 307 reviewers of submissions to the International Conference on Human Factors in Computing Systems (CHI 2011) to gain a better understanding of their motivations for reviewing. We found that encouraging high-quality research, giving back to the research community, and finding out about new research were the top general motivations for reviewing. We further found that relevance of the submission to a reviewer's research and relevance to the reviewer's expertise were the strongest motivations for accepting a request to review, closely followed by a number of social factors. Gender and reviewing experience significantly affected some reviewing motivations, such as the desire for learning and preparing for higher reviewing roles. We discuss implications of our findings for the design of future peer review processes and systems to support them.
In this article we use a stated choice experiment to study researcher preferences in the information sciences and to investigate the relative importance of different journal characteristics in convincing potential authors to submit to a particular journal. The analysis distinguishes high quality from standard quality articles and focuses on the question whether communicating acceptance rates rather than rejection rates leads to other submission decisions. Our results show that a positive framing effect might be present when authors decide on submitting a high quality article. No evidence of a framing effect is found when authors consider a standard quality article. From a journal marketing perspective, this is important information for editors. Communicating acceptance rates rather than rejection rates might help to convince researchers to submit to their journal.
The increasing popularity of the Internet and social media is creating new and unique challenges for parents and adolescents regarding the boundaries between parental control and adolescent autonomy in virtual spaces. Drawing on developmental psychology and Communication Privacy Management (CPM) theory, we conduct a qualitative study to examine the challenge between parental concern for adolescent online safety and teens' desire to independently regulate their own online experiences. Analysis of 12 parent-teen pairs revealed five distinct challenges: (a) increased teen autonomy and decreased parental control resulting from teens' direct and unmediated access to virtual spaces, (b) the shift in power to teens who are often more knowledgeable about online spaces and technology, (c) the use of physical boundaries by parents as a means to control virtual spaces, (d) an increase in indirect boundary control strategies such as covert monitoring, and (e) the blurring of lines in virtual spaces between parents' teens and teens' friends.
Much of the recent research on digital data repositories has focused on assessing either the trustworthiness of the repository or quantifying the frequency of data reuse. Satisfaction with the data reuse experience, however, has not been widely studied. Drawing from the information systems and information science literature, we developed a model to examine the relationship between data quality and data reusers' satisfaction. Based on a survey of 1,480 journal article authors who cited Inter-University Consortium for Political and Social Research (ICPSR) data in published papers from 2008-2012, we found several data quality attributes-completeness, accessibility, ease of operation, and credibility-had significant positive associations with data reusers' satisfaction. There was also a significant positive relationship between documentation quality and data reusers' satisfaction.
In the rapidly evolving technology world blogs have become a popular genre of communication. Their potential influence on decision making is the focus of the present research. Based on interplay between Social Judgment Theory and Framing Theory this study investigates whether the information delivered by technology blogs is treated differently during investment decision making than information from traditional financial newspapers in digital form, while containing information cues and text framing. Using an online experiment with a 3 x 2 design, this research compares the influence of this trio of variables on the investment decisions of 236 participants. Results indicate a complex investment decision-making process differing according to the type of medium presented, the text framing, the information cues, and the decision maker's background.
Tags generated by domain experts reaching a consensus under social influence reflect the core concepts of the tagged resource. Such tags can act as navigational cues that enable users to discover meaningful and relevant information in a Web 2.0 environment. This is particularly critical for nonexperts for understanding formal academic or scientific resources, also known as hard content. The goal of this study was to develop a novel one-bit comparison (OBC) metric and to assess in what circumstances a set of tags describing a hard-content resource is mature and representative. We compared OBC with the conventional Shannon entropy approach to determine performance when distinguishing tags generated by domain experts and nonexperts in the early and later stages under social influence. The results indicated that OBC can accurately distinguish mature tags generated by a strong expert consensus from other tags, and outperform Shannon entropy. The findings support tag-based learning, and provide insights and tools for the design of applications involving tags, such as tag recommendation and tag-based organization.
Scholars have often relied on name initials to resolve name ambiguities in large-scale coauthorship network research. This approach bears the risk of incorrectly merging or splitting author identities. The use of initial-based disambiguation has been justified by the assumption that such errors would not affect research findings too much. This paper tests that assumption by analyzing coauthorship networks from five academic fields-biology, computer science, nanoscience, neuroscience, and physics-and an interdisciplinary journal, PNAS. Name instances in data sets of this study were disambiguated based on heuristics gained from previous algorithmic disambiguation solutions. We use disambiguated data as a proxy of ground-truth to test the performance of three types of initial-based disambiguation. Our results show that initial-based disambiguation can misrepresent statistical properties of coauthorship networks: It deflates the number of unique authors, number of components, average shortest paths, clustering coefficient, and assortativity, while it inflates average productivity, density, average coauthor number per author, and largest component size. Also, on average, more than half of top 10 productive or collaborative authors drop off the lists. Asian names were found to account for the majority of misidentification by initial-based disambiguation due to their common surname and given name initials.
In authorship attribution, various distance-based metrics have been proposed to determine the most probable author of a disputed text. In this paradigm, a distance is computed between each author profile and the query text. These values are then employed only to rank the possible authors. In this article, we analyze their distribution and show that we can model it as a mixture of 2 Beta distributions. Based on this finding, we demonstrate how we can derive a more accurate probability that the closest author is, in fact, the real author. To evaluate this approach, we have chosen 4 authorship attribution methods (Burrows' Delta, Kullback-Leibler divergence, Labbe's intertextual distance, and the naive Bayes). As the first test collection, we have downloaded 224 State of theUnion addresses (from 1790 to 2014) delivered by 41 U.S. presidents. The second test collection is formed by the FederalistPapers. The evaluations indicate that the accuracy rate of some authorship decisions can be improved. The suggested method can signal that the proposed assignment should be interpreted as possible, without strong certainty. Being able to quantify the certainty associated with an authorship decision can be a useful component when important decisions must be taken.
Disseminated across the world in more than 100 languages and viewed over 1 billion times, TED Talks is a successful example of web-based science communication. This study investigates the impact of TED Talks videos on YouKu, a Chinese video portal, and YouTube using 6 measures of impact: number of views; likes; dislikes; comments; bookmarks; and shares. In particular, we study the relationship between the topicality and impact of these videos. Findings demonstrate that topics vary greatly in terms of their impact: Topics on entertainment and psychology/philosophy receive more views and likes, whereas design/art and astronomy/biology/oceanography attract fewer comments and bookmarks. Moreover, we identify several topical differences between YouKu and YouTube users. Topics on global issues and technology are more popular on YouKu, whereas topics on entertainment and psychology/philosophy are more popular on YouTube. By analyzing the popularity distribution of videos and the audience characteristics of YouKu, we find that women are more interested in topics on education and psychology/philosophy, whereas men favor topics on technology and astronomy/biology/oceanography.
We investigate the contributions of particular disciplines, countries, and academic departments to the literature of library and information science (LIS) using data for the articles published in 31 journals from 2007 to 2012. In particular, we examine the contributions of authors outside the United States, the United Kingdom, and Canada; faculty in departments other than LIS; and practicing librarians. Worldwide, faculty in LIS departments account for 31% of the journal literature; librarians, 23%; computer science faculty, 10%; and management faculty, 10%. The top contributing nations are the United States, the United Kingdom, Spain, China, Canada, and Taiwan. Within the United States and the United Kingdom, the current productivity of LIS departments is correlated with past productivity and with other measures of reputation and performance. More generally, the distribution of contributions is highly skewed. In the United States, five departments account for 27% of the articles contributed by LIS faculty; in the United Kingdom, four departments account for nearly two-thirds of the articles. This skewed distribution reinforces the possibility that high-status departments may gain a permanent advantage in the competition for students, faculty, journal space, and research funding. At the same time, concentrations of research-active faculty in particular departments may generate beneficial spillover effects.
Images convey essential information in science, technology, engineering, and mathematics communication. Current guidelines on publishing recommend making images accessible to all readers. However, academic publishers do not always follow these guidelines and therefore fail to guarantee access by all readers to the visual content of academic articles. People with severe visual impairments cannot access the visual content of images unless a text alternative describing the images is provided. This study investigates the current use of texts commonly related to images in academic articles, such as captions and mentions, in order to assess their suitability as potential text alternatives to the images for readers who are blind or have severe low vision. A sample of 30 academic articles in the fields of biomedicine, computer science, and mathematics was analyzed and quantitative and qualitative data were collected about images and their related texts. We suggest a practical and sustainable solution that can foster the adoption of good accessibility practices by authors and publishers and facilitate their inclusion in regular publishing workflows.
Omitted citations-i.e., missing links between a cited paper and the corresponding citing papers-are a consequence of several bibliometric-database errors. To reduce these errors, databases may undertake two actions: (1) improving the control of the (new) papers to be indexed, i.e., limiting the introduction of "new" dirty data, and (2) detecting and correcting errors in the papers already indexed by the database, i.e., cleaning "old" dirty data. The latter action is probably more complicated, as it requires the application of suitable error-detection procedures to a huge amount of data. Based on an extensive sample of scientific papers in the Engineering-Manufacturing field, this study focuses on old dirty data in the Scopus and WoS databases. To this purpose, a recent automated algorithm for estimating the omitted-citation rate of databases is applied to the same sample of papers, but in three different-time sessions. A database's ability to clean the old dirty data is evaluated considering the variations in the omitted-citation rate from session to session. The major outcomes of this study are that: (1) both databases slowly correct old omitted citations, and (2) a small portion of initially corrected citations can surprisingly come off from databases over time.
The study sought to explore the underlying factors that influence research collaboration in Library and Information Science (LIS) schools in South Africa. The population for the study consisted of 85 academic teaching staff employed by LIS schools in South African universities. A survey design was used to obtain data for the study, through a questionnaire containing open- and close-ended questions. A total of 85 teaching staff in 10 LIS schools in South Africa were alerted, through email, to the location of the Web-based questionnaires, developed using the Stellarsurvey software. A total of 51 questionnaires were completed and returned for analysis. The findings suggest that factors such as networking, sharing of resources, enhancing productivity, educating students, overcoming intellectual isolation, and accomplishments of projects in a short time as well as learning from peers influenced research collaboration in LIS in South Africa. Factors that are likely to hinder effective collaboration in LIS research include bureaucracy, lack of funding, lack of time, as well as physical distance between researchers. The findings further suggest that even though there are drawbacks to collaboration, majority of LIS researchers thought that collaboration is beneficial and should be encouraged.
This article investigates the developments during the last decades in the use of languages, publication types and publication channels in the social sciences and humanities (SSH). The purpose is to develop an understanding of the processes of internationalization and to apply this understanding in a critical examination of two often used general criteria in research evaluations in the SSH. One of them is that the coverage of a publication in Scopus or Web of Science is seen in itself as an expression of research quality and of internationalization. The other is that a specific international language, English, and a specific type of publication, journal articles, are perceived as supreme in a general hierarchy of languages and publication types. Simple distinctions based on these criteria are contrary to the heterogeneous publication patterns needed in the SSH to organize their research adequately, present their results properly, reach their audiences efficiently, and thereby fulfil their missions. Research quality, internationalization, and societal relevance can be promoted in research assessment in the SSH without categorical hierarchies of publications. I will demonstrate this by using data from scholarly publishing in the SSH that go beyond the coverage in the commercial data sources in order to give a more comprehensive representation of scholarly publishing in the SSH.
It is important to identify the most appropriate statistical model for citation data in order to maximise the potential of future analyses as well as to shed light on the processes that may drive citations. This article assesses stopped sum models and some variants and compares them with two previously used models, the discretised lognormal and negative binomial, using the Akaike Information Criterion (AIC). Based upon data from 20 Scopus categories, some of the stopped sum variant models had lower AIC values than the discretised lognormal models, which were otherwise the best (with respect to AIC). However, very large standard errors were returned for some of these variant models, indicating the imprecision of the estimates and the impracticality of the approach. Hence, although the stopped sum variant models show some promise for citation analysis, they are only recommended when they fit better than the alternatives and have manageable standard errors. Nevertheless, their good fit to citation data gives evidence that two different, but related, processes may drive citations.
Scientific workflows organize the assembly of specialized software into an overall data flow and are particularly well suited for multi-step analyses using different types of software tools. They are also favorable in terms of reusability, as previously designed workflows could be made publicly available through the myExperiment community and then used in other workflows. We here illustrate how scientific workflows and the Taverna workbench in particular can be used in bibliometrics. We discuss the specific capabilities of Taverna that makes this software a powerful tool in this field, such as automated data import via Web services, data extraction from XML by XPaths, and statistical analysis and visualization with R. The support of the latter is particularly relevant, as it allows integration of a number of recently developed R packages specifically for bibliometrics. Examples are used to illustrate the possibilities of Taverna in the fields of bibliometrics and scientometrics.
Having combined data on Quebec scientists' funding and journal publication, this paper tests the effect of holding a research chair on a scientist's performance. The novelty of this paper is to use a matching technique to understand whether holding a research chair contributes to a better scientific performance. This method compares two different sets of regressions which are conducted on different data sets: one with all observations and another with only the observations of the matched scientists. Two chair and non-chair scientists are deemed matched with each other when they have the closest propensity score in terms of gender, research field, and amount of funding. The results show that holding a research chair is a significant scientific productivity determinant in the complete data set. However, when only matched scientists are kept in data set, holding a Canada research chair has a significant positive effect on scientific performance but other types of chairs do not have a significant effect. In the other words, in the case of two similar scientists in terms of gender, research funding, and research field, only holding a Canada research chair significantly affects scientific performance.
This article examines the rise in co-authorship in the Social Sciences over a 34-year period. It investigates the development in co-authorship in different research fields and discusses how the methodological differences in these research fields together with changes in academia affect the tendency to co-author articles. The study is based on bibliographic data about 4.5 million peer review articles published in the period 1980-2013 and indexed in the 56 subject categories of the Web of Science's Social Science Citation Index. The results show a rise in the average number of authors, share of co-authored and international co-authored articles in the majority of the subject categories. However, the results also show that there are great disciplinary differences to the extent of the rises in co-authorship. The subject categories with a great share of international coauthored articles have generally experienced an increase in co-authorship, but increasing international collaboration is not the only factor influencing the rise in co-authorship. Hence, the most substantial rises have occurred in subject categories, where the research often is based on the use of experiments, large data set, statistical methods and/or team-production models.
Funding is one of the crucial drivers of scientific activities. The increasing number of researchers and the limited financial resources have caused a tight competition among scientists to secure research funding. On the other side, it is now even harder for funding allocation organizations to select the most proper researchers. Number of publications and citation counts based indicators are the most common methods in the literature for analyzing the performance of researchers. However, the mentioned indicators are highly correlated with the career age and reputation of the researchers, since they accumulate over time. This makes it almost impossible to evaluate the performance of a researcher based on quantity and impact of his/her articles at the time of the publication. This article proposes an intelligent machine learning framework for scientific evaluation of researchers (iSEER). iSEER may help decision makers to better allocate the available funding to the distinguished scientists through providing fair comparative results, regardless of the career age of the researchers. Our results show that iSEER performs well in predicting the performance of the researchers with high accuracy, as well as classifying them based on collaboration patterns, research performance, and efficiency.
This paper presents an analysis of scientific research output of the republics of former Yugoslavia for the period 1970-2014. Thomson Reuters' Web of Science database was used for data acquisition and 223,135 publications have been analyzed. The Yugoslav Wars were ethnic conflicts fought from 1991 to 1999 on the territory of former Yugoslavia which accompanied the breakup of the country and today each republic of former Yugoslavia is an independent country as well as the province of Kosovo. Results of the analysis are represented by four figures depicting relative publication share and for figures depicting normalized cooperation score for each former Yugoslav republics and the province of Kosovo, as well as by four figures depicting cooperation networks between the former Yugoslav republics and the province of Kosovo for the periods before the Yugoslav wars (from 1970 until 1990), during the wars (from 1991 until 1999), in the first decade after the wars (from 2000 until 2009), and in the last 5 years (from 2010 until 2014). The impact of the wars on scientific cooperation in the republics has been studied.
The literature considers a journal's h-index as a compared measure, however the relation between institutional level h-index (IHI) and journal related indices (JRI) has not been explored in previous studies at meso-level research assessment. This study applies the scientometric approach to meso-level data to examine the association, functional relationship and correlation analysis to evaluate the reliability of IHI with respect to JRI. For this purpose, data from the Web of Science, journal citation report and time cited features were used. The unit of analysis was Malaysian engineering research with a wider time span of 10 year's data (2001-2010) and a larger set of journals (1381). We explored the inter-correlation of IHI with a set of eight JRI and applied principle component analysis, regression analysis, and correlation. At the institutional level, the component analysis and functional relationship of the cumulative impact factor and 5 years IF yielded a more strong association with IHI. Cumulative impact factor is a strong predictor for IHI followed by cumulative 5 years impact factor. Correlation matrix results show that average impact factor (AIF) is correlated with immediacy index and Eigenfactor only. AIF and median impact factor (MIF) have no correlation with each other and IHI is correlated with all indices except AIF and MIF. This study puts forward a better understanding in considering new impact indices at meso level for performance evaluation purpose.
This study aimed to investigate technological evolution from the perspective of the US Patent Classification (USPC) reclassification. Similar to the revisions of the Dewey Decimal Classification, a commonly used library classification scheme, USPC reclassification takes the forms of creating, abolishing or modifying USPC class schedules. The results showed that there exist significant differences among five types of patents based on the USPC reclassification: Patents reclassified to Class 001 (classification undetermined), Patents with Technological Inter-field Mobilised Codes, Patents with Technological Intra-field Mobilised Codes, Patents with Abolished Codes, and Patents with Original Codes. Patents reclassified to Class 001, mostly related to the topic of "Data processing", performed better than other patents in novelty, linkage to science, technological complexity and innovative scope. Patents with Inter-field Mobilised Codes, related to the topics of "Data processing: measuring, calibrating, or testing" and "Optical communications", involved broader technology topics but had a low speed of innovation. Patents with Intra-field Mobilised Codes, mostly in the Computers & Communications and Drugs & Medical fields, tended to have little novelty and a small innovative scope. Patents with Abolished Codes and patents with Original Codes performed similarly-their values of patent indicators were low. It is suggested that future research extend the patent sample to subclasses or reclassified secondary USPCs in order to understand the technological evolution within a field in greater detail.
This study used the US Patent Application Database to identify who files provisional applications in the United States. Preference ratios, use ratios, and provisional application to nonprovisional application ratios were used to evaluate the filing behavior of applicants in filing provisional applications with respect to nonprovisional applications. Factors encouraging filing provisional applications include the possibility to obtain an earlier filing date, a longer patent term, and an earlier promoting opportunity. Factors discouraging filing provisional applications include the eventual higher cost in filing nonprovisional applications and the additional requirements for foreign applicants to file patent applications in the United States. These factors are discussed in this paper to explain the filing behavior of applicants in filing provisional applications with respect to non-provisional applications. Applicants from the United States, Israel, and Canada were more likely to file provisional applications than applicants from other countries. We propose that the English ability of the applicants and additional requirements for foreign applicants might be the cause of this result. Applicants in the category of Drugs and Medical were more likely to file provisional applications than applicants in other categories. We propose that the possibility for obtaining an earlier filing date and a longer patent term might be the cause of this result.
This paper aims to assess the diffusion and adoption of nanotechnology knowledge within the Turkish scientific community using social network analysis and bibliometrics. We retrieved a total of 10,062 records of nanotechnology papers authored by Turkish researchers between 2000 and 2011 from Web of Science and divided the data set into two 6-year periods. We analyzed the most prolific and collaborative authors and universities on individual, institutional and international levels based on their network properties (e.g., centrality) as well as the nanotechnology research topics studied most often by the Turkish researchers. We used co-word analysis and mapping to identify the major nanotechnology research fields in Turkey on the basis of the co-occurrence of words in the titles of papers. We found that nanotechnology research and development in Turkey is on the rise and its diffusion and adoption have increased tremendously thanks to the Turkish government's decision a decade ago identifying nanotechnology as a strategic field and providing constant support since then. Turkish researchers tend to collaborate within their own groups or universities and the overall connectedness of the network is thus low. Their publication and collaboration patterns conform to Lotka's law. They work mainly on nanotechnology applications in Materials Sciences, Chemistry and Physics, among others. This is commensurate, more or less, with the global trends in nanotechnology research and development.
In bibliometrics, interdisciplinarity is often measured in terms of the "diversity" of research areas in the references that an article cites. The standard indicators used are borrowed mostly from other research areas, notably from ecology (biodiversity measures) and economics (concentration measures). This paper argues that while the measures used in biodiversity research have evolved over time, the interdisciplinarity indicators used in bibliometrics can be mapped to a subset of biodiversity measures from the first and second generations. We discuss the third generation of biodiversity measures and especially the Leinster-Cobbold diversity indices (LCDiv) (Leinster and Cobbold in Ecology 93(3):477-489, 2012). We present a case study based on a previously published dataset of interdisciplinarity study in the field of bio-nano science (Rafols and Meyer in Scientometrics 82(2):263-287, 2010). We replicate the findings of this study to show that the various interdisciplinarity measures are in fact special cases of the LCDiv. The paper discusses some interesting properties of the LCDiv which make them more appealing in the study of disciplinary diversity than the standard interdisciplinary diversity indicators.
Author bibliographic coupling (ABC) is extended from the bibliographic coupling concept and holds the view that two authors with more common references are more related and have more similar research interests. This study aims to examine the association between author bibliographic coupling strength and citation exchange in eighteen subject areas. The results show that there is no significant difference in the associations found across the subject areas. The correlation is positive and significant between the two factors in all subject areas, although it is stronger in some subject areas, such as Biomedical Engineering, than in others. For a closer investigation of the association between bibliographic coupling strength and citations exchanged between pairs of authors and also of ABC networks, a sample of highly cited authors in one of the subfields of Information Science & Library Science, imetrics, was taken into account. The correlation is also highly significant among imetricians. This finding confirms Merton's norm of universalism versus constructivists' particularism. The investigation of thirty highly cited imetricians shows that Thelwall, M is in strong bibliographic coupling and citation relationships with the majority of the authors in the network. He and Bar-Ilan have the strongest ABC and citation relationships in the network. Rousseau, R, Glanzel, W., Bornmann, L, Bar-Ilan, J, and Leydesdorff, L are also in strong ABC relationships with each other as well as other authors in the network.
Advances concerning publication-level classification system have been demonstrated striking results by dealing properly with emergent, complex and interdisciplinary research areas, such as nanotechnology and nanocellulose. However, less attention has been paid to propose a delineating method to retrieve relevant research areas on specific subjects. This study aims at proposing a procedure to delineate research areas addressed in case nanocellulose. We investigate how a bibliometric analysis could provide interesting insights into research about this sustainable nanomaterial. The research topics clustered by a Publication-level Classification System were used. The procedure involves an iterative process, which includes developing and cleaning a set of core publication regarding the subject and an analysis of clusters they are associated with. Nanocellulose was selected as the subject of study, but the methodology may be applied to any other research area or topic. A discussion about each step of the procedure is provided. The proposed delineation procedure enables us to retrieve relevant publications from research areas involving nanocellulose. Seventeen research topics were mapped and associated with current research challenges on nanocellulose.
In this paper, we study the influence of path dependencies on the development of an emerging technology in a transitional economy. Our focus is the development of nanotechnology in Russia in the period between 1990 and 2012. By examining outputs, publication paths and collaboration patterns, we identify a series of factors that help to explain Russia's limited success in leveraging its ambitious national nanotechnology initiative. The analysis highlights four path-dependent tendencies of Russian nanotechnology research: publication pathways and the gatekeeping role of the Russian Academy of Sciences; increasing geographical and institutional centralisation of nanotechnology research; limited institutional diffusion; and patterns associated with the internationalisation of Russian research. We discuss policy implications related to path dependence, nanotechnology research in Russia and to the broader reform of the Russian science system.
The paper describes a method to combine the information on the number of citations and the relevance of the publishing journal (as measured by the Impact Factor or similar impact indicators) of a publication to rank it with respect to the world scientific production in the specific subfield. The linear or non-linear combination of the two indicators is represented on the scatter plot of the papers in the specific subfield in order to immediately visualize the effect of a change in weights. The final rank of the papers is therefore obtained by partitioning the two-dimensional space through linear or higher order curves. The procedure is intuitive and versatile since it allows, after adjusting few parameters, an automatic and calibrated assessment at the level of the subfield. The derived evaluation is homogeneous among different scientific domains and can be used to address the quality of research at the departmental (or higher) levels of aggregation. We apply this method, that is designed to be feasible on a scale typical of a national evaluation exercise and to be effective in terms of cost and time, to some instances of the Thomson Reuters Web of Science database and discuss the results in view of what was done recently in Italy for the Evaluation of Research Quality exercise 2004-2010. We show how the main limitations of the bibliometric methodology used in that context can be easily overcome.
For academic book authors and the institutions assessing their research performance, the relevance of books is undisputed. In spite of this, the absence of comprehensive international databases covering the items and information needed for the assessment of this type of publication has urged several European countries to develop custom-built information systems for the registration of scholarly books, as well as weighting and funding allocation procedures. For the first time, these systems make the assessment of books as a research output feasible. The present paper summarizes the main features of the registration and/or assessment systems developed in five European countries/regions (Spain, Denmark, Flanders, Finland and Norway), focusing on the processes involved in the collection and processing of data on book publications, their weighting, as well as the application in the context of research assessment and funding.
Despite many empirical studies having been carried out on examiner patent citations, few have scrutinized the obstacles to prior art searching when adding patent citations during patent prosecution at patent offices. This analysis takes advantage of the longitudinal gap between an International Search Report (ISR) as required by the Patent Cooperation Treaty (PCT) and subsequent national examination procedures. We investigate whether several kinds of distance actually affect the probability that prior art is detected at the time of an ISR; this occurs much earlier than in national phase examinations. Based on triadic PCT applications between 2002 and 2005 for the trilateral patent offices (the European Patent Office, the US Patent and Trademark Office, and the Japan Patent Office) and their family-level citations made by the trilateral offices, we find evidence that geographical distance negatively affects the probability of capture of prior patents in an ISR. In addition, the technological complexity of an application negatively affects the probability of capture, whereas the volume of forward citations of prior art affects it positively. These results demonstrate the presence of obstacles to searching at patent offices, and suggest ways to design work sharing by patent offices, such that the duplication of search costs arises only when patent office search horizons overlap.
In this study, we explore the citedness of research data, its distribution over time and its relation to the availability of a digital object identifier (DOI) in the Thomson Reuters database Data Citation Index (DCI). We investigate if cited research data "impacts'' the (social) web, reflected by altmetrics scores, and if there is any relationship between the number of citations and the sum of altmetrics scores from various social media platforms. Three tools are used to collect altmetrics scores, namely PlumX, ImpactStory, and Altmetric.com, and the corresponding results are compared. We found that out of the three altmetrics tools, PlumX has the best coverage. Our experiments revealed that research data remain mostly uncited (about 85 %), although there has been an increase in citing data sets published since 2008. The percentage of the number of cited research data with a DOI in DCI has decreased in the last years. Only nine repositories are responsible for research data with DOIs and two or more citations. The number of cited research data with altmetrics "foot-prints'' is even lower (4-9 %) but shows a higher coverage of research data from the last decade. In our study, we also found no correlation between the number of citations and the total number of altmetrics scores. Yet, certain data types (i.e. survey, aggregate data, and sequence data) are more often cited and also receive higher altmetrics scores. Additionally, we performed citation and altmetric analyses of all research data published between 2011 and 2013 in four different disciplines covered by the DCI. In general, these results correspond very well with the ones obtained for research data cited at least twice and also show low numbers in citations and in altmetrics. Finally, we observed that there are disciplinary differences in the availability and extent of altmetrics scores.
Diachronous studies of obsolescence categorized articles into three general types: "flashes in the pan", "sleeping beauties" and "normal articles". These studies used either quartiles or averages to define thresholds on sleeping and awakening periods. However, such average- and quartile-based criteria, sometimes, are less effective in distinguishing "flashes in the pan" and "sleeping beauties" from normal articles due to the arbitrariness of the manner in which thresholds are determined. In this investigation, we propose a vector for measuring obsolescence of scientific articles as an alternative to these criteria. The obsolescence vector is designed as O = (G(s), A(-)), with G(s) as a parameter affecting the shape of citation curves and A(-) as a parameter detecting drastic fluctuation of citation curves. We collected 50,789 articles authored by Nobel laureates during 1900-2012. Applying our criteria to this dataset, we compared the obsolescence vector with average- and quartile-based criteria. Our findings show that the proposed obsolescence vector is different from and serves as an alternative to the average- and quartile-based criteria.
The research area of scientometrics began during the second half of the nineteenth century. After decades of growth, the international field of scientometrics has become increasingly mature. This study intends to understand the evolution of the collaboration network in the field of scientometrics. The growth of the discipline is divided into three stages: the first period (1987-1996), the second period (1997-2006), and the third period (2007-2015). Macro-level, meso-level, and micro-level network measures across the time periods are compared. Macro-level analyses show that the degree distribution of the collaboration in each time span are consistent with power-law, and that both the average degree and average distance steadily increase with time. From the meso-level perspective, the increase of the number of clusters in the collaboration networks suggests the emergence of more collaborative fields in scientometrics. Moreover, the growth of the size of primary clusters demonstrates the expansion of the research fields and the collaboration range. Micro-level structure analyses identify the authors/researchers with high performance in raw degree measure, degree centrality measure, and betweenness measure, all of which are dynamic across different time spans. From three dimensions (raw degree, degree centrality, and betweenness centrality), the collaboration dominators are identified for each time span.
Charles Dotter has been described as the father of interventional radiology, a medical specialty born at the interface between radiology and cardiology. Before 1979, it was relatively difficult to find citations to a landmark paper that Dotter had first published in 1964-qualifying this study, from a scientometric perspective, as a sleeping beauty. Sleeping beauties are texts that suffer due to delayed recognition. The present paper explores the Dotter case study's bibliometric characteristics while analyzing the Van Raan criteria's usefulness for defining sleeping beauties in science. Citation network analysis using CitNetExplorer has proven helpful in identifying the "Prince'' in this fairy tale. The duration of sleep is viewed here as a period of restlessness marked by science and social controversies that are often documented in publication databases using a wide range of bibliographic references. Hence the idea of introducing alongside this sleeping beauty construct the idea of "restless sleep''. These observations should open new avenues in identifying sleeping beauties while nurturing scientific controversy studies revolving around the use of scientometric approaches.
In this paper we investigate the problem of university classification and its relation to ranking practices in the policy context of an official evaluation of Romanian higher education institutions and their study programs. We first discuss the importance of research in the government-endorsed assessment process and analyze the evaluation methodology and the results it produced. Based on official documents and data we show that the Romanian classification of universities was implicitly hierarchical in its conception and therefore also produced hierarchical results due to its close association with the ranking of study programs and its heavy reliance on research outputs. Then, using a distinct dataset on the research performance of 1385 faculty members working in the fields of political science, sociology and marketing we further explore the differences between university categories. We find that our alternative assessment of research productivity-measured with the aid of Hirsch's (Proc Natl Acad Sci 102(46): 16569-16572, 2005) h-index and with Egghe's (Scientometrics 69(1): 131-152, 2006) g-index-only provides empirical support for a dichotomous classification of Romanian institutions.
Research funding organizations invest substantial resources to monitor mission-relevant research findings to identify and support promising new lines of inquiry. To that end, we have been pursuing the development of tools to identify research publications that have a strong likelihood of driving new avenues of research. This paper describes our work towards incorporating multiple time-dependent and -independent features of publications into a model to identify candidate breakthrough papers as early as possible following publication. We used multiple random forest models to assess the ability of indicators to reliably distinguish a gold standard set of breakthrough publications as identified by subject matter experts from among a comparison group of similar Thomson Reuters Web of Science (TM) publications. These indicators were then tested for their predictive value in random forest models. Model parameter optimization and variable selection were used to construct a final model based on indicators that can be measured within 6 months post-publication; the final model had an estimated true positive rate of 0.77 and false positive rate of 0.01.
While most technological positioning studies were traditionally addressed by comparing firms technological patents classes and portfolios, only a few of them adopted science mapping patent co-citation techniques and none of these seeks to understand the impact of collective cognition on the technology structure of an entire industry. What is the firms technological positioning landscape within an high collective cognition sector? What is the groups technological positioning evolution? How do technology structures shift according to different economic scenarios? Through a strategic lens we contribute to technology strategy literatures by proposing an invention behavior map of automotive actors at a firm, groups and industry level. From Derwent Innovation Index, about 581,000 patents, 1,309,356 citations and 1,287,594 co-citations relationships between (a) the main 49 firms assignees of 1991-2013 and (b) the main 28 or 34 groups assignees by considering three timespan 1991-1997, 1998-2004, 2005-2013, were collected. Results: (1) most of the companies are located close together, depicting the sector technology structure as highly dense; (2) the market leaders do not coincide with technology production leaders and not necessarily occupy central technological positions; (3) the automotive groups considerably varies in the three timespan in terms of position and composition; (4) the market leaders groups occupy technological remoteness positions during economic growth timespan; (5) the sector technology structure is highly dense during growth, strongly scattered and lacking of technologically center positioned actors after economic decline. Finally, strategic implications supporting central locating or suburb R&D positioning planning and M&As recombinational partners decision making are discussed.
One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine's index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing's indices over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much, if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.
One of the most important requirements of building applicable models and meaningful indicators for the use of scientometrics at the micro and meso level is the correct identification and disambiguation of authors and institutes. Platforms like ResearcherID or ORCID with author registration providing high reliability but lower coverage now provide appropriate data sets for the development and testing of stochastic models describing the publication activity and citation impact of individual authors. This paper proposes a triangular model incorporating papers, citations and authors analogously to the dichotomous model used at higher levels of aggregation like countries or fields. This model is applied to a set of authors in any field of science identified by their ResearcherID. However, the main advantage of classical citation indicators to study citation impact under conditional productivity turned out to be the main problem in this triangle: the possible heterogeneity of the collaborating authors results in low robustness. A mere technical solution to this problem would be fractional counting at three levels but the conceptual issue, the different roles of co-authors causing this heterogeneity will never be solved by any algorithm.
This study is a bibliometric analysis of the highly complex research discipline Geography. In order to identify the most popular and most cited publication channels, to reveal publication strategies, and to analyse the discipline's coverage within publications, the three main data sources for citation analyses, namely Web of Science, Scopus and Google Scholar, have been utilized. This study is based on publication data collected for four individual evaluation exercises performed at the University of Vienna and related to four different subfields: Geoecology, Social and Economic Geography, Demography and Population Geography, and Economic Geography. The results show very heterogeneous and individual publication strategies, even in the same research fields. Monographs, journal articles and book chapters are the most cited document types. Differences between research fields more related to the natural sciences than to the social sciences are clearly visible, but less considerable when taking into account the higher number of co-authors. General publication strategies seem to be established for both natural science and social sciences, however, with significant differences. While in natural science mainly publications in international peer-reviewed scientific journals matter, the focus in social sciences is rather on book chapters, reports and monographs. Although an "iceberg citation model'' is suggested, citation analyses for monographs, book chapters and reports should be conducted separately and should include complementary data sources, such as Google Scholar, in order to enhance the coverage and to improve the quality of the visibility and impact analyses. This is particularly important for social sciences related research within Geography.
Geographical proximity, population, language and other cultural factors are given as reasons for international collaboration on scientific studies. These reasons are also used to analyze international trade in goods. The scientists exchange ideas in joint papers so international collaboration on scientific studies can be labeled as international trade in ideas. In this paper, we take on this perspective and establish a link between international trade in goods and ideas. First, we define export and import of ideas by using the co-authorship patterns. Second, we compute metrics such as net exports for the international trade in ideas. Third, we compare and contrast goods and ideas by using these metrics. Lastly, we use the international trade models such as the Hecksher-Ohlin model to analyze the factors that affect international trade in ideas. We find that the correlation is weak and even negative between international trade metrics for goods and ideas in general, but the correlation between some specific goods and ideas is positive and stronger. By using the Hecksher-Ohlin model, we explore reasons for comparative advantage in exporting ideas.
The aim of this paper is to measure the relevance of the institutions in the academic community involved in creating and disseminating knowledge in the field of Management through their position in the collaboration network. This relevance is defined by an original and more comprehensive approach to the analysis of each institution's importance through degree centrality, as it includes scientific output, while at the same time taking into account the level of collaboration between institutions, as well as the impact of the publications in which each institution is involved. This approach enables us to draw up a ranking of the 103 leading institutions, as well as overcome some of the limitations of prior studies by considering the role each institution plays in the academic community, not only through its scientific output or citations but also through the relationships it forges with other institutions. Our findings confirm the existence of elite groups worldwide that collaborate with other minor institutions, whereas major institutions collaborate less with each other.
Professors and associate professors ("professors") in full-time positions are key personnel in the scientific activity of university departments, both in conducting their own research and in their roles as project leaders and mentors to younger researchers. Typically, this group of personnel also contributes significantly to the publication output of the departments, although there are also major contributions by other staff (e.g. PhD-students, postdocs, guest researchers, students and retired personnel). The scientific productivity is however, very skewed at the level of individuals, also for professors, where a small fraction of the professors, typically account for a large share of the publications. In this study, we investigate how the productivity profile of a department (i.e. the level of symmetrical/asymmetrical productivity among professors) influences on the citation impact of their departments. The main focus is on contributions made by the most productive professors. The findings imply that the impact of the most productive professors differs by scientific field and the degree of productivity skewness of their departments. Nevertheless, the overall impact of the most productive professors on their departments' citation impact is modest.
Both end users and authors commonly evaluate scientific journals based on several popular journal metrics. Such metrics, in particular the "impact factor," carry crucial weight in terms of which journals authors choose for submitting scientific works as well as to what titles an institutional library subscribes. While previous research has focused on the value of journals in terms of "price per page," no study has investigated the relationship between common journal metrics and the price a journal advertises for an annual subscription. In the present study, we took a linear modeling approach using Akaike information criterion to determine which journal metric (impact factor, Eigenfactor score, article influence score, total cites, or proportion reviews) was the "best" predictor of the advertised annual subscription price for scientific journals. Examining three differing scientific fields (aquatic science, sociology, and immunology) and accounting for for-profit versus not-for-profit status, we found results to be field-dependent. Total cites was the best predicting metric for the annual advertised subscription price for aquatic science and immunology, while the Eigenfactor score was the best predictor for sociology. We hypothesize the relationship with price changes with differing magnitudes of citation flows in a field. Clear from our study was that no single measure of journal quality is universally applicable to determine subscription "value.".
There is growing interest in assessing the societal impacts of research such as informing health policies and clinical practice, and contributing to improved health. Bibliometric approaches have long been used to assess knowledge outputs, but can they also help evaluate societal impacts? We aimed to see how far the societal impacts could be traced by identifying key research articles in the psychiatry/neuroscience area and exploring their societal impact through analysing several generations of citing papers. Informed by a literature review of citation categorisation, we developed a prototype template to qualitatively assess a reference's importance to the citing paper and tested it on 96 papers. We refined the template for a pilot study to assess the importance of citations, including self-cites, to four key research articles. We then similarly assessed citations to those citing papers for which the key article was Central i.e. it was very important to the message of the citing article. We applied a filter of three or more citation occasions in order to focus on the citing articles where the reference was most likely to be Central. We found the reference was Central for 4.4 % of citing research articles overall and ten times more frequently if the article contained three or more citation occasions. We created a citation stream of influence for each key paper across up to five generations of citations. We searched the Web of Science for citations to all Central papers and identified societal impacts, including international clinical guidelines citing papers across the generations.
This article examines the extent to which existing network centrality measures can be used (1) as filters to identify a set of papers to start reading within a journal and (2) as article-level metrics to identify the relative importance of a paper within a journal. We represent a dataset of published papers in the Public Library of Science (PLOS) via a co-citation network and compute three established centrality metrics for each paper in the network: closeness, betweenness, and eigenvector. Our results show that the network of papers in a journal is scale-free and that eigenvector centrality (1) is an effective filter and article-level metric and (2) correlates well with citation counts within a given journal. However, closeness centrality is a poor filter because articles fit within a small range of citations. We also show that betweenness centrality is a poor filter for journals with a narrow focus and a good filter for multidisciplinary journals where communities of papers can be identified.
The present research work shows the results of an analysis about the existing literature on one of the 'topics' which is currently raising greater interest among scholars and researchers in the fields of strategic management and organization science, namely: organizational ambidexterity. More precisely, and seeking to identify and visualize the intellectual structure or knowledge base of the research developed in relation to this construct, a decision was made to analyze a total of 283 research papers which appeared after the publication in the journal California Management Review in the summer of 1996 of the seminal work by Tushman and O'Reilly III entitled 'Ambidextrous Organizations: Managing Evolutionary and Revolutionary Change,' where these authors suggested that organizations need to explore and exploit simultaneously if they want to be ambidextrous. As for the methodology applied, it was based on the utilization of bibliometric techniques-particularly citation analyses and author co-citation analyses and social networks analysis.
Under the Excellence Initiative, a number of Clusters of Excellence in Germany have been supported since 2006 and 2007-including each a limited number of cooperating institutions. The aim of the present study is to investigate whether support for Clusters of Excellence since 2006 and 2007 is reflected in bibliometric network data. For this purpose, a comparison is made between network data in the period before support started (2003-2005) and in the period after support started (2009-2011). For these two periods, a co-authorship network is generated (based on the funded institutions). This is based on publications which are among the 1 % most frequently cited publications in their respective fields and publication year and which have at least one author from Germany. As the results show, the outcomes this yields for life sciences and natural sciences differ from each other. Whereas natural sciences display an effect of establishment of Clusters of Excellence on the bibliometric networks, this was not true of life sciences. After establishment of the Clusters of Excellence, the network in natural sciences not only contained more institutions of a Cluster of Excellence, but these institutions were distributed across fewer bibliometric clusters in the network than before establishment. In other words the structure of the Clusters of Excellence was better reflected in the network.
This paper investigates the pattern of teaching and research performances, the relationship between them, and the convergence for Italian public HEIs in the period 2000-2010, by comparing different bootstrap robust non-parametric frontier estimators. Overall we find an efficiency improvement, mainly driven by research, whereas teaching efficiency increases only in the very first years of the sample period. We also ascertain a slightly positive relationship between research and teaching performances. Furthermore, we find that Italian HEIs converge, in the observed period, although research and teaching do it at a different pace. Our empirical findings are robust to alternative estimators and bootstrapped bias correction.
We investigated the effect of international collaboration (in the form of international co-authorship) on the impact of publications of young universities (<50 years old), and compared to that of renowned old universities (>100 years old). The following impact indicators are used in this study, they are: (1) the 5-year citations per paper (CPP) data, (2) the international co-authorship rate, (3) the CPP differential between publications with and without international co-authorships, and (4) the difference between the percentage of international co-authored publications falling in the global top 10 % highly cited publications and the percentage of overall publications falling in the global top 10 % highly cited publications (Delta%(Top10%)). The increment of 5-year (2010-2014) field weighted citation impact (FWCI) of internationally co-authored papers over the 5-year overall FWCI of the institutions in SciVal (R) is used as another indicator to eliminate the effect of discipline difference in citation rate. The results show that, for most top institutions, the difference between the citations per paper (CPP) for their publications with and without international co-authorship is positive, with increase of up to 5.0 citations per paper over the period 1996-2003. Yet, for some Asian institutions, by attracting a lot of researchers with international background and making these collaborating "external" authors as internal researchers, these institutions have created a special kind of international collaboration that are not expressed in co-authorship, and the CPP gaps between publications with and without international co-authorship are relatively small (around 0-1 citations per paper increment) for these institutions. The top old institutions have higher CPP than young institutions, and higher annual research expenditures; while young universities have a higher relative CPP increment for the current 5-year period over the previous 5-year period. The Delta%(Top10%) for international co-authored publications is generally higher than that for all journal publications of the same institution. With the increase of international co-authorship ratio, the mean geographical collaboration distance (MGCD, an indication of increased international co-authorship) of one institution based on the Leiden Ranking data also increases, and young institutions have relatively higher CPP increment over MGCD increment. International co-authorship has a positive contribution to the FWCI of the institution, yet there are untapped potential to enhance the collaboration among young institutions.
As the framework of scientific research, subject-classification plays an important role in the development of science. In order to combine the development of science with the current expert subject-classification system and further give a more appropriate description of scientific output analysis from subject level, We study the relationship between the natural science related sub-categories of Chinese library classification using objective computerized scientometrics, and give some modification to the first two level subjects of the existing Chinese library classification system. Taking Chinese Science Citation Database as our data source, this article studies the similarity of subjects based on journal coupling strength. Then we try to set up an improved subject-classification system whose top categories are relied on Chinese library classification system and sub-categories are the ensemble clustering result based on journal coupling measure. Further, in order to help identifying and interpreting the rationality of this improved classification system, we make use of some text mining methods, such as key words recognition and topic detection, to explain the cause of similarity between some subjects from the perspective of semantic. Our study shows that the improved subject-classification system constructed in this article not only conforms to previous experience and cognitive but also combines subject development knowledge.
It is now generally accepted that institutions of higher education and research, largely publicly funded, need to be subjected to some benchmarking process or performance evaluation. Currently there are several international ranking exercises that rank institutions at the global level, using a variety of performance criteria such as research publication data, citations, awards and reputation surveys etc. In these ranking exercises, the data are combined in specified ways to create an index which is then used to rank the institutions. These lists are generally limited to the top 500-1000 institutions in the world. Further, some criteria (e.g., the Nobel Prize), used in some of the ranking exercises, are not relevant for the large number of institutions that are in the medium range. In this paper we propose a multidimensional 'Quality-Quantity' Composite Index for a group of institutions using bibliometric data, that can be used for ranking and for decision making or policy purposes at the national or regional level. The index is applied here to rank Central Universities in India. The ranks obtained compare well with those obtained with the h-index and partially with the size-dependent Leiden ranking and University Ranking by Academic Performance. A generalized model for the index using other variables and variable weights is proposed.
Many contemporary social and public health problems do not fit neatly into the research fields typically found in universities. With this in mind, researchers and funding agencies have devoted increasing attention to projects that span multiple disciplines. However, comparatively little attention has been paid to how these projects evolve over time. This relative neglect is in part attributable to a lack of theory on the dynamic nature of such projects. In this paper, we describe how research programs can move through various states of integration including disciplinarity, multidisciplinarity, interdisciplinarity and transdisciplinarity. We link this insight to computational techniques-topic models-to explore one of the most vibrant and pressing contemporary research areas-research on HIV/AIDS. Topic models of over 9000 abstracts from two prominent journals illustrate how research on HIV/AIDS has evolved from a high to a lower level of integration. The topic models motivate a more detailed historical analysis of HIV/AIDS research and, together, they highlight the dynamic nature of knowledge production. We conclude by discussing the role of computational social science in dynamic models of interdisciplinarity.
This study provides an overview of the knowledge management literature from 1980 through 2014. We employ bibliometric and text mining analyses on a sample of 500 most cited articles to examine the impact of factors such as number of authors, references, pages, and keywords on the number of citations that they received. We also investigate major trends in knowledge management literature including the contribution of different countries, variations across publication years, and identifying active research areas and major journal outlets. Our study serves as a resource for future studies by shedding light on how trends in knowledge management research have evolved over time and demonstrating the characteristics of the most cited articles in this literature. Specifically, our results reveal that the most cited articles are from United States and United Kingdom. The most prolific year in terms of the number of published articles is 2009 and in terms of the number of citations is 2012. We also found a positive relationship between the number of publications' keywords, references, and pages and the number of citations that they have received. Finally, the Journal of Knowledge Management has the largest share in publishing the most cited articles in this field.
In this work we address the challenge of how to identify those documents from a given set of texts that are most likely to have substantial impact in the future. To this end we develop a purely content-based methodology in order to rank a given set of documents, for example abstracts of scientific publications, according to their potential to generate impact as measured by the numbers of citations that the articles will receive in the future. We construct a bipartite network consisting of documents that are linked to keywords and terms that they contain. We study recursive centrality measures for such networks that quantify how many different terms a document contains and how these terms are related to each other. From this we derive a novel indicator-document centrality-that is shown to be highly predictive of citation impact in six different case studies. We compare these results to findings from a multivariable regression model and from conventional network-based centrality measures to show that document centrality indeed offers a comparably high performance in identifying those articles that contain a large number of high-impact keywords. Our findings suggest that articles which conform to the mainstream within a given research field tend to receive higher numbers of citations than highly original and innovative articles.
Bibliometric analyses depend on the quality of data sets and the author name disambiguation process (ANDP), which attributes author names on papers to real persons. Errors in a data set or the ANDP result in wrongly attributed papers to the wrong person. These errors can potentially distort the results of analyses based on such data sets. However, the general impact of data set quality on bibliometric analysis is mostly unknown; as such, an assessment is costly due to the manual steps involved. This paper presents an overview of the data set qualities produced by different ANDPs and uses simulations to study the general impact of data set quality on different bibliometric analysis (author rankings and regressions analysis with number of papers as dependent variable). The results show that rankings of authors are only valid on high quality data sets, which are typically not found directly in commercially available datasets. Both mean and individual per person data set quality is important for valid ranking results. Regressions are not as influenced by the overall data set quality but instead by individual quality differences between authors. Different types of errors can potentially bias the regression results. The outcome of this study also shows the importance of reporting both overall and individual variation in data set quality, so that the validity of analyses based on these data sets can be assessed.
Understanding the competitive environment of one's company is crucial for every manager. One tool to quantify the technological relationships between companies, evaluate industry landscapes and knowledge transfer potential in collaborations is the technological distance. There are different methods and many different factors that impact the results and thus the conclusions that are drawn from distance calculation. Therefore, the present study derives guidelines for calculating and evaluating technological distances for three common methods, i.e. the Euclidean distance, the cosine angle and the min-complement distance. For this purpose, we identify factors that influence the results of technological distance calculation using simulation. Subsequently, we analyze technological distances of cross-industry collaborations in the field of electric mobility. Our findings show that a high level of detail is necessary to achieve insightful results. If the topic in scope of the analysis does not represent the core business of the companies, we recommend filters to focus on the respective topic. Another key suggestion is to compare the calculated results to a peer group in order to evaluate if a distance can be evaluated as 'near' or 'far'.
In this work, we propose the graphical representation for the empirical data of the impact factor rank-ordered distribution. The characteristics of the distribution can be directly visualized. Within the subject category of journal citation reports, the impact factor rank-ordered distribution systematically presents a clear evidence of the two-exponent behavior and the S-shaped decrease. The sharp convex decrease is related to the first exponent, which dictates the distribution of lower ranks. The mild concave decrease is related to the second exponent, which dictates the distribution of higher ranks. The relevance of Matthew effect is discussed.
Okulicz-Kozaryn (Scientometrics 96:679-681, 2013) examined the readability issue in terms of the proportions of adjectives and adverbs in research articles. The results showed that natural scientists used the lowest proportion of adjectives and adverbs, while social scientists employed more adjectives and adverbs than natural scientists. Based on the findings, he argued for killing much of the adjectives and adverbs in academic writing for brevity and conciseness. However, adjectives and adverbs serve different functions in academic writing. Thus, the present study investigated the use of adjectives and adverbs separately with a much larger set of academic writing of various genres and a subsample of only research articles. The results indicated that the proportions of adjectives in natural science and applied science are higher than those in arts and humanities and social science, while the proportions of adverbs in natural science and applied science are lower than those in arts and humanities and social science. The results seemingly complemented Okulicz-Kozaryn's (2013) findings. It is accordingly suggested that researchers in arts and humanities and social science should use less adverbs in academic writing. Issues concerning readability and impact of articles are also discussed.
This study draws on biotechnology publications by Chinese scientists indexed in the Science Citation Index Expanded from 1991 to 2014, using bibliometric and statistical methods to investigate the characteristics of research collaboration. A series of rules and statistical methods are applied jointly for data cleaning to ensure data accuracy. The major findings are following: Firstly, Chinese Academy of Sciences, Tsinghua University, Zhejiang University, Fudan University and China Agricultural University are the top 5 active research institutions in the field of biotechnology. Secondly, the collaboration pattern indicates that Chinese academic institutions are more focused into building national collaboration and tapping national expertise. Thirdly, both national and international collaboration degrees in this field are improving constantly. Tsinghua University keeps a stable growth trend in terms of international collaboration. Fourthly, Chinese academic institutions have extensive collaboration in various fields in biotechnology. National collaboration focuses more on the application fields of biotechnology. Bioinformatics-related subjects attach greater importance to international collaboration. Finally, we discuss the reasons of the above collaboration characteristics and the implications of this study for China's research management.
Studying research productivity is a challenging task that is important for understanding how science evolves and crucial for agencies (and governments). In this context, we propose an approach for quantifying the scientific performance of a community (group of researchers) based on the similarity between its publication profile and a reference community's publication profile. Unlike most approaches that consider citation analysis, which requires access to the content of a publication, we only need the researchers' publication records. We investigate the similarity between communities and adopt a new metric named Volume Intensity. Our goal is to use Volume Intensity for measuring the internationality degree of a community. Our experimental results , using Computer Science graduate programs and including both real and random scenarios, show we can use publication profile as a performance indicator.
One possible way of measuring the broad impact of research (societal impact) quantitatively is the use of alternative metrics (altmetrics). An important source of altmetrics is Twitter, which is a popular microblogging service. In bibliometrics, it is standard to normalize citations for cross-field comparisons. This study deals with the normalization of Twitter counts (TC). The problem with Twitter data is that many papers receive zero tweets or only one tweet. In order to restrict the impact analysis on only those journals producing a considerable Twitter impact, we defined the Twitter Index (TI) containing journals with at least 80 % of the papers with at least 1 tweet each. For all papers in each TI journal, we calculated normalized Twitter percentiles (TP) which range from 0 (no impact) to 100 (highest impact). Thus, the highest impact accounts for the paper with the most tweets compared to the other papers in the journal. TP are proposed to be used for cross-field comparisons. We studied the field-independency of TP in comparison with TC. The results point out that the TP can validly be used particularly in biomedical and health sciences, life and earth sciences, mathematics and computer science, as well as physical sciences and engineering. In a first application of TP, we calculated percentiles for countries. The results show that Denmark, Finland, and Norway are the countries with the most tweeted papers (measured by TP).
The aim of this paper is to extend our knowledge about the power-law relationship between citation-based performance and collaboration patterns for papers by analyzing its behavior at the level of a national science system. We analyzed 3012 Cuban articles on Natural Sciences that received 17,295 citations. The number of articles published through collaboration accounted for 94 %. The collaborative articles accounted for 96 % of overall citations. The citation-based performance and international collaboration patterns exhibit a power-law correlation with a scaling exponent of 1.22 +/- 0.08. Citations to a field's research internationally collaborative articles in Natural Sciences tended to increase 2.(1.22) or 2.33 times each time it doubles the number of internationally collaborative papers. The Matthew Effect is stronger for internationally collaborative papers than for domestic collaborative articles.
By adopting a citation-based recursive ranking method for patents the evolution of new fields of technology can be traced. Specifically, it is demonstrated that the laser/inkjet printer technology emerged from the recombination of two existing technologies: sequential printing and static image production. The dynamics of the citations coming from the different "precursor" classes illuminates the mechanism of the emergence of new fields and give the possibility to make predictions about future technological development. For the patent network the optimal value of the PageRank damping factor is close to 0.5; the application of d = 0.85 leads to unacceptable ranking results.
A study released by the Google Scholar team found an apparently increasing fraction of citations to old articles from studies published in the last 24 years (1990-2013). To demonstrate this finding we conducted a complementary study using a different data source (Journal Citation Reports), metric (aggregate cited half-life), time spam (2003-2013), and set of categories (53 Social Science subject categories and 167 Science subject categories). Although the results obtained confirm and reinforce the previous findings, the possible causes of this phenomenon keep unclear. We finally hypothesize that "first page results syndrome" in conjunction with the fact that Google Scholar favours the most cited documents are suggesting the growing trend of citing old documents is partly caused by Google Scholar.
Journal Citation Reports (JCR) is the main source of bibliometric indicators known by the scientific community. This paper presents the results of a study of the distributions of the first and second significant digits according to Benford's law (BL) of the number of articles, citations, impact factors, half-life and immediacy index bibliometric indicators in journals indexed in the JCR Sciences and Social Sciences Editions from 2007 to 2014. We also performed the data analysis to country's origin and by journal's category, and we verified that the second digit has a better adherence to BL. The use of the second digit is important since it provides a more sound, complete and consistent analysis of the bibliometric indicators.
The number of authors and their contributions in original research publications have drawn the attention of editors for years. Some journals specifically ask that the authors list their contributions on the title page. However, whether this requirement is useful remains unclear. The present study was to elucidate the author number trend over the past 10 years in four major gastroenterology journals. Four major journals in gastroenterology (Gastroenterology, Gut, Journal of Hepatology and Hepatology) between January 1, 2005 and December 31, 2014 were searched. We limited the papers to basic research. The average number of authors of papers published in each journal were calculated and any changes in author numbers over the search period were compared. We found that Gastroenterology and Gut began to require the authors to list contributions in 2010 and Journal of Hepatology and Hepatology did the same in 2012. We therefore also investigated whether authorship numbers changed in studies published in Gastroenterology and Gut before and after 2010, and before and after 2012 in Journal of Hepatology and Hepatology. There has been an increase in author numbers over the past 10 years. This number was significantly increased in the papers published between 2010 and 2014 compared with those between 2005 and 2009 (p < 0.05 in all of the four journals). Author contribution requirements have not deterred the trend of increasing author numbers. The number of authors in basic science papers are increasing; the new requirement of listing authorship contributions has not affected this trend.
Reference Publication Year Spectroscopy (RPYS) and Multi-RPYS provide algorithmic approaches to reconstructing the intellectual histories of scientific fields. With this brief communication, we describe a technical advancement for developing research historiographies by introducing RPYS i/o, an online tool for performing standard RPYS and Multi-RPYS analyses interactively (at http://comins.leydesdorff.net/). The tool enables users to explore seminal works underlying a research field and to plot the influence of these seminal works over time. This suite of visualizations offers the potential to analyze and visualize the myriad of temporal dynamics of scientific influence, such as citation classics, sleeping beauties and the dynamics of research fronts. We demonstrate the features of the tool by analyzing-as an example-the references in documents published in the journal Philosophy of Science.
The Flesch Reading Ease measure is widely used to measure the difficulty of text in various disciplines, including Scientometrics. This letter/paper argues that the measure is now outdated, used inappropriately, and unreliable.
I argue that Polonioli (Scientometrics, 2016, this issue) provides no new evidence to show that Open Access is beneficial for the social sciences and the humanities. He raises three criticisms against my recent paper (Wray in Scientometrics 106(3): 1031-1035, 2016). Two of these criticisms fail to take account of the data I was working with. On the basis of those data, I could not draw conclusions about (1) the societal impact of research or (2) other publication models besides the traditional model and the Author-Pay Open Access model. He also claims that I do not take account of the costs associated with the status quo. I argue that he fails to take account of the value added by publishers of academic journals. Further, contrary to what he suggests, there is no evidence that Open Access publishing is making it easier for scholars in developing countries to contribute to scholarship.
This study reports a positive correlation between impact factor and article number in scholarly journals. High impact journals publish more articles. Quality and quantity are positively correlated. This is a common trend in different disciplines as revealed by empirical data. The correlation is obscure in a direct plot of article number versus impact factor. A plot of accumulated article number versus normalized rank of impact factor clearly demonstrates the correlation. The reasons behind this correlation are discussed. (C) 2016 Elsevier Ltd. All rights reserved.
Identifying the statistical distribution that best fits citation data is important to allow robust and powerful quantitative analyses. Whilst previous studies have suggested that both the hooked power law and discretised lognormal distributions fit better than the power law and negative binomial distributions, no comparisons so far have covered all articles within a discipline, including those that are uncited. Based on an analysis of 26 different Scopus subject areas in seven different years, this article reports comparisons of the discretised lognormal and the hooked power law with citation data, adding 1 to citation counts in order to include zeros. The hooked power law fits better in two thirds of the subject/year combinations tested for journal articles that are at least three years old, including most medical, life and natural sciences, and for virtually all subject areas for younger articles. Conversely, the discretised lognormal tends to fit best for arts, humanities, social science and engineering fields. The difference between the fits of the distributions is mostly small, however, and so either could reasonably be used for modelling citation data. For regression analyses, however, the best option is to use ordinary least squares regression applied to the natural logarithm of citation counts plus one, especially for sets of younger articles, because of the increased precision of the parameters. (C) 2016 Elsevier Ltd. All rights reserved.
Journal classification systems play an important role in bibliometric analyses. The two most important bibliographic databases, Web of Science and Scopus, each provide a journal classification system. However, no study has systematically investigated the accuracy of these classification systems. To examine and compare the accuracy of journal classification systems, we define two criteria on the basis of direct citation relations between journals and categories. We use Criterion I to select journals that have weak connections with their assigned categories, and we use Criterion II to identify journals that are not assigned to categories with which they have strong connections. If a journal satisfies either of the two criteria, we conclude that its assignment to categories may be questionable. Accordingly, we identify all journals with questionable classifications in Web of Science and Scopus. Furthermore, we perform a more in-depth analysis for the field of Library and Information Science to assess whether our proposed criteria are appropriate and whether they yield meaningful results. It turns out that according to our citation-based criteria Web of Science performs significantly better than Scopus in terms of the accuracy of its journal classification system. (C) 2016 Elsevier Ltd. All rights reserved.
In the work presented in this paper, we analyse ranking algorithms that can be applied to bibliographic citation networks and rank academic entities such as papers and authors. We evaluate how well these algorithms identify important and high-impact entities. The ranking algorithms are computed on the Microsoft Academic Search (MAS) and the ACM digital library citation databases. The MAS database contains 40 million papers and over 260 million citations that span across multiple academic disciplines, while the ACM database contains 1.8 million papers from the computing literature and over 7 million citations. We evaluate the ranking algorithms by using a test data set of papers and authors that won renowned prizes at numerous computer science conferences. The results show that using citation counts is, in general, the best ranking metric to measure high-impact. However, for certain tasks, such as ranking important papers or identifying high-impact authors, algorithms based on PageRank perform better. (C) 2016 Elsevier Ltd. All rights reserved.
The Italian National Scientific Qualification (ASN) was introduced as a prerequisite for applying for tenured associate or full professor positions at state-recognized universities. The ASN is meant to attest that an individual has reached a suitable level of scientific maturity to apply for professorship positions. A five member panel, appointed for each scientific discipline, is in charge of evaluating applicants by means of quantitative indicators of impact and productivity, and through an assessment of their research profile. Many concerns were raised on the appropriateness of the evaluation criteria, and in particular on the use of bibliometrics for the evaluation of individual researchers. Additional concerns were related to the perceived poor quality of the final evaluation reports. In this paper we assess the ASN in terms of appropriateness of the applied methodology, and the quality of the feedback provided to the applicants. We argue that the ASN is not fully compliant with the best practices for the use of bibliometric indicators for the evaluation of individual researchers; moreover, the quality of final reports varies considerably across the panels, suggesting that measures should be put in place to prevent sloppy practices in future ASN rounds. (C) 2016 Elsevier Ltd. All rights reserved.
In this paper, a novel method for analyzing media in Arabic using new quantitative characteristics is proposed. A sequence of newspaper daily issues is represented as histograms of occurrences of informative terms. The histograms closeness is evaluated via a rank correlation coefficient by treating the terms as ordinal data consistent with their frequencies. A new characteristic is introduced to quantify the relationship of an issue with numerous earlier ones. A newspaper is imaged as a time series of this characteristic values affected by the current social situation. The change points of this process may indicate fluctuations in the social behavior of the corresponding society as is evident from changes in the linguistic content. Moreover, the similarity measure created by means of this characteristic makes it possible to accurately derive the groups of homogeneous issues without any additional information. The methodology is evaluated on sequential issues of an Egyptian newspaper, Al-Ahraam, and a Lebanese newspaper, Al-Akhbaar. The results exhibit the high ability of the proposed approach to expose changes in the linguistic content and to connect them with changes in the structure of society and the relationships in it. The method can be suitably extended to every alphabetic language media. (C) 2016 Elsevier Ltd. All rights reserved.
There is no agreement over which statistical distribution is most appropriate for modelling citation count data. This is important because if one distribution is accepted then the relative merits of different citation-based indicators, such as percentiles, arithmetic means and geometric means, can be more fully assessed. In response, this article investigates the plausibility of the discretised lognormal and hooked power law distributions for modelling the full range of citation counts, with an offset of 1. The citation counts from 23 Scopus subcategories were fitted to hooked power law and discretised lognormal distributions but both distributions failed a Kolmogorov-Smirnov goodness of fit test in over three quarters of cases. The discretised lognormal distribution also seems to have the wrong shape for citation distributions, with too few zeros and not enough medium values for all subjects. The cause of poor fits could be the impurity of the subject subcategories or the presence of interdisciplinary research. Although it is possible to test for subject subcategory purity indirectly through a goodness of fit test in theory with large enough sample sizes, it is probably not possible in practice. Hence it seems difficult to get conclusive evidence about the theoretically most appropriate statistical distribution. (C) 2016 Elsevier Ltd. All rights reserved.
Citation is perhaps the mostly used metric to evaluate the scientific impact of papers. Various measures of the scientific impact of researchers and journals rely heavily on the citations of papers. Furthermore, in many practical applications, people may need to know not only the current citations of a paper, but also a prediction of its future citations. However, the complex heterogeneous temporal patterns of the citation dynamics make the predictions of future citations rather difficult. The existing state-of-the-art approaches used parametric methods that require long period of data and have poor performance on some scientific disciplines. In this paper, we present a simple yet effective and robust data analytic method to predict future citations of papers from a variety of disciplines. With rather short-term (e.g., 3 years after the paper is published) citation data, the proposed approach can give accurate estimate of future citations, outperforming state-of-the-art prediction methods significantly. Extensive experiments confirm the robustness of the proposed approach across various journals of different disciplines. (C) 2016 Elsevier Ltd. All rights reserved.
The use of science to understand its own structure is becoming popular, but understanding the organization of knowledge areas is still limited because some patterns are only discoverable with proper computational treatment of large-scale datasets. In this paper, we introduce a network-based methodology combined with text analytics to construct the taxonomy of science fields. The methodology is illustrated with application to two topics: complex networks (CN) and photonic crystals (PC). We built citation networks using data from the Web of Science and used a community detection algorithm for partitioning to obtain science maps of the fields considered. We also created an importance index for text analytics in order to obtain keywords that define the communities. A dendrogram of the relatedness among the subtopics was also obtained. Among the interesting patterns that emerged from the analysis, we highlight the identification of two well-defined communities in PC area, which is consistent with the known existence of two distinct communities of researchers in the area: telecommunication engineers and physicists. With the methodology, it was also possible to assess the interdisciplinary and time evolution of subtopics defined by the keywords. The automatic tools described here are potentially useful not only to provide an overview of scientific areas but also to assist scientists in performing systematic research on a specific topic. (C) 2016 Elsevier Ltd. All rights reserved.
We introduce a new tool - the CitedReferencesExplorer (CRExplorer, www.crexplorer.net) - which can be used to disambiguate and analyze the cited references (CRs) of a publication set downloaded from the Web of Science (WoS). The tool is especially suitable to identify those publications which have been frequently cited by the researchers in a field and thereby to study for example the historical roots of a research field or topic. CRExplorer simplifies the identification of key publications by enabling the user to work with both a graph for identifying most frequently cited reference publication years (RPYs) and the list of references for the RPYs which have been most frequently cited. A further focus of the program is on the standardization of CRs. It is one of the biggest problems in bibliometrics that there are several variants of the same CR in the WoS. In this study, CRExplorer is used to study the CRs of all papers published in the Journal of Informetrics. The analyses focus on the most important papers published between 1980 and 1990. (C) 2016 Elsevier Ltd. All rights reserved.
Global cities are defined, on the one hand, as the major command and control centres of the world economy and, on the other hand, as the most significant sites of the production of innovation. As command and control centres, they are home to the headquarters of the most powerful MNCs of the global economy, while as sites for the production of innovation they are supposed to be the most important sites of corporate research and development (R&D) activities. In this paper, we conduct a bibliometric analysis of the data located in the Scopus and Forbes 2000 databases to reveal the correlation between the characteristics of the above global city definitions. We explore which cities are the major control points of the global corporate R&D (home city approach), and which cities are the most important sites of corporate R&D activities (host city approach). According to the home city approach we assign articles produced by companies to cities where the decision-making headquarters are located (i.e. to cities that control the companies R&D activities), while according to the host city approach we assign articles to cities where the R&D activities are actually conducted. Given Sassen's global city concept, we expect global cities to be both the leading home cities and host cities. The results show that, in accordance with the global city concept, Tokyo, New York, London and Paris surpass other cities as command points of global corporate R&D (having 42 percent of companies scientific articles). However, as sites of corporate R&D activities to be conducted, New York and Tokyo form a unique category (having 28 percent of the articles). The gap between San Jose and Boston, and the global cities has consistently narrowed because the formers are the leading centres of the fastest growing innovative industries (e.g. information technology and biotechnology) in the world economy, and important sites of international R&D activities within these industries. The emerging economies are singularly represented by Beijing; however, the position of Chinese capital (i.e. the number of its companies scientific articles), has been strengthening rapidly. (C) 2016 Elsevier Ltd. All rights reserved.
A new methodology is proposed for comparing Google Scholar (GS) with other citation indexes. It focuses on the coverage and citation impact of sources, indexing speed, and data quality, including the effect of duplicate citation counts. The method compares GS with Elseviers Scopus, and is applied to a limited set of articles published in 12 journals from six subject fields, so that its findings cannot be generalized to all journals or fields. The study is exploratory, and hypothesis generating rather than hypothesis-testing. It confirms findings on source coverage and citation impact obtained in earlier studies. The ratio of GS over Scopus citation varies across subject fields between 1.0 and 4.0, while Open Access journals in the sample show higher ratios than their non-OA counterparts. The linear correlation between GS and Scopus citation counts at the article level is high: Pearsons R is in the range of 0.8-0.9. A median Scopus indexing delay of two months compared to GS is largely though not exclusively due to missing cited references in articles in press in Scopus. The effect of double citation counts in GS due to multiple citations with identical or substantially similar meta-data occurs in less than 2% of cases. Pros and cons of article-based and what is termed as concept-based citation indexes are discussed. (C) 2016 Elsevier Ltd. All rights reserved.
The paper investigates the theoretical response of h-type bibliometric indicators developed over the past decade when faced with the problem of manipulation through self-citation practices. An extreme self-citation scenario is used to test the theoretical resistance of the research performance metrics to strategic manipulation and to determine the magnitude of the impact that self-citations may induce on the indicators. The original h-index, eighteen selected variants, as well as traditional bibliometric indicators are considered. The results of the theoretical study indicate that while all indicators are vulnerable to manipulation, some of the h-index variants are more susceptible to the influence of strategic behavior than others: elite set indicators prove more resilient than the original h while other variants, including most of those directly derived from the h-index, are shown to be less robust. Variants that take into account time constraints prove to be especially useful for detecting potential manipulation. As a practical tool which may aid further studies, the article offers a collection of functions to compute the h-index and several of its variants in the R language and environment for statistical computing. (C) 2016 Elsevier Ltd. All rights reserved.
The aim of this study is to explore the network effects of the national and disciplinary community and actor attribute effects on the future performance of scientists in two fields of social sciences in Croatia. Based on the publication data from 1992 to 2012, extracted from three databases, we used the co-authorship network from the period 1992-2001 for the specification of nine structural effects to predict individual performance in the 2002-2012 period. Employing the auto-logistic actor attribute models allowed the inclusion of six actor attributes and the analysis of their effects simultaneously with network effects. The results show that future performance is dependent on the national and disciplinary network both in the psychology field and in the sociology field. When controlling for actor attribute effects, these structural effects play a significant role only in sociology, where activity in the network is a negative predictor and having a tie with an actor who is going to be above average in productivity is a positive predictor of the outcome. Institution type in psychology, age and the previous productivity in sociology are significant actor attribute effects. We used log-odds to demonstrate the probabilities of the outcome for three prototypical egonet structures: open, closed and complex; with different numbers of alters with attribute. Specific directions for future research are identified. (C) 2016 Elsevier Ltd. All rights reserved.
The ability to attract and retain talented professors is a distinctive competence of world-class universities and a source of competitive advantage. The ratio of top scientists to academic staff could therefore be an indicator of the competitive strength of the universities. This work identifies the Italian top scientists in over 200 fields, by their research productivity. It then ranks the relative universities by the ratio of top scientists to overall faculty. Finally, it contrasts this list with the ranking list by average productivity of the overall faculty. The analysis is carried out at the field, discipline, and overall university levels. The paper also explores the secondary question of whether the ratio of top scientists to faculty is related to the size of the university. (C) 2016 Elsevier Ltd. All rights reserved.
The volume of the existing research literature is such it can make it difficult to find highly relevant information and to develop an understanding of how a scientific topic has evolved. Prior research on topic evolution has often leveraged refinements to Latent Dirichlet Allocation (LDA) to identify emerging topics. However, such methods do not answer the question of which studies contributed to the evolution of a topic. In this paper we show that meta-paths over a heterogeneous bibliographic network (consisting of papers, authors and venues) can be used to identify the network elements that made the greatest contributions to a topic. In particular, by adding derived edges that capture the contribution of papers, authors, and venues to a topic (using PageRank algorithm), a restricted meta-path over the bibliographic network can be used to restrict the evolution of topics to the context of interest to a researcher. We use such restricted meta-paths to construct a topic evolution tree that can provide researchers with a web-based visualization of the evolution of a scientific topic in the context of interest to them. Compared to baseline networks without restrictions, we find that restricted networks provide more useful topic evolution trees. (C) 2016 Elsevier Ltd. All rights reserved.
Although statistical models fit many citation data sets reasonably well with the best fitting models being the hooked power law and discretised lognormal distribution, the fits are rarely close. One possible reason is that there might be more uncited articles than would be predicted by any model if some articles are inherently uncitable. Using data from 23 different Scopus categories, this article tests the assumption that removing a proportion of uncited articles from a citation dataset allows statistical distributions to have much closer fits. It also introduces two new models, zero inflated discretised lognormal distribution and the zero inflated hooked power law distribution and algorithms to fit them. In all 23 cases, the zero inflated version of the discretised lognormal distribution was an improvement on the standard version and in 15 out of 23 cases the zero inflated version of the hooked power law was an improvement on the standard version. Without zero inflation the discretised lognormal models fit the data better than the hooked power law distribution 6 out of 23 times and with it, the discretised lognormal models fit the data better than the hooked power law distribution 9 out of 23 times. Apparently uncitable articles seem to occur due to the presence of academic-related magazines in Scopus categories. In conclusion, future citation analysis and research indicators should take into account uncitable articles, and the best fitting distribution for sets of citation counts from a single subject and year is either the zero inflated discretised lognormal or zero inflated hooked power law. (C) 2016 Elsevier Ltd. All rights reserved.
We propose a method to analyze public opinion about political issues online by automatically detecting polarity in Twitter data. Previous studies have focused on the polarity classification of individual tweets. However, to understand the direction of public opinion on a political issue, it is important to analyze the degree of polarity on the major topics at the center of the discussion in addition to the individual tweets. The first stage of the proposed method detects polarity in tweets using the Lasso and Ridge models of shrinkage regression. The models are beneficial in that the regression results provide sentiment scores for the terms that appear in tweets. The second stage identifies the major topics via a latent Dirichlet analysis (LDA) topic model and estimates the degree of polarity on the LDA topics using term sentiment scores. To the best of our knowledge, our study is the first to predict the polarities of public opinion on topics in this manner. We conducted an experiment on a mayoral election in Seoul, South Korea and compared the total detection accuracy of the regression models with five support vector machine (SVM) models with different numbers of input terms selected by a feature selection algorithm. The results indicated that the performance of the Ridge model was approximately 7% higher on average than that of the SVM models. Additionally, the degree of polarity on the LDA topics estimated using the proposed method was compared with actual public opinion responses. The results showed that the polarity detection accuracy of the Lasso model was 83%, indicating that the proposed method was valid in most cases. (C) 2016 Elsevier Ltd. All rights reserved.
The arguments presented demonstrate that the Mean Normalized Citation Score (MNCS) and other size-independent indicators based on the ratio to publications are not indicators of research performance. The article provides examples of the distortions when rankings by MNCS are compared to those based on indicators of productivity. The authors propose recommendations for the scientometric community to switch to ranking by research efficiency, instead of MNCS and other size-independent indicators. (C) 2016 Elsevier Ltd. All rights reserved.
This article explores the current perception of Wikipedia in academia, focusing on both the reasons for its unpopularity among some and the reasons for its growing acceptance among others. First, the reasons that Wikipedia is still struggling to gain acceptance among many academics and higher education professionals are identified. These include common misconceptions about Wikipedia, doubts about its quality, uneasiness with the challenge that it poses to the traditional peer-review system, and a lack of career-enhancing motivations related to using Wikipedia. Second, the benefits of teaching with Wikipedia for educators, students, and the wider society, as discussed in the current teaching literature, are explored. Finally, the article presents an argument for using Wikipedia in a variety of ways to help students develop critical and academic writing skills.
The introduction of new technologies and access to new information channels continue to change the way media studies researchers work and the questions they seek to answer. We investigate the current practices of media studies researchers and how these practices affect their research questions. Through the analysis of 27 interviews about the research practices of media studies researchers during a research project we developed a model of the activities in their research cycle. We find that information gathering and analysis activities are dominating the research cycle. These activities influence the research outcomes as they determine how research questions asked by media studies researchers evolve. Specifically, we show how research questions are related to the availability and accessibility of data as well as new information sources for contextualization of the research topic. Our contribution is a comprehensive account of the overall research cycle of media studies researchers as well as specific aspects of the research cycle, i.e., information sources, information seeking challenges, and the development of research questions. This work confirms findings of previous work in this area using a previously unstudied group of researchers, as well as providing new details about how research questions evolve.
Although the Internet has become a major source for accessing news, there is little research regarding users' experience with news sites. We conducted an experiment to test a comprehensive model of user experience with news sites that was developed previously by means of an online survey. Level of adoption (novel or adopted site) was controlled with a between-subjects manipulation. We collected participants' answers to psychometric scales at 2 times: after presentation of 5 screenshots of a news site and directly after 10 minutes of hands-on experience with the site. The model was extended with the prediction of users' satisfaction with news sites as a high-level design goal. A psychometric measure of trust in news providers was developed and added to the model to better predict people's intention to use particular news sites. The model presented in this article represents a theoretically founded, empirically tested basis for evaluating news websites, and it holds theoretical relevance to user-experience research in general. Finally, the findings and the model are applied to provide practical guidance in design prioritization.
The main focus of this article is to examine whether sentiment analysis can be successfully used for event detection, that is, detecting significant events that occur in the world. Most solutions to this problem are typically based on increases or spikes in frequency of terms in social media. In our case, we explore whether sudden changes in the positivity or negativity that keywords are typically associated with can be exploited for this purpose. A data set that contains several million Twitter messages over a 1-month time span is presented and experimental results demonstrate that sentiment analysis can be successfully utilized for this purpose. Further experiments study the sensitivity of both frequency- or sentiment-based solutions to a number of parameters. Concretely, we show that the number of tweets that are used for event detection is an important factor, while the number of days used to extract token frequency or sentiment averages is not. Lastly, we present results focusing on detecting local events and conclude that all approaches are dependant on the level of coverage that such events receive in social media.
Social media provide opportunities for policy makers to gauge pubic opinion. However, the large volumes and variety of expressions on social media have challenged traditional policy analysis and public sentiment assessment. In this article, we describe a framework for social-media-based public policy informatics and a system called iMood that addresses the needs for sentiment and network analyses of U.S. immigration and border security. iMood collects related messages on Twitter, extracts user sentiment and emotion, and constructs networks of the Twitter users, helping policy makers to identify opinion leaders, influential users, and community activists. We evaluated the sentiment, emotion, and network characteristics found in 909,035 tweets posted by over 300,000 users during three phases between May and November 2013. Statistical analyses reveal significant differences in emotion and sentiment among the 3 phases. The Twitter networks of the 3 phases also had significantly different relationship counts, network densities, and total influence scores from those of other phases. This research should contribute to developing a new framework and a new system for social-media-based public policy informatics, providing new empirical findings and data sets of sentiment and network analyses of U.S. immigration and border security, and demonstrating a general applicability to different domains.
Most information retrieval (IR) systems consider relevance, usefulness, and quality of information objects (documents, queries) for evaluation, prediction, and recommendation, often ignoring the underlying search process of information seeking. This may leave out opportunities for making recommendations that analyze the search process and/or recommend alternative search process instead of objects. To overcome this limitation, we investigated whether by analyzing a searcher's current processes we could forecast his likelihood of achieving a certain level of success with respect to search performance in the future. We propose a machine-learning-based method to dynamically evaluate and predict search performance several time-steps ahead at each given time point of the search process during an exploratory search task. Our prediction method uses a collection of features extracted from expression of information need and coverage of information. For testing, we used log data collected from 4 user studies that included 216 users (96 individuals and 60 pairs). Our results show 80-90% accuracy in prediction depending on the number of time-steps ahead. In effect, the work reported here provides a framework for evaluating search processes during exploratory search tasks and predicting search performance. Importantly, the proposed approach is based on user processes and is independent of any IR system.
We introduce a new problem, identifying the type of relation that holds between a pair of similar items in a digital library. Being able to provide a reason why items are similar has applications in recommendation, personalization, and search. We investigate the problem within the context of Europeana, a large digital library containing items related to cultural heritage. A range of types of similarity in this collection were identified. A set of 1,500 pairs of items from the collection were annotated using crowdsourcing. A high intertagger agreement (average 71.5 Pearson correlation) was obtained and demonstrates that the task is well defined. We also present several approaches to automatically identifying the type of similarity. The best system applies linear regression and achieves a mean Pearson correlation of 71.3, close to human performance. The problem formulation and data set described here were used in a public evaluation exercise, the *SEM shared task on Semantic Textual Similarity. The task attracted the participation of 6 teams, who submitted 14 system runs. All annotations, evaluation scripts, and system runs are freely available.(1)
Automatic text classification (TC) continues to be a relevant research topic and several TC algorithms have been proposed. However, the majority of TC algorithms assume that the underlying data distribution does not change over time. In this work, we are concerned with the challenges imposed by the temporal dynamics observed in textual data sets. We provide evidence of the existence of temporal effects in three textual data sets, reflected by variations observed over time in the class distribution, in the pairwise class similarities, and in the relationships between terms and classes. We then quantify, using a series of full factorial design experiments, the impact of these effects on four well-known TC algorithms. We show that these temporal effects affect each analyzed data set differently and that they restrict the performance of each considered TC algorithm to different extents. The reported quantitative analyses, which are the original contributions of this article, provide valuable new insights to better understand the behavior of TC algorithms when faced with nonstatic (temporal) data distributions and highlight important requirements for the proposal of more accurate classification models.
We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (SCITEX), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.
The diversity of bibliometric indices today poses the challenge of exploiting the relationships among them. Our research uncovers the best core set of relevant indices for predicting other bibliometric indices. An added difficulty is to select the role of each variable, that is, which bibliometric indices are predictive variables and which are response variables. This results in a novel multioutput regression problem where the role of each variable (predictor or response) is unknown beforehand. We use Gaussian Bayesian networks to solve the this problem and discover multivariate relationships among bibliometric indices. These networks are learnt by a genetic algorithm that looks for the optimal models that best predict bibliometric data. Results show that the optimal induced Gaussian Bayesian networks corroborate previous relationships between several indices, but also suggest new, previously unreported interactions. An extended analysis of the best model illustrates that a set of 12 bibliometric indices can be accurately predicted using only a smaller predictive core subset composed of citations, g-index, q(2)-index, and h(r)-index. This research is performed using bibliometric data on Spanish full professors associated with the computer science area.
The citation relationships between publications, which are significant for assessing the importance of scholarly components within a network, have been used for various scientific applications. Missing citation metadata in scholarly databases, however, create problems for classical citation-based ranking algorithms and challenge the performance of citation-based retrieval systems. In this research, we utilize a two-step citation analysis method to investigate the importance of publications for which citation information is partially missing. First, we calculate the importance of the author and then use his importance to estimate the publication importance for some selected articles. To evaluate this method, we designed a simulation experimentrandom citation-missingto test the two-step citation analysis that we carried out with the Association for Computing Machinery (ACM) Digital Library (DL). In this experiment, we simulated different scenarios in a large-scale scientific digital library, from high-quality citation data, to very poor quality data, The results show that a two-step citation analysis can effectively uncover the importance of publications in different situations. More importantly, we found that the optimized impact from the importance of an author (first step) is exponentially increased when the quality of citation decreases. The findings from this study can further enhance citation-based publication-ranking algorithms for real-world applications.
The contribution of this article is twofold. First, we present Indexing by latent Dirichlet allocation (LDI), an automatic document indexing method. Many ad hoc applications, or their variants with smoothing techniques suggested in LDA-based language modeling, can result in unsatisfactory performance as the document representations do not accurately reflect concept space. To improve document retrieval performance, we introduce a new definition of document probability vectors in the context of LDA and present a novel scheme for automatic document indexing based on LDA. Second, we propose an Ensemble Model (EnM) for document retrieval. EnM combines basic indexing models by assigning different weights and attempts to uncover the optimal weights to maximize the mean average precision. To solve the optimization problem, we propose an algorithm, which is derived based on the boosting method. The results of our computational experiments on benchmark data sets indicate that both the proposed approaches are viable options for document retrieval.
As open-access (OA) publishing funded by article-processing charges (APCs) becomes more widely accepted, academic institutions need to be aware of the total cost of publication (TCP), comprising subscription costs plus APCs and additional administration costs. This study analyzes data from 23 UK institutions covering the period 2007-2014 modeling the TCP. It shows a clear rise in centrally managed APC payments from 2012 onward, with payments projected to increase further. As well as evidencing the growing availability and acceptance of OA publishing, these trends reflect particular UK policy developments and funding arrangements intended to accelerate the move toward OA publishing (Gold OA). Although the mean value of APCs has been relatively stable, there was considerable variation in APC prices paid by institutions since 2007. In particular, hybrid subscription/OA journals were consistently more expensive than fully OA journals. Most APCs were paid to large traditional commercial publishers who also received considerable subscription income. New administrative costs reported by institutions varied considerably. The total cost of publication modeling shows that APCs are now a significant part of the TCP for academic institutions, in 2013 already constituting an average of 10% of the TCP (excluding administrative costs).
This article explores the effects of health information technologies (HIT) in operating rooms (ORs). When functioning well, HIT are a boon to mankind. However, HIT in the OR also create hazards for patients for a number of interrelated reasons. We introduce 5 interrelated components of hazard situations for medical teams operating in the OR: complexity, overload/underload, inadequate individual training, inadequate training of medical teams, and overconfidence of surgeons. These components of hazard situations in the OR may negatively impact patient safety. We discuss implications, especially in terms of individuals and medical teams in the OR, as well as work substitution as a broader aspect of the potential dark side of health IT.
In this opinion piece, we would like to present a short literature review of perceptions and reservations towards Wikipedia in academia, address the common questions about overall reliability of Wikipedia entries, review the actual practices of Wikipedia usage in academia, and conclude with possible scenarios for a peaceful coexistence. Because Wikipedia is a regular topic of JASIST publications (Lim, 2009; Meseguer-Artola, Aibar, Llados, Minguillon, & Lerga, ; Mesgari, Okoli, Mehdi, Nielsen, & Lanamaki, ; Okoli, Mehdi, Mesgari, Nielsen, & Lanamaki, ), we hope to start a useful discussion with the right audience.
The purpose of this paper is to assess the degree of international visibility for the Romanian scientific social sciences journals included in the Institute for Scientific Information (ISI) database. By examining the national distribution of authors and the proportion of co-authorship within and outside Romania, the paper proposes the use of the Theil Index and its decomposition as a tool to assess international visibility. Although there are 10 ISI social sciences journals in Romania, the international visibility of these journals is relatively low; the number of foreign authors as a percentage of the total number of authors remains below 30 % for most journals. There is a high degree of geographic concentration for the foreign authors, as most come from two countries. Regression models also indicate that the number of authors from the same institution as the one that issues the journal affects significantly a journal's Impact Factor. The number of articles authored exclusively by mixed teams (including authors from the same institution that issues the journal and authors from abroad or authors from other Romanian institutions) as a percentage of the total number of articles published is extremely low (8 %). This suggests that the Impact Factor, when used as a measure of research quality for the Romanian social sciences journals, may create bias in the judgement of those interpreting the results of the Impact Factor rankings, favoring insularity at the expense of scientific collaboration.
The advent of large data repositories and the necessity of distributed skillsets have led to a need to study the scientific collaboration network emerging around cyber-infrastructure-enabled repositories. To explore the impact of scientific collaboration and large-scale repositories in the field of genomics, we analyze coauthorship patterns in NCBIs big data repository GenBank using trace metadata from coauthorship of traditional publications and coauthorship of datasets. We demonstrate that using complex network analysis to explore both networks independently and jointly provides a much richer description of the community, and addresses some of the methodological concerns discussed in previous literature regarding the use of coauthorship data to study scientific collaboration.
Research collaboration is necessary, rewarding, and beneficial. Cohesion between team members is related to their collective efficiency. To assess collaboration processes and their eventual outcomes, agencies need innovative methods-and social network approaches are emerging as a useful analytical tool. We identified the research output and citation data of a network of 61 research groups formally engaged in publishing rare disease research between 2000 and 2013. We drew the collaboration networks for each year and computed the global and local measures throughout the period. Although global network measures remained steady over the whole period, the local and subgroup metrics revealed a growing cohesion between the teams. Transitivity and density showed little or no variation throughout the period. In contrast the following points indicated an evolution towards greater network cohesion: the emergence of a giant component (which grew from just 30 % to reach 85 % of groups); the decreasing number of communities (following a tripling in the average number of members); the growing number of fully connected subgroups; and increasing average strength. Moreover, assortativity measures reveal that, after an initial period where subject affinity and a common geographical location played some role in favouring the connection between groups, the collaboration was driven in the final stages by other factors and complementarities. The Spanish research network on rare diseases has evolved towards a growing cohesion-as revealed by local and subgroup metrics following social network analysis.
Thomson Reuter's highly cited database (HiCite) (http://www.highlycited.com) is composed of the top researchers in several subspecialties belonging to the 22 essential science indicator fields of the web of science. By analyzing the data collected, we are able to calculate several correlations in the data based upon select areas, view trends of changes in rank, percentage of contribution, and countries, and re-rank the organizations by new standards. The purpose of this is to refocus and highlight previously unaccounted but significant details that are historically ignored, such as economics, specialties, nationality, efficiency, and field, and to evaluate performance of the separate organizations by multiple factors, with an emphasis on the status of the United States of America.
We employ data envelopment analysis on a series of experiments performed in Fermilab, one of the major high-energy physics laboratories in the world, in order to test their efficiency (as measured by publication and citation rates) in terms of variations of team size, number of teams per experiment, and completion time. We present the results and analyze them, focusing in particular on inherent connections between quantitative team composition and diversity, and discuss them in relation to other factors contributing to scientific production in a wider sense. Our results concur with the results of other studies across the sciences showing that smaller research teams are more productive, and with the conjecture on curvilinear dependence of team size and efficiency.
In modern knowledge societies, scientific research is crucial, but expensive and often publicly financed. However, with regard to scientific research success, some studies have found gender differences in favor of men. To explain this, it has been argued that female researchers collaborate less than male researchers, and the current study examines this argument scientometrically. A secondary data analysis was applied to the sample of a recent scientometric publication (Konig et al. in Scientometrics 105:1931-1952, 2015. doi: 10.1007/s11192-015-1646-y). The sample comprised 4234 (45 % female) industrial-organizational psychologists with their 46,656 publications (published from 1948 to 2013) and all of their approx. 100,000 algorithmically genderized collaborators (i.e., co-authors). Findings confirmed that (a) the majority of researchers' publications resulted from collaborations, and (b) their engagement in collaborations was related to their scientific success, although not as clearly as expected (and partly even negatively). However, there was no evidence that a lack of female collaboration causes females' lower scientific success. In fact, female researchers engage in more scientific collaborations. Our findings have important implications for science and society because they make gender differences in scientific success much harder to rationalize.
Author co-citation analysis (ACA) is a well-known and frequently-used method to exhibit the academic researchers and the professional field sketch according to co-citation relationships between authors in an article set. However, visualizing subtle examination is limited because only author co-citation information is required in ACA. The proposed method, called modified author co-citation analysis (MACA), exploits author co-citation relationship, citations published time, citations published carriers, and citations keywords, to construct MACA-based co-citation matrices. According to the results of our experiments: (1) MACA shows a good clustering result with more delicacy and more clearness; (2) more information involved in co-citation analysis performs good visual acuity; (3) in visualization of co-citation network produced by MACA, the points in different categories have far more distance, and the points indicating authors in the same category are closer together. As a result, the proposed MACA is found that more detailed and subtle information of a knowledge domain analyzed can be obtained, compared to ACA.
Funding acknowledgements found in scientific publications have been used to study the impact of funding on research since the 1970s. However, no broad scale indexation of that paratextual element was done until 2008, when Thomson Reuters' Web of Science started to add funding acknowledgement information to its bibliographic records. As this new information provides a new dimension to bibliometric data that can be systematically exploited, it is important to understand the characteristics of these data and the underlying implications for their use. This paper analyses the presence and distribution of funding acknowledgement data covered in Web of Science. Our results show that prior to 2009 funding acknowledgements coverage is extremely low and therefore not reliable. Since 2008, funding information has been collected mainly for publications indexed in the Science Citation Index Expanded; more recently (2015), inclusion of funding texts for publications indexed in the Social Science Citation Index has been implemented. Arts & Humanities Citation Index content is not indexed for funding acknowledgement data. Moreover, English-language publications are the most reliably covered. Finally, not all types of documents are equally covered for funding information indexation and only articles and reviews show consistent coverage. The characterization of the funding acknowledgement information collected by Thomson Reuters can therefore help understand the possibilities offered by the data but also their limitations.
Definitions for influence in bibliometrics are surveyed and expanded upon in this work. On data composed of the union of DBLP and CiteSeer (x) , approximately 6 million publications, a relatively small number of features are developed to describe the set, including loyalty and community longevity, two novel features. These features are successfully used to predict the influential set of papers in a series of machine learning experiments. The most predictive features are highlighted and discussed.
We analyze whether proximity effect, preferential attachment and path dependence or three "P" mechanisms would concur within the evolutionary process of inter-regional network. Using a unique database of China's technology transaction between regions, we show that proximity effect, preferential attachment and path dependence have coexisted in the evolutionary process of China's inter-regional network of technology transactions. In particular, the inter-regional relations positively and significantly correlate with the geographical and economic proximity matrix, all regions' three centrality values in current year positively and significantly correlate with their centrality in the last two years, and the inter-regional relations in current year positively and significantly correlate with own relations in last two years. This paper contributes to the existing literature by identifying three evolutionary mechanisms of inter-regional network. An interpretation is that the evolution process of inter-regional network is a very complex process, and one mechanism such as geographical proximity from the perspective of economic geography or preferential attachment from the perspective of network science only could explain a part of the process.
Although the contribution of scientometric literature to policies on academic science has been substantial, the literature has focused primarily on the production of scientific knowledge, whereas limited attention has been paid to the other critical mission of academic institutions, i.e., education or the production of scientists. To address this limitation and better inform policymakers, the current study proposes a new approach drawing on Ph.D. dissertation data, which we believe should open up a new avenue of scientometric research. Integrating dissertation data with more traditional types of scientometric data such as publications and careers, this study presents a case study of the Japanese science system investigating its transition since the 1970s.
The aim of this paper is to collect the most-cited articles of the twenty-first century and to study how this group changed over time. Here the term "most-cited" is operationalized by considering yearly h-cores in the Web of Science. These h-cores are analysed in terms of authors, research areas, countries, institutions, journals and average number of authors per paper. We only consider publications of article or proceedings type. The research of some of the more prolific authors is on genetics and genomes publishing in multidisciplinary journals, such as Nature and Science, while the results show that writing a software tool for crystallography or molecular biology may help collecting large numbers of citations. English is the language of all articles in any h-core. The core institutions are largely those best placed in most rankings of world universities. Some attention is given on the relation between h-core articles and the information sciences. We further introduce the notions of h-core scores and h-core score per publication, leading to new rankings of countries. We conclude by stating that the notions of h-cores and h-core scores provide a new perspective on leading countries, articles and scientists.
Researchers have to operate in an increasingly competitive environment in which funding is becoming a scarce resource. Funding agencies are unable to experiment with their allocation policies since even small changes can have dramatic effects on academia. We present a Proposal-Evaluation-Grant System (PEGS) which allows us to simulate different research funding allocation policies. We implemented four Resource Allocation Strategies (RAS) entitled Communism, Lottery, Realistic, and Ideal. The results show that there is a strong effect of the RAS on the careers of the researchers. In addition the PEGS investigated the influence of the paper writing skill and the grant review errors.
Scientific journals published in non-English languages may be less accessible to researchers worldwide. Most of them are not covered in international indexing and abstracting databases such as the Web of Science and Scopus, which can influence their impact. Scientific journals published by the Indonesian Agency for Agricultural Research and Development are a case in point, and their impact cannot be ascertained due to the non-existence of a tool that can assist in assessing the performance of the journals. To address this concern, this study aims to (a) assess the quality of Indonesian agricultural science journals; (b) determine how Indonesia-based agricultural science researchers assign and calibrate trust to the journals they use; (c) determine how Indonesia-based agricultural science researchers assess the usability of the journals they read; and (d) produce an internal ranking of Indonesian agricultural journals. The study has been designed as a combination of two approaches, namely revealed preference and stated preference study. The revealed preference study involves citation analysis of the nine journals sampled. The stated preference study gauges the trustworthiness and usability of these journals from the perspectives of the researchers who use them. The revealed preference provides the Journal Quality Index whereas the stated preference study provides the Journal Trust and Journal Usability Index. The study also provides internal ranking and comparison between indicators resulted from the revealed preference and stated preference study. It is also observed that Quality and Trust indices are well correlated and indicate a good model fit with the Overall Index. On the other hand, Usability Index is negatively correlated and shows very less model fit with the Overall Index.
A longitudinal bibliometric analysis of publications indexed in Thomson Reuters' Incites and Elsevier's Scopus, and published from Persian Gulf States and neighbouring Middle East countries, shows clear effects of major political events during the past 35 years. Predictions made in 2006 by the US diplomat Richard N. Haass on political changes in the Middle East have come true in the Gulf States' national scientific research systems, to the extent that Iran has become in 2015 by far the leading country in the Persian Gulf, and South-East Asian countries including China, Malaysia and South Korea have become major scientific collaborators, displacing the USA and other large Western countries. But collaborations patterns among Persian Gulf States show no apparent relationship with differences in Islam denominations.
In recent years there has been increasing concern about the rigor of laboratory research. Here we present the protocol for a study comparing the completeness of reporting of in vivo and in vitro research carried in Nature Publication Group journals before and after the introduction of a change in editorial policy (the introduction of a set of guidelines for reporting); and in similar research published in other journals in the same periods.
We reply to the comment by Lundgren, Shildrick and Lawrence on our article on gender studies bibliometrics and argue that it does not challenge any of our main results. Their points of criticism concerned that we had not compiled exactly all scholarly gender production, that the gender studies field had changed during the period, that the definition of the research area is vague, and suggest that only gender studies scholars themselves are able to study the field. We maintain that constructive scientific critique should specify alternative methods and how they are expected to change the results and conclusions, and why that would be preferable. Without such stringency, it reduces to regressive lists of detail.
Altmetrics or other indicators for the impact of academic outputs are often correlated with citation counts in order to help assess their value. Nevertheless, there are no guidelines about how to assess the strengths of the correlations found. This is a problem because the correlation strength affects the conclusions that should be drawn. In response, this article uses experimental simulations to assess the correlation strengths to be expected under various different conditions. The results show that the correlation strength reflects not only the underlying degree of association but also the average magnitude of the numbers involved. Overall, the results suggest that due to the number of assumptions that must be made, in practice it will rarely be possible to make a realistic interpretation of the strength of a correlation coefficient.
This article uses a bundle of bibliometric and text-mining techniques to provide a systematic assessment of the intellectual core of the Social Media-based innovation research field. The goal of this study is to identify main research areas, understand the current state of development and suggest potential future directions by analysing co-citations from 155 papers published between 2003 and 2013 in the most influential academic journals. The main clusters have been identified, mapped, and labelled. Their most active areas on this topic and the most influential and co-cited papers have been identified and described. Also, intra- and inter-cluster knowledge base diversity has been assessed by using indicators stemming from the domains of Information Theory and Biology. A t test has been performed to assess the significance of the inter-cluster diversity. Five co-existing research streams shaping the research field under investigation have been identified and characterized.
The paper summarizes the results of the recently completed project to derive a Research Core Dataset (RCD) for the German science system. It describes the basic principles and the architecture of the specification by introducing its main components and elements and by depicting the provisions with regard to aggregate and base data. In this context, the paper also explains the peculiarities of the German science system and the need for standardization given institutional heterogeneity and highly fragmented institutional reporting activities. The paper concludes with a short outlook on the potential chances and risks of the RCD to promote data integration and efficiency in reporting by research institutions in Germany.
With increasing uptake among researchers, social media are finding their way into scholarly communication and, under the umbrella term altmetrics, are starting to be utilized in research evaluation. Fueled by technological possibilities and an increasing demand to demonstrate impact beyond the scientific community, altmetrics have received great attention as potential democratizers of the scientific reward system and indicators of societal impact. This paper focuses on the current challenges for altmetrics. Heterogeneity, data quality and particular dependencies are identified as the three major issues and discussed in detail with an emphasis on past developments in bibliometrics. The heterogeneity of altmetrics reflects the diversity of the acts and online events, most of which take place on social media platforms. This heterogeneity has made it difficult to establish a common definition or conceptual framework. Data quality issues become apparent in the lack of accuracy, consistency and replicability of various altmetrics, which is largely affected by the dynamic nature of social media events. Furthermore altmetrics are shaped by technical possibilities and are particularly dependent on the availability of APIs and DOIs, strongly dependent on data providers and aggregators, and potentially influenced by the technical affordances of underlying platforms.
In a knowledge-based economy, a good overview of the scientific and technological portfolio is essential for policy formation and driving knowledge transfer to the industry and the broad public. In order to enhance open innovation, the Flemish public administration has created a Flanders research information portal that integrates information available from its data providers (research institutions, funding organizationsaEuro broken vertical bar) using the Common European Research Information Format standard. Although this standard allows for almost unlimited flexibility for modelling the research information, it has limitations when it comes down to communication to end-users, in terms of semantics. However, interoperability of research information is only meaningful when a well-defined semantics is used and hence acts as a leverage for comparability of the information provided on the portal, including its derived services offered to the research community, research-driven organizations and policy makers. This paper describes the implementation of a business semantics tool that governs the meaning of the data concepts and classifications used for research information, in particular research funding, as a means to unambiguously exchange and interpret research data.
We illustrate the usefulness of an Ontology-Based Data Management (OBDM) approach to develop an open information system, allowing for a deep level of interoperability among different databases, and accounting for additional dimensions of data quality compared to the standard dimensions of the OECD (Quality framework and guidelines for OECD statistical activities, OECD Publishing, Paris, 2011) Quality Framework. Recent advances in engineering in computer science provide promising tools to solve some of the crucial issues in data integration for Research and Innovation.
Research performance indicators are broadly used, for a range of purposes. The scientific literature on research indicators has a strong methodological focus. There is no comprehensive overview or classification of the use of such indicators. In this paper we give such a classification of research indicator use. Using the journal Scientometrics as a starting point we scrutinized recent journal literature on scientometrics, bibliometrics, research policy, research evaluation, and higher education in order to spot paragraphs or sections that mention indicator use. This led to a classification of research indicator use with 21 categories which can be grouped into five main categories.
This paper details a unique data experiment carried out at the University of Amsterdam, Center for Digital Humanities. Data pertaining to monographs were collected from three autonomous resources, the Scopus Journal Index, WorldCat.org and Goodreads, and linked according to unique identifiers in a new Microsoft SQL database. The purpose of the experiment was to investigate co-varied metrics for a list of book titles based on their citation impact (from Scopus), presence in international libraries (WorldCat.org) and visibility as publically reviewed items (Goodreads). The results of our data experiment highlighted current problems related citation indices and the way that books are recorded by different citing authors. Our research further demonstrates the primary problem of matching book titles as 'cited objects' with book titles held in a union library catalog, given that books are always recorded distinctly in libraries if published as separate editions with different International Standard Book Numbers (ISBNs). Due to various 'matching' problems related to the ISBN, we suggest a new type of identifier, a 'Book Object Identifier', which would allow bibliometricians to recognize a book published in multiple formats and editions as 'one object' suitable for evaluation. The BOI standard would be most useful for books published in the same language, and would more easily support the integration of data from different types of book indexes.
A researcher collaborating with many groups will normally have more papers (and thus higher citations and h-index) than a researcher spending all his/her time working alone or in a small group. While analyzing an author's research merit, it is therefore not enough to consider only the collective impact of the published papers, it is also necessary to quantify his/her share in the impact. For this quantification, here I propose the I-index which is defined as an author's percentage share in the total citations that his/her papers have attracted. It is argued that this I-index does not directly depend on the most of the subjective issues like an author's influence, affiliation, seniority or career break. A simple application of the Central Limit Theorem shows that, the scheme of equidistribution of credit among the coauthors of a paper will give us the most probable value of the I-index (with an associated small standard deviation which decreases with increasing h-index). I show that the total citations (N-c), the h-index and the I-index are three independent parameters (within their bounds), and together they give a comprehensive idea of an author's overall research performance.
Even though there is a rich discussion in the literature about co-authorship practices, many of the existing studies do not offer a dynamic picture of co-authorship patterns and experiences across disciplines. To address the research gap, our study aims to explore several key dimensions of the social dynamics in co-authorship practices. In particular, we examine cohort differences in collaboration patterns across disciplines and cohort differences in negative collaboration experiences across disciplines. To conduct our analyses, we use data from a national survey of scholars and engineers in 108 top research universities. Our results indicate that the number of collaborators at one's own university is correlated with an increase in negative collaboration experiences, while an increase in collaborators at other universities is not correlated with an increase in negative collaboration experiences. In addition, we conclude that junior scholars are more likely to have negative collaboration experiences than their senior peers. This result is true even after controlling for gender and discipline.
This exploratory study analyzes the networked structure of theories in social sciences represented by co-occurrences on the World Wide Web. For this, co-occurrences of communication science theories were retrieved from the Web and analyzed using social network analysis tools. Several networks and node-level properties were measured to examine the relationships of theories in terms of co-occurrences. Communication science theories were grouped into four clusters. The results shed some important light on structural dynamics of communication science theories on the academic and social Web.
Many countries are investing a lot in innovation in order to modernize their economies. A key step in this process is the development of academic research in innovation. This article analyzes the leading countries in innovation research between 1989 and 2013 from an academic perspective. The aim of the study is to identify the most relevant countries in this field and the leading trends that are occurring during the last years. The work also introduces a general perspective analyzing the research developed in several supranational regions. The main advantage of this contribution is that it gives a global overview of the current academic state of the art in the area. The analysis focuses on the most productive and influential countries in innovation research classifying the results in periods of 5 years. The leading journals in the field are also studied individually identifying the most productive countries in each of the journals. The results show that the publications of each country are biased by the country origin of the journal. The USA and the UK are the leading countries in this field being the UK the most productive one in per capita terms among the big countries.
Our paper analyses 25 years of performance management research published in the English-language journals, included in SSCI database, separating the business domain from public sector one. We used a content analysis for showing the relationships between the subfields of performance management and the time evolution. Through a multiple correspondence analysis based on keywords we provide a framework to track this literature over the 25-year period. We conclude the paper with a discussion on future pathways in the performance management literature.
Co-authorship networks have been used to study collaboration patterns in various fields, evaluate researchers and recommend policies. In their simplest form they are constructed by considering authors to be network nodes connected to each other if they published a paper together. We propose to further explore the same data by constructing a different network, in which nodes are articles linked to one another if they have a common author. For papers published in the fields of computer science and mathematics with affiliations to Romanian institutions, we show that this type of network reveals patterns of collaborative behavior and offers new insights about practices in the field. We find that the proposed networks are smaller and denser than the co-authorship networks, have a better defined community structure, and directly represent the results of collaborative endeavors by focusing on the actual outcome, i.e., published papers.
This paper aims to study the collaboration among researchers in a specific Italian program funding, the Projects of National Interest (PRIN), which supports the academic research. The paper uses two approaches to study the dynamic complex networks: first it identifies the observed distribution of links among researchers in the four areas of interest (chemistry, physics, economics and sociology) through distribution models, then it uses a stochastic model to understand how the links change over time. The analysis is based on large and unique dataset on 4322 researchers from 98 universities and research institutes that have been selected for PRIN allocation from 2000 to 2011. The originality of this work is that we have studied a competitive funding schemes through dynamic network analysis techniques.
It is hard to detect important articles in a specific context. Information retrieval techniques based on full text search can be inaccurate to identify main topics and they are not able to provide an indication about the importance of the article. Generating a citation network is a good way to find most popular articles but this approach is not context aware. The text around a citation mark is generally a good summary of the referred article. So citation context analysis presents an opportunity to use the wisdom of crowd for detecting important articles in a context sensitive way. In this work, we analyze citation contexts to rank articles properly for a given topic. The model proposed uses citation contexts in order to create a directed and edge-labeled citation network based on the target topic. Then we apply common ranking algorithms in order to find important articles in this newly created network. We showed that this method successfully detects a good subset of most prominent articles in a given topic. The biggest contribution of this approach is that we are able to identify important articles for a given search term even though these articles do not contain this search term. This technique can be used in other linked documents including web pages, legal documents, and patents as well as scientific papers.
Universities and the members of their faculties, by means of open access, open education, and social media engagement, contribute to many publicly accessible resources of academic values, i.e., open scholarship. To encourage universities to contribute even more to open scholarship, in a more focused and sustainable way, the methodology of Open Scholarship Ranking (OSR) was constructed after a thorough examination and several adjustments based on the Berlin Principles on Ranking of Higher Education Institutions (hereinafter referred to as "the Berlin Principles"). The OSR has met most of the Berlin Principles, and new adjustments helped to improve its quality. A significant correlation has been observed between the OSR results of Chinese research universities and the results from existing comprehensive university rankings. The OSR provides an evaluation framework for universities' performance in open scholarship, and can be regarded as an acceptable way of ranking universities.
Since the publication of the first academic journal in 1665, the number of academic journal titles has grown steadily. In 2001, Mabe and Amin studied the pattern of growth in the number of academic journals worldwide, identifying three key development periods between 1900 and 1996. These three episodes are from 1900 to 1944, from 1944 to 1978, and from 1978 to 1996. The compound annual growth rates for each episode are 3.30, 4.68 and 3.31 % respectively. In this research, we seek to validate these findings, and extend on previous work to analyze journal growth patterns from 1986 to 2013. Our results show academic journals grew at an average rate of 4.7 % from 1986 to 2013, which is very similar to the growth rate during the Big Science period observed in the previous study. Our results also show that academic journals had an estimated 92 % Active rate, and 8 % Inactive rate annually. Out of all Active journals, approximately 43 % have high impact and reach JCR or SJR databases, and 26 % have relatively higher impact and are thus collected in the JCR database. The comparison results of Active/Inactive SJR and JCR journals suggest that lower impact journals have a higher chance to become Inactive than higher impact journals. With the wide use of the Internet in academic science, our results expectedly show that the number of Print-Only journals is gradually decreasing while the number of Online-Only journals is increasing. The growth of Online-Only journals exceeds the growth of Print-Only journals in 2007, and the number of Online-Only journals exceeded the number of Print and Only journals in 2012. More than 30 % Newly Created journals provide Open Access. It is suggested that we are experiencing the second journal boom in history and Internet technology has changed the academic publication system.
Faced with limited resources, scientists from around the world enter into collaborations to join their resources to conduct research. Like everywhere else, international co-publishing in southern African countries is increasingly on the rise. The aim of this study was to document and analyse the level of scientific productivity, collaboration patterns, scientists' experiences and attitudes towards South-South and South-North collaboration. We performed 105 interviews with scientists based at five southern African Universities, namely; University of Malawi-Chancellor College, National University of Science and Technology, the University of Botswana, the University of Zambia, and the University of Zimbabwe. We also traced 192 scientists from the various departments at these universities that had jointly published 623 scientific papers in the field of basic sciences in the period 1995-2014 in Web of Science journals. Our results show that in the majority of the cases funding from the North contributed substantially to increased scientific productivity, and international co-authorship. The results also show that collaboration with southern scientists is equally valued as that with northern scientists, but for different reasons. We conclude that supporting international and national collaboration which includes increased scientific mobility, strong scientific groups and networks, are key factors for capacity building of research in southern African Universities.
Quantitative measurements of bibliometrics based on knowledge entities (i.e., keywords) improve competencies in tracking the structure and dynamic development of various scientific domains. Co-word networks (a content analysis technique and type of knowledge network) are often employed to discern relationships among various scientific concepts in scholarly publications to reveal the development and evolution of scientific knowledge. In relation to evolutionary network analysis, different link prediction methods in network science can assist in the prediction of missing links and modelling of network dynamics. These traditional methods (based on topological similarity scores and time series methods of link prediction) can be used to predict future co-occurrence trends among scientific concepts. This study attempted to build supervised learning models for link prediction in co-word networks using network topological similarity metrics and their temporal evolutionary information. In addition to exploring the underlying mechanism of temporal co-word network evolution, classification datasets containing links with both positive and negative labels were also built. A set of topological metrics and their temporal evolutionary information were produced to describe instances of classification datasets. Supervised classifications methods were then applied to classify the links and accurately predict future associations among keywords. Time series based forecasting methods were used to predict the future values of topological evolution. Results in relation to supervised link prediction by different classifiers showed that both static and dynamic information are valuable in predicting new links between literary concepts extracted from scientific literature.
The problem with the prediction of scientific collaboration success based on the previous collaboration of scholars using machine learning techniques is addressed in this study. As the exploitation of collaboration network is essential in collaborator discovery systems, in this article an attempt is made to understand how to exploit the information embedded in collaboration networks. We benefit the link structure among the scholars and also among the scholars and the concepts to extract set of features that are correlated with the collaboration success and increase the prediction performance. The effect of considering other aggregate methods in addition to average and maximum, for computing the collaboration features based on the feature of the members is examined as well. A dataset extracted from Northwestern University's SciVal Expert is used for evaluating the proposed approach. The results demonstrate the capability of the proposed collaboration features in order to increase the prediction performance in combination with the widely-used features like h-index and average citation counts. Consequently, the introduced features are appropriate to incorporate in collaborator discovery systems.
Scientific domain vocabularies play an important role in academic communication and lean research management. Confronted with the dramatic increasing of new keywords, the continuous development of a domain vocabulary is important for the domain to keep its long survival in the scientific context. Current methods based either on statistical or linguistic approaches can automatically generate vocabularies that consist of popular keywords, but these approaches fail to capture high-quality standardized terms due to the lack of human intervention. Manual methods take use of human knowledge, but they are both time-consuming and expensive. In order to overcome these deficiencies, this research proposes a novel social voting approach to construct scientific domain vocabularies. It integrates automatic system and human knowledge based on the theory of linguistic arbitrariness and selects widely accepted standardized set of keywords based on social voting. A social voting system has been implemented to aid scientific domain vocabulary construction in the National Natural Science Foundation of China. Two experiments are conducted to demonstrate the effectiveness and validity of the built system. The results show that the constructed domain vocabulary using this system covers a wide range of areas under a discipline and it facilitates the standardization of scientific terminology.
Three kinds of criteria have been advanced for distinguishing sleeping beauties in science, i.e., average-based criteria, quartile-based criteria and parameter-free criteria, on which basis four rules are proposed that should be adhered to in distinguishing sleeping beauties: (1) the early citations should be penalized; (2) the whole citation history should be taken into account; (3) the awakening time of a sleeping beauty should not vary over time; and (4) arbitrary thresholds on sleeping period or awakening intensity should be avoided.
Our current societies increasingly rely on electronic repositories of collective knowledge. An archetype of these databases is the Web of Science (WoS) that stores scientific publications. In contrast to several other forms of knowledge-e.g., Wikipedia articles-a scientific paper does not change after its "birth". Nonetheless, from the moment a paper is published it exists within the evolving web of other papers, thus, its actual meaning to the reader changes. To track how scientific ideas (represented by groups of scientific papers) appear and evolve, we apply a novel combination of algorithms explicitly allowing for papers to change their groups. We (1) identify the overlapping clusters of the undirected yearly co-citation networks of the WoS (1975-2008) and (2) match these yearly clusters (groups) to form group timelines. After visualizing the longest lived groups of the entire data set we assign topic labels to all groups. We find that in the entire WoS multidisciplinarity is clearly over-represented among cutting edge ideas. In addition, we provide detailed examples for papers that (1) change their topic labels and (2) move between groups.
The existing papers on the economic impact of research output have focussed on either a single country or bloc of selected countries. The aim of this paper is to examine the effect of research output on economic growth in 169 countries for the period, 1996-2013. A system GMM estimate, which provides for endogeneity, unobserved effects and small sample bias, is employed to test the relationship. Within the neoclassical framework, we use varieties of indicators to proxy research performance, and a few sensitivity analyses were also performed. Overall, the results show that research output has positive impact on economic growth, irrespective of whether the sample is for developing or developed countries. The policy implications of the findings are detailed in the body of the paper.
This paper investigates the social space of physics research institutions. Scientific capital is a well-known concept for measuring and assessing the accumulated recognition and the specific scientific power developed by Pierre Bourdieu. The scientific capital of a physics research institution manifests itself as a reputation, a high-profile name in the field of physics, symbols of academic recognition, and scientific status. Using citation statistics from the Web of Science Core Collection and sociological data of dedicated survey "The Monitoring of the Labor Market for Highly Qualified R&D Personnel" we construct the social space of Russian physics institutions. The analysis reveals generalized grounds of social space of Russian physics institutions: principles of visibility and scientific capital. The study highlights internal differentiation of physics institutions on three groups ("major", "high energy", and "secondary" institutions). The social space of physics research institutions provides a map of field of physics in Russia. This research may be a useful starting point for developing a more comprehensive study of the field of physics.
Bibliometric analysis of Egyptian literature on HCV provides the intelligence needed for decision makers and gives an insight into research productivity in this area. We propose our database (HCVDBegy) on MS-SQL server by querying PubMed for "HCV and Egypt" with time limit till 31st March 2013. Fifty eight out of the 716 records were excluded and the rest 658 were divided into 22 domains. Analysis used data mining add-ins for Microsoft Excel, including association and regression algorithms. A fluctuation in numbers of papers was noticed from 2004 to 2009 with a steady increase onward. Eighty six percent of publications were the contribution of three or more authors. Top publishing bodies were Cairo and Ain Shams Universities, Faculty of Medicine, National Research Center and National Cancer Institute. Three Egyptian journals came on top, whereas other publishing journals were mainly from the USA. Few controlled clinical trials and meta-analyses were published. HCV epidemiology, review articles and sequence analysis domains were the most cited. Forecasting model showed a breakthrough in numbers of publications on 2013 and 2014 than those forecasted. Dependency network based on association rule model of MeSH topics was also extensively analyzed. Number of publications showed a promising increase which points to the better national awareness of HCV problem. Studying MeSH terms clustering showed some hot topics. We recommend that the PubMed should alarm authors of the challenges of author affiliations. HCVDBegy availability opens the door for more drill down analysis for decision makers.
Peer-review based research assessment, as implemented in Australia, the United Kingdom, and some other countries, is a very costly exercise. We show that university rankings in economics based on long-run citation counts can be easily predicted using early citations. This would allow a research assessment to predict the relative long-run impact of articles published by a university immediately at the end of the evaluation period. We compare these citation-based university rankings with the rankings of the 2010 Excellence in Research assessment in Australia and the 2008 Research Assessment Exercise in the United Kingdom. Rank correlations are quite strong, but there are some differences between rankings. However, if assessors are willing to consider citation analysis to assess some disciplines, as is the case for the natural sciences and psychology in Australia, it seems reasonable to consider also including economics in that set.
Using life cycle publication data of 9368 economics PhD graduates from 127 U.S. institutions between 1987 and 1996, we compare research productivities of male and female graduates, and how these correlate with macroeconomic conditions prior to starting graduate studies and with availability of academic jobs at the time of graduation. We find that availability of academic jobs is positively correlated with research productivity for both male and female graduates. Unfavorable employment conditions prior to starting graduate education are negatively correlated with female graduates' research productivity and positively correlated with male graduates' research productivity.
This paper provides a citation network analysis of publications from the Academy of Management Journal, one of the key US-based journals in the field of Management. Our analysis covers all publications in the journal from 1958-2014. This represents the entire history of the journal until the arbitrary cut-off point of our study. The paper analyses the most published authors, most cited articles, most cited authors, top institutions, and the nationalities of authors that are most represented in the journal. 2304 articles containing 114,550 references were taken from the primary data source, the Web of Science (TM). An analysis of 114,550 citations was carried out using the Web of Science (TM) online analytics tool and Excel (R). Gephi (TM), a data visualisation and manipulation software, was used to provide a visual representation of the citation networks. Results indicate that the most published authors within AMJ throughout the journal's history are Ivancevich, Golembiewski and Hambrick. The three most cited authors within AMJ are Pfeffer, Porter and Thompson. The single most cited article is Pfeffer and Salancik's 1978 article The external control of organizations: a resource dependence perspective. A keyword analysis revealed that the most important terms used in the journal's history were 'Performance', 'Organization' and 'Work'. Results from this paper extend our previous citation analyses of key journals in the discipline of Higher Education to a new discipline-the field of Management. The paper provides evidence of how visual analyses can help to represent the citation "geography" of a journal over time.
Altmetrics have gained momentum and are meant to overcome the shortcomings of citation-based metrics. In this regard some light is shed on the dangers associated with the new "all-in-one" indicator altmetric score.
The Frascati and Oslo manuals assemble scientific activities, technological activities and their definitions in generic manner, without attempting to propose a rigorous and cogent organization of the categories. Such uncertainties could possibly be overcome by an attempt to formulate a coherent, holistic classification, retracing the indications of previous scholars concerning the broader characteristics of scientific discovery and technological innovation. From such an attempt, we gather the lesson that scholars of technological innovation and scientific progress must at all times be ready to reopen debate on the assertions that they have thus far formulated.
De Marchi and Lorenzetti (Scientometrics 106(1):253-261, 2016) have recently argued that in fields where the journal impact factor (IF) is not calculated, such as in the humanities, it is key to find other indicators that would allow the relevant community to assess the quality of scholarly journals and the research outputs that are published in them. The authors' suggestion is that information concerning the journal's rejection rate and the number of subscriptions sold is important and should be used for such assessment. The question addressed by the authors is very important, yet their proposed solutions are problematic. Here I point to some of these problems and illustrate them by considering as a case in point the field of philosophy. Specifically, here I argue for four main claims. First, even assuming that IF provides a reliable indicator of the quality of journals for the assessment of research outputs, De Marchi and Lorenzetti have failed to validate their suggested indicators and proxies. Second, it has not been clarified why, in absence of IF, other journal-based metrics that are currently available should not be used. Third, the relationship between IF and rejection rate is more complex than the authors suggest. Fourth, accepting the number of sold subscriptions as a proxy would result in discrimination against open access journals. The upshot of my analysis is that the question of how to assess journals and research outputs in the humanities is still far from resolved.
Social media offers both opportunities and challenges in everyday life information seeking (ELIS). Despite their popularity, it is unclear whether the use of social media for ELIS heightens problematic outcomes, such as encountering too much information and finding irrelevant, conflicting, outdated, and noncredible information. In light of this gap, this study tested (a) whether the level of problematic informational outcomes varies with the use of social networking sites, microblogs, and social question and answer sites; (b) whether the problem level varies by gender and problem-solving styles; and (c) whether the aforementioned factors have significant interaction effects. An online questionnaire was used to survey 791 U.S. undergraduates. Irrelevant information was the top issue. Gender difference was statistically significant for conflicting information, which was more problematic for women. The multiway analysis of variance (ANOVA) indicated notable problem-solving style differences, especially on the Personal Control subscale. This highlights the importance of affective factors. It is noteworthy that although social media use has no significant main effect, there were significant interaction effects between microblog use and the Approach-Avoidance and Problem Solving Confidence subscales. The impact of microblog use on ELIS outcomes therefore warrants further investigation. Five propositions are posited for further testing.
Twitter has emerged as a popular source of sharing and delivering news information. In tweet messages, URLs to web resources and hashtags are often included. This study investigates the potential of the hyperlinks and hashtags as topical clues and indicators to tweet messages. For this study, we crawled and analyzed about 1.5 million tweets for a 3-month period covering any topic or subject. The findings of this study revealed a power law relationship for the ranking and frequency of (a) the host names of URLs, and (b) a pair of hashtags and URLs that appeared in the tweet messages. This study also discovered that the most popular URLs used in tweets come from news and media websites, and a majority of the hyperlinked resources are news web pages. One implication of this study is that Twitter users are becoming more active in sharing already published information than producing new information. Finally, our investigation on hashtags for web resource indexing reveals that hashtags have the potential to be used as indexing terms for co-occurring URLs in the same tweet. We also discuss the implications of this study for web resource recommendation.
Participating in online social, cultural, and political activities requires digital skill and knowledge. This study investigates how sustained student engagement in game design and social media use can attenuate the relations between socioeconomic factors and digital inequality among youth. This study of 242 middle and high school students participating in the Globaloria project shows that participation eliminates gender effects, and reduces parent education effects in home computer use. Further, students from schools with lower parent education show greater increases in frequency of school technology engagement. Globaloria participation also weakens the link between prior school achievement and advanced technology activities. Results offer evidence that school-based digital literacy programs can attenuate digital divide effects known to occur cross-sectionally in the general U.S. population.
This study aims to identify the way researchers collaborate with other researchers in the course of the scientific research life cycle and provide information to the designers of e-Science and e-Research implementations. On the basis of in-depth interviews with and on-site observations of 24 scientists and a follow-up focus group interview in the field of bioscience/nanoscience and technology in Korea, we examined scientific collaboration using the framework of the scientific research life cycle. We attempt to explain the major motivations, characteristics of communication and information sharing, and barriers associated with scientists' research collaboration practices throughout the research life cycle. The findings identify several notable phenomena including motivating factors, the timing of collaboration formation, partner selection, communication methods, information-sharing practices, and barriers at each phase of the life cycle. We find that specific motivations were related to specific phases. The formation of collaboration was observed throughout the entire process, not only in the beginning phase of the cycle. For communication and information-sharing practices, scientists continue to favor traditional means of communication for security reasons. Barriers to collaboration throughout the phases included different priorities, competitive tensions, and a hierarchical culture among collaborators, whereas credit sharing was a barrier in the research product phase.
Biochemistry is a highly funded research area that is typified by large research teams and is important for many areas of the life sciences. This article investigates the citation impact and Mendeley readership impact of biochemistry research from 2011 in the Web of Science according to the type of collaboration involved. Negative binomial regression models are used that incorporate, for the first time, the inclusion of specific countries within a team. The results show that, holding other factors constant, larger teams robustly associate with higher impact research, but including additional departments has no effect and adding extra institutions tends to reduce the impact of research. Although international collaboration is apparently not advantageous in general, collaboration with the United States, and perhaps also with some other countries, seems to increase impact. In contrast, collaborations with some other nations seems to decrease impact, although both findings could be due to factors such as differing national proportions of excellent researchers. As a methodological implication, simpler statistical models would find international collaboration to be generally beneficial and so it is important to take into account specific countries when examining collaboration.
Based on State of the Union addresses from 1790 to 2014 (225 speeches delivered by 42 presidents), this paper describes and evaluates different text representation strategies. To determine the most important words of a given text, the term frequencies (tf) or the tfidf weighting scheme can be applied. Recently, latent Dirichlet allocation (LDA) has been proposed to define the topics included in a corpus. As another strategy, this study proposes to apply a vocabulary specificity measure (Zscore) to determine the most significantly overused word-types or short sequences of them. Our experiments show that the simple term frequency measure is not able to discriminate between specific terms associated with a document or a set of texts. Using the tf idf or LDA approach, the selection requires some arbitrary decisions. Based on the term-specific measure (Zscore), the term selection has a clear theoretical basis. Moreover, the most significant sentences for each presidency can be determined. As another facet, we can visualize the dynamic evolution of usage of some terms associated with their specificity measures. Finally, this technique can be employed to define the most important lexical leaders introducing terms overused by the k following presidencies.
Two dominant theoretical models for privacyindividual privacy preferences and context-dependent definitions of privacyare often studied separately in information systems research. This paper unites these theories by examining how individual privacy preferences impact context-dependent privacy expectations. The paper theorizes that experience provides a bridge between individuals' general privacy attitudes and nuanced contextual factors. This leads to the hypothesis that, when making judgments about privacy expectations, individuals with less experience in a context rely more on individual preferences such as their generalized privacy beliefs, whereas individuals with more experience in a context are influenced by contextual factors and norms. To test this hypothesis, 1,925 American users of mobile applications made judgments about whether varied real-world scenarios involving data collection and use met their privacy expectations. Analysis of the data suggests that experience using mobile applications did moderate the effect of individual preferences and contextual factors on privacy judgments. Experience changed the equation respondents used to assess whether data collection and use scenarios met their privacy expectations. Discovering the bridge between 2 dominant theoretical models enables future privacy research to consider both personal and contextual variables by taking differences in experience into account.
An increasing number of tools are being developed to help academics interact with information, but little is known about the benefits of those tools for their users. This study evaluated academics' receptiveness to information proposed by a mobile app, the SerenA Notebook: information that is based in their inferred interests but does not relate directly to a prior recognized need. The evaluated app aimed at creating the experience of serendipitous encounters: generating ideas and inspiring thoughts, and potentially triggering follow-up actions, by providing users with suggestions related to their work and leisure interests. We studied how 20 academics interacted with messages sent by the mobile app (3 per day over 10 consecutive days). Collected data sets were analyzed using thematic analysis. We found that contextual factors (location, activity, and focus) strongly influenced their responses to messages. Academics described some unsolicited information as interesting but irrelevant when they could not make immediate use of it. They highlighted filtering information as their major struggle rather than finding information. Some messages that were positively received acted as reminders of activities participants were meant to be doing but were postponing, or were relevant to ongoing activities at the time the information was received.
Websites offer an unobtrusive data source for developing and analyzing information about various types of social science phenomena. In this paper, we provide a methodological resource for social scientists looking to expand their toolkit using unstructured web-based text, and in particular, with the Wayback Machine, to access historical website data. After providing a literature review of existing research that uses the Wayback Machine, we put forward a step-by-step description of how the analyst can design a research project using archived websites. We draw on the example of a project that analyzes indicators of innovation activities and strategies in 300 U.S. small- and medium-sized enterprises in green goods industries. We present six steps to access historical Wayback website data: (a) sampling, (b) organizing and defining the boundaries of the web crawl, (c) crawling, (d) website variable operationalization, (e) integration with other data sources, and (f) analysis. Although our examples draw on specific types of firms in green goods industries, the method can be generalized to other areas of research. In discussing the limitations and benefits of using the Wayback Machine, we note that both machine and human effort are essential to developing a high-quality data set from archived web information.
The dramatic increase in the amount of information stored on the web makes it more important to familiarize people with disabilities and elderly people with digital devices and applications and to adapt websites to enable their use by these users. Discapnet is a website mainly aimed at visually disabled people, and navigation is a challenging task for its users. In this context, system evaluation and problem detection become crucial aspects for enhancing user experience and may contribute greatly to diminishing the existing technological gap. This study proposes a system based on web-mining techniques that collects in-use information while the user is accessing the web (thus, being a noninvasive system). The proposed system models users in the wild and discovers navigation problems appearing in Discapnet and can also be used for problem detection when new users are navigating the site. The system was tested and its efficiency demonstrated in an experiment involving navigation under supervision, in which 82.6% of a set of disabled people were automatically labeled as having problems with the website.
Accurate and detailed data recording is indispensable for documenting archeological projects and for subsequent information exchange. To prevent comprehension and accessibility issues in these cases, data infrastructures can be useful. The establishment of such data infrastructures requires a clear understanding of the business processes and information flows within the archeological domain. This study attempts to provide insights into how information is managed in Flemish archeological processes and how this management process can be enhanced: an exploratory study based on an analysis of the new Flemish Immovable Heritage Decree, informal interviews with Flemish archeological organizations, and the results of an international survey. Three main processes, in which certified archeologists and the Flemish Heritage agency are key actors, were identified. Multiple types of information, the majority of which contain a geographical component, are recorded, acquired, used, and exchanged. Geographical information systems (GIS) and geodatabases therefore appear to be valuable components of an archeology-specific data infrastructure. This is of interest because GIS are widely adopted in archeology and multiple Flemish archeological organizations are in favor of a government-provided exchange standard or database templates for data recording. Furthermore, free and open source software is preferred to ensure cost efficiency and customizability.
The objective of this paper is to identify the dynamic structure of several time-dependent, discipline-level citation networks through a path-based method. A network data set is prepared that comprises 27 subjects and their citations aggregated from more than 27,000 journals and proceedings indexed in the Scopus database. A maximum spanning tree method is employed to extract paths in the weighted, directed, and cyclic networks. This paper finds that subjects such as Medicine, Biochemistry, Chemistry, Materials Science, Physics, and Social Sciences are the ones with multiple branches in the spanning tree. This paper also finds that most paths connect science, technology, engineering, and mathematics (STEM) fields; 2 critical paths connecting STEM and non-STEM fields are the one from Mathematics to Decision Sciences and the one from Medicine to Social Sciences.
Wikipedia is increasingly an important source of information for many. Hence, it is important to develop an understanding of how it is situated within society and the wider roles it is called onto perform. This article argues that one of these roles is as a depository of collective memory. Building on the work of Pentzold, I present a case study of the English Wikipedia article on the Vietnam War to demonstrate that the article, or more accurately, its talk pages, provide a forum for the contestation of collective memory. I further argue that this function is one that should be supported by libraries as they position themselves within a rapidly changing digital world.
Medical research is highly funded and often expensive and so is particularly important to evaluate effectively. Nevertheless, citation counts may accrue too slowly for use in some formal and informal evaluations. It is therefore important to investigate whether alternative metrics could be used as substitutes. This article assesses whether one such altmetric, Mendeley readership counts, correlates strongly with citation counts across all medical fields, whether the relationship is stronger if student readers are excluded, and whether they are distributed similarly to citation counts. Based on a sample of 332,975 articles from 2009 in 45 medical fields in Scopus, citation counts correlated strongly (about 0.7; 78% of articles had at least one reader) with Mendeley readership counts (from the new version 1 applications programming interface [API]) in almost all fields, with one minor exception, and the correlations tended to decrease slightly when student readers were excluded. Readership followed either a lognormal or a hooked power law distribution, whereas citations always followed a hooked power law, showing that the two may have underlying differences.
Credit assignment to multiple authors of a publication is a challenging task owing to the conventions followed within different areas of research. In this study, we present a review of different author credit-assignment schemas, which are designed mainly based on author position and the total number of coauthors on the publication. We implemented, tested, and classified 15 author credit-assignment schemas into 3 types: linear, curve, and "other" assignment schemas. Further investigation and analysis revealed that most of the methods provide reasonable credit-assignment results, even though the credit-assignment distribution approaches are quite different among different types. The evaluation of each schema based on PubMed articles published in 2013 shows that there exist positive correlations among different schemas and that the similarity of credit-assignment distributions can be derived from the similar design principles that stress the number of coauthors or the author position, or consider both. We provide a summary about the features of each credit-assignment schema to facilitate the selection of the appropriate one, depending on the different conditions required to meet diverse needs.
Interests of researchers who engage with research synthesis methods (RSM) intersect with library and information science (LIS) research and practice. This intersection is described by a summary of conceptualizations of research synthesis in a diverse set of research fields and in the context of Swanson's (1986) discussion of undiscovered public knowledge. Through a selective literature review, research topics that intersect with LIS and RSM are outlined. Topics identified include open access, information retrieval, bias and research information ethics, referencing practices, citation patterns, and data science. Subsequently, bibliometrics and topic modeling are used to present a systematic overview of the visibility of RSM in LIS. This analysis indicates that RSM became visible in LIS in the 1980s. Overall, LIS research has drawn substantially from general and internal medicine, the field's own literature, and business; and is drawn on by health and medical sciences, computing, and business. Through this analytical overview, it is confirmed that research synthesis is more visible in the health and medical literature in LIS; but suggests that, LIS, as a meta-science, has the potential to make substantive contributions to a broader variety of fields in the context of topics related to research synthesis methods.
Spam has become an issue of concern in almost all areas where the Internet is involved, and many people today have become victims of spam from publishers and individual journals. We studied this phenomenon in the field of scholarly publishing from the perspective of a single author. We examined 1,024 such spam e-mails received by Marcin Kozak from publishers and journals over a period of 391 days, asking him to submit an article to their journal. We collected the following information: where the request came from; publishing model applied; fees charged; inclusion or not in the Directory of Open Access Journals (DOAJ); and presence or not in Beall's (2014) listing of dubious journals. Our research showed that most of the publishers that sent e-mails inviting manuscripts were (i) using the open access model, (ii) using article-processing charges to fund their journal's operations; (iii) offering very short peer-review times, (iv) on Beall's list, and (v) misrepresenting the location of their headquarters. Some years ago, a letter of invitation to submit an article to a particular journal was considered a kind of distinction. Today, e-mails inviting submissions are generally spam, something that misleads young researchers and irritates experienced ones.
This study proposes a new method to construct and trace the trajectory of conceptual development of a research field by combining main path analysis, citation analysis, and text-mining techniques. Main path analysis, a method used commonly to trace the most critical path in a citation network, helps describe the developmental trajectory of a research field. This study extends the main path analysis method and applies text-mining techniques in the new method, which reflects the trajectory of conceptual development in an academic research field more accurately than citation frequency, which represents only the articles examined. Articles can be merged based on similarity of concepts, and by merging concepts the history of a research field can be described more precisely. The new method was applied to the "h-index" and "text mining" fields. The precision, recall, and F-measures of the h-index were 0.738, 0.652, and 0.658 and those of text-mining were 0.501, 0.653, and 0.551, respectively. Last, this study not only establishes the conceptual trajectory map of a research field, but also recommends keywords that are more precise than those used currently by researchers. These precise keywords could enable researchers to gather related works more quickly than before.
This article develops a preferential attachment-based mixture model of global Internet bandwidth and investigates it in the context of observed bandwidth distributions between 2002 and 2011. Our longitudinal analysis shows, among other things, that the bandwidth share distributions-and thus bandwidth differences-exhibit considerable path dependence where country proportions of international bandwidth in 2011 can be substantially accounted for by a preferential attachment-based mixture of micro-level processes. Our preferential attachment model, consistent with empirical data, does not predict increasing concentration of bandwidth within top-ranked countries. We argue that recognizing the strong, but nuanced, historical inertia of bandwidth distributions is helpful in better discriminating among competing theoretical perspectives on the global digital divide as well as in clarifying policy discussions related to gaps between bandwidth-rich and bandwidth-poor countries.
Understanding the structural change and evolution of networks for predicting their dynamics is one of the fundamental problems in network related studies. In order to uncover the dynamic structural patterns of a network over time, it is vital to investigate the ways nodes behave at a local level. So, it is important to know the reasons why nodes stop a relationship or select a new partner, compared to other alternatives, for establishing a link. This study aims to understand the processes of network evolution by quantitatively examining the attachment behaviors of nodes in a real collaboration network by identifying the characteristics of the existing nodes which can impact on their link formation process. To do so, different link formation or attachment processes such as cohesiveness, cumulative advantage, assortative mixing, and structural position are examined. The results indicate that structural position, the tendency to connect to the nodes in a strategic intermediating position in the network, is the most effective processes that expose the attachment behavior of nodes during the evolution of a collaboration network. Understanding these effective processes can help to predict more precisely how the nodes' local structure and consequently the overall network structural change over time. This could support researchers, decision makers or practitioners to manage the nodes (agents) in their social or technical networks (systems) for reaching their organizational goals. (C) 2016 Elsevier Ltd. All rights reserved.
The Finnish publication channel quality ranking system was established in 2010. The system is expert-based, where separate panels decide and update the rankings of a set of publications channels allocated to them. The aggregated rankings have a notable role in the allocation of public resources into universities. The purpose of this article is to analyze this national ranking system. The analysis is mainly based on two publicly available databases containing the publication source information and the actual national publication activity information. Using citation-based indicators and other available information with association rule mining, decision trees, and confusion matrices, it is shown that most of the expert-based rankings can be predicted and explained using automatically constructed reference models. Publication channels, for which the Finnish expert-based rank is higher than the estimated one, are mainly characterized by higher publication activity or recent upgrade of the rank. Such findings emphasize the importance of openness of information on a ranking system, with its multifaceted evaluation. (C) 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Braun, Glanzel, and Schubert (2006) recommended using the h-index as an alternative to the journal impact factor (IF) to qualify journals. In this paper, a Bayesian-based sensitivity analysis is performed with the aid of mathematical models to examine the behavior of the journal h-index to changes in the publication/citation counts of journals. Sensitivity of the h-index was most apparent for changes in the number of citations, revealing similar patterns of behavior for almost all models and independently to the field of research. In general, the h-index was found to be robust to changes in citations up to approximately the 25th percentile of the citation distribution, inflating its value afterwards. (C) 2016 Elsevier Ltd. All rights reserved.
In social tagging systems, like Mendeley, CiteULike, and BibSonomy, users can post, tag, visit, or export scholarly publications. In this paper, we compare citations with metrics derived from users' activities (altmetrics) in the popular social bookmarking system BibSonomy. Our analysis, using a corpus of more than 250,000 publications published before 2010, reveals that overall, citations and altmetrics in BibSonomy are mildly correlated. Furthermore, grouping publications by user-generated tags results in topic-homogeneous subsets that exhibit higher correlations with citations than the full corpus. We find that posts, exports, and visits of publications are correlated with citations and even bear predictive power over future impact. Machine learning classifiers predict whether the number of citations that a publication receives in a year exceeds the median number of citations in that year, based on the usage counts of the preceding year. In that setup, a Random Forest predictor outperforms the baseline on average by seven percentage points. (C) 2016 Elsevier Ltd. All rights reserved.
Mapping the evolution of scientific fields has drawn much attention in recent years. Researchers have proposed various methods to describe, explain and predict different aspects of science. Network-based analysis has been widely used for knowledge networks, in order to track the changes of research topics and the spread of scientific ideas. Here we propose a novel approach for mapping the science from the perspective of cross-field authors. Computer science is selected based on its interdisciplinary applications. We build a scientific network consisting of computer science conferences as nodes, and two conferences are linked if there exist authors that published papers on both conferences. The scientific fields are identified by community detection algorithm. The results suggest the proposed method based on author overlaps across fields are effective in mapping the science. (C) 2016 Elsevier Ltd. All rights reserved.
Large-scale information, especially in the form of documents, is potentially useful for decision-making but intensifies the information overload problem. To cope with this problem, this paper proposes a method named RepExtract to extract a representative subset from large-scale documents. The extracted representative subset possesses three desirable features: high coverage of the content of the original document set, low redundancy within the extracted subset, and consistent distribution with the original set. Extensive experiments were conducted on benchmark datasets, demonstrating the superiority of RepExtract over the benchmark methods in terms of the three features above. A user study was also conducted by collecting human evaluations of different methods, and the results indicate that users can gain an understanding of large-scale documents precisely and efficiently through a representative subset extracted by the proposed method. (C) 2016 Elsevier Ltd. All rights reserved.
For the normalization of citation counts, two different kinds of methods are possible and used in bibliometrics: the cited-side and citing-side normalizations both of which can also be applied in the normalization of "Mendeley reader counts". Haunschild and Bornmann (2016a) introduced the paper-side normalization of reader counts (mean normalized reader score, MNRS) which is an adaptation of the cited-side normalization. Since the calculation of the MNRS needs further data besides data from Mendeley (a field-classification scheme, such as the Web of Science subject categories), we introduce here the reader-side normalization of reader counts which is an adaptation of the citing-side normalization and does not need further data from other sources, because self-assigned Mendeley disciplines are used. In this study, all articles and reviews of the Web of Science core collection with publication year 2012 (and a DOI) are used to normalize their Mendeley reader counts. The newly proposed indicator (mean discipline normalized reader score, MDNRS) is obtained, compared with the MNRS and bare reader counts, and studied theoretically and empirically. We find that: (i) normalization of Mendeley reader counts is necessary, (ii) the MDNRS is able to normalize Mendeley reader counts in several disciplines, and (iii) the MNRS is able to normalize Mendeley reader counts in all disciplines. This generally favorable result for the MNRS in all disciplines leads to the recommendation to prefer the MNRS over the MDNRS provided that the user has an external field-classification scheme at hand. (C) 2016 Elsevier Ltd. All rights reserved.
Admittedly, despite the plethora of scientometric indices proposed to rank scientists, none of them can fully capture the performance and impact of a scientist, since each index quantifies only one or a few aspects of his/her multifarious performance. Therefore, the task of scientometric ranking can be seen as a multi-dimensional ranking problem, where the different indices comprise the dimensions. The application of the skyline operator comes then as a natural solution to the problem. In this article we apply the skyline operator to scientist ranking to identify those scientists whose performance cannot be surpassed by others' with respect to all attributes. This technique can be used as a tool for short-listing distinguished researchers in case of award nomination. (C) 2016 Elsevier Ltd. All rights reserved.
A new data source providing the citation links of book publications, the Book Citation Index (BKCI), was recently released. A deeper understanding of the citation characteristics of book publications is needed before specific bibliometric indicators can be developed. In this study, the characteristics of citation distribution concentration in journal and book literature in Web of Science Core Collection (WoS), and the differences of these characteristics across fields, levels of aggregation and citation periods were probed to determine possible applications of this new data source for bibliometric studies. Even though the aggregation scheme is not sound for evaluation practices in books, aggregation matters much more for edited books in the sciences than for those in the social sciences and humanities. Journal articles have the least concentrated citation distribution in the sciences, while books play a more important role than other document types in the humanities. In the social sciences, both edited books and authored books have citation concentration distribution similar to journal articles. In addition, the Leimkuhler curves showed that citation window length (3 years vs. 9 years) does not affect the citation concentrations of most document types in journal and book literature significantly. (C) 2016 Elsevier Ltd. All rights reserved.
Quantitative analysis of the scientific literature is a frequent task in bibliometrics. Several large online resources collect and disseminate bibliographic information, paving the way for broad analyses and statistics. The Europe PubMed Central (PMC) and its Web Services is one of these resources, providing a rich platform to retrieve information and metadata on scientific publications. However, a complete bibliometric analysis that involves gathering information and deriving statistics on an author, topic, or country is laborious when consuming Web Services on the command-line or using low level automation. In contrast, scientific workflow managers can integrate different types of software tools to automate multi-step processes. The Taverna workflow engine is a popular open-source scientific workflow manager, giving easy access to available Web Services. In this tutorial, we demonstrate how to design scientific workflows for bibliometric analyses in Taverna by integrating Europe PubMed Central Web Services and statistical analysis tools. To our knowledge, this is also the first time scientific workflow managers have been used to perform bibliometric analyses using these Web Services. (C) 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
As much effort has been made to accelerate the publication of research results, nowadays the number of papers per scientist is much larger than before. In this context, how to identify the representative work for individual researcher is an important yet uneasy problem. Addressing it will help policy makers better evaluate the achievement and potential of researchers. So far, the representative work of a researcher is usually selected as his/her most highly cited paper or the paper published in top journals. Here, we consider the representative work of a scientist as an important paper in his/her area of expertise. Accordingly, we propose a self-avoiding preferential diffusion process to generate personalized ranking of papers for each scientist and identify their representative works. The citation data from American Physical Society (APS) are used to validate our method. We find that the self avoiding preferential diffusion method can rank the Nobel prize winning paper in each Nobel laureate's personal ranking list higher than the citation count and PageRank methods, indicating the effectiveness of our method. Moreover, the robustness analysis shows that our method can highly rank the representative papers of scientists even if partial citation data are available or spurious behaviors exist. The method is finally applied to revealing the research patterns (i.e. consistency-oriented or diversity-oriented) of different scientists, institutes and countries. (C) 2016 Elsevier Ltd. All rights reserved.
The work applies the funnel plot methodology to measure and visualize uncertainty in the research performance of Italian universities in the science disciplines. The performance assessment is carried out at both discipline and overall university level. The findings reveal that for most universities the citation-based indicator used gives insufficient statistical evidence to infer that their research productivity is inferior or superior to the average. This general observation is one that we could indeed expect in a higher education system that is essentially non-competitive. The question is whether the introduction of uncertainty in performance reporting, while technically sound, could weaken institutional motivation to work towards continuous improvement. (C) 2016 Elsevier Ltd. All rights reserved.
Many different citation-based indicators are used by researchers and research evaluators to help evaluate the impact of scholarly outputs. Although the appropriateness of individual citation indicators depends in part on the statistical properties of citation counts, there is no universally agreed best-fitting statistical distribution against which to check them. The two current leading candidates are the discretised lognormal and the hooked or shifted power law. These have been mainly tested on sets of articles from a single field and year but these collections can include multiple specialisms that might dilute their properties. This article fits statistical distributions to 50 large subject-specific journals in the belief that individual journals can be purer than subject categories and may therefore give clearer findings. The results show that in most cases the discretised lognormal fits significantly better than the hooked power law, reversing previous findings for entire subcategories. This suggests that the discretised lognormal is the more appropriate distribution for modelling pure citation data. Thus, future analytical investigations of the properties of citation indicators can use the lognormal distribution to analyse their basic properties. This article also includes improved software for fitting the hooked power law. (C) 2016 Elsevier Ltd. All rights reserved.
In this paper, a new field-normalized indicator is introduced, which is rooted in early insights in bibliometrics, and is compared with several established field-normalized indicators (e.g. the mean normalized citation score, MNCS, and indicators based on percentile approaches). Garfield (1979) emphasizes that bare citation counts from different fields cannot be compared for evaluative purposes, because the "citation potential" can vary significantly between the fields. Garfield (1979) suggests that "the most accurate measure of citation potential is the average number of references per paper published in a given field.Based on this suggestion, the new indicator is basically defined as follows: the citation count of a focal paper is divided by the mean number of cited references in a field to normalize citations. The new indicator is called citation score normalized by cited references (CSNCR). The theoretical analysis of the CSNCR shows that it has the properties of consistency and homogeneous normalization. The close relation of the new indicator to the MNCS is discussed. The empirical comparison of the CSNCR with other field-normalized indicators shows that it is slightly poorer able to field-normalize citation counts than other cited-side normalized indicators (e.g. the MNCS), but its results are favorable compared to two citing-side indicator variants (SNCS indicators). Taken as a whole, the results of this study confirm the ability of established indicators to field-normalize citations. (C) 2016 Elsevier Ltd. All rights reserved.
This paper introduces a systematic technology trend monitoring (TTM) methodology based on an analysis of bibliometric data. Among the key premises for developing a methodology are: (1) the increasing number of data sources addressing different phases of the STI development, and thus requiring a more holistic and integrated analysis; (2) the need for more customized clustering approaches particularly for the purpose of identifying trends; and (3) augmenting the policy impact of trends through gathering future-oriented intelligence on emerging developments and potential disruptive changes. Thus, the TTM methodology developed combines and jointly analyzes different datasets to gain intelligence to cover different phases of the technological evolution starting from the 'emergence' of a technology towards 'supporting' and 'solution' applications and more 'practical' business and market-oriented uses. Furthermore, the study presents a new algorithm for data clustering in order to overcome the weaknesses of readily available clusterization tools for the purpose of identifying technology trends. The present study places the TTM activities into a wider policy context to make use of the outcomes for the purpose of Science, Technology and Innovation policy formulation, and R&D strategy making processes. The methodology developed is demonstrated in the domain of "semantic technologies''.
Journals are routinely evaluated by journal impact factors. However, more controversially, these same impact factors are often used to evaluate authors and groups as well. A more meaningful approach will be to use actual citation rates. Since in each journal there is a very highly skewed distribution of articles according to citation rates, there is little correlation between journal impact factor and actual citation rate of articles from individual scientists or research groups. Simply stated, journal impact factor does not successfully predict high citations in future. In this paper, we propose the use of Peirce's measure of predictive success (Peirce in Science 4(93): 453-454, 1884) to see if the use of journal impact factors to predict high citation rates is acceptable or not. It is seen that this measure is independent of Pearson's correlation (Seglen 1997) and gives a more quantitative refinement of the Type I and Type II classification of Smith (Financ Manag 133-149, 2004). The measures are used to examine the portfolios of some active scientists. It is clear that the journal impact factor is not effective in predicting future citations of successful authors.
There are increasing demands on universities to operate transparently with regard how resources are being used and targets met, putting them under growing pressure to clarify their position relative to other universities at national and international level. But there are several challenges associated with establishing their relative positioning. The first issue is concerned with measurement or how to capture data that are relevant, pertinent and fit-for-purpose. Second, universities must obtain an overall indicator or a means of ordering that helps synthesize the different indicators; and, third, they must decide how to weight them. Those university rankings that attempt to address these questions are met with a degree of criticism for being subjective or inconsistent in their quantification of the indicators. The aim of the present work is to develop a procedure for synthesizing all of the indicators relating to the objective measurement of university R&D and innovation into a single or summary concept. In other words, to establish a procedure that does not require subjective criteria and that can be applied for both absolute and relativized indicators. This approach makes a dual contribution. First, a specific application, in this particular case for the case of the Spanish university system, will be created, to obtain a synthetic indicator for research activity. This will enable us to achieve an R&D and innovation ranking for Spanish universities. Second, the work makes a methodological contribution by using a new technique for synthesizing this type of indicator, namely Partial Least Squares (PLS).
A fundamental problem in the field of the social studies of science is how to measure the patterns of international scientific collaboration to analyse the structure and evolution of scientific fields. This study here confronts the problem by developing an allometric model of morphological changes in order to measure and analyse the relative growth of international research collaboration in comparison with domestic collaboration only for fields of science. Statistical analysis, based on data of internationally co-authored papers from National Science Foundation (1997-2012 period), shows an acceleration (a disproportionate relative growth) of collaboration patterns in medical sciences, social sciences, geosciences, agricultural sciences, and psychology (predominantly applied fields). By contrast, some predominantly basic fields, including physics and mathematics, have lower levels of relative growth in international scientific collaboration. These characteristics of patterns of international research collaboration seem to be vital contributing factors for the evolution of the social dynamics and social construction of science. The main aim of this article is therefore to clarify the on-going evolution of scientific fields that might be driven by the plexus (interwoven combination of parts in a system) of research disciplines, which generates emerging research fields with high growth rates of international scientific collaboration.
Exploring the topic hierarchy of a research field can help us better recognize its intellectual structure. This paper proposes a new method to automatically discover the topic hierarchy, in which the keyword network is constructed to represent topics and their relations, and then decomposed hierarchically into shells using the K-core decomposition method. Adjacent shells with similar morphology are merged into layers according to their density and clustering coefficient. In the keyword network of the digital library field in China, we discover four different layers. The basic layer contains 17 tightly-interconnected core concepts which form the knowledge base of the field. The middle layer contains 13 mediator concepts which are directly connected to technology concepts in the basic layer, showing the knowledge evolution of the field. The detail layer contains 65 concrete concepts which can be grouped into 13 clusters, indicating the research specializations of the field. The marginal layer contains peripheral or isolated concepts.
Funding bodies have tended to encourage collaborative research because it is generally more highly cited than sole author research. But higher mean citation for collaborative articles does not imply collaborative researchers are in general more research productive. This article assesses the extent to which research productivity varies with the number of collaborative partners for long term researchers within three Web of Science subject areas: Information Science & Library Science, Communication and Medical Informatics. When using the whole number counting system, researchers who worked in groups of 2 or 3 were generally the most productive, in terms of producing the most papers and citations. However, when using fractional counting, researchers who worked in groups of 1 or 2 were generally the most productive. The findings need to be interpreted cautiously, however, because authors that produce few academic articles within a field may publish in other fields or leave academia and contribute to society in other ways.
In journalistic publication, Betteridge's Law of Headlines stipulates that "Any headline that ends in a question mark can be answered by the word no.'' When applied to the titles of academic publication, the assertion is referred to as Hinchcliffe's Rule and denigrates the use of the question mark in titles as a "click-bait'' marketing strategy. We examine the titles of all published articles in the year 2014 from five top-ranked and five mid-range journals in each of six academic fields (n = 7845). We describe the form of questions when they occur, and where a title poses a question that can be answered with a "yes'' or "no'' we note the article's substantive answer. We do not find support for the criticism lodged by Betteridge's Law and Hinchcliffe's Rule. Although patterns vary by discipline, titles with questions are posed infrequently overall. Further, most titles with questions do not pose yes/no questions. Finally, the few questions that are posed in yes/no terms are actually more often answered with a "yes'' than with a "no.'' Concerns regarding click-bait questions in academic publications may, therefore, be unwarranted.
The last years have been characterized by tremendous institutional change in the university sector induced by far-reaching Higher Education Reforms (e.g. Bologna). Building on loose-coupling theory, we hypothesize that smaller universities were better able to adapt to the Higher Education Reforms of the recent years, triggering a decline in the optimal size of universities in the reform period. Using a 12-year panel data set on the inputs and outputs of German universities, we find a tremendous decrease in optimal university size, which is driven by the decline in the optimal scale for the provision of teaching activities. Our results also suggest this drop is also due to fact that the relatively higher administrative overheads of larger universities become an organizational liability in times of rapid institutional change.
The present study analysed the readability of abstracts and full texts of the articles published in four journals of information science from 2003 to 2012. The results showed that the abstracts are very difficult to read in terms of readability indices such as FRE and SMOG. The results also showed that some of the readabilities of the abstracts and full texts changed in the examined decade, though the effect sizes were minuscule. Meanwhile, the readability scores were not significantly correlated with the number of citations. Although the readability issue of an academic text is secondary to the impact of the study, it is not suggested that the academic writers not pay any attention to the readability issue. On the contrary, it would be better if the technical texts be more readable and clearer after the knowledge or information is accurately and academically conveyed.
Considering that modern science is conducted primarily through a network of collaborators who organize themselves around key researchers, this research develops and tests a characterization and assessment method that recognizes the particular endogenous, or self-organizing characteristics of research groups. Instead of establishing an ad-hoc unit of analysis and assuming an unspecified network structure, the proposed method uses knowledge footprints, based on backward citations, to measure and compare the performance/productivity of research groups. The method is demonstrated by ranking research groups in Physics, Applied Physics/Condensed Matter/Materials Science and Optics in the leading institutions in Mexico, the results show that the understanding of the scientific performance of an institution changes with a more careful account for the unit of analysis used in the assessment. Moreover, evaluations at the group level provide more accurate assessments since they allow for appropriate comparisons within subfields of science. The proposed method could be used to better understand the self-organizing mechanisms of research groups and have better assessment of their performance.
Partnership between the public and private sectors has been studied using different methodologies; among them, scientific articles offer an objective way to quantify and assess some of these public-private interactions. The present paper takes advantage of the funding acknowledgements (FA) section included in WoS articles written in English and studies some features of the funded research, such as impact and collaboration. For this purpose, articles with Spain in the address field are selected and retrieved (years 2008-2013), dividing them in two sets: articles with or without FA. Besides, given the large volume of items, the study is focused on groups of articles of each area selected by stratified random sampling. Additionally, those items with a FA section are analysed to identify three types of funding sources: only public, only private, or both sectors. The results show differences between areas in terms of presence of FA and types of funding sectors. On the one hand, in general, articles funded by both the public and private sectors present the best impact, as well as the highest number of authors and organisations. On the other hand, there are important variations in impact and collaboration between areas depending on types of funding sectors. Thus, items funded by both the public and private sectors show the highest significant impact in Clinical Medicine, Life Sciences and Physics, having also greater international collaboration, in most areas, than articles funded only by the public sector. Finally, some limitations of this study are identified and some recommendations for funders and authors are offered.
This paper discusses how to translate the well-confirmed phenomenon of increasing citation of older scientific literature into an argument for the persistent citation impact of older scientific journal articles. Since libraries purchase or subscribe to scientific journal articles in packages consisting of journal-years, the citation impact of past journal-years needs to be assessed separately from that of recent years. The simple and flexible (Bouabid in Scientometrics 88:199-211, 2011. doi: 10.1007/s11192-011-0370-5) model, as applied to particular journal-years, is applied and assessed.
How to effectively make an international comparison of the basic research capacity in a specific field is an urgent problem for a country to monitor the science and technology activities. In this paper, we develop a composite index, that is, Basic Research Competitiveness Index (BR-CI), for evaluating the countries' basic research performance versus the world average level over time. For this purpose, the three sub-indexes, namely, Activity Index (AcI), Attractive Index (AtI) and Efficiency Index (EI), are proposed, which are respectively used to assess the countries' research efforts, impacts, efficiency versus the world average level in a particular science field over time. The proposed indicator system can present the cross-time and cross-country comparison of research performance, which can quantify the basic research competiveness relative to the world average level, and describe the competitive landscape among countries in the world. As the first application, this paper employs the four indicators, AcI, AtI, EI and BR-CI, to measure and compare the five leading countries' research performances in the biomass energy field. In short, the modelling and empirical study in this paper not only offers some new references for fully exploring the international competition in the basic research of biomass energy field, but also provides some new perspectives and ideas for examining the international competitiveness in the basic research field of science and technology.
This study developed the Multi-Dimensional Research Agendas Inventory to measure the key factors associated with the process of research agenda setting. Research agendas reflect the preferences, strategies, influences and goals that guide researchers' decisions to investigate specific topics. The results of exploratory and confirmatory factor analyses indicated that the instrument has eight distinct dimensions: Scientific Ambition, Convergence, Divergence, Discovery, Conservative, Tolerance for Low Funding, Mentor Influence and Collaboration. The model underlying the instrument exhibited a very good fit [X-2/df = 1.710; CFI = 0.961; PCFI = 0.791; RMSEA = 0.035; P(rmsea <= 0.05) < 0.001], and the instrument itself was found to have excellent measuring properties (in terms of validity, reliability and sensitivity). Potential interpretations of the instrument and its implications for research and practice are also discussed in this article.
An analysis of Twitter use in 116 conferences suggests that the service is used more extensively at PACS10 conferences (those devoted to the physics of elementary particles and fields) and PACS90 conferences (those devoted to geophysics, astronomy, and astrophysics) than at conferences in other fields of physics. Furthermore, Twitter is used in a qualitatively different manner. A possible reason for these differences is discussed.
This study falls under a broad research agenda for measurement of U.S. government information with a focus on federal statistics and data. It examines patterns in both production and scholarly use of National Center for Health Statistics (NCHS) publications, using a fixed set of NCHS publications as the benchmark. Of 563 publications issued between January 1, 2010 and August 31, 2015, 168 (29.9 %) were cited at least once between January 1, 2010 and November 1, 2015. There were a total 20,550 cites to NCHS publications with an average 122 cites per publication. Fifty NCHS publications were cited at least 100 times, with Prevalence of Obesity in the United States, 2009-2010 (1669 cites); deaths: final data for 2009 (927 cites); and births: final data for 2007 (809 cites) as the top 3 cited publications. Overall, summary statistics for obesity, birth, and mortality were the most highly cited publications. This study is consistent with a previous citation analysis of federal statistics in identifying mortality as a predominant topic for production and use of NCHS publications. It also highlights several topics, including obesity, that were cited with high frequency compared to their lower publishing coverage. Future studies under this research agenda will investigate non-scholarly uses of government information, and remain focused on government statistics and data. Given trends in open government data, big data, and analytics, government statistics and data has been widely adopted in community activism and data journalism, but such use has yet to be analyzed indepth.
Traditional bibliometric indicators aim at helping academic administrators or research investors measure the influence of publications. These indicators focus on how to quantify and compare the scientific output of researchers. However, little attention has been paid to the aspect that bibliometric indicators can also be used to help scientists find valuable referential papers. In this paper, we propose three points to characterize valuable referential papers: first, valuable referential papers always are high-quality research; second, valuable referential papers are closely related to a considerable quantity of recent papers; third, valuable referential papers lead to hotspots which attract papers to follow successively. We extract the critical subnetwork from the original citation network which only reserves the significant nodes and edges that meet the three preceding points. Then we present two indicators on the basis of the critical subnetwork. The experimental results demonstrate that papers recommended by our indicators are relatively new and our indicators have greater Spearman's rank correlation coefficients with the future citation count compared with other bibliometric indicators like the raw citation count.
This study aims to investigate different types and trends of information provided by research titles published in the applied linguistics journals from 1975 to 2015. To this end, 428 research titles published in 63 issues of three applied linguistics journals, namely, Modern Language Journal, Language Learning, and Foreign Language Annals were extracted. The research titles were analyzed for the information they contained in terms of such categories as Method/Design, Result, Dataset, and Conclusion. The results revealed that from 1975 to 2015, the research titles of the three applied linguistics journals tended to provide the most information on Method/Design of the studies. In addition, the research titles containing information on Topic, Result, and Dataset showed fluctuating rates in different time intervals. However, the research titles containing information about Conclusion had a falling rate in the three journals. The study concludes with some discussion on the results obtained.
Based on bibliometric analysis, this paper identified certain characteristics of literature related to river water quality assessment and simulation and consequently to assist researchers to establish future research directions. There were 3701 articles pertinent to river assessment and simulation published by SCIE and SSCI databases from 2000 to 2014. Various publication characteristics were analyzed, such as countries, research organizations, subject categories, journals and keywords. Results showed there was a significant growth in total publications over the past 15 years. The USA took a leading position out of 104 countries/territories, followed by China and the UK. Similarly, Chinese Academy of Sciences was the most significant contributor in this field of research. Environmental sciences and water resources were the top two most central subject categories and journal of hydrology was the most productive journal. Singh K. P ranked the first in terms of comprehensive index in all core authors. Five clusters were identified in terms of keywords networks. And temporal trend of keywords indicated nutrient and eutrophication is the hot topics and SWAT is widely accepted as the model to study water quality in the past 15 year.
This article analyses the development of effectiveness and efficiency of German business schools' research production between 2001 and 2009. The results suggest that effectiveness for most of the examined business schools increases initially. Then, however, a declining trend in the further course of time can be observed. Similar tendencies can be stated considering efficiency, even though they are slightly less pronounced. An analysis of the reasons for these observations reveals that the initial positive developments of effectiveness and of efficiency are mainly due to technology advances, whereas the following decreases are basically a result of technology backwardness. In regard to different types of business schools, a strong relation between the reputation of a school and the research effectiveness of that school becomes apparent. With reference to geographical regions, Western and Southern German business schools feature higher effectiveness than their Northern or Eastern counterparts do. This statement, however, is not valid in terms of efficiency.
The formation, evolution, and dynamics of Industry-University-Research Institute (IUR) scientific collaborations in China have not been uncovered fully in extant literature. This study seeks to fill this research gap based on a novel sample of the China Academy of Sciences (CAS) from an ego-network perspective and especially reveals the guiding role of government policies. By taking the inter-organizational scientific collaboration systems of the CAS with enterprises and universities as a proxy for IUR collaborations in China, we explore the dynamic evolution and characteristics of the IUR collaboration networks in China during the period from 1978 to 2015. Our study reveals a simultaneous trend in accordance with the effects of the government's science and technology (S&T) policies on shaping the linkages among public research institutes, enterprises and universities during the last several decades. In particular, we find that S&T policies issued by the government may affect the dynamic evolution of the small-world structure in the scientific collaboration networks of public research institutes with enterprises and universities over time. This study enriches the empirical research on IUR collaborations in the context of China by examining the patterns of bilateral or trilateral collaborations between and among CAS with industries and universities around this country, which not only contributes to understanding the dynamic evolution of China's IUR collaborations in the context of a series of government S&T policies but also helps deepen our understanding of the characteristics of China's national innovation system.
Statistical methods play an important role in medical and dental research. Earlier studies have found that the current use of methods and statistical reporting lead to errors in interpreting results. This study aimed to compare statistical methods and reporting between dental articles and reports published in highly visible medical journals. We analyzed 200 papers published in 2010 in five dental journals and 240 papers published between 2007 and 2011 in the New England Journal of Medicine (NEJM) and the Lancet. We summarized the characteristics of the informed authors, classified the articles by study design type and reviewed the main strategy in the analysis of the primary research question. We also assessed the frequency with which the articles report various statistical methods. We then examined the differences between the dental and medical articles. The median number of authors in articles in the dental versus those in the Lancet and NEJM articles was 5 versus 12. International co-operation in the dental journals was lower than in the medical journals. The proportion of papers reporting "significant'' results was 62.5 % in the dental journals and 48.3 % in the Lancet and the NEJM. The percentage frequencies of statistical procedures used in the two sets of articles indicate a broader use of statistical methods in the Lancet and the NEJM, and that both journals were significantly more likely to use advanced statistical methods. Improving the application and presentation of statistical methods in dental articles is essential to meeting the current and future goals of dental research.
A mathematical structure for defining multi-valued bibliometric indices is provided with the aim of measuring the impact of general sources of information others than articles and journals-for example, repositories of datasets. The aim of the model is to use several scalar indices at the same time for giving a measure of the impact of a given source of information, that is, we construct vector valued indices. We use the properties of these vector valued indices in order to give a global answer to the problem of finding the optimal scalar index for measuring a particular aspect of the impact of an information source, depending on the criterion we want to fix for the evaluation of this impact. The main restrictions of our model are (1) it uses finite sets of scalar impact indices (altmetrics), and (2) these indices are assumed to be additive. The optimization procedure for finding the best tool for a fixed criterion is also presented. In particular, we show how to create an impact measure completely adapted to the policy of a specific research institution.
Besides the spread of knowledge, publications are often related to promotions and academic progression, so timing is vital. Among students in universities, there is a belief that a journal's high impact factor means fast publishing time in ecology journals, such as the time between submission to acceptance and subsequent online posting in journal's Web sites. Here we tested this assumption, and we also examined if a journal's charges, paper length and the number of papers published per year were related to publishing time, specifically the period between submission and online availability of the accepted manuscript. After a thorough survey in 29 ecology journals, we found that publishing time was negatively and significantly related to journal's impact factor, and also negatively (but non-significantly) to the number of paper published per year per journal and positively (but also not significantly) to paper length. Publishing time depended also on journal identity, but there was a large variation from the time between manuscript submission to final acceptance and online posting among journals. Several factors with a high degree of unpredictability and randomness are involved in the publication process, and here we found that journals with high impact factor publish the papers faster compared to journals with low factors. Even with substantial publishing time, e.g., on average 167 days between submission to acceptance and 223 days for online posting, editorial delays in ecology journals are quicker than journals in other disciplines/sciences.
A journal's impact factor (IF) may be boosted by increasing self-citations. We aimed to determine the self-citation rate (SCR) of pediatric journals registered in the Journal Citations Report (JCR), to evaluate the impact of SCR upon the IF and to determine the effect of the SCR of a journal on its IF. We found 117 journals categorized as pediatric journals by the JCR (as of 2013). The median and range of SCR, IF and corrected IF (IF without self-citations) were 9 % (0-30 %), 1.54 (0-6.35) and 1.37 (0-5.87) respectively. No differences were found between general and subspecialty journals in terms of SCR, IF or corrected IF. Spearman's ranked correlation showed that IF was significantly and inversely correlated with SCR (r = -0.28, P = 0.002; R-2 = 0.08). There was a significant difference between IF and corrected IF among all journals (1.74 +/- 1.04 vs 1.59 +/- 0.98, P < 0.001). Self-citation is relatively rare in pediatric journals. Importantly and unlike other fields of medicine, self-citation was found to be more prevalent in journals with a lower IF and also with lower corrected IF.
Using a 40-year (from 1975 to 2015) hiring dataset of 642 library and Information science (LIS) faculty members from 44 US universities, this research reveals the disciplinary characteristics of LIS through several key aspects including gender, rank, country, university, major, and research area. Results show that genders and ranks among LIS faculty members are evenly distributed; geographically, more than 90 % of LIS faculty members received doctoral degrees in the US; meanwhile, 60 % of LIS faculty received Ph.D. in LIS, followed by Computer Science and Education; in regards to research interests, Human-Computer interaction, Digital Librarianship, Knowledge Organization and Management, and Information Behavior are the most popular research areas among LIS faculty members. Through a series of dynamic analyses, this study shows that the educational background of LIS faculty members is becoming increasingly diverse; in addition, research areas such as Human-Computer interaction, Social Network Analysis, Services for Children and Youth, Information Literacy, Information Ethics and Policy, and Data and Text Mining, Natural Language Processing, Machine Learning have received an increasing popularity. Predictive analyses are performed to discover trends on majors and research areas. Results show that the growth rate of LIS faculty members is linearly distributed. In addition, among faculty member's Ph.D. majors, the share of LIS is decreasing while that the share of Computer Science is growing; among faculty members' research areas, the share of Human-Computer interaction is on the rise.
Defining and measuring internationality as a function of influence diffusion of scientific journals is an open problem. There exists no metric to rank journals based on the extent or scale of internationality. Measuring internationality is qualitative, vague, open to interpretation and is limited by vested interests. With the tremendous increase in the number of journals in various fields and the unflinching desire of academics across the globe to publish in "international'' journals, it has become an absolute necessity to evaluate, rank and categorize journals based on internationality. Authors, in the current work have defined internationality as a measure of influence that transcends across geographic boundaries. There are concerns raised by the authors about unethical practices reflected in the process of journal publication whereby scholarly influence of a select few are artificially boosted, primarily by resorting to editorial maneuvers. To counter the impact of such tactics, authors have come up with a new method that defines and measures internationality by eliminating such local effects when computing the influence of journals. A new metric, Non-Local Influence Quotient is proposed as one such parameter for internationality computation along with another novel metric, Other-Citation Quotient as the complement of the ratio of self-citation and total citation. In addition, SNIP and international collaboration ratio are used as two other parameters. As these journal parameters are not readily available in one place, algorithms to scrape these metrics are written and documented as a part of the current manuscript. Cobb-Douglas production function is utilized as a model to compute Journal Internationality Modeling Index. Current work elucidates the metric acquisition algorithms while delivering arguments in favor of the suitability of the proposed model. Acquired data is corroborated by different supervised learning techniques. As part of future work, the authors present a bigger picture, Reputation and Global Influence Score, that will be computed to facilitate the formation of clusters of journals of high, moderate and low internationality.
Water security has been an emerging and rapidly developing new research area. A bibiometric study is very helpful. By sufficiently analyzing the data from all related items between 1998 and 2015 obtained in Web of Science databases, we found the publications in overall scopes, various subjects, countries or journals all matched logistic growths with large value of K (maximum possible publication) and small value of b (related to growth rate). The most promising subjects were environmental sciences and water resources, and Zipf's law of publication distribution in all subjects was satisfied. USA owned maximum publications, whereas Canada had a more latent capacity. USA and UK dominated the collaborative network. With "Water Science and Technology'' as the most active journal, the Bradford's scattering distribution of publications in all journals was elucidated. The productivity of the authors showed a rough Lotka's distribution. Besides "water security'' and "water safety'', "climate change'' was the hottest keyword. The co-words patterns revealed the wide mutual influences between water security and climate change. No significant aging for the highly cited publications implied the past, future and vitality of rapidly-developing water security research. Our findings drawn from a suite of bibliometric indicators are instructive for the future studies of researchers, the strategy/policy of countries and the efforts of publishing organizations, altogether prompting the global water security research.
This review is a comprehensive quantitative analysis of the International Business literature whose focus is on national culture. The analysis relies on a broad range of bibliometric techniques as productivity rankings, citation analysis (individual and cumulative), study of collaborative research patterns, and analysis of the knowledge base. It provides insights on (1) faculty and institutional research productivity and performance; (2) articles, institutions, and scholars' influence in the contents of the field and its research agenda; and (3) national and international collaborative research trends. The study also explores the body of literature that has exerted the greatest impact on the researched set of selected articles.
The research on magnetic nanoparticles attracts scientists from broad disciplines including chemistry, physics, and biomedical science. It is a great challenge for scientists from different background to discover the development trends and research fronts that are embodied in publications from different disciplines. This article aims to portray the global research profile and detect research fronts of magnetic nanoparticles by taking advantages of scientometric approaches. A total of 13,464 publications regarding magnetic nanoparticles indexed by Web of Science during 2000-2015 were used for a detailed analysis of the global magnetic nanoparticles research performance. The 500 most-cited publications on magnetic nanoparticles were analyzed for the temporal-spatial distribution characteristics as well as co-citation networks and co-word networks to identify research fronts and development trends. This study revealed that 'block-copolymers' attracted most attentions in high quality research of MNPs. Researches on yadh-bound MNPs were among the most hot MNPs topics. Recently, researches on catalysis characteristics emerged as the hot MNPs topics.
There is a considerable evidence that academicians who have surname initials that are placed early in the alphabet have advantage in publications, citations and other academic outcomes when they work in academic fields that order author names alphabetically. We analyze the distributions of the full professors' surnames initial letters in nine academic fields. We are unable to find the expected effect of alphabetization on the academic careers. The academicians who have surname initials that are placed early in the alphabet are not more prevalent in alphabetic academic fields compared to non-alphabetic academic fields. The academicians who are at the top departments are not more likely to have surname initials that are placed early in the alphabet compared to the academicians who are at the lower ranked departments.
Cuba has developed a biopharmaceutical sector that involves some of the country's most relevant scientific institutions. Despite the severe constraints on resources resulting from the U.S. embargo, the results achieved by this sector have contributed to put the country's health indicators at the same level of high-income nations. Recently, the creation of BioCubaFarma as a cluster of high-technology enterprises organized around a closed cycle model becomes one of the most relevant efforts of the Island in order to make biopharmaceuticals one of the country's leading export earners. The main aim of the current paper was to characterize BioCubaFarma through a battery of Scopus-based bibliometric indicators. A comparison with the most productive multinational pharmaceutical companies was made. Regression analysis of annual productivity, number of citations, scientific talent pool, innovative knowledge and other citation-based indicators was performed. Differences and similarities between BioCubaFarma and multinational companies in four Scopus subject categories related to this sector were identified. The most productive and visible institutions from BioCubaFarma were also characterized. Qualified human resources, innovative knowledge, leadership, high specialization in the field of vaccines development and non-dependence of international collaboration are strengths of the organization. However, it is still necessary to increase the number of articles published in highly visible journals with the aim to achieve a better citation-based performance. Moreover, to increase the contributions from less-productive institutions, more clinical research published in medical journals and more collaboration with universities and health institutions could also have positive benefits for BioCubaFarma's pipelines and portfolios.
In comparison to the many dozens of articles reviewing and comparing (coverage of) the Web of Science, Scopus, and Google Scholar, the bibliometric research community has paid very little attention to Microsoft Academic Search (MAS). An important reason for the bibliometric community's lack of enthusiasm might have been that MAS coverage was fairly limited, and that almost no new coverage had been added since 2012. Recently, however, Microsoft introduced a new service-Microsoft Academic-built on content that search engine Bing crawls from the web. This article assesses Microsoft Academic coverage through a detailed comparison of the publication and citation record of a single academic for each the four main citation databases: Google Scholar, Microsoft Academic, the Web of Science, and Scopus. Overall, this first small-scale case study suggests that the new incarnation of Microsoft Academic presents us with an excellent alternative for citation analysis. If our findings can be confirmed by larger-scale studies, Microsoft Academic might well turn out to combine the advantages of broader coverage, as displayed by Google Scholar, with the advantages of a more structured approach to data presentation, typical of Scopus and the Web of Science. If so, the new Microsoft Academic service would truly be a Phoenix arisen from the ashes.
During the Italian research assessment exercise, the national agency ANVUR performed an experiment to assess agreement between grades attributed to journal articles by informed peer review (IR) and by bibliometrics. A sample of articles was evaluated by using both methods and agreement was analyzed by weighted Cohen's kappas. ANVUR presented results as indicating an overall "good'' or "more than adequate'' agreement. This paper re-examines the experiment results according to the available statistical guidelines for interpreting kappa values, by showing that the degree of agreement (always in the range 0.09-0.42) has to be interpreted, for all research fields, as unacceptable, poor or, in a few cases, as, at most, fair. The only notable exception, confirmed also by a statistical meta-analysis, was a moderate agreement for economics and statistics (Area 13) and its sub-fields. We show that the experiment protocol adopted in Area 13 was substantially modified with respect to all the other research fields, to the point that results for economics and statistics have to be considered as fatally flawed. The evidence of a poor agreement supports the conclusion that IR and bibliometrics do not produce similar results, and that the adoption of both methods in the Italian research assessment possibly introduced systematic and unknown biases in its final results. The conclusion reached by ANVUR must be reversed: the available evidence does not justify at all the joint use of IR and bibliometrics within the same research assessment exercise.
Strategic management remains a recent field of research that is dynamic and changing with the global business economy. Given the sheer importance of research on this field of business management, this paper aims to conduct a co-citation bibliometric analysis of strategic management research. We map the authors and the most relevant approaches as well as detailing the new theoretical perspectives to strategic management theory. The analysis conducted uses three multivariate statistical analysis techniques in addition to the co-citation matrix to shed light on these issues. By incorporating all the citations that are included in the Science Citation Index and the Social Science Citation Index, we analyze co-citation patterns of the strategic management field during the period 1971-2014 and identify six subfields (clusters) that constitute the intellectual structure and investigate their mutual relationships. The main findings of the factor analysis suggest that there is a clear division between strategic entrepreneurship and corporate entrepreneurship. In addition, the concept of strategic behavior affects most strategic management research as evidenced by the co-citation analysis. A debate of future directions on the strategic management literature is discussed, which highlights the importance of combining more of a strategic entrepreneurship perspective based on behavioral intentions to the emerging research.
As a relatively young and vibrant discipline, shareholder activism has evolved to become a critical element of corporate governance research. While both finance and law scholars have revealed strong interest on shareholder activism, the heterogeneity in the research concentration of these two fields makes shareholder activism a fragmented discipline. We believe it would be interesting and insightful to conduct a citation-based analysis through the case study encompassing related articles on this topic published in both finance and law journals. We have adopted main path analysis to map out the evolution and research fronts of shareholder activism spanning over 30 years. The results indicated that shareholder activism research has developed in several stages: discussions on theoretical foundation, explorations on the "shareholder activism versus firm performance" correlation, the exceptions to the "one-share, one-vote" rule, the emergence of hedge fund activism research, and the recent "say-on-pay" campaigns. Edge-betweenness based clustering was used to categorize the citation network into coherent groups, with the most popular themes for the period 2003-2013 including "pressure and monitoring from institutional investors", "shareholder activism in CSR and climate change", "say-on-pay movement and board responsiveness", and "shareholder voting and shareholder rights". The trends are also discussed herein. This case study should contribute to future studies in both the practical and academic arenas.
With great versatile characteristics, micro/nano-bubble related research have attracted much attention due to their extensive applications in the last half century. Researchers not merely focus on their physi-chemical properties, but also aim at their well-controlled generation methods and potential adhibition field. It can be expected that the future prospects of micro/nano-bubble related research will be tremendous and that there will be even more to be explored. In this case study, a bibliometric analysis was conducted to evaluate micro/nano-bubble related research from 1991 to 2014, based on the Science Citation Index EXPANDED database. The Ultrasound in Medicine and Biology with the highest h-index of 56 is the leading journal in this field, publishing 6.9 % of articles over this period, followed by Langmuir and Journal of the Acoustical Society of America. USA and the Univ Toronto, Canada were the most productive country and institution, respectively, while the USA, was the most internationally collaborative and had the highest h-index (111) of all countries. A new method named "word cluster analysis" was successfully applied to trace the research hotspots. Innovation in detection means and novel pathways for medical applications via micro/nano-bubble is considered to relate to the increasingly new types and varieties of diseases or cancers, as well as the well-controlled generation of micro/nano-bubbles.
This research aims to estimate the partnership ability of Scientometrics journal authors based on WoS data from 2001 to 2013 according to the partnership ability index (I center dot-index). I center dot-index combines the number of co-authors and the times each of them acted as co-authors with a given author exactly the same way as Hirsch's h-index combines the number of publications and their citation rate. This indicator estimates the quality of the writing, stating the author suggests how much ability involved. This research uses the scientometric technique and also a social network analysis approach. For this study a Scientometrics co-authorship map was drawn during 2001-2013. The results of the co-authorship map showed 101 nodes. Centrality measures were estimated and according to the closeness centrality measurement, 56 authors were selected for estimating I center dot-index. The result also showed that Glanzel and Moed ranked the first and second in all 3 centrality measures, respectively. Based on the results of an estimated I center dot-index, Glanzel and Schubert ranked first and second, respectively. This research revealed in detail that, the less scattered the co-authors of an author, the higher I center dot-index.
There is a growing need for businesses to pay attention to human well-being, providing people with the conditions necessary for meeting organizational goals, which depend on logistical, production, and supply chain processes functioning as a whole. In that light, the present work seeks to construct a mapping of existing academic publications regarding ergonomics within the field of logistics, and assemble a bibliographic portfolio of the most relevant and applicable works, in the opinion of the researchers. Towards this end, a bibliometric analysis was carried out with the goal of gaining insight as to the most-recurring topics: authors, articles, periodicals and keywords in this field. This work is characterized as exploratory-descriptive research; besides this, it employs mixed qualitative and quantitative data analysis by way of the Knowledge Development Process-Constructivist intervention instrument (ProKnow-C). Resulting from this, 15 relevant and 512 reference articles were selected, which went on to form the bibliographic portfolio. Upon completion of bibliometric analysis of the articles and references composing the bibliographic portfolio, the following evidences was concluded: the academic journals International Journal of Industrial Ergonomics, Ergonomics and Applied Ergonomics; the articles titled "Precision of measurements of physical workload during standardised manual handling. Part I: Surface electromyography of m. trapezius, m. infraspinatus and the forearm extensors" and "Ergonomic evaluation of complex work: A participative approach employing video-computer interaction, exemplified in a study of order picking"; the keywords "Human Factors", "Supply Chain", "Material Handling", "Logistics" and "Ergonomics"; and the most-cited authors being the following: Neumann, W. P.; Marras, W. S.; Winkel, J.; Hansson, G. .; Skerfving S.; Mathiassen, S. E., and Medbo, L. T. Once the results are unique, they are appropriate only for the particular case, so it cannot be recommended for other contexts. The ProKnow-C process, however, is general and can be used in any context.
Here, we carried out a scientometric analysis of the scientific literature published in the field of reproductive medicine (RM), an important and rapidly evolving branch of Medicine. To this aim, we analysed the research papers published in the period 1996-2011 in RM and we assessed the bibliometric parameters (number of citable documents, cites per document and H-index) and the extent of international collaboration among different geographical Regions and Countries. Further, we studied the link of RM with other disciplines in a Journals co-citation study, then the keyword citation bursts were assessed. Ultimately, the social behaviour of Authors involved in RM was analysed in a social network study. As a result, we found that still it exists marked differences in term of quantity and quality of production in RM among the different Regions and Countries, likely due to the different economic effort to sustain research. The most productive ones, Western Europe and Northern America and USA and UK, respectively, display a slow decrease in production, while Asia (and China) are amazingly growing. Unfortunately, Eastern Europe, Latin America and Africa are still very backward. In addition, RM is directly connected with basics disciplines but not with other medical specialities and the most cited keywords suggest the shift from basic to more applied research. Finally, the Authors involved in RM originate a small world community, in which it is possible to identify the brokers of information flow. In conclusion, this analysis could contribute to the knowledge of scientific production related to RM field.
Brazilian scientific output in the field of Neurosciences is analyzed based on articles indexed in Web of Science from 2006 to 2013 according to bibliometric indicators of production, collaboration, impact and keywords analysis. The growth rate of Brazilian scientific output is greater than global scientific production in the area, with a higher percentage of articles in English than other research areas in Brazil and Brazilian neuroscientists preferring to publish their work in foreign journals. However, Portuguese papers were also observed in domestic journals in connection mainly to one research focus-Psychiatry. Modes of production in the area are also transdisciplinary when analyzed within the scope of research topics, which branch into issues related to basic and experimental research as well as clinical research. In addition, the Brazilian Neurosciences output is highly concentrated to a small number of authors, regions, and particularly institutions, with most output coming from public universities in the southeastern and southern states. However, there is greater participation by the private sector than in other fields of knowledge (mainly private universities and hospitals). Interinstitutional collaboration occurs in 60.79 % and international collaboration in 29.40 %. Brazil's main partners in international collaboration are the USA, Colombia, Argentina and the UK. With regard to citations, journals that most cite Brazilian Neurosciences are US, English and English-language Dutch publications, but the citing authors are linked to institutions on all continents of the world. It concludes that global reach and accelerated productivity growth does not translate into excellent impact. Thus, it is suggested to conduct further studies to determine why research is scarce in the northern and northeastern states.
This study analyzed the characteristics of coactivity in the field of fuel cells at institutional and individual levels by examining Science Citation Index papers and U.S. patents between 1991 and 2010. The findings reveal that few coactive institutions or individuals adopt a balanced approach to publishing papers and filing patents. Substantial differences in productivity were observed between research institutions and companies. Research institutions focus on research and publications, whereas companies emphasize technology innovation and obtaining patents. Furthermore, the difference between research institutions and companies in productivity regarding the publishing of papers is less substantial than that regarding filing patents. Companies exhibit coactive performance that is superior to that of research institutions.
To date, less care has been taken to quantitatively visualize the intellectual evolution of transport geography research than to qualitatively review this field. Based on big-data literature from the Thomson Reuters Web of Science as well as scientometric mapping analysis, this important research topic is analyzed by techniques from informetric domains to detect its developmental landscape. After data reduction and clean-up, 4840 articles published from 1982 to 2014 are identified on which two network analyses are conducted: a bibliometric approach (i.e. co-occurrence and co-citation network) and a complex network approach utilizing C. Chen's CiteSpaceII, O. Persson's BibExcel and ESRI's ArcGIS. Results illustrate the following: (1) periods including the rise (1960-1970s), to a stagnation period (1980-1990s), to a boom (since 1990); (2) that the change of research frontiers and hot issues is either social oriented or topic oriented; (3) that its development owes a good deal to cooperative subnetworks (schools) of six academic communities-Urban Planning, Marxist Geography, Mobility Turn, New Economic Geography, Port Geography, and Time Geography; and (4) that its research methods tend to be diversified and integrated, while its research perspective is inclined to be microcosmic and oriented to social hot issues. Finally, 23 documents are identified as playing the pivotal role in its knowledge evolution as an intellectual base.
This study examines the structural patterns of international co-institutions and co-authors in science citation index papers in the research domain of the Internet of Things (IoT). The study uses measures from the social network analysis method, including degree centrality, betweenness centrality, eigenvector centrality, and effectiveness, to investigate the effects of social networks. In addition, the study proposes a prediction model for assessing the semantic relevancy of research papers in the field of IoT (Regarding social science approach on semantic analysis, refer to Jung and Park in Gov Inf Q, 2015a. doi:, 32(3):353-358, 2015b). For the analysis, 815 research papers were selected from the Web of Science database for the 1993-2015 period. Empirical analysis results identify China as the most central country, followed by the U.S., Spain, the U.K., and Sweden, in terms of the co-authored network. Similarly, the Chinese Academy of Sciences, the Beijing University of Posts and Telecommunications, and Shanghai Jiao Tong University were ranked first, third, and fourth, respectively, among the top five co-institutions. Northeastern University (U.S.) and the University of Surrey (U.K.) ranked second and fifth, respectively. A confusion matrix was used to validate the accuracy of the proposed model. The accuracy of the prediction model was 76.84 %, whereas recall for the model (ability of a search to find all relevant items in the corpus) was 94.47 %.
This study explores the evolution of institutional collaborations in articles published in the Strategic Management Journal between 1980 and 2014 via descriptive analysis and social network analysis. These analyses show that, in each sub-period, the number of institutions involved, as measured by papers published, increased significantly and a significant number of new institutions participated in the strategic management community via the SMJ. However, a few institutions from the US dominated the field. The collaboration network was weakly clustered, fragmented, and scattered, and the relationship among institutions was not close. International collaborations have been growing based on center-periphery, international trade, and social factors, instead of geographic proximity. An inclusive evaluation of the results, limitations, and suggestions for future research is provided.
This study presents an analysis of environmental law research in the natural science and social science fields from a bibliometric perspective. Document type, publication language, annual output, and distributions of countries were quantitatively characterized and compared in the Science Citation Index Expanded (SCI-EXPANDED) and the Social Science Index (SSCI). The citation history of highly cited articles and word analysis were used to examine research tendencies and "hotspots" in environmental law research. The results show that from 1992 to 2014 SCI-EXPANDED has published more research in environmental laws than SSCI except in 2011. The USA is the most productive country in both databases. Developing countries such as China, India, and Brazil are among the top 10 productive countries in SCI-EXPANDED, while in that of SSCI, China is the only developing country. The USA had the most frequent collaborations with other countries both in SCI-EXPANDED and SSCI; collaborations were more frequent in SSCI than in the SCI-EXPANDED. Words analysis reveals that "sustainability", "compliance", and "environmental management" are key issues in SSCI, while articles in SCI-EXPANDED pay more attention to "risk assessment", "recycling", "wastewater treatment", and "temperature". "China" is a key issue in research in both natural science and social science field, which indicates the rapid development of environmental law research in China as well as the growing concerns of China's environmental law issues.
The development of environmentally friendly products is one of the key contemporary trends in the environmental management and planning field of knowledge. Ecodesign is considered a practical mechanism for integrating environmental considerations throughout the life cycle of the product. Within this scope, the aim of this paper is to systematize the publications on ecodesign and to propose the historical evolutionary phases of this area, considering important characteristics such as geographical distribution. To this end, a bibliometric analysis was performed by identifying key papers, authors, and journals that deal with the theme and the history of the number of papers published. Among the results, a recent growth in publications was found, with a wide range of authors conducting research and publishing papers on the subject. The majority of research is conducted in European countries, especially France and Nordic region. Most journals that publish papers on ecodesign are from the environmental field as opposed to those that deal with new product development and innovation and project management. This work also identifies historical research phases; among the most recent, it is possible to notice efforts to link ecodesign with other areas of management, such as the fuzzy method, lean product development, and project management.
Bibliometric indicators, used appropriately with other indicators, provide a valuable tool for the evaluation of scientific activity. The article focuses on the review of the scientific literature on crisis and tourism by means of these indicators, using sources provided by the Institute for Scientific Information. The aim is to prove the consolidation of this field by designing a Thematic Level of the Field (TLF) that demonstrates its continuity, collaboration, visibility and impact, while identifying any trends and changes. Bibliographic citations have been obtained by examining information from the Web of Science, using the titles of articles from 1956 to 2013 from the databases of the Social Sciences Citation Index and the Science Citation Index Expanded. The end product is the result of processing this information with different proximity measurements and the application of weighting factors, which lead to the creation of the TLF, classified according to different value-words.
This paper looks at the patterns of collaborative scientific output during the first 10 years of the New Millennium (2000-2009) based on Web of Science data for those Mexican institutions and departments known to be researching in Chemistry. Of the 17,109 papers retrieved 76.9 % were in collaboration, with 34.3 % of these involving foreign institutions. We analysed the collaboration links with foreign partners using visualizations and their dynamics by determining the combination and frequency of individual occurrences and by establishing the sequence found in our country co-authorship chains. Bilateral partnerships were the most common predominantly with the USA and to the lesser extent with Spain. These two countries are also the protagonists of the most frequent trilateral co-authorships. Collaboration with other Latin American countries is infrequent and mainly bilateral. The number of partner countries increased from 75 to 92 from the first to the second quinquennium. With respect to the countries emerging in the second period we find a greater occurrence and repetition of bilateral partnerships and a notable presence of Mexico's main industrialised partners in the corresponding chains. Similar numbers of journals were found for national and international collaborative papers with important differences but several coincidences. The subject range of journals was diverse in both cases, reflecting the interdisciplinary nature of the field but with an important presence of titles specializing in subfields of Chemistry. Our findings provide new insight into the way countries interact and communicate when co-authoring with developing countries.
This study has used the social network analysis technique to effectively decompose the structure of the author, institution, country, and journal networks of the work engagement domain. Using articles from the Web of Science database, 1406 publications published in SCI, SCI-E, SSCI, and A&HCI outlets between 1990 and 2015 period were extracted and included in the analysis. Following an examination of this domain, we found that the network is relatively un-fragmented with a strong core that contains one large community of authors. This study also found evidence among the collaboration networks to suggest the power law distribution exists in the author and institutional networks in which the incoming nodes and links prefer to attach to the nodes that are already well connected. The study also sheds light on both the hidden network collaboration among authors, countries, and within the work engagement domain.
This study aimed to identify and analyze the characteristics of the highly cited articles in Antarctic field using Science Citation Index Expanded from 1900 to 2012. Articles that have been cited more than 100 times since publication to 2012 were assessed. The analyzed aspects covered distribution of annual production, annual citations, journals, categories, countries/territories, institutions, authors, and research focuses and trends by words in title, author keywords, and KeyWords Plus. A total of 852 highly cited articles were published from 1959 to 2011, cited a mean number of 181 citations per article. Two famous journals: Nature and Science led 184 journals. Typically, the exploration of Antarctic needs multidisciplinary science, also involving more collaboration. The USA with the greatest manpower, took the lead position among 48 countries, while National Aeronautics and Space Administration of the USA and British Antarctic Survey of the UK were the two most productive institutions. European Project for Ice Coring in Antarctica Community was active in Antarctic research. Moreover, the comprehensive analysis of keywords revealed that sea ice, Southern Ocean, climate change, and ozone depletion were recent focuses and would receive more citations in the near future. In addition, citations in the first 3 years after publication (TC (3)), in 2012 (C (2012)), and since publication to 2012 (TC (2012)), and citations per year of each article (TCPY) were used to characterize the citation patterns and citation life of most cited articles.
Relying on the Science Citation Index Expanded and Social Science Citation Index databases, a bibliometric analysis of global studies on the genetically modified foods (GMF) in the last 20 years was conducted. We explored the knowledge foundations, research areas, authorships, spatiotemporal patterns, and the trendy. The GMF-related research maintains stable growth with established research teams and sufficient funds in the recent years. GMF-related research is a young field with a newly established intellectual base, and widely recognized as bio-science study. Journal of Agricultural and Food Chemistry, European Food Research and Technology, Food and Chemical Toxicology were the most active journals in the field. H. Akiyama, R. Teshima, A. Hino and A. Cifuentes were the most prolific authors in GMF-related research. For countries, both the pro-GMF and anti-GMF countries have pushed relevant research, including USA, EU, and Japan. Considering the GMF as national non-traditional security issue, inter-institute collaborations were more visible than that of international collaborations. Through keyword analysis, we found that the trendy of GMF-related research has been shifted from a pure technological perspective to a combination of technology, food safety and public acceptance.
The categorization of Asperger syndrome has suffered criticism from its origins. Hence our decision to carry out a bibliometric study with the aim to analyse how the treatment of this topic has evolved through a set of bibliometric indicators referred to variables of scientific production (authorship, research topics and areasaEuro broken vertical bar) during the period 1990-2014. We used different database like Medline, Inspec, Biosis Citation Index, SciELO Citation Index, Web of science, current contents connect that allowed us to obtain a study sample of 3452 papers distributed across 574 journals. The results show a gradual increase of scientific production over the last 12 years, which is larger amongst English-speaking countries. The topic under analysis was covered to a greater extent from the Behavioral Science, rather than on educational intervention. The conclusion shows that a need exists to undertake studies which implement proposals for intervention with these students.
We compared the scientific output of 16 countries from the Middle East during the period 1996-2014 to 27 countries from West Europe and to the average world production, analyzing data year by year, in order to find trends. Overall, our data shows that during the period 1996-2014 Israel was the leading nation in the Middle East in terms of the total number of citations and of total citations per document, while Turkey and Iran were in the lead in terms of scientific documents produced and, together with Egypt and Saudi Arabia, were among the emerging countries in the Middle East in terms of scientific production. Israel has been slowing losing its relative (to the world) weight in terms of scientific production over the last years, following a trend that mirrors Western Europe countries, due to a rapid increase in scientific production and rising impact of new emerging countries. Also, while four emerging countries (Iran, Turkey, Saudi Arabia and Egypt) have been rapidly rising, the bottom countries were still under-performing when compared to the world average. Our findings show that the Middle Eastern counties greatly differed in terms of scientific production over time, that no common trend could be found among them, and that there was a profound imbalance in terms of scientific performance, highlighting a big divide between the top 5 and the other countries of the Middle East.
International scientific collaboration is strategic for the growth of a country, in particular for developing countries. Among these ones the five Brazil, Russia, India, China and South Africa (BRICS) have a relevant role, also because they are joined in an association to foster mutual development. The present article studies the network of international scientific collaborations existing around the five BRICS. It does so considering the number of coauthored scientific product having authorship shared between two different countries of a group of 70: the five BRICS plus 65 countries strongly collaborating with them. Absolute numbers of coauthored scientific products are arranged in a contingency table, and Probabilistic Affinity Indexes (chosen due to size independency) are then calculated. Indexes show the relative strength of inter-BRICS collaborations with respect to the network surrounding the five countries. At the end of the work obtained results are discussed and commented, and policy suggestions are offered.
International Ocean Discovery Program (IODP 2013-2023) and its predecessors is the largest and longest international cooperation research program in earth science. It requires a strong financial support from government. Therefore, a study to compare research achievements among main IODP members was carried out. All peer-reviewed publications related with IODP can be acquired through database built by American geosciences institute. Comparisons of research directions and focus among the United States, Japan, Germany and P. R. China were studied using term frequency-inverse document frequency matrix, co-word, co-citation cluster and overlay analysis methods. Publication numbers, collaboration network, annual variation and expedition achievements among IODP main members were analyzed at the meantime. Research output differences in ocean science among IODP main members were also covered and compared with IODP outputs.US has emerged as absolutely the leader of IODP. Japan, Germany, UK and France are close followers, and these countries also have their own research topics. P. R. China ranks six with publication number and ranks ten with first authors and corresponding authors. There is a large gap between P. R. China and other countries such as US and Japan, which lies in expedition chief scientists' percentage, basic researches and activeness in research focus. Besides, researches output performance of P. R. China in ocean science is better it ranks 2 in ocean science publications. What is more, Japan has a trend to catch US, however, P. R. China doesn't have any sign of increasing. Sustained fiscal effort is still necessary for P. R. China.
Based on the original data of 100,275 SSCI indexed papers in the field of economics in 2009-2014, this work applied scientometrics and network analysis methods to study the funding pattern of funding ratio, impacts, indices' relationship and collaboration structure in major countries/territories. Results show that, unlike the notable standing of economics, the global funding ratio of economics in 2009-2014 is just 8.3 %, much lower than the average level of social sciences. Although USA seems to be far ahead in the innovation of economics, the coverage of its science funding has been not widespread. By contrast, the funding ratio of China ranks highest, but the effect of funding needs to be strengthened. We observe an approximate power-law relationship among three basic measures of funded economics papers, including citations, total numbers and h-indexes. The cooperation researches in economics present a key structure of three main components: USA as the central part, a core group of Asia Pacific and another core group of Europe. The collaboration pattern of continents is largely based on the connection between Europe and USA.
A bibliometric analysis based on the related articles in the Science Citation Index Expanded database was conducted to gain insight into global trends and hot issues of metal-organic frameworks (MOFs). The word clusters of synthesis methods, MOFs' properties and potential applications and some representative MOFs with related supporting words in title, author keywords, abstract, along with KeyWords Plus were proposed to provide the clues to discover the current research emphases. Y index was introduced to assess the publication characteristics related to the number of first author and corresponding author highly cited articles. Top eight classic articles with total citations since publication to the end of 2014 more than 1000 times (TC2014 > 1000) and top eight classic articles with citations in 2014 more than 165 times (C (2014) > 165) were selected and assessed regarding distribution of outputs in journals, publications of authors, institutions, as well as their citation life cycles. Solvothermal (including hydrothermal) method and diffusion (slow evaporation) were used mostly to prepare MOFs. Series representative MOFs, as well as the corresponding composites or film (membrane) arose the wide interests from researchers due to their excellent performances. Among the various properties and potential applications of MOFs, adsorption (gas adsorption and liquid adsorption) took the lead, followed by catalysis (including photocatalysis), as a result of their ultrahigh porosity and even their catalytic property. The results of Y index analysis revealed that most highly cited articles in MOFs field were contributed by Yaghi, O.M. as corresponding author, who published 27 articles with TC2014 (number of citations since its publication to the end of 2014) aeyen100. Omar M. Yaghi, as corresponding author (reprint author), contributed most classic articles, which dealt with synthesis strategy of MOFs with high porosity and high capacity of gas storage. The remaining classic ones concerned to catalysis and drug delivery. These classic articles were published in four high impact journals. The analyses on citation life cycles of the classic articles with highest TC2014 and C (2014) can help the researchers in MOFs related fields gain insight into their impact histories.
This case study of the impact of publications in the area of Neurosciences and Mental Health was completed as part of an institutional analysis of health research activity at the University of Toronto. Our data show that selecting top researchers by total publication output favoured clinical research over all other research disciplines active in the subjects. The use of citation rate based measures broadened the research disciplines in the top group, to include researchers in Public Health (highest impact in the analysis), Commerce and Basic Sciences. In addition, focusing on impact rather than output increased the participation of women in the top group. The number of female scientists increased from 20 to 31 % in the University of Toronto cohort when citations to publications were compared. Social network analysis showed that the top 100 researchers in both cohorts were highly collaborative, with several researchers forming bridges between individual clusters. There were two areas of research, neurodegeneration/movement disorders and cerebrovascular disease, represented by strong clusters in each analysis. The University of Toronto analysis identified two areas neuro-oncology/neuro-development and mental health/schizophrenia that were not represented in the global researcher networks. Information about the areas and relative strength of researcher collaborative networks will inform future strategic planning.
Hantavirus, one of the deadliest viruses known to humans, hospitalizes tens of thousands of people each year in Asia, Europe and the Americas. Transmitted by infected rodents and their excreta, Hantavirus are identified as etiologic agents of two main types of diseases-Hemorrhagic fever with renal syndrome and hantavirus pulmonary syndrome, the latter having a fatality rate of above 40 %. Although considerable research for over two decades has been going on in this area, bibliometric studies to gauge the state of research of this field have been rare. An analysis of 2631 articles, extracted from WoS databases on Hantavirus between 1980 and 2014, indicated a progressive increase (R (2) = 0.93) in the number of papers over the years, with the majority of papers being published in the USA and Europe. About 95 % papers were co-authored and the most common arrangement was 4-6 authors per paper. Co-authorship has seen a steady increase (R (2) = 0.57) over the years. We apply research collaboration network analysis to investigate the best-connected authors in the field. The author-based networks have 49 components (connected clump of nodes) with 7373 vertices (authors) and 49,747 edges (co-author associations) between them. The giant component (the largest component) is healthy, occupying 84.19 % or 6208 vertices with 47,117 edges between them. By using edge-weight threshold, we drill down into the network to reveal bonded communities. We find three communities' hotspots-one, led by researchers at University of Helsinki, Finland; a second, led by the Centers of Disease Control and Prevention, USA; and a third, led by Hokkaido University, Japan. Significant correlation was found between author's structural position in the network and research performance, thus further supporting a well-studied phenomenon that centrality effects research productivity. However, it was the PageRank centrality that out-performed degree and betweenness centrality in its strength of correlation with research performance.
In 2014, the European Space Agency (ESA) celebrated its 50th anniversary. In this period, ESA has led European programs in space research, and has coordinated the efforts of the national industries through different, successful programs. This paper reports the conclusions of a bibliometric analysis of ESA's scientific and technical bibliographic production. Input data covering the period from 1964 to 2014 have been collected from the Elsevier Scopus database. Using these data, a set of bibliometric indicators have been obtained to obtain knowledge on different aspects: productivity, collaboration, impact of ESA's research on subsequent works, and information consumption patterns. The study has made use of the data analysis features provided by the Scopus database, and of two bibliometric software tools: BibExcel and VosViewer. These tools were used to complete complementary analysis (co-citation, bibliographic coupling and clustering) and generate visualizations that support the exploration of data.
The current study applies the hIa metric of Harzing et al. (Scientometrics 99(3):811-821 2014) to examine average faculty research performance across 5 Colleges in a single university. Average faculty performance for a range of common metrics of research publication such as papers, citations and h-index are presented to allow for a comparison of the degree to which the hIa can account for differences in publication patterns and career lengths in the current sample of faculty (N = 474). Scopus publication data for all faculty members across 5 Colleges was collected and analyzed to evaluate the assertion that the hIa provides a more reliable metric for comparison between academics of different career lengths, and academics researching in different disciplines. Comparison of current results with the results from the original work of Harzing et al. (Scientometrics 99(3):811-821 2014) offer strong support for the usefulness of the hIa in qualitatively and quantitatively different academic environments. Results are discussed in relation to the potential value and appropriate use of the hIa metric.
Radio frequency identification (RFID) is one of the most influential technologies of the twenty-first century. Today, RFID technology is being applied in a wide array of disciplines in science research and industrial projects. The significant impact of RFID is clearly visible by the rate of academic publications in the last few years. This article surveys the literature to evaluate the trend of RFID technology development based on academic publications from 2001 to 2014. Both bibliometric and content analyses are applied to examine this topic in SCI-Index and SSCI-Index documents. Based on the bibliometric technique, all 5159 existing RFID documents are investigated and several important factors are reviewed, including contributions by country, organizations, funding agencies, journal title, authors, research area and Web of Science category. Moreover, content analysis is applied to the top 100 most cited documents and based on their contents, these top 100 documents are classified into four different categories with each category divided in several sub-categories. This research aims to identify the best source of the most cited RFID papers and to provide a comprehensive road map for the future research and development in the field of RFID technology in both academic and industrial settings. Six key findings from this review are (1) the experimental method is the most popular research methodology, (2) RFID research has been a hot area of investigation but will branch out into related subset areas, (3) South East Asia is positioned to dominate this research space, (4) the focus of research up to now has been on technical issues rather than business and management issues, (5) research on RFID application domains will spread beyond supply chain and health care to a number of different areas, and (6) more research will be related to policy issues such as security and privacy.
Information and communications technologies (ICTs) have enabled the rise of so-called Collaborative Consumption (CC): the peer-to-peer-based activity of obtaining, giving, or sharing the access to goods and services, coordinated through community-based online services. CC has been expected to alleviate societal problems such as hyper-consumption, pollution, and poverty by lowering the cost of economic coordination within communities. However, beyond anecdotal evidence, there is a dearth of understanding why people participate in CC. Therefore, in this article we investigate people's motivations to participate in CC. The study employs survey data (N=168) gathered from people registered onto a CC site. The results show that participation in CC is motivated by many factors such as its sustainability, enjoyment of the activity as well as economic gains. An interesting detail in the result is that sustainability is not directly associated with participation unless it is at the same time also associated with positive attitudes towards CC. This suggests that sustainability might only be an important factor for those people for whom ecological consumption is important. Furthermore, the results suggest that in CC an attitude-behavior gap might exist; people perceive the activity positively and say good things about it, but this good attitude does not necessary translate into action.
Previous studies have not fully investigated the role of source accessibility versus source quality in the selection of information sources. It remains unclear what their (relative) importance is. Three different models have been identified: (a) an exclusively accessibility-driven model, (b) a cost-benefit model in which both accessibility and quality are significant influences, and (c) an exclusively quality-driven model. Moreover, the conditions under which accessibility and quality are important are not well understood. The goal of our study is to shed more light on both issues by assessing the role of different dimensions of accessibility and quality and how their importance is affected by time pressure. We conducted a policy-capturing study in which 89 financial specialists participated. Each judged 20 scenarios in which the accessibility and quality of human information sources, as well as time pressure, were manipulated. Results showed that both accessibility and quality affect the likelihood of asking a human information source for information. Moreover, although the weights attached to physical accessibility and the source's perceived technical quality were indeed moderated by time pressure, in both conditions we find support for a cost-benefit model of information seeking, in which both accessibility and quality are significant influences.
The increasing abundance of digital textual archives provides an opportunity for understanding human social systems. Yet the literature has not adequately considered the disparate social processes by which texts are produced. Drawing on communication theory, we identify three common processes by which documents might be detectably similar in their textual featuresauthors sharing subject matter, sharing goals, and sharing sources. We hypothesize that these processes produce distinct, detectable relationships between authors in different kinds of textual overlap. We develop a novel n-gram extraction technique to capture such signatures based on n-grams of different lengths. We test the hypothesis on a corpus where the author attributes are observable: the public statements of the members of the U.S. Congress. This article presents the first empirical finding that shows different social relationships are detectable through the structure of overlapping textual features. Our study has important implications for designing text modeling techniques to make sense of social phenomena from aggregate digital traces.
This article describes an experimental study that examines the extent to which a group decision support system (GDSS), which allows team members to view other members' preference ratings, can encourage changes in individual preferences. We studied 22, four-person teams performing 2 hidden profile taskssimple and complexin a controlled setting. Transparency of the interactions, achieved through the visibility of ratings, influenced changes in participants' preferences as measured before, during and after the team discussion. Visibility of team scores could then offer an effective way to reach consensus, despite individual incumbent preferences. Changes between individuals' initial preferences and team preferences were found to be larger for members working on a complex task compared to a simple task, as were changes between individuals' prediscussion and postdiscussion preferences. Although prior studies established that the initial preferences of individual team members are rather sticky, this study reveals that individuals adjusted their initial preferences to reach a team consensus, as well as modified their preferences after team discussions. Despite the mixed earlier research results on the impact of GDSS on efficient decision making, findings from this study suggest that in complex decision-making contexts, GDSS tools can be effective in enabling consensus building in groups.
Recognizing negative and speculative information is highly relevant for sentiment analysis. This paper presents a machine-learning approach to automatically detect this kind of information in the review domain. The resulting system works in two steps: in the first pass, negation/speculation cues are identified, and in the second phase the full scope of these cues is determined. The system is trained and evaluated on the Simon Fraser University Review corpus, which is extensively used in opinion mining. The results show how the proposed method outstrips the baseline by as much as roughly 20% in the negation cue detection and around 13% in the scope recognition, both in terms of F-1. In speculation, the performance obtained in the cue prediction phase is close to that obtained by a human rater carrying out the same task. In the scope detection, the results are also promising and represent a substantial improvement on the baseline (up by roughly 10%). A detailed error analysis is also provided. The extrinsic evaluation shows that the correct identification of cues and scopes is vital for the task of sentiment analysis.
Software is increasingly crucial to scholarship, yet the visibility and usefulness of software in the scientific record are in question. Just as with data, the visibility of software in publications is related to incentives to share software in reusable ways, and so promote efficient science. In this article, we examine software in publications through content analysis of a random sample of 90 biology articles. We develop a coding scheme to identify software mentions and classify them according to their characteristics and ability to realize the functions of citations. Overall, we find diverse and problematic practices: Only between 31% and 43% of mentions involve formal citations; informal mentions are very common, even in high impact factor journals and across different kinds of software. Software is frequently inaccessible (15%-29% of packages in any form; between 90% and 98% of specific versions; only between 24%-40% provide source code). Cites to publications are particularly poor at providing version information, whereas informal mentions are particularly poor at providing crediting information. We provide recommendations to improve the practice of software citation, highlighting recent nascent efforts. Software plays an increasingly great role in scientific practice; it deserves a clear and useful place in scholarly communication.
A thesaurus contains a set of terms or features that may be used to represent recorded information, including prose documents or scientific data sets. The focus of this work is on the basic structural nature of a thesaurus itself, not on how people develop a thesaurus or how a thesaurus effects retrieval performance. Thesauri in this research are automatically developed in a simulation from sets of randomly or exhaustively generated documents. Each thesaurus is generated by the Thesaurus Generator software from a set of several hundred documents, and thousands of different document sets are used as input to the Thesaurus Generator, producing thousands of thesauri. Thus, thousands of thesauri are generated for each data point in accompanying graphs. The characteristics of this large number of thesauri are studied so that the relationships between thesaurus parameters can be determined. Some rules governing these relationships are suggested, addressing factors such as tree height and width, number of tree roots in thesauri, and number of terms available for the vocabulary. How these parameters scale as vocabularies grow is addressed. These results apply to various information systems that contain features with hierarchical relationships, including many thesauri and ontologies.
We investigated a subject directory in the US Agriculture Department-Economic Research Service portal. Parent-child relationships, related connections among the categories, and related connections among the subcategories in the subject directory were optimized using social network analysis. The optimization results were assessed by both density analysis and edge strength analysis methods. In addition, the results were evaluated by domain experts. From this study, it is recommended that four subcategories be switched from their original four categories into two different categories as a result of the parent-child relationship optimization.It is also recommended that 132 subcategories be moved to 40 subcategories and that eight categories be moved to two categories as a result of the related connection optimization. The findings show that optimization boosted the densities of the optimized categories, and the recommended connections of both the related categories and subcategories were stronger than the existing connections of the related categories and subcategories. This paper provides visual displays of the optimization analysis as well as suggestions to enhance the subject directory of this portal.
Climate change as a complex physical and social issue has gained increasing attention in the natural as well as the social sciences. Climate change research has become more interdisciplinary and even transdisciplinary as a typical Mode-2 science that is also dependent on an application context for its further development. We propose to approach interdisciplinarity as a co-construction of the knowledge base in the reference patterns and the programmatic focus in the editorials in the core journal of the climate-change sciencesClimaticChangeduring the period 1977-2013. First, we analyze the knowledge base of the journal and map journal-journal relations on the basis of the references in the articles. Second, we follow the development of the programmatic focus by analyzing the semantics in the editorials. We argue that interdisciplinarity is a result of the co-construction between different agendas: The selection of publications into the knowledge base of the journal, and the adjustment of the programmatic focus to the political context in the editorials. Our results show a widening of the knowledge base from referencing the multidisciplinary journals Nature and Science to citing journals from specialist fields. The programmatic focus follows policy-oriented issues and incorporates public metaphors.
We compare the network of aggregated journal-journal citation relations provided by the Journal Citation Reports (JCR) 2012 of the Science Citation Index (SCI) and Social Sciences Citation Index (SSCI) with similar data based on Scopus 2012. First, global and overlay maps were developed for the 2 sets separately. Using fuzzy-string matching and ISSN numbers, we were able to match 10,524 journal names between the 2 sets: 96.4% of the 10,936 journals contained in JCR, or 51.2% of the 20,554 journals covered by Scopus. Network analysis was pursued on the set of journals shared between the 2 databases and the 2 sets of unique journals. Citations among the shared journals are more comprehensively covered in JCR than in Scopus, so the network in JCR is denser and more connected than in Scopus. The ranking of shared journals in terms of indegree (i.e., numbers of citing journals) or total citations is similar in both databases overall (Spearman rank correlation >0.97), but some individual journals rank very differently. Journals that are unique to Scopus seem to be less importantthey are citing shared journals rather than being cited by thembut the humanities are covered better in Scopus than in JCR.
A time-dependent centrality metric for disciplinary coauthorship graphs, the Nobel number for a discipline, is introduced. A researcher's Nobel number for a given discipline in a given year is defined as the researcher's average coauthorship distance to that discipline's Nobel laureates in that year. Plotting Nobel numbers over several decades provides a quantitative as well as visual indication of a researcher's proximity to the intuitive center of a discipline as defined by recognized scientific achievement. It is shown that the Nobel number distributions for physics of several researchers both within and outside of physics are surprisingly flat over the five-decade span from 1951 to 2000. A model in which Nobel laureates are typically connected by short coauthorship paths both intergenerationally and between subdisciplines reproduces such flat Nobel number distributions.
This study examines patterns of dynamic disciplinary knowledge production and diffusion. It uses a citation data set of Scopus-indexed journals and proceedings. The journal-level citation data set is aggregated into 27 subject areas and these subjects are selected as the unit of analysis. A 3-step approach is employed: the first step examines disciplines' citation characteristics through scientific trading dimensions; the second step analyzes citation flows between pairs of disciplines; and the third step uses egocentric citation networks to assess individual disciplines' citation flow diversity through Shannon entropy. The results show that measured by scientific impact, the subjects of Chemical Engineering, Energy, and Environmental Science have the fastest growth. Furthermore, most subjects are carrying out more diversified knowledge trading practices by importing higher volumes of knowledge from a greater number of subjects. The study also finds that the growth rates of disciplinary citations align with the growth rates of global research and development (R&D) expenditures, thus providing evidence to support the impact of R&D expenditures on knowledge production.
Having reflected on the theoretical tradition of previous information inequality research that treats society's information rich/poor as identical with its socioeconomic rich/poor, this study examines the informational structure of contemporary Chinese urban society through a cluster analysis of a sample of 3,361 urban residents measured by a holistic informational measurement developed around the concept of an individual's information world. It finds that, first, 4 groups, instead of a binary haves versus have-nots, best characterize Chinese urban society informationally; second, the distribution of people among these groups conforms to normal distribution, in striking contrast with the pyramid-shaped socioeconomic structure of Chinese society; third, although the demographic characteristics of these groups suggest a significant correlation between people's informational and socioeconomic statuses, the 2 are far from identical; fourth, although the 4 groups differ in all aspects investigated, they differ most notably in information assets and the range and type of materials they choose as their regular information resources; fifth, although the 4 groups vary significantly, each differs from the others in its own way. This study concludes that society's informational and socioeconomic structures are 2 related but distinctive structures, and that the informational structure is characterized by highly complicated textures of inequality.
University libraries provide access to thousands of online journals and other content, spending millions of dollars annually on these electronic resources. Providing access to these online resources is costly, and it is difficult both to analyze the value of this content to the institution and to discern those journals that comparatively provide more value. In this research, we examine 1,510 journals from a large research university library, representing more than 40% of the university's annual subscription cost for electronic resources at the time of the study. We utilize a web analytics approach for the creation of a linear regression model to predict usage among these journals. We categorize metrics into two classes: global (journal focused) and local (institution dependent). Using 275 journals for our training set, our analysis shows that a combination of global and local metrics creates the strongest model for predicting full-text downloads. Our linear regression model has an accuracy of more than 80% in predicting downloads for the 1,235 journals in our test set. The implications of the findings are that university libraries that use local metrics have better insight into the value of a journal and therefore more efficient cost content management.
Citation data for institutes are generally provided as numbers of citations or as relative citation rates (as, for example, in the Leiden Ranking). These numbers can then be compared between the institutes. This study aims to present a new approach for the evaluation of citation data at the institutional level, based on regression models. As example data, the study includes all articles and reviews from the Web of Science for the publication year 2003 (n=886,416 papers). The study is based on an in-house database of the Max Planck Society. The study investigates how much the expected number of citations for a publication changes if it contains the address of an institute. The calculation of the expected values allows, on the one hand, investigating how the citation impact of the papers of an institute appears in comparison with the total of all papers. On the other hand, the expected values for several institutes can be compared with one another or with a set of randomly selected publications. Besides the institutes, the regression models include factors which can be assumed to have a general influence on citation counts (e.g., the number of authors).
As open access (OA) publication of research outputs becomes increasingly common and is mandated by institutions and research funders, it is important to understand different aspects of the costs involved. This paper provides an early review of administrative costs incurred by universities in making research outputs OA, either via publication in journals (Gold OA), involving payment of article-processing charges (APCs), or via deposit in repositories (Green OA). Using data from 29 UK institutions, it finds that the administrative time, as well as the cost incurred by universities, to make an article OA using the Gold route is over 2.5 times higher than Green. Costs are then modeled at a national level using recent UK policy initiatives from Research Councils UK and the Higher Education Funding Councils' Research Excellence Framework as case studies. The study also demonstrates that the costs of complying with research funders' OA policies are considerably higher than where an OA publication is left entirely to authors' discretion. Key target areas for future efficiencies in the business processes are identified and potential cost savings calculated. The analysis is designed to inform ongoing policy development at the institutional and national levels.
Scientific disciplines are distinct not only in what they know but in how they know what they knowthat is, in their epistemic cultures. There is a close relationship between the technologies that a field utilizes and sanctions and the process of inquiry, the character and meaning of corroborative data and evidence, and the kinds of models and theories developed in a field. As the machinery changes, epistemic practices also change. A case in point is how the epistemic practices of historians are reconfigured by the introduction of Geographic Information Systems (GIS). We argue that GIS mediates historical understanding and knowledge creation in at least three ways: (a) by allowing historians to bring new sets of data into analysis, (b) by introducing novel questions, fresh insights, and new modes of analysis and reasoning, or discovering new answers to older questions; and (c) by providing new tools for historians to communicate with each other and with their audiences. We illustrate these mediations through the study of the historiography of Budapest Ghettos during World War II. Our study shows how GIS functionalities reveal hitherto unknown aspects of social life in the ghettos, while pushing certain other aspects into the background.
Modern organizations often employ data scientists to improve business processes using diverse sets of data. Researchers and practitioners have both touted the benefits and warned of the drawbacks associated with data science and big data approaches, but few studies investigate how data science is carried out on the ground. In this paper, we first review the hype and criticisms surrounding data science and big data approaches. We then present the findings of semistructured interviews with 18 data analysts from various industries and organizational roles. Using qualitative coding techniques, we evaluated these interviews in light of the hype and criticisms surrounding data science in the popular discourse. We found that although the data analysts we interviewed were sensitive to both the allure and the potential pitfalls of data science, their motivations and evaluations of their work were more nuanced. We conclude by reflecting on the relationship between data analysts' work and the discourses around data science and big data, suggesting how future research can better account for the everyday practices of this profession.
Dyslexic users often do not exhibit spelling and reading skills at a level required to perform effective search. To explore whether autocomplete functions reduce the impact of dyslexia on information searching, 20 participants with dyslexia and 20 controls solved 10 predefined tasks in the search engine Google. Eye-tracking and screen-capture documented the searches. There were no significant differences between the dyslexic students and the controls in time usage, number of queries, query lengths, or the use of the autocomplete function. However, participants with dyslexia made more misspellings and looked less at the screen and the autocomplete suggestions lists while entering the queries. The results indicate that although the autocomplete function supported the participants in the search process, a more extensive use of the autocomplete function would have reduced misspellings. Further, the high tolerance for spelling errors considerably reduced the effect of dyslexia, and may be as important as the autocomplete function.
This study focuses on the sharing of "happy" information: information that creates a sense of happiness within the individual sharing the information. We explore the range of factors motivating and impacting individuals' happy information-sharing behavior within a casual leisure context through 30 semistructured interviews. The findings reveal that the factors influencing individuals' happy information-sharing behavior are numerous, and impact each other. Most individuals considered sharing happy information important to their friendships and relationships. In various contexts the act of sharing happy information was shown to enhance the sharer's happiness.
An international survey of over 3,600 researchers examined how trustworthiness and quality are determined for making decisions on scholarly reading, citing, and publishing and how scholars perceive changes in trust with new forms of scholarly communication. Although differences in determining trustworthiness and authority of scholarly resources exist among age groups and fields of study, traditional methods and criteria remain important across the board. Peer review is considered the most important factor for determining the quality and trustworthiness of research. Researchers continue to read abstracts, check content for sound arguments and credible data, and rely on journal rankings when deciding whether to trust scholarly resources in reading, citing, or publishing. Social media outlets and open access publications are still often not trusted, although many researchers believe that open access has positive implications for research, especially if the open access journals are peer reviewed.
One of the key challenges for innovation and technology-mediated knowledge collaboration within organizational settings is motivating contributors to share their knowledge. Drawing upon self-determination theory, we investigate 2 forms of motivation: internally driven (autonomous motivation) and externally driven (controlled motivation). Knowledge sharing could be viewed as a required in-role activity or as discretionary extra-role behavior. In this study, we examine the moderating effect of role perceptions on the relations between each of the two motivational constructs and knowledge sharing, paying particular attention to the affordances of the enabling information technology. An analysis of survey data from a wiki-based organizational encyclopedia in a large, multinational firm reveals that when contributors' motivation is externally driven, they are more likely to share knowledge if this activity is viewed as in-role behavior. However, when contributors' motivation is internally driven, they are more likely to participate in knowledge sharing when this activity is viewed as extra-role behavior. Theoretical and practical implications are discussed.
Arguing that environmental sustainability is a growing concern for digital information systems and services, this article proposes a simple method for estimation of the energy and environmental costs of digital libraries and information services. It is shown that several factors contribute to the overall energy and environmental costs of information and communication technology (ICT) in general and digital information systems and services in particular. It is also shown that end-user energy costs play a key role in the overall environmental costs of a digital library or information service. It is argued that appropriate user research, transaction log analysis, user modeling, and better design and delivery of services can significantly reduce the user interaction time, and thus the environmental costs, of digital information systems and services, making them more sustainable.
Systematic study of data sharing by citizen scientists will make a significant contribution to science because of the growing importance of aggregated data in data-intensive science. This article expands on the data sharing component of a paper presented at the 2013 ASIST conference. A three-phase project is reported. Conducted between 2011 and 2013 within an environmental voluntary group, the Australian Plants Society Victoria (APSV), the interviews of the first phase are the major data source. Because the project revealed the importance of data sharing with professional scientists, their views are included in the literature review where four themes are explored: lack of shared disciplinary culture, trust, responsibility and controlled access to data, and describing data to enable reuse. The findings, presented under these themes, revealed that, within APSV, sharing among members is mostly generous and uninhibited. Beyond APSV, when online repositories were involved, barriers came very strongly into play. Trust was weaker and barriers also included issues of data quality, data description, and ownership and control. The conclusion is that further investigation of these barriers, including the attitudes of professional scientists to using data contributed by citizen scientists, would indicate how more extensive and useful data sharing could be achieved.
Social question & answer forums offer great learning opportunities, but students need to evaluate the credibility of answers to avoid being misled by untrustworthy sources. This critical evaluation may be beyond the capabilities of students from primary and secondary school. We conducted 2 studies to assess how students from primary, secondary, and undergraduate education perceive and use 2 relevant credibility cues in forums: author's identity and evidence used to support his answer. Students didn't use these cues when they evaluated forums with a single answer (Experiment 1), but they recommended more often answers from self-reported experts than from users with a pseudonym when multiple sources were discussed in the forum (Experiment 2). This pattern of results suggested that multiple viewpoints increase students' attention to source features in forum messages. Experiment 2 also revealed that primary school students preferred personal experience as evidence in the messages, whereas undergraduate students preferred the inclusion of documentary sources. Thus, while children mimic the adult preference for expert sources in web forums, they treat source information in a rather superficial manner. To conclude, we outline possible mechanisms to understand how credibility assessment evolves across educational levels, and discuss potential implications for the educational curriculum in information literacy.
A new information literacy test (ILT) for higher education was developed, tested, and validated. The ILT contains 40 multiple-choice questions (available in Appendix) with four possible answers and follows the recommendations of information literacy (IL) standards for higher education. It assesses different levels of thinking skills and is intended to be freely available to educators, librarians, and higher education managers, as well as being applicable internationally for study programs in all scientific disciplines. Testing of the ILT was performed on a group of 536 university students. The overall test analysis confirmed the ILT reliability and discrimination power as appropriate (Cronbach's alpha 0.74; Ferguson's delta 0.97). The students' average overall achievement was 66%, and IL increased with the year of study. The students were less successful in advanced database search strategies, which require a combination of knowledge, comprehension, and logic, and in topics related to intellectual property and ethics. A group of 163 students who took a second ILT assessment after participating in an IL-specific study course achieved an average posttest score of 78.6%, implying an average IL increase of 13.1%, with most significant improvements in advanced search strategies (23.7%), and in intellectual property and ethics (12.8%).
Sophisticated documents like legal cases and biomedical articles can contain unusually long sentences. Extractive summarizers can select such sentencespotentially adding hundreds of unnecessary words to the summaryor exclude them and lose important content. Sentence simplification or compression seems on the surface to be a promising solution. However, compression removes words before the selection algorithm can use them, and simplification generates sentences that may be ambiguous in an extractive summary. We therefore compare the performance of an extractive summarizer selecting from the sentences of the original document with that of the summarizer selecting from sentences shortened in three ways: simplification, compression, and disaggregation, which splits one sentence into several according to rules designed to keep all meaning. We find that on legal cases and biomedical articles, these shortening methods generate ungrammatical output. Human evaluators performed an extrinsic evaluation consisting of comprehension questions about the summaries. Evaluators given compressed, simplified, or disaggregated versions of the summaries answered fewer questions correctly than did those given summaries with unaltered sentences. Error analysis suggests 2 causes: Altered sentences sometimes interact with the sentence selection algorithm, and alterations to sentences sometimes obscure information in the summary. We discuss future work to alleviate these problems.
Finding software for reuse is a problem that programmers face. To reuse code that has been proven to work can increase any programmer's productivity, benefit corporate productivity, and also increase the stability of software programs. This paper shows that fuzzy retrieval has an improved retrieval performance over typical Boolean retrieval. Various methods of fuzzy information retrieval implementation and their use for software reuse will be examined. A deeper explanation of the fundamentals of designing a fuzzy information retrieval system for software reuse is presented. Future research options and necessary data storage systems are explored.
The delineation of coordinates is fundamental for the cartography of science, and accurate and credible classification of scientific knowledge presents a persistent challenge in this regard. We present a map of Finnish science based on unsupervised-learning classification, and discuss the advantages and disadvantages of this approach vis-a-vis those generated by human reasoning. We conclude that from theoretical and practical perspectives there exist several challenges for human reasoning-based classification frameworks of scientific knowledge, as they typically try to fit new-to-the-world knowledge into historical models of scientific knowledge, and cannot easily be deployed for new large-scale data sets. Automated classification schemes, in contrast, generate classification models only from the available text corpus, thereby identifying credibly novel bodies of knowledge. They also lend themselves to versatile large-scale data analysis, and enable a range of Big Data possibilities. However, we also argue that it is neither possible nor fruitful to declare one or another method a superior approach in terms of realism to classify scientific knowledge, and we believe that the merits of each approach are dependent on the practical objectives of analysis.
Automated methods for the analysis, modeling, and visualization of large-scale scientometric data provide measures that enable the depiction of the state of world scientific development. We aimed to integrate minimum span clustering (MSC) and minimum spanning tree methods to cluster and visualize the global pattern of scientific publications (PSP) by analyzing aggregated Science Citation Index (SCI) data from 1994 to 2011. We hypothesized that PSP clustering is mainly affected by countries' geographic location, ethnicity, and level of economic development, as indicated in previous studies. Our results showed that the 100 countries with the highest rates of publications were decomposed into 12 PSP groups and that countries within a group tended to be geographically proximal, ethnically similar, or comparable in terms of economic status. Hubs and bridging nodes in each knowledge production group were identified. The performance of each group was evaluated across 16 knowledge domains based on their specialization, volume of publications, and relative impact. Awareness of the strengths and weaknesses of each group in various knowledge domains may have useful applications for examining scientific policies, adjusting the allocation of resources, and promoting international collaboration for future developments.
This study examines whether an institution's research resources affect its centrality and relationships in international collaboration among 606 astronomical institutions worldwide. The findings support our theoretical hypotheses that an institution's research resources are positively related to its central position in the network. Astronomical institutions with superior resources, such as being equipped with international observational facilities and having substantial research manpower, tend to have more foreign partners (high degree centrality) and play an influential role (high betweenness centrality) in the international collaboration network. An institution becomes more and more active in international collaborations as its research population expands. In terms of the relationship, which is captured by an actor institution's co-authorship preference for each partner in the network, the effect of research resources is not as significant as expected. We found that astronomical institutions are not necessarily preferentially co-authoring with partners that have better research resources. In addition, this study indicates that geographic closeness (or "geographic proximity") largely affects the occurrence of international collaboration. The investigated institutions apparently prefer partners from neighboring countries. This finding gives an indication of the phenomenon of regional homophily in the international collaboration network.
The vast amount of scientific publications available online makes it easier for students and researchers to reuse text from other authors and makes it harder for checking the originality of a given text. Reusing text without crediting the original authors is considered plagiarism. A number of studies have reported the prevalence of plagiarism in academia. As a consequence, numerous institutions and researchers are dedicated to devising systems to automate the process of checking for plagiarism. This work focuses on the problem of detecting text reuse in scientific papers. The contributions of this paper are twofold: (a) we survey the existing approaches for plagiarism detection based on content, based on content and structure, and based on citations and references; and (b) we compare content and citation-based approaches with the goal of evaluating whether they are complementary and if their combination can improve the quality of the detection. We carry out experiments with real data sets of scientific papers and concluded that a combination of the methods can be beneficial.
As the number of scientific journals has multiplied, journal rankings have become increasingly important for scientific decisions. From submissions and subscriptions to grants and hirings, researchers, policy makers, and funding agencies make important decisions influenced by journal rankings such as the ISI journal impact factor. Typically, the rankings are derived from the citation network between a selection of journals and unavoidably depend on this selection. However, little is known about how robust rankings are to the selection of included journals. We compare the robustness of three journal rankings based on network flows induced on citation networks. They model pathways of researchers navigating the scholarly literature, stepping between journals and remembering their previous steps to different degrees: zero-step memory as impact factor, one-step memory as Eigenfactor, and two-step memory, corresponding to zero-, first-, and second-order Markov models of citation flow between journals. We conclude that higher-order Markov models perform better and are more robust to the selection of journals. Whereas our analysis indicates that higher-order models perform better, the performance gain for higher-order Markov models comes at the cost of requiring more citation data over a longer time period.
This study explores the extent to which authors with different impact and productivity levels cite journals, institutions, and other authors through an analysis of the scientific papers of 37,717 authors during 1990-2013. The results demonstrate that the core-scatter distribution of cited authors, institutions, and journals varies for authors in each impact and productivity class. All authors in the science network receive the majority of their credit from high-impact authors; however, this effect decreases as authors' impact levels decrease. Similarly, the proportion of citations that lower-impact authors make to each other increases as authors' impact levels decrease. High-impact authors, who have the highest degree of membership in the science network, publish fewer papers in comparison to highly productive authors. However, authors with the highest impact make both more references per paper and also more citations to papers in the science network. This suggests that high-impact authors produce the most relevant work in the science network. Comparing practices by productivity level, authors receive the majority of their credit from highly productive authors and authors cite highly productive authors more frequently than less productive authors.
The results of bibliometric studies provided by bibliometric research groups, for example, the Centre for Science and Technology Studies (CWTS) and the Institute for Research Information and Quality Assurance (iFQ), are often used in the process of research assessment. Their databases use Web of Science (WoS) citation data, which they match according to their own matching algorithmsin the case of CWTS for standard usage in their studies and in the case of iFQ on an experimental basis. Because the problem of nonmatched citations in the WoS persists due to inaccuracies in the references or inaccuracies introduced in the data extraction process, it is important to ascertain how well these inaccuracies are rectified in these citation matching algorithms. This article evaluates the algorithms of CWTS and iFQ in comparison to the WoS in a quantitative and a qualitative analysis. The analysis builds upon the method and the manually verified corpus of a previous study. The algorithm of CWTS performs best, closely followed by that of iFQ. The WoS algorithm still performs quite well (F-1 score: 96.41%), but shows deficits in matching references containing inaccuracies. An additional problem is posed by incorrectly provided cited reference information in source articles by the WoS.
The objective of this article is to determine if academic collaboration is associated with the citation-based performance of articles that are published in management journals. We analyzed 127,812 articles published between 1988 and 2013 in 173 journals on the ISI Web of Science in the management category. Collaboration occurred in approximately 60% of all articles. A power-law relationship was found between citation-based performance and journal size and collaboration patterns. The number of citations expected by collaborative articles increases 2(1.89) or 3.7 times when the number of collaborative articles published in a journal doubles. The number of citations expected by noncollaborative articles only increases 2(1.35) or 2.55 times if a journal publishes double the number of noncollaborative articles. The Matthew effect is stronger for collaborative than for noncollaborative articles. Scale-independent indicators increase the confidence in the evaluation of the impact of the articles published in management journals.
Twitter is used by a substantial minority of the populations of many countries to share short messages, sometimes including images. Nevertheless, despite some research into specific images, such as selfies, and a few news stories about specific tweeted photographs, little is known about the types of images that are routinely shared. In response, this article reports a content analysis of random samples of 800 images tweeted from the UK or USA during a week at the end of 2014. Although most images were photographs, a substantial minority were hybrid or layered image forms: phone screenshots, collages, captioned pictures, and pictures of text messages. About half were primarily of one or more people, including 10% that were selfies, but a wide variety of other things were also pictured. Some of the images were for advertising or to share a joke but in most cases the purpose of the tweet seemed to be to share the minutiae of daily lives, performing the function of chat or gossip, sometimes in innovative ways.
Many computer users today value personalization but perceive it in conflict with their desire for privacy. They therefore tend not to disclose data that would be useful for personalization. We investigate how characteristics of the personalization provider influence users' attitudes towards personalization and their resulting disclosure behavior. We propose an integrative model that links these characteristics via privacy attitudes to actual disclosure behavior. Using the Elaboration Likelihood Model, we discuss in what way the influence of the manipulated provider characteristics is different for users engaging in different levels of elaboration (represented by the user characteristics of privacy concerns and self-efficacy). We find particularly that (a) reputation management is effective when users predominantly use the peripheral route (i.e., a low level of elaboration), but much less so when they predominantly use the central route (i.e., a high level of elaboration); (b) client-side personalization has a positive impact when users use either route; and (c) personalization in the cloud does not work well in either route. Managers and designers can use our results to instill more favorable privacy attitudes and increase disclosure, using different techniques that depend on each user's levels of privacy concerns and privacy self-efficacy.
Compared to the early versions of smart phones, recent mobile devices have bigger screens that can present more web search results. Several previous studies have reported differences in user interaction between conventional desktop computer and mobile device-based web searches, so it is imperative to consider the differences in user behavior for web search engine interface design on mobile devices. However, it is still unknown how the diversification of screen sizes on hand-held devices affects how users search. In this article, we investigate search performance and behavior on three different small screen sizes: early smart phones, recent smart phones, and phablets. We found no significant difference with respect to the efficiency of carrying out tasks, however participants exhibited different search behaviors: less eye movement within top links on the larger screen, fast reading with some hesitation before choosing a link on the medium, and frequent use of scrolling on the small screen. This result suggests that the presentation of web search results for each screen needs to take into account differences in search behavior. We suggest several ideas for presentation design for each screen size.
The selection and retrieval of relevant information from the information universe on the web is becoming increasingly important in addressing information overload. It has also been recognized that geography is an important criterion of relevance, leading to the research area of geographic information retrieval. As users increasingly retrieve information in mobile situations, relevance is often related to geographic features in the real world as well as their representation in web documents. We present 2 methods for assessing geographic relevance (GR) of geographic entities in a mobile use context that include the 5 criteria topicality, spatiotem-poral proximity, directionality, cluster, and colocation. To determine the effectiveness and validity of these methods, we evaluate them through a user study conducted on the Amazon Mechanical Turk crowdsourcing platform. An analysis of relevance ranks for geographic entities in 3 scenarios produced by two GR methods, 2 baseline methods, and human judgments collected in the experiment reveal that one of the GR methods produces similar ranks as human assessors.
Exploratory search is an increasingly important activity yet challenging for users. Although there exists an ample amount of research into understanding exploration, most of the major information retrieval (IR) systems do not provide tailored and adaptive support for such tasks. One reason is the lack of empirical knowledge on how to distinguish exploratory and lookup search behaviors in IR systems. The goal of this article is to investigate how to separate the 2 types of tasks in an IR system using easily measurable behaviors. In this article, we first review characteristics of exploratory search behavior. We then report on a controlled study of 6 search tasks with 3 exploratory-comparison, knowledge acquisition, planning-and 3 lookup tasks-fact-finding, navigational, question answering. The results are encouraging, showing that IR systems can distinguish the 2 search categories in the course of a search session. The most distinctive indicators that characterize exploratory search behaviors are query length, maximum scroll depth, and task completion time. However, 2 tasks are borderline and exhibit mixed characteristics. We assess the applicability of this finding by reporting on several classification experiments. Our results have valuable implications for designing tailored and adaptive IR systems.
As a significant contextual factor in information search, topic knowledge has been gaining increased research attention. We report on a study of the relationship between information searchers' topic knowledge and their search behaviors, and on an attempt to predict searchers' topic knowledge from their behaviors during the search. Data were collected in a controlled laboratory experiment with 32 undergraduate journalism student participants, each searching on 4 tasks of different types. In general, behavioral variables were not found to have significant differences between users with high and low levels of topic knowledge, except the mean first dwell time on search result pages. Several models were built to predict topic knowledge using behavioral variables calculated at 3 different stages of search episodes: the first-query-round, the middle point of the search, and the end point. It was found that a model using some search behaviors observed in the first query round led to satisfactory prediction results. The results suggest that early-session search behaviors can be used to predict users' topic knowledge levels, allowing personalization of search for users with different levels of topic knowledge, especially in order to assist users with low topic knowledge.
Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.
New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long-term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties-including researchers and the general public-focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full-text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time-series analysis. The results from two large-scale experiments with 3.8 million full-text articles and 48 million metadata records support the conclusion that full-text features are significantly more useful for prediction than metadata-only features and that the most accurate predictions result from combining the metadata and full-text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full-text features.
Recent research in information studies suggests that the tradition of seeing the discipline as weak is still alive and kicking. This is a problem because the discourse of the weak discipline creates conceptual confusion in relation to interdisciplinarity. Considering the growth of the iSchools and what is assumed to be a major institutional redrawing of boundaries, there is a pressing need to conceptualize interdisciplinary practices and boundary work. This paper explores the "weak" discipline through a discourse analytical lens and identifies a myth. Perceiving the discipline as weak is part of a myth, fueled by the ideal of a unitary discipline; the ideal discipline has strong boundaries, and as long as the discourse continues to focus on a need for boundaries, the only available discourse is one that articulates the discipline as weak. Thus, the myth is a vicious circle that can be broken if weakness is no longer ascribed to the discipline by tradition. The paper offers an explanation of the workings of the myth so that its particular way of interpreting the world does not mislead us when theorizing interdisciplinarity. This is a conceptual paper, and the examples serve as an empirical backdrop to the conceptual argument.
This article explores the cultural characteristics of three open access (OA)-friendly disciplines (physics, economics, and clinical medicine) and the ways in which those characteristics influence perceptions, motivations, and behaviors toward green OA. The empirical data are taken from two online surveys of European authors. Taking a domain analytic approach, the analysis draws on Becher and Trowler's (2001) and Whitley's (2000) theories to gain a deeper understanding of why OA repositories (OAR) play a particularly important role in the chosen disciplines. The surveys provided a unique opportunity to compare perceptions, motivations, and behaviors of researchers at the discipline level with the parent metadiscipline. It should be noted that participants were not drawn from a stratified sample of all the different subdisciplines that constitute each discipline, and therefore the generalizability of the findings to the discipline may be limited. The differential role of informal and formal communication in each of the three disciplines has shaped green OA practices. For physicists and economists, preprints are an essential feature of their respective OAR landscapes, whereas for clinical medics final published articles have a central role. In comparing the disciplines with their parent metadisciplines there were some notable similarities/differences, which have methodological implications for studying research cultures.
With the rapid development of social media, spontaneously user-generated content such as tweets and forum posts have become important materials for tracking people's opinions and sentiments online. A major hurdle for current state-of-the-art automatic methods for sentiment analysis is the fact that human communication often involves the use of sarcasm or irony, where the author means the opposite of what she/he says. Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. Lack of naturally occurring utterances labeled for sarcasm is one of the key problems for the development of machine-learning methods for sarcasm detection. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine-learning effectiveness for identifying sarcastic utterances and we compare the performance of machine-learning techniques and human judges on this task.
This paper examines the central place of the list and the associated concept of an identifier within the scaffolding of contemporary institutional order. These terms are deliberately chosen to make strange and help unpack the constitutive capacity of information systems and information technology within and between contemporary organizations. We draw upon the substantial body of work by John Searle to help understand the place of lists and identifiers in the constitution of institutional order. To enable us to ground our discussion of the potentiality and problematic associated with lists we describe a number of significant instances of list-making, situated particularly around the use of identifiers to refer to people, places, and products. The theorization developed allows us to better explain not only the significance imbued within lists and identifiers but the key part they play in form-ing the institutional order. We also hint at the role such symbolic artifacts play within breakdowns in institutional order.
In the humanities and social sciences, bibliometric methods for the assessment of research performance are (so far) less common. This study uses a concrete example in an attempt to evaluate a research institute from the area of social sciences and humanities with the help of data from Google Scholar (GS). In order to use GS for a bibliometric study, we developed procedures for the normalization of citation impact, building on the procedures of classical bibliometrics. In order to test the convergent validity of the normalized citation impact scores, we calculated normalized scores for a subset of the publications based on data from the Web of Science (WoS) and Scopus. Even if scores calculated with the help of GS and the WoS/Scopus are not identical for the different publication types (considered here), they are so similar that they result in the same assessment of the institute investigated in this study: For example, the institute's papers whose journals are covered in the WoS are cited at about an average rate (compared with the other papers in the journals).
We investigate the citation distributions of the 500 universities in the 2013 edition of the Leiden Ranking produced by The Centre for Science and Technological Studies. We use a Web of Science data set consisting of 3.6 million articles published in 2003 to 2008 and classified into 5,119 clusters. The main findings are the following. First, the universality claim, according to which all university-citation distributions, appropriately normalized, follow a single functional form, is not supported by the data. Second, the 500 university citation distributions are all highly skewed and very similar. Broadly speaking, university citation distributions appear to behave as if they differ by a relatively constant scale factor over a large, intermediate part of their support. Third, citation-impact differences between universities account for 3.85% of overall citation inequality. This percentage is greatly reduced when university citation distributions are normalized using their mean normalized citation scores (MNCSs) as normalization factors. Finally, regarding practical consequences, we only need a single explanatory model for the type of high skewness characterizing all university citation distributions, and the similarity of university citation distributions goes a long way in explaining the similarity of the university rankings obtained with the MNCS and the Top 10% indicator.
We prove that Ochiai similarity of the co-occurrence matrix is equal to cosine similarity in the underlying occurrence matrix. Neither the cosine nor the Pearson correlation should be used for the normalization of co-occurrence matrices because the similarity is then normalized twice, and therefore overestimated; the Ochiai coefficient can be used instead. Results are shown using a small matrix (5 cases, 4 variables) for didactic reasons, and also Ahlgren etal.'s (2003) co-occurrence matrix of 24 authors in library and information sciences. The overestimation is shown numerically and will be illustrated using multidimensional scaling and cluster dendograms. If the occurrence matrix is not available (such as in internet research or author cocitation analysis) using Ochiai for the normalization is preferable to using the cosine.
MELIBEA is a directory of institutional open-access policies for research output that uses a composite formula with eight weighted conditions to estimate the strength of open access (OA) mandates (registered in ROARMAP). We analyzed total Web of Science-(WoS)-indexed publication output in years 2011-2013 for 67 institutions in which OA was mandated to estimate the mandates' effectiveness: How well did the MELIBEA score and its individual conditions predict what percentage of the WoS-indexed articles is actually deposited in each institution's OA repository, and when? We found a small but significant positive correlation (0.18) between the MELIBEA strength score and deposit percentage. For three of the eight MELIBEA conditions (deposit timing, internal use, and opt-outs), one value of each was strongly associated with deposit percentage or latency ([a] immediate deposit required; [b] deposit required for performance evaluation; [c] unconditional opt-out allowed for the OA requirement but no opt-out for deposit requirement). When we updated the initial values and weights of the MELIBEA formula to reflect the empirical association we had found, the score's predictive power for mandate effectiveness doubled (0.36). There are not yet enough OA mandates to test further mandate conditions that might contribute to mandate effectiveness, but the present findings already suggest that it would be productive for existing and future mandates to adopt the three identified conditions so as to maximize their effectiveness, and thereby the growth of OA.
The body of knowledge related to modeling and simulation (M&S) comes from a variety of constituents: (1) practitioners and users, (2) tool developers and (3) theorists and methodologists. Previous work has shown that categorizing M&S as a concentration in an existing, broader disciple is inadequate because it does not provide a uniform basis for research and education across all institutions. This article presents an approach for the classification of M&S as a scientific discipline and a framework for ensuing analysis. The novelty of the approach lies in its application of machine learning classification to documents containing unstructured text (e.g. publications, funding solicitations) from a variety of established and emerging disciplines related to modeling and simulation. We demonstrate that machine learning classification models can be trained to accurately separate M&S from related disciplines using the abstracts of well-index research publication repositories. We evaluate the accuracy of our trained classifiers using cross-fold validation. Then, we demonstrate that our trained classifiers can effectively identify a set of previously unseen M&S funding solicitations and grant proposals. Finally, we use our approach to uncover new funding trends in M&S and support a uniform basis for education and research.
Newcomer nations, promoted by developmental states, have poured resources into nanotechnology development, and have dramatically increased their nanoscience research influence, as measured by research citation. Some achieved these gains by producing significantly higher impact papers rather than by simply producing more papers. Those nations gaining the most in relative strength did not build specializations in particular subfields, but instead diversified their nanotechnology research portfolios and emulated the global research mix. We show this using a panel dataset covering the nanotechnology research output of 63 countries over 12 years. The inverse relationship between research specialization and impact is robust to several ways of measuring both variables, the introduction of controls for country identity, the volume of nanoscience research output (a proxy for a country's scientific capability) and home-country bias in citation, and various attempts to reweight and split the samples of countries and journals involved. The results are consistent with scientific advancement by newcomer nations being better accomplished through diversification than specialization.
The purpose of this paper is to analyze the effects of networks on research output and impact. The analysis was done using a database of 2150 Mexican engineers who have been members of the National System of Researchers. Results show that although there are several methods to measure centrality and structure in the social network analysis theory, not all variables show the same impact on performance. Our results suggest that the centrality measures that show a positive effect on publications and citations are degree and closeness, betweenness is only significant with publications, and eigenvector has a negative effect on publications. Related to the measures of network structure, our analysis suggests that both structural holes and density have a positive effect on research output and impact, confirming that there are networks in which closure and brokerage can benefit the performance of the members of the network.
In September 2015 Thomson Reuters published its Ranking of Innovative Universities (RIU). Covering 100 large research-intensive universities worldwide, Stanford University came in first, MIT was second and Harvard in third position. But how meaningful is this outcome? In this paper we will take a critical view from a methodological perspective. We focus our attention on the various types of metrics available, whether or not data redundancies are addressed, and if metrics should be assembled into a single composite overall score or not. We address these issues in some detail by emphasizing one metric in particular: university-industry co-authored publications (UICs). We compare the RIU with three variants of our own University-Industry R&D Linkage Index, which we derived from the bibliometric analysis of 750 research universities worldwide. Our findings highlight conceptual and methodological problems with UIC-based data, as well as computational weaknesses such university ranking systems. Avoiding choices between size-dependent or independent metrics, and between single-metrics and multi-metrics systems, we recommend an alternative 'scoreboard' approach: (1) without weighing systems of metrics and composite scores; (2) computational procedures and information sources are made more transparent; (3) size-dependent metrics are kept separate from size-independent metrics; (4) UIC metrics are selected according to the type of proximity relationship between universities and industry.
We investigated whether applicants or recipients of research productivity fellowships of the main research financing agency in Brazil (CNPq) would consider the most "important products and indicators" of scientific/academic activity those also considered the least susceptible. We hypothesized that perception of susceptibility and importance of productivity indicators would vary according to the fellowship level of the grantees. Seven hundred and two scientists, being 79 non-grantees and 623 recipients of research productivity fellowships in the area of biosciences participated in the study. The scientists were requested to score the importance of a series of indicators (i.e., total number of published articles, number of articles as first author, number of articles as last/corresponding author, H-index, books and others, totalizing 39 variables) using a Likert scale. After completing the evaluation of the symbolic importance of all indicators, the scientists scored the "susceptibility" of the same indicators. The most important products and indicators of productivity were also those considered the least susceptible. Local, national and international prizes, publications or grants were increasingly perceived as more important and less susceptible. Moreover, the symbolic magnitude of susceptibility and importance of the elements (indicators) of the curriculum varied according to the productivity fellowship level of the grantee and gender. Despite the observed differences, a consensus of the most important and least susceptible products and indicators could be established. Ultimate individual responsibility and international projection are common characteristics of the most important and least susceptible indicators of scientific productivity.
In this study we examined the institutions (and countries) the Nobel laureates of the three disciplines chemistry, physics and physiology/medicine were affiliated with (from 1994 to 2014) when they did the decisive research work. To be able to frame the results at that time point, we also looked at when the Nobel laureates obtained their Ph.D./M.D. and when they were awarded the Nobel Prize. We examined all 155 Nobel laureates of the last 21 years in physics, chemistry, and physiology/medicine. Results showed that the USA dominated as a country. Statistical analysis also revealed that only three institutions can boast a larger number of Nobelists at all three time points examined: UC Berkeley, Columbia University and the Massachusetts Institute of Technology (MIT). Researcher mobility analysis made clear that most of the Nobel laureates were mobile; either after having obtained their Ph.D./M.D. or after writing significant papers that were decisive for the Nobel Prize. Therefore, we distinguished different ways of mobility between countries and between institutions. In most cases, the researchers changed institutes/universities within one and the same country (in first position: the USA, followed, by far, by the United Kingdom, Japan and Germany).
This article is focused on the government program for increasing competitiveness of Russian universities "5-top 100" started in 2013. The main aim of the program is creating necessary conditions for Russian universities for being ranked in the top-quality world rankings. Fifteen universities were carefully selected among many other institutions. The university funding amount for the next year directly depends on the current year results. We analyzed the results of universities participating and not participating in the program "5-top 100" for the years 2008-2014 within the university groups and with reference group as well as with world and country average values. Based on the analysis we can state that the program "5-top 100" enables not only participating universities to improve their current result. The program helps universities in prioritizing their aims and enhancing their competitiveness at the world level that is supported by positions in world university rankings.
In sociology of science much attention is dedicated to the study of scientific networks, especially to co-authorship and citations in publications. Other trends of research have investigated the advantages, limits, performances and difficulties of interdisciplinary research, which is increasingly advocated by the main lines of public research funding. This paper explores the dynamics of interdisciplinary research in Italy over 10 years of scientific collaboration on research projects. Instead of looking at the output of research, i.e. publications, we analyse the original research proposals that have been funded by the Ministry of University and Research for a specific line of funding, the Research Projects of National Interest. In particular, we want to see how much interdisciplinary research has been conducted during the period under analysis and how changes in the overall amount of public funding might have affected disciplinary and interdisciplinary collaboration. We also want to cluster the similarities and differences of the amount of disciplinary and interdisciplinary collaboration across scientific disciplines, and see if it changes over time. Finally, we want to see if interdisciplinary projects receive an increasing share of funding compared to their disciplinary bounded counterparts. Our results indicate that while interdisciplinary research diminishes along the years, potentially responding to the contraction of public funding, research that cut across disciplinary boundaries overall receives more funding than research confined within disciplinary boundaries. Furthermore, the clustering procedure do not indicate clear and stable distinction between disciplines, but similar patterns of disciplinary and interdisciplinary collaboration are shown by discipline with common epistemological frameworks, which share compatible epistemologies of scientific investigations. We conclude by reflecting upon the implications of our findings for research policies and practices and by discussing future research in this area.
We analyze the effect of High Energy Physics Large Collaboration articles, an important example of Big Science and well traceable in Web of Science, on the output and citation records at the country and institutional levels. Furthermore, the effect of these specific bibliometric data on two different university rankings, the SCIMAGO and the THE, is addressed. The results suggest that these rankings may be significantly affected by this class of output, suggesting the necessity of a discussion about methodologies differentiating them from other outputs, as well as the time range considered by the rankings.
Scientific impact evaluation is a long-standing problem in scientometrics. Graph-ranking methods are often employed to account for the collective diffusion process of scientific credit among researchers or their publications. One key issue, however, is still up in the air: what is the appropriate level for scientific credit diffusion, researcher level or paper level? In this paper, we tackle this problem via an anatomy of the credit diffusion mechanism underlying both researcher level and paper level graph-ranking methods. We find that researcher level and paper level credit diffusions are actually two aggregations of a fine-grained authorship level credit diffusion. We further find that researcher level graph-ranking methods may cause misallocation of scientific credit, but paper level graph-ranking methods do not. Consequently, researcher level methods often fail to identify researchers with high quality but low productivity. This finding indicates that scientific credit is fundamentally derived from "paper citing paper" rather than "researcher citing researcher". We empirically verify our findings using American Physical Review publication dataset spanning over one century.
Knowledge communication plays a fundamental role in studies of science of science. This paper aims to examine inter-specialty communication patterns within a discipline using author citation networks. Two metrics are designed, including average knowledge flow and average shortest distance. They are used to identify the impact and diffusion characteristics of inter-specialty knowledge communication. We apply these metrics to an empirical data set of Chinese library and information science (CLIS) publications. We find that the two metrics portray different aspects of knowledge communication in CLIS and conclude that indirect paths, the size of specialties, and the communication structure among specialties may lead to the differences.
This paper analyses the academic production of researchers that were called "productivity grants" awarded by CNPq, the Brazilian research funding agency, in the field of production engineering in the period 2007-2009. Was extracted the data resumes of 101 Brazilian researchers in the Lattes Platform. In relation to the scientific production, productivity grant researchers presented superior performance than the other professors working in graduate programs in the production engineering area. There is a tendency towards the increase in the mean of supervisions to master students starting from the highest aggregate level of productivity grants going down to the beginner levels. In comparison with the other permanent professors in graduate programs, who do not hold productivity grants, the tutorial mean for both masters and doctorate students of the grant researchers' is exactly the same as group 2PQ. PQ researchers usually present high scientific production ad low technical production, while DT researchers present low scientific production and high technical production. There seems to be logical coherence regarding the distribution of grants, at least with respect to the easily measurable progression criteria. However, there is some evidence that for criteria that are harder to assess, there may be some discrepancies.
In 2014 over 52,000 academics submitted > 155,500 journal articles in 36 different disciplines for assessment in the UK's four-year Research Evaluation Framework (the REF). In this paper the characteristics of the titles of these papers are assessed. Although these varied considerably between the disciplines, the main findings were that: (i) the lengths of the titles increased with the number of authors in almost all disciplines, (ii) the use of colons and question marks tended to decline with increasing author numbers-although there were a few disciplines, such as economics, where the reverse was evident, (iii) papers published later on in the 4-year period tended to have more authors than those published earlier, and (iv), in some disciplines, the numbers of subsequent citations to papers were higher when the titles were shorter and when they employed colons but lower when they used question marks.
Research productivity distributions exhibit heavy tails because it is common for a few researchers to accumulate the majority of the top publications and their corresponding citations. Measurements of this productivity are very sensitive to the field being analyzed and the distribution used. In particular, distributions such as the lognormal distribution seem to systematically underestimate the productivity of the top researchers. In this article, we propose the use of a (log)semi-nonparametric distribution (log-SNP) that nests the lognormal and captures the heavy tail of the productivity distribution through the introduction of new parameters linked to high-order moments. The application uses scientific production data on 140,971 researchers who have produced 253,634 publications in 18 fields of knowledge (O'Boyle and Aguinis in Pers Psychol 65(1):79-119, 2012) and publications in the field of finance of 330 academic institutions (Borokhovich et al. in J Finance 50(5):1691-1717, 1995), and shows that the log-SNP distribution outperforms the lognormal and provides more accurate measures for the high quantiles of the productivity distribution.
Usage data of scholarly articles provide a direct way to explore the usage preferences of users. Using the "Usage Count" provided by the Web of Science platform, we collect and analyze the usage data of five journals in the field of Information Science and Library Science, to investigate the usage patterns of scholarly articles on Web of Science. Our analysis finds that the distribution of usage fits a power law. And according to the time distribution of usage, researchers prefer to use more recent papers. As to those old papers, citations play an important role in determining the usage count. Highly cited old papers are more likely to be used even a long time after publication.
During the past 30 years there has been growing interests in global value chains (GVC) across various disciplines including economics, business & management, economic geography, operational research, computer science, engineering, and so forth. In order to further explore GVC research, this paper employs bibliometric analysis based on co-occurrence network; specifically, it investigates the temporal evolution of disciplines and keywords co-occurrences, as well as the reference co-citation analysis between 1995 and 2014, in order to uncover the evolution of disciplines and research fronts, and identify the intellectual base of global value chains research.
Fields of science (FOS) can be used for the assessment of publishing patterns and scientific output. To this end, WOS JCR (Web of Science/Journal Citation Reports) subject categories are often mapped to Frascati-related OECD FOS (Organization for Economic Co-operation and Development). Although WOS categories are widely employed, they reflect agriculture (one of six major FOS) less comprehensively. Other fields may benefit from agricultural WOS mapping. The aim was to map all articles produced nationally (Slovenia) by agricultural research groups, over two decades, to their corresponding journals and categories in order to visualize the strength of links between the categories and scatter of articles, based on WOS-linked raw data in COBISS/SciMet portal (Co-operative Online Bibliographic System and Services/Science Metrics) and national CRIS-Slovenian Current Research Information System. Agricultural groups are mapped into four subfields: Forestry and Wood Science, Plant Production, Animal Production, and Veterinary Science. Food science is comprised as either plant- or animal-product-related. On average, 50 % of relevant articles are published outside the scope of journals mapped to WOS agricultural categories. The other half are mapped mostly to OECD Natural-, Medical- and Health Sciences, and Engineering-and-Technology. A few selected journals and principal categories account for an important part of all relevant documents (core). Even many core journals/categories as ascertained with power laws (Bradford's law) are not mapped to agriculture. Research-evaluation based on these classifications may underestimate multidisciplinary dimensions of agriculture, affecting its position among scientific fields and also subsequent funding if established on such ranking.
Aging is considered to be an important factor in a scholar's propensity to innovate, produce, and collaborate on high quality work. Yet, empirical studies in the area are rare and plagued with several limitations. As a result, we lack clear evidence on the relationship between aging and scholarly communication activities and impact. To this end, we study the complete publication profiles of more than 1000 authors across three fields-sociology, economics, and political science-to understand the relationship between aging, productivity, collaboration, and impact. Furthermore, we analyze multiple operationalizations of aging, to determine which is more closely related to observable changes in scholarly communication behavior. The study demonstrates that scholars remain highly productive across the life-span of the career (i.e., 40 years), and that productivity increases steeply until promotion to associate professor and then remains stable. Collaboration increases with age and has increased over time. Lastly, a scholar's work obtains its highest impact directly around promotion and then decreases over time. Finally, our results suggest a statistically significant relationship between rank of the scholar and productivity, collaboration, and impact. These results inform our understanding of the scientific workforce and the production of science.
Using co-authored publications between China and Korea in Web of Science (WoS) during the one-year period of 2014, we evaluate the government stimulation program for collaboration between China and Korea. In particular, we apply dual approaches, full integer versus fractional counting, to collaborative publications in order to better examine both the patterns and contents of Sino-Korean collaboration networks in terms of individual countries and institutions. We first conduct a semi-automatic network analysis of Sino-Korean publications based on the full-integer counting method, and then compare our categorization with contextual rankings using the fractional technique; routines for fractional counting of WoS data are made available at http://www.leydesdorff.net/software/fraction. Increasing international collaboration leads paradoxically to lower numbers of publications and citations using fractional counting for performance measurement. However, integer counting is not an appropriate measure for the evaluation of the stimulation of collaborations. Both integer and fractional analytics can be used to identify important countries and institutions, but with other research questions.
We investigated some factors that can affect the citation behavior in young scientific fields by using ethnobiology as a research model. In particular, we sought to assess the degree of insularity in the citations of scientific articles and whether this behavior varies across countries, continents and related areas of knowledge. In addition, we analyzed if researchers cite more scientific articles or gray literature in their publications and whether there is variation in this behavior among different continents and areas of knowledge. We also assessed citation behavior considering open and closed access journals. Scientific articles from four journals that relate to ethnobiology were selected; two are open access journals, and two are closed access journals. Overall, we found a general lack of insularity, but the analysis by country revealed the existence of this phenomenon in Brazil, the United States, India, Mexico, Spain and Turkey. Contrary to what the scientometric literature indicates, the scientific articles that were published in closed access journals are cited more often than the scientific articles that were published in open access journals. This citation behavior may relate to the better establishment of this type of journal in the ethnobiology field, which also had articles with a lower citation rate of gray literature.
In order to maintain food security and sustainability of production under climate change, interdisciplinary and international collaboration in research is essential. In the EU, knowledge hubs are important funding instruments for the development of an interconnected European Research Area. Here, network analysis was used to assess whether the pilot knowledge hub MACSUR has affected interdisciplinary collaboration, using co-authorship of peer reviewed articles as a measure of collaboration. The broad community of all authors identified as active in the field of agriculture and climate change was increasingly well connected over the period studied. Between knowledge hub members, changes in network parameters suggest an increase in collaborative interaction beyond that expected due to network growth, and greater than that found in the broader community. Given that interdisciplinary networks often take several years to have an impact on research outputs, these changes within the relatively new MACSUR community provide evidence that the knowledge hub structure has been effective in stimulating collaboration. However, analysis showed that knowledge hub partners were initially well-connected, suggesting that the initiative may have gathered together researchers with particular resources or inclinations towards collaborative working. Long term, consistent funding and ongoing reflection to improve networking structures may be necessary to sustain the early positive signs from MACSUR, to extend its success to a wider community of researchers, or to repeat it in less connected fields of science. Tackling complex challenges such as climate change will require research structures that can effectively support and utilise the diversity of talents beyond the already well-connected core of scientists at major research institutes. But network research shows that this core, well-connected group are vital brokers in achieving wider integration.
The very nature of scientific activity and information, which are meant to be shared, is the starting point in defining a scientific journal, and the criteria according to which its value and role are determined. The authors aim at analysing some criteria that define the quality of scientific journals considering their visibility and impact. The concept of open access for journals is analysed in point of its advantages and disadvantages, since it differs greatly from the subscription-based access, whether we talk about institutional or individual subscriptions. The authors are in favour of the concept of public access, considering that it gives a journal more visibility, on the condition that article processing charges are reduced. The essential condition for a journal to become renowned is to be as visible as possible. The concept of open access is beneficial, supports instruction through and for scientific research, regardless of educational level. The aim of this paper is to emphasise the modalities, specificities and bibliometric performances (percent of citable documents, impact factor and immediacy index) of open access versus subscription-based access, as well as to investigate whether the use of open access concept determines an increase of the journals' quality, study applied to the analysis of Hindawi Publishing Company journals and Multidisciplinary Digital Publishing Institute (MDPI) journals.
Japan's system of Grants-in-aid for scientific research aims to promote creative and pioneering research across a wide spectrum of fields, ranging from the humanities and social sciences to the natural sciences. The grants, amounting to 191 billion JPY (in 2006 FY) per year, are the main sources of competitive research funds for applicants belonging to national and public universities. In this paper, we examine this system by performing a statistical analysis on the relation between the awarding of grants and the attributes of applicants, including field, degree, position, and other covariates. We used Poisson, negative binomial distributions, their generalized linear models and logistic regressions for implementing covariates to the model. The analysis reveals interesting features of academic fields as well as the attributes of the awarded applicants.
Altmetrics is an emergent research area whereby social media is applied as a source of metrics to assess scholarly impact. In the last few years, the interest in altmetrics has grown, giving rise to many questions regarding their potential benefits and challenges. This paper aims to address some of these questions. First, we provide an overview of the altmetrics landscape, comparing tool features, social media data sources, and social media events provided by altmetric aggregators. Second, we conduct a systematic review of the altmetrics literature. A total of 172 articles were analysed, revealing a steady rise in altmetrics research since 2011. Third, we analyse the results of over 80 studies from the altmetrics literature on two major research topics: cross-metric validation and coverage of altmetrics. An aggregated percentage coverage across studies on 11 data sources shows that Mendeley has the highest coverage of about 59 % across 15 studies. A meta-analysis across more than 40 cross-metric validation studies shows overall a weak correlation (ranging from 0.08 to 0.5) between altmetrics and citation counts, confirming that altmetrics do indeed measure a different kind of research impact, thus acting as a complement rather than a substitute to traditional metrics. Finally, we highlight open challenges and issues facing altmetrics and discuss future research areas.
Most scholarly journals have explicit copyright restrictions for authors outlining how published articles, or earlier manuscript versions of such articles, may be distributed on the open web. Empirical research on the development of open access (OA) is still scarce and methodologically fragmented, and research on the relationship between journal copyright restrictions and actual free online availability is non-existent. In this study the free availability of articles published in eight top journals within the field of Information Systems (IS) is analyzed by observing the availability of all articles published in the journals during 2010-2014 (1515 articles in total) through the use of Google and Google Scholar. The web locations and document versions of retrieved articles for up to three OA copies per published article were categorized manually. The web findings were contrasted to journal copyright information and augmented with citation data for each article. Around 60 % of all published articles were found to have an OA copy available. The findings suggest that copyright restrictions weakly regulate actual author-side dissemination practice. The use of academic social networks (ASNs) for enabling online availability of research publications has grown increasingly popular, an avenue of research dissemination that most of the studied journal copyright agreements failed to explicitly accommodate.
The manifold activities in Southeast Asian countries for establishing World-Class Universities are observed since several years. In contrast, the substantial efforts in Arabian countries are barely noticed. As illustrative example, the King Abdulaziz University has enormously increased the quantity and quality of its research reflected in a growing number of articles and rising citation scores. This development implies a steadily improving position in rankings such as the Shanghai Ranking, but the investment is not unilaterally focussed on research, but education profited as well. The amelioration in science is substantially based on new academic staff from foreign countries experienced in high level research, but the number of nationals, male and female, significantly rose as well. The investment in research and education has to be considered as starting point of a long-term strategy of economic development for coping with the foreseeable end of the oil boom.
The effect of collaborators on institutions scientific impact was examined for 81 institutions with different degrees of impact and collaboration. Not only collaborators including both core and peripheral collaborators cite each other more than non-collaborators, but also the first group cites each other faster than the second group even when self-citations were ignored. Although high impact institutions and more collaborative institutions receive more citations from their collaborators, it seems that the number of these citations increases only up to a certain point. In this regard, for example, there is a slight difference between top and middle collaborative institutions; however, only a small fraction of collaborators do not cite back the papers of these two groups of institutions. The benefit of collaboration varies based on the type of collaborators, institutions, papers, citers and the publication year of cited documents. For example, the effect of collaboration decreases as the institutions level of impact increases. Hence, collaborating more does not directly imply obtaining higher impact.
The discrepancies among various global university rankings derive us to compare and correlate their results. Thus, the 2015 results of six major global rankings are collected, compared and analyzed qualitatively and quantitatively using both ranking orders and scores of the top 100 universities. The selected six global rankings include: Academic Ranking of World Universities (ARWU), Quacquarelli Symonds World University Ranking (QS), Times Higher Education World University Ranking (THE), US News & World Report Best Global University Rankings (USNWR), National Taiwan University Ranking (NTU), and University Ranking by Academic Performance (URAP). Two indexes are used for comparison namely, the number of overlapping universities and Pearson's/Spearman's correlation coefficients between each pair of the studied six global rankings. The study is extended to investigate the intra-correlation of ARWU results of the top 100 universities over a 5-year period (2011-2015) as well as investigation of the correlation of ARWU overall score with its single indicators. The ranking results limited to 49 universities appeared in the top 100 in all six rankings are compared and discussed. With a careful analysis of the key performance indicators of these 49 universities one can easily define the common features for a world-class university. The findings indicate that although each ranking system applies a different methodology, there are from a moderate to high correlations among the studied six rankings. To see how the correlation behaves at different levels, the correlations are also conducted for the top 50 and the top 200 universities. The comparison indicates that the degree of correlation and the overlapping universities increase with an increase in the list length. The results of URAP and NTU show the strongest correlation among the studied rankings. Shortly, careful understanding of various ranking methodologies are of utmost importance before analysis, interpretation and usage of ranking results. The findings of the present study could inform policy makers at various levels to develop policies aiming to improve performance and thereby enhance the ranking position.
The analysis of productivity in Higher Education Institutions (HEIs) at a European level reveals enormous differences in output per researcher across countries. This study develops a 5-step methodology that explicitly considers the quality of scientific output in EU universities and its specialisations to explain and decompose the differences in output per university researcher in terms of (a) differences in efficiency within each field of science (FOS), (b) differences in FOS specialisation of HEIs in each country, (c) differences in quality, and (d) differences in allocation of resources per researcher. The inefficiency levels estimated show that across the EU as a whole there is a substantial margin for increasing research output without having to spend more resources. There are also major differences between countries in terms of inefficiency. The main sources of heterogeneity in scientific output in the HEIs of the EU are the differences in resources allocated per researcher and, to a lesser extent, the differences in efficiency within each knowledge field. The differences in quality and in specialisation also play a smaller role in determining differences in output.
The present study analyzes scientific publications on mass gatherings, characterizing its development as an emerging research field. We identified publications on mass gatherings, analyzing the scientific production and carrying out a co-citation analysis. We identified the works of reference that have laid the intellectual foundation for the field as well as the main scientific disciplines and journals that have contributed to its development. We identified 278 documents that cited 7149 bibliographic references. The 2006-2010 period saw a dramatic increase in the number of works published. Papers on mass gatherings also appeared frequently in multidisciplinary journals of high visibility and impact. The co-citation analysis revealed the existence of five clusters or thematic nuclei in research of the area. One large cluster brings together different studies on the prevalence of infectious diseases associated with pilgrimages to Mecca, and another cluster focuses on planning and response for health services in the context of mass gatherings associated with sporting events. Different indicators help characterize the nature of this emerging field, in which we observe the absence of a stable research community, the recentness of the bibliographic citations, and a high concentration of publications on the topic, with no peripheral areas of investigation. The study of mass gatherings is an emerging area of research with a notably multidisciplinary nature. Given the relevance and incidence of mass gatherings in relation to population health, it is necessary to foster the conditions that favor the consolidation of the field as a topic of research.
The new Norwegian system for calculation of publication credits is examined. The new system was launched due to criticism for penalizing collaborative research. It turns out that adverse incentive problems emerge as a result of this system change. We show by a simple case, that institutions will benefit (credit-wise) by adding more authors to a scientific publication. Even worse, the beneficial effect increases the more authors the paper has initially. Alternative cases indicate even stronger incentives for co-author maximization.
There is a great variation of research output across countries in terms of differences in the amount of published peer-reviewed literature. Besides determining the causal determinants of these differences, an important task of scientometric research is to make accurate predictions of countries' future research output. Building on previous research on the key drivers of differences in countries' research outputs, this study develops a model which includes sixteen macro-level predictors representing aspects of the research and economic system, of the political conditions, and of structural and cultural attributes of countries. In applying a machine learning procedure called boosted regression trees, the study demonstrates these predictors are sufficient for making highly accurate forecasts of countries' research output across scientific disciplines. The study also shows that using a functionally flexible procedure like boosted regression trees can substantially increase the predictive power of the model when compared to traditional regression. Finally, the results obtained allow a different perspective on the functional forms of the relations between the predictors and the response variable.
The "citation score" remains the most commonly-used measure of academic impact, but is also viewed as practically and conceptually limited. The aim of this case study was to test the feasibility of creating a "citation profile" for a single, frequently-cited methods paper, the author's own publication on the conceptual framework for implementation fidelity. This was a proof-of-concept study that involved an analysis of the citations of a single publication. This analysis involved identifying all citing publications and recording, not only how many times the key paper was cited within each citing publication, but also within which sections of that publication (e.g. Background, Methods, Results etc.). Level of impact could be categorised as high, moderate or low. The key paper had been cited more than 400 times and had a high impact in 25 % of publications based on citation frequency within publications, i.e. the key paper was cited three or more times; and a low impact in 58 % of citing publications, i.e. the key paper was cited just once. There were 41 "high impact" publications based on location of the citations, of which 35 (85 %) were also categorised as high impact by frequency. These results suggest that it is both possible and straightforward to categorise the level of impact of a key paper based on its "citation profile", i.e., the frequency with which the paper is cited within citing publications, thus adding depth and value to the citation metric.
An important way in which medical research can translate into improved health outcomes is by motivating or influencing clinical trials that eventually lead to changes in clinical practice. Citations from clinical trials records to academic research may therefore serve as an early warning of the likely future influence of the cited articles. This paper partially assesses this hypothesis by testing whether prior articles referenced in ClinicalTrials.gov records are more highly cited than average for the publishing journal. The results from four high profile general medical journals support the hypothesis, although there may not be a cause-and effect relationship. Nevertheless, it is reasonable for researchers to use citations to their work from clinical trials records as evidence of the possible long-term impact of their research.
The objective of this paper is to understand the relationship between the diffusion and mention of research papers in Twitter according to whether their authors are members or not of that micro-blogging service. To that end, 4166 articles from 76 Twitter users and 124 from non-Twitter users were analysed. Data on Twitter mentions were extracted from PlumX Analytics, information on each Twitter user was taken from the own platform and citations were collected from Scopus public API. Results show that papers from Twitter users are 33 % more tweeted than documents of non-Twitter users. From Twitter users, the increase of followers produces 30 % more tweets. No differences were found between the citation impact (i.e. number of citations) of papers authored by Twitter users and non-Twitter users. However, the number of followers indirectly influences the citation impact. The main conclusion is that the participation on Twitter affects the dissemination of research papers, and in consequence, it indirectly favours the likelihood that academic outputs being cited.
Scholarly articles are discussed and shared on social media, which generates altmetrics. On the opposite side, what is the impact of social media on the dissemination of scholarly articles and how to measure it? What are the visiting patterns? Investigating these issues, the purpose of this study is to seek a solution to fill the research gap, specifically, to explore the dynamic visiting patterns directed by social media, and examine the effects of social buzz on the article visits. Using the unique real referral data of 110 scholarly articles, which are daily updated in a 90-day period, this paper proposes a novel method to make analysis. We find that visits from social media are fast to accumulate but decay rapidly. Twitter and Facebook are the two most important social referrals that directing people to scholarly articles, the two are about the same and account for over 95 % of the total social referral directed visits. There is synchronism between tweets and tweets resulted visits. Social media and open access are playing important roles in disseminating scholarly articles and promoting public understanding science, which are confirmed quantitatively for the first time with real data in this study.
Confirmatory bias induces overconfidence in the sense that people believe more strongly than they should in their preferred hypotheses. This work describes a Bayesian-based formal model to study the effect of overconfidence about the causes of manuscript rejection due to confirmatory bias in peer review. In addition, we also present an online tool that helps authors to study their beliefs about the causes of rejection. This tool takes the authors' self-evaluated probability of misinterpretation (i.e. confirmatory bias) and self-evaluated probability of perceiving a review signal correlated with the true cause of rejection and a sequence of review signals perceived as input, and gives a prediction of whether there is overconfidence and wrongness in the author's belief that bias in peer review caused rejection. We continue to discuss the effect of confirmatory bias in the editor-reviewer relationship in peer review process and show that when the strength of informative signals about the manuscript quality is sufficiently weak and reviewer's confirmatory bias is sufficiently severe, there is a strong probability that the reviewer would erroneously identify the manuscript quality, making the editor less inclined to offer the potential reviewer any incentive to accept the invitation to review the manuscript. Based on these, we offer a theoretical explanation of current practices adopted to improve the review performance (e.g., desk rejection).
Citation analyses normally investigate the number of citations of publications (e.g. by people, institutions or journals) where the information on times cited from the bibliographic databases (such as Scopus or Web of Science) is evaluated. But in recent years, a series of works have also been published which have undertaken a change of perspective and are based on the evaluation of the cited references. The cited references are the works cited in the publications which are used to calculate the times cited. Since these evaluations have led to important insights into science and into scientometric indicators, this paper presents an overview of methods based on cited references, and examples of some empirical results from studies are presented. Thus, the investigation of references allows general statements to be made on the precision of citation analyses, and offers alternatives for the normalization of citation numbers in the framework of research evaluation using citation impact. Via the analysis of references, the historical roots of research areas or the works of decisive importance in an area can be determined. References allow quantitative statements on the interdisciplinarity of research units and the overall growth of science. The use of a selection for the analysis of references from the publications of specific research areas enables the possibility of measuring citation impact target-oriented (i.e. limited to these areas). As some empirical studies have shown, the identification of publications with a high creative content seems possible via the analysis of the cited references. The possibilities presented here for cited reference analysis indicate the great potential of the data source. We assume that there are additional possibilities for its application in scientometrics.
Information interventions that influence health behavior are a major element of the public health toolkit and an area of potential interest and investigation for library and information science (LIS) researchers. To explore the use of information as a concept within dominant public health behavior models and the manner in which information practices are handled therein, we undertook a scoping study. We scoped the use of information within core English-language health behavior textbooks and examined dominant models of health behavior for information practices. Index terms within these texts indicated a lack of common language around information-related concepts. Nine models/theories were discussed in a majority of the texts. These were grouped by model type and examined for information-related concepts/constructs. Information was framed as a thing or resource, and information practices were commonly included or implied. However, lack of specificity regarding the definition of information, how it differs from knowledge, and how context affects information practices make the exact role of information within health behavior models unclear. Although health information interventions may be grounded in behavioral theory, a limited understanding of the ways information works within people's lives hinders our ability to effectively use information to improve health. By the same token, information scientists should explore public health's interventionist approach.
Sustained use of an information source is sometimes important for achieving an individual's long-term goals, such as learning and self-development. It is even more important for users of online health communities because health benefits usually come with sustained use. However, little is known about what retains a user. We interviewed 21 participants who had been using online diabetes communities in a sustained manner. Guided by self-determination theory, which posits that behaviors are sustained when they can satisfy basic human needs for autonomy, competence, and relatedness, we identified mechanisms that help satisfy these needs, and thus sustain users in online health communities. Autonomy-supportive mechanisms include being respected and supported as a unique individual, feeling free in making choices, and receiving meaningful rationales about others' decisions. Competence-cultivating mechanisms include seeking information, providing information, and exchanging information with others to construct knowledge. Mechanisms that cultivate relatedness include seeing similarities between oneself and peers, receiving responses from others, providing emotional support, and forming small underground groups for closer interactions. The results suggest that, like emotions, information and small group interactions also play a key role in retaining users. System design and community management strategies are discussed based on these mechanisms.
The effects of distraction on completion scores generate a gap that is generally not taken into account in information behavior studies. This research investigated what happens if researchers de facto allow distractions to occur in a test situation. It examined the type and magnitude of occurred distractions, the effects distractions have on completion scores, and whether different distractions affect different test activities differently. In the research design, participants were randomly assigned to either a controlled environment or their natural environment. The results showed that whereas participants in the natural environment needed more time to complete the post task questionnaire than their laboratory counterparts, they spent a similar amount of time on the tasks. Participants were capable of, and indeed willing to, limit the less-urgent distractions in the interests of getting the tasks done. If they were interrupted by a human contact, however, the completion time for tasks increased significantly. Previous studies showed that distractions change information behavior. Yet, the present results provide evidence that these changes do not always occur, and thus there needs to be a better demarcation of the limits within which distraction can be expected to change how people interact with information.
With bases in protection motivation theory and social capital theory, this study investigates teen and parental factors that determine teens' online privacy concerns, online privacy protection behaviors, and subsequent online information disclosure on social network sites. With secondary data from a 2012 survey (N=622), the final well-fitting structural equation model revealed that teen online privacy concerns were primarily influenced by parental interpersonal trust and parental concerns about teens' online privacy, whereas teen privacy protection behaviors were primarily predicted by teen cost-benefit appraisal of online interactions. In turn, teen online privacy concerns predicted increased privacy protection behaviors and lower teen information disclosure. Finally, restrictive and instructive parental mediation exerted differential influences on teens' privacy protection behaviors and online information disclosure.
Location-based information can now be easily accessed anytime and anywhere using mobile devices. Common ways of presenting such information include lists, maps, and augmented reality (AR). Each of these interface types has its strengths and weaknesses, but few empirical evaluations have been conducted to compare them in terms of performance and perceptions of usability. In this paper, we investigate these issues using three interface types for searching and browsing location-based information across two task types: open and closed ended. The experimental study involved 180 participants who were issued an Android mobile phone preloaded with a specific interface and asked to perform a set of open- and closed-ended tasks using both searching and browsing approaches. The results suggest that the list interface performed best across all tasks in terms of completion times, whereas the AR interface ranked second and the map interface performed worst. Participants rated the list as best across most usability constructs but the map was rated better than the AR interface, even though the latter performed better. Implications of the work are discussed.
Several studies have identified important factors for search success in online searches, but until now it has not been determined whether the influence of these factors varies during the search process. This study analyzes (a) whether search expertise, prior topic knowledge, topic interest, or flow experience during a search of the World Wide Web (WWW) influence success in finding relevant information and (b) whether the effects of these predictors vary during the course of the search process. Two different search tasks are investigated: The evaluating task focuses on the selection of relevant websites from a large number of potentially relevant sites, whereas the finding task focuses on the difficulty of finding information in the case of a lack of potentially relevant websites. Survival analysis is applied to data from a quasi-experiment. This analysis considers not only the question of whether information is found, but also when. Findings show that search expertise and flow explain success in the evaluation task; however, flow is only influential in the first phase of the search process. For the finding task, the predictors have no explanatory strength.
Traditionally, many information retrieval models assume that terms occur in documents independently. Although these models have already shown good performance, the word independency assumption seems to be unrealistic from a natural language point of view, which considers that terms are related to each other. Therefore, such an assumption leads to two well-known problems in information retrieval (IR), namely, polysemy, or term mismatch, and synonymy. In language models, these issues have been addressed by considering dependencies such as bigrams, phrasal-concepts, or word relationships, but such models are estimated using simple n-grams or concept counting. In this paper, we address polysemy and synonymy mismatch with a concept-based language modeling approach that combines ontological concepts from external resources with frequently found collocations from the document collection. In addition, the concept-based model is enriched with subconcepts and semantic relationships through a semantic smoothing technique so as to perform semantic matching. Experiments carried out on TREC collections show that our model achieves significant improvements over a single word-based model and the Markov Random Field model (using a Markov classifier).
The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield-style test. Finally, a detailed topic-level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition (OCR) errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
The segment of companies providing storage services and hardware for end users and small businesses has been growing in the past few years. Cloud storage, personal network-attached storage (NAS), and external hard drives are more affordable than ever before and one would think that backing up personal digital information is a straightforward process nowadays. Despite this, small group studies and corporate surveys show the opposite. In this paper we present the results from a quantitative and qualitative survey of 319 participants about how they back up their personal computers and restore personal information in case of computer failures. The results show that the majority of users do manual, selective, and noncontinuous backups, rely on a set of planned and unplanned backups (as a consequence of other activities), have inadequate knowledge about possible solutions and implications of using known solutions, and so on. The study also reveals that around a fifth of all computers are not backed up, and a quarter of most important files and a third of most important folders at the time of the survey could not be (fully) restored in the event of computer failure. Based on the results, several implications for practice and research are presented.
We present an analysis of data citation practices based on the Data Citation Index (DCI) (Thomson Reuters). This database launched in 2012 links data sets and data studies with citations received from the other citation indexes. The DCI harvests citations to research data from papers indexed in the Web of Science. It relies on the information provided by the data repository. The findings of this study show that data citation practices are far from common in most research fields. Some differences have been reported on the way researchers cite data: Although in the areas of science and engineering & technology data sets were the most cited, in the social sciences and arts & humanities data studies play a greater role. A total of 88.1% of the records have received no citation, but some repositories show very low uncitedness rates. Although data citation practices are rare in most fields, they have expanded in disciplines such as crystallography and genomics. We conclude by emphasizing the role that the DCI could play in encouraging the consistent, standardized citation of research dataa role that would enhance their value as a means of following the research process from data collection to publication.
Almost all research evaluation exercises, by construction, deliver data at the level of departments and universities, not at the level of individuals. Yet, the aggregate performance is the average of the performance of individual researchers. This paper explores the issue of the relative magnitude of variability in performance within departments and between departments. It exploits anonymized data at the individual level from one of the largest research evaluation exercises, the Italian VQR 2004-2010 (Valutazione della Qualita della Ricerca). If the variability between departments were much larger than variability within departments, we would see evidence of a process of stratification, or vertical differentiation, arguably driven by competition and researcher mobility. The data show that the opposite pattern is at play.
Scholars writing books that are widely used to support teaching in higher education may be undervalued because of a lack of evidence of teaching value. Although sales data may give credible evidence for textbooks, these data may poorly reflect educational uses of other types of books. As an alternative, this article proposes a method to search automatically for mentions of books in online academic course syllabi based on Bing searches for syllabi mentioning a given book, filtering out false matches through an extensive set of rules. The method had an accuracy of over 90% based on manual checks of a sample of 2,600 results from the initial Bing searches. Over one third of about 14,000 monographs checked had one or more academic syllabus mention, with more in the arts and humanities (56%) and social sciences (52%). Low but significant correlations between syllabus mentions and citations across most fields, except the social sciences, suggest that books tend to have different levels of impact for teaching and research. In conclusion, the automatic syllabus search method gives a new way to estimate the educational utility of books in a way that sales data and citation counts cannot.
F1000 recommendations were assessed as a potential data source for research evaluation, but the reasons for differences between F1000 Article Factor (FFa scores) and citations remain unexplored. By linking recommendations for 28,254 publications in F1000 with citations in Scopus, we investigated the effect of research level (basic, clinical, mixed) and article type on the internal consistency of assessments based on citations and FFa scores. The research level has little impact on the differences between the 2 evaluation tools, while article type has a big effect. These 2 measures differ significantly for 2 groups: (a) nonprimary research or evidence-based research are more highly cited but not highly recommended, while (b) translational research or transformative research are more highly recommended but have fewer citations. This can be expected, since citation activity is usually practiced by academic authors while the potential for scientific revolutions and the suitability for clinical practice of an article should be investigated from a practitioners' perspective. We conclude with a recommendation that the application of bibliometric approaches in research evaluation should consider the proportion of 3 types of publications: evidence-based research, transformative research, and translational research. The latter 2 types are more suitable for assessment through peer review.
Little is known about what factors influence users' continued use of social cataloging sites. This study therefore examines the impacts of key factors from theories of information systems (IS) success and sense of community (SOC) on users' continuance intention in the social cataloging context. Data collected from an online survey of 323 social cataloging users provide empirical support for the research model. The findings indicate that both information quality (IQ) and system quality (SQ) are significant predictors of satisfaction and SOC, which in turn lead to users' intentions to continue using these sites. In addition, SOC was found to affect continuance intention not only directly, but also indirectly through satisfaction. Theoretically, this study draws attention to a largely unexplored but essential area of research in the social cataloging literature and provides a fundamental basis to understand the determinants of continued social cataloging usage. From a managerial perspective, the findings suggest that social cataloging service providers should constantly focus their efforts on the quality control of their contents and system, and the enhancement of SOC among their users.
Scientists and managers using citation-based indicators to help evaluate research cannot evaluate recent articles because of the time needed for citations to accrue. Reading occurs before citing, however, and so it makes sense to count readers rather than citations for recent publications. To assess this, Mendeley readers and citations were obtained for articles from 2004 to late 2014 in five broad categories (agriculture, business, decision science, pharmacy, and the social sciences) and 50 subcategories. In these areas, citation counts tended to increase with every extra year since publication, and readership counts tended to increase faster initially but then stabilize after about 5 years. The correlation between citations and readers was also higher for longer time periods, stabilizing after about 5 years. Although there were substantial differences between broad fields and smaller differences between subfields, the results confirm the value of Mendeley reader counts as early scientific impact indicators.
The evaluation of scientific research is crucial for both the academic community and society as a whole. Numerous bibliometric indices have been proposed for the ranking of research performance, mainly on an ad hoc basis. We introduce the novel class of Scientific Research Measures (SRMs) to rank scientists' research performance and provide a rigorous theoretical foundation for these measures. In contrast to many bibliometric indices, SRMs take into account the whole citation curve of the scientist, offer appealing structural properties, allow a finer ranking of scientists, correspond to specific features of different disciplines, research areas and seniorities, and include several bibliometric indices as special cases. Thus SRMs result in more accurate rankings than ad hoc bibliometric indices. We also introduce the further general class of Dual SRMs that reflect the value of journals and permit the ranking of research institutions based on theoretically sound criteria, which has been a central theme in the scientific community over recent decades. An empirical application to the citation curves of 173 finance scholars shows that SRMs can be easily calibrated to actual citation curves and generate different authors' rankings than those produced by seven traditional bibliometric indices.
Visualization of scientific results using networks has become popular in scientometric research. We provide base maps for Mendeley reader count data using the publication year 2012 from the Web of Science data. Example networks are shown and explained. The reader can use our base maps to visualize other results with the VOSViewer. The proposed overlay maps are able to show the impact of publications in terms of readership data. The advantage of using our base maps is that it is not necessary for the user to produce a network based on all data (e.g., from 1 year), but can collect the Mendeley data for a single institution (or journals, topics) and can match them with our already produced information. Generation of such large-scale networks is still a demanding task despite the available computer power and digital data availability. Therefore, it is very useful to have base maps and create the network with the overlay technique.
Users of research databases, such as CiteSeer(X), Google Scholar, and Microsoft Academic, often search for papers using a set of keywords. Unfortunately, many authors avoid listing sufficient keywords for their papers. As such, these applications may need to automatically associate good descriptive keywords with papers. When the full text of the paper is available this problem has been thoroughly studied. In many cases, however, due to copyright limitations, research databases do not have access to the full text. On the other hand, such databases typically maintain metadata, such as the title and abstract and the citation network of each paper. In this paper we study the problem of predicting which keywords are appropriate for a research paper, using different methods based on the citation network and available metadata. Our main goal is in providing search engines with the ability to extract keywords from the available metadata. However, our system can also be used for other applications, such as for recommending keywords for the authors of new papers. We create a data set of research papers, and their citation network, keywords, and other metadata, containing over 470K papers with and more than 2 million keywords. We compare our methods with predicting keywords using the title and abstract, in offline experiments and in a user study, concluding that the citation network provides much better predictions.
In this communication I give a brief introduction to Valiant's probably approximately correct (PAC) theory, provide an extension that goes beyond Valiant's ideas (and beyond the domain for which this theory was meant), and come to an interpretation in terms of research evaluation. As such, PAC provides a framework for a theory of research evaluation.
As a follow-up to the highly cited authors list published by Thomson Reuters in June 2014, we analyzed the top 1% most frequently cited papers published between 2002 and 2012 included in the Web of Science (WoS) subject category Information Science & Library Science. In all, 798 authors contributed to 305 top 1% publications; these authors were employed at 275 institutions. The authors at Harvard University contributed the largest number of papers, when the addresses are whole-number counted. However, Leiden University leads the ranking if fractional counting is used. Twenty-three of the 798 authors were also listed as most highly cited authors by Thomson Reuters in June 2014 (). Twelve of these 23 authors were involved in publishing 4 or more of the 305 papers under study. Analysis of coauthorship relations among the 798 highly cited scientists shows that coauthorships are based on common interests in a specific topic. Three topics were important between 2002 and 2012: (a) collection and exploitation of information in clinical practices; (b) use of the Internet in public communication and commerce; and (c) scientometrics.
Using the full-text corpus of more than 75,000 research articles published by seven PLOS journals, this paper proposes a natural language processing approach for identifying the function of citations. Citation contexts are assigned based on the frequency of n-gram co-occurrences located near the citations. Results show that the most frequent linguistic patterns found in the citation contexts of papers vary according to their location in the IMRaD structure of scientific articles. The presence of negative citations is also dependent on this structure. This methodology offers new perspectives to locate these discursive forms according to the rhetorical structure of scientific articles, and will lead to a better understanding of the use of citations in scientific articles.
Based on a unique time-use survey of academic researchers in Japan, this study finds that research time decreases over the life cycle. The decrease in total hours worked and the increase in time spent on administrative tasks explain the decrease in research time. We also show that the decrease of research time partly explains why the research output of older researchers' decreases. The results suggest that proper incentives and job designs for senior researchers may increase their research output.
The purpose of this paper is to investigate the contributions of the Information Systems & MIS articles in the electronic commerce literature. To achieve this, the papers published in the ten top journals are reviewed. This bibliometric study examines the extant literature on Information Systems & MIS and international business. We examine a sample of 853 articles published in ten leading management/business journals during the 23 year period from 1991 to 2014. The results provide a global perspective of the field, identifying the works that have had the greatest impact, the intellectual interconnections among authors and published papers, and the main research traditions or themes that have been explored in Information Systems & MIS studies. Structural and longitudinal analyses reveal changes in the intellectual structure of the field over time. The paper concludes with a discussion of the accumulated knowledge and suggestions for avenues of future research.
In the current UK Research Excellence Framework (REF) and the Excellence in Research for Australia (ERA), societal impact measurements are inherent parts of the national evaluation systems. In this study, we deal with a relatively new form of societal impact measurements. Recently, Altmetric-a start-up providing publication level metrics-started to make data for publications available which have been mentioned in policy documents. We regard this data source as an interesting possibility to specifically measure the (societal) impact of research. Using a comprehensive dataset with publications on climate change as an example, we study the usefulness of the new data source for impact measurement. Only 1.2 % (n = 2341) out of 191,276 publications on climate change in the dataset have at least one policy mention. We further reveal that papers published in Nature and Science as well as from the areas "Earth and related environmental sciences" and "Social and economic geography" are especially relevant in the policy context. Given the low coverage of the climate change literature in policy documents, this study can be only a first attempt to study this new source of altmetrics data. Further empirical studies are necessary, because mentions in policy documents are of special interest in the use of altmetrics data for measuring target-oriented the broader impact of research.
This study aimed to assess the mediating role of save, discussion, and recommendation measures in the relationship between visibility and citation in biomedical articles in 2009-2013. Path analysis method was used to assess the causal relationships between the variables in this descriptive correlational study. Systematic and random stratified methods were employed for sampling. The sample size was determined to be 1892 articles using the Cochrane formula and data were gathered by using the PLOS altmetrics. The study's model fit indices showed that visibility influences citation both directly and indirectly through the mediating role of save. Discussion had a significant, negative role in the relationship between visibility and citation, and recommendation did not have any significant mediating role in this relationship. Among the social networks presenting altmetrics, it seems that networks such as Mendeley which provide a basis for saving scientific articles have an important and significant effect on the amount of future citations through visibility metrics. This is while social networks discussing scientific findings have a negative effect on the future citation of articles through visibility metrics. This asserts that social networks based on save have an influential role as the basis of scientific interaction.
In 2005 Hirsch introduced h-index to evaluate the research output of researchers. This had initiated a debate in the scientific community. Many researchers have evaluated the feasibility of h-index in different scientific domains. Some remained successful while others criticized the effectiveness of h-index in the domains they evaluated. After a decade of this proposal, Dienes critically evaluated the original h-index and have claimed that h index lacks something intrinsic in its definition. Subsequently Dienes introduced a conversion factor based on entire community of one domain to complete the definition of h index. Dienes has not evaluated the conversion factor on actual data; rather they have just proposed mathematical formulations. The aim of our research is to calculate that factor for the field of Mathematics and then after computing completing-h value for all the authors in this community, we have compared our results with h-index (original) and g-index values considering award winners as benchmark. We found out that complete-h contributes positively and shows comparatively better results than h-index and g-index. In top 1000 authors ranked according to these indices 95 award winners were found in complete-h, 76 were found in h-index and 64 were found when authors were ranked according to g-index.
For research evaluation, publication lists need to be matched to entries in large bibliographic databases, such as Thomson Reuters Web of Science. This matching process is often done manually, making it very time consuming. This paper presents the use of character n-grams as automated indicator to inform and ease the manual matching process. The similarity of two references was identified by calculating Salton's cosine for their common character n-grams. As a complementary and confirmatory measure, Kondrak's Levenshtein distance score, based on the character n-grams, is used to re-measure the similarity of the top matches resulting from Salton's cosine. These automated matches were compared to results from completely manual matching. Incorrect matches were examined in depth and possible solutions suggested. This method was applied to two independent datasets, to validate the results and inferences drawn. For both datasets, the Salton's score based on character n-grams proves to be a useful indicator to distinguish between correct and incorrect matches. The suggested method is compared with a baseline which is based on word unigrams. Accuracy of the character and word based systems are 96.0 and 94.7 %, respectively. Despite a small difference in accuracy, we observed that the character based system provides more correct matches when the data contains abbreviations, mathematical expressions or erroneous text.
The aim of this study is to conduct a retrospective bibliometric analysis of articles about rehabilitation medicine using virtual reality technology. Bibliometrics is one subfield of scientometric. It is an effective tool for evaluating research trends in different science fields. A systematic bibliometric search was performed using three academic databases (PubMed, Scopus and Web of Science) between January 1, 1996, and December 31, 2015. Research outputs, countries, institutions, authors, major journals, cited articles, subject area and hot research topics were analyzed to base on bibliometrics methodologies. The retrieval of results was analyzed and described in the form of texts, tables, and graphics. Total of 15,191 articles were identified from three academic databases; and from them 48.32 % published as original articles. The articles were originated from 101 countries and territories. United States was ranked first with 4522 articles, and United Kingdom was on second with 1369 articles. 96.75 % of the articles are published in English. 527 articles were published by the Lecture Notes In Computer Science Including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics. With regard of the research institutions, Eidgenossische Technische Hochschule Zurich published 208 articles ranked first. In the past 20 years, the research outcome of rehabilitation using virtual reality technology research has increased substantially. This study provides a valuable reference for researchers to understand the overview and present situations in this field.
The paper presents analysis of core-periphery structure and transition dynamics of individuals between the core and periphery in collaboration networks of Slovenian researchers over 44 years. We observe the dynamics of individuals from three different aspects: regarding the length of the presence in the core-strength, the length of intervals of permanent presence in the core-stability, and the presence in different time periods-balance. We use clustering and classification machine learning techniques in order to automatically group individuals with similar dynamics of behaviour into common classes. We study collaboration networks of Slovenian researchers based on their publication records. The data we used comprises about 18,000 researchers registered in Slovenian national databases of researchers together with their publications from 1970 to 2013.
This study intends to describe the development and results of a software designed to analyze millions of articles in the area of Transportation Engineering. This tool intends to support Transportation Planning activities by providing additional information about trends, references and technologies. In order to develop this software, techniques from scientometrics, bibliometrics and informetrics were employed with the support of tools from Computer Science, such as Artificial Intelligence, Data Mining and Natural Language Processing. The result of this study is a structured database that allows browsing the change of interest in different topics along the years in areas related to Transportation Engineering. When analyzing a given area, the database is capable of identifying which authors published works in that area, allowing the identification of specialists and related papers. In addition, the software responsible for creating this database is capable of performing the same analysis in academic corpora of other areas of study.
Software plays an important role in the advancement of science. Software developers, users, and funding agencies have deep interests in the impact of software on science. This study investigates the use and impact of software by examining how software is mentioned and cited among 9548 articles published in PLOS ONE in 12 defined disciplines. Our results demonstrate that software is widely used in scientific research and a substantial uncitedness of software exists across different disciplines. Findings also show that the practice of software citations varies noticeably at the discipline level and software that is free for academic use is more likely to receive citations than commercial software.
This paper presents a timely analysis of participation in the 8th European Framework Programme for Research and Innovation (EU FP) Horizon 2020. Our dataset comprises the entire population of research organizations in Norway, enabling us to distinguish between non-applicants, non-successful applicants, and successful participants. We find it important to distinguish two stages of the participation process: the self-selection stage in which organizations decide whether they wish to apply for EU funding, and the second stage in which the European Commission selects the best applications for funding. Our econometric results indicate that the propensity to apply is enhanced by prior participation in EU FPs and the existence of complementary national funding schemes; further, that the probability of succeeding is strengthened by prior participation as well as the scientific reputation of the applicant organization.
When research groups are evaluated by an expert panel, it is an open question how one can determine the match between panel and research groups. In this paper, we outline two quantitative approaches that determine the cognitive distance between evaluators and evaluees, based on the journals they have published in. We use example data from four research evaluations carried out between 2009 and 2014 at the University of Antwerp. While the barycenter approach is based on a journal map, the similarity-adapted publication vector (SAPV) approach is based on the full journal similarity matrix. Both approaches determine an entity's profile based on the journals in which it has published. Subsequently, we determine the Euclidean distance between the barycenter or SAPV profiles of two entities as an indicator of the cognitive distance between them. Using a bootstrapping approach, we determine confidence intervals for these distances. As such, the present article constitutes a refinement of a previous proposal that operates on the level of Web of Science subject categories.
In most areas of computer science (CS), and in the software domain in particular, international conferences are as important as journals as a venue to disseminate research results. This has resulted in the creation of rankings to provide quality assessment of conferences (specially used for academic promotion purposes) like the well-known CORE ranking created by the Computing Research and Education Association of Australasia. In this paper we analyze 102 CORE-ranked conferences in the software area (covering all aspects of software engineering, programming languages, software architectures and the like) included in the DBLP dataset, an online reference for computers science bibliographic information. We define a suite of metrics focusing on the analysis of the co-authorship graph of the conferences, where authors are represented as nodes and co-authorship relationships as edges. Our aim is to first characterize the patterns and structure of the community of researchers in software conferences. We then try to see if these values depend on the quality rank of the conference justifying this way the existence of the different classifications in the CORE-ranking system.
In recent years scholars have built maps of science by connecting the academic fields that cite each other, are cited together, or that cite a similar literature. But since scholars cannot always publish in the fields they cite, or that cite them, these science maps are only rough proxies for the potential of a scholar, organization, or country, to enter a new academic field. Here we use a large dataset of scholarly publications disambiguated at the individual level to create a map of science-or research space-where links connect pairs of fields based on the probability that an individual has published in both of them. We find that the research space is a significantly more accurate predictor of the fields that individuals and organizations will enter in the future than citation based science maps. At the country level, however, the research space and citations based science maps are equally accurate. These findings show that data on career trajectories-the set of fields that individuals have previously published in-provide more accurate predictors of future research output for more focalized units-such as individuals or organizations-than citation based science maps.
This work explores the distribution of citations for the publications of top scientists. A first objective is to find out whether the 80-20 Pareto rule applies, that is if 80 % of the citations to a top scientist's work concern 20 % of their publications. Observing that the rule does not apply, we also measure the dispersion of the citation distribution by means of the Gini coefficient. Further, we investigate the question of what share of a top scientist' publications go uncited. Finally, we study the relation between the dispersion of the citation distribution and the share of uncited publications. As well as the overall level, the analyses are carried out at the field and discipline level, to assess differences across them.
This study proposes a method to automatically establish a narrow-sense knowledge structure for Chinese Library and Information Science (CLIS) using data from the Chinese Social Science Citation Index. The method applies multi-level clustering, using ontological ideas as theoretical guidance and ontology learning techniques as technical means. Knowledge categories generated are checked for cohesion and coupling through hierarchical clustering analysis and multidimensional scaling analysis in order to verify the accuracy and rationality of the narrow-sense knowledge structure of CLIS. Finally, the narrow-sense knowledge structure is expanded to a broad sense. Using scholars as objects in examples, this study discusses the semantic associations between topic knowledge and the other academic objects in CLIS from the micro-, meso-, and macro-levels, so as to fully explore the broad-sense knowledge structure of CLIS for knowledge analysis and applications.
Communication studies depend on information and communication technology (ICT) and the behavior of people using the technology. ICT enables individuals to transfer information quickly via various media. Social changes are occurring rapidly and their studies are growing in number. Thus, a tool to extract knowledge to comprehend the quickly changing dynamics of communication studies is required. We propose a subject-method topic network analysis method that integrates topic modeling analysis and network analysis to understand the state of communication studies. Our analysis focuses on the relationships between topics classified as subjects and methods. From the relationships, we examine the societal and perspective changes relative to emerging media technologies. We apply our method to all papers listed in the Journal Citation Reports Social Science Citation Index as communication studies between 1990 and 2014. The study results allow us to identify popular subjects, methods, and subject-method pairs in proportion and relation.
The unbalanced international scientific collaboration as cause of misleading information on the country's contribution to the scientific world output was analyzed. ESI Data Base (Thomson Reuters' InCites), covering the scientific production of 217 active countries in the period 2010-2014 was used. International collaboration implicates in a high percentage (33.1 %) of double-counted world articles, thus impacting qualitative data as citations, impact and impact relative to word. The countries were divided into three groups, according to their individual contribution to the world publications: Group I (24 countries, at least 1 %) representing 83.9 % of the total double-counted world articles. Group II (40 countries, 0.1-0.99 % each). Group III, 153 countries (70.5 %) with < 0.1 % and altogether 1.9 % of the world. Qualitative characteristics of each group were also analyzed: percentage of the country's GNP applied in R&D, proportion of Scientists and Engineers per million inhabitants and Human Development Index. Average international collaboration were: Group I, 43.0 %; Group II, 55.8 % and Group III, 85.2 %. We concluded that very high and unbalanced international collaboration, as presented by many countries, misrepresent the importance of their scientific production, technological and social outputs. Furthermore, it jeopardizes qualitative outputs of the countries themselves, artificially increasing their scientific impact, affecting all fields and therefore, the whole world. The data confirm that when dealing with the qualitative contribution of countries, it is necessary to take in consideration the level of international cooperation because, as seen here, it can and in fact it does create false impression of the real contribution of countries.
The h-index, introduced by Hirsch in 2005, was used by Schubert in 2009 to assess single publications. In 2011, Bornmann, Schier, Marx, and Daniel confirmed that the h-index is effective when assessing papers in chemistry. Quite a few Hirsch-type indices originate from the h-index. Can these Hirsch-type indices also be effectively used for assessing single publications? Will they behave the same or differently? In this study, the research objects were 26 kinds of Hirsch-type indices (including the original h-index) and three traditional methods, a total of 29 indicators. Based on the original definitions of these indicators and our new explanations of generations (i.e. mixed, pure, and non-pure generations of citations), we defined/redefined 29 paper-level metrics, calculated their values to assess publications, considered the correlations between those indices and the h-index or Wu's w-index, and did factor analysis to contrast effectiveness. It was found that a few Hirsch-type indices (i.e. the f-index, rational h-index, real h-index, j-index, hg-index, Woeginger's w-index, and tapered h-index) are highly correlated with the h-index but not close to Wu's w-index, while some other indices (i.e. the a-index, h(5,2)-index, q(2)-index, r-index, maxprod, e-index, p-index, and weighted h-index) have relatively low correlations with the h-index but are close to Wu's w-index. The normalized h-index and ph-ratio are obviously different from the other indices, and in most cases, their correlation coefficients with the h-index or Wu's w-index are statistically non-significant (p > .05) or negative significant (p < .01). We argue that indices which are neither too near to nor too far from the h-index could be much more promising than others.
Research fronts represent areas of cutting-edge study in specific fields. They not only provide insights into current focuses and future trends, but also serve as crucial indicators for technology-related government policymaking. This study examined research fronts by using three citation window types (i.e., fixed citation windows, citing half life, and sliding windows). Organic light-emitting diodes (OLEDs) were adopted as the research area in comparing the evolution and development of research fronts from the three citation windows. The bibliographic coupling method was applied to identify the research fronts by using 210 highly cited articles in OLED research. The results indicated that among the three citation windows, sliding windows returned the highest number of research fronts, hence exhibiting maximal effectiveness. Furthermore, regarding effectiveness in detecting emerging fronts, both fixed citation windows and citing half life identified four emerging fronts, whereas sliding windows identified 11 emerging fronts, demonstrating optimal effectiveness.
An experiment run in 2009 could not assess whether making monographs available in open access enhanced scholarly impact. This paper revisits the experiment, drawing on additional citation data and tweets. It attempts to answer the following research question: does open access have a positive influence on the number of citations and tweets a monograph receives, taking into account the influence of scholarly field and language? The correlation between monograph citations and tweets is also investigated. The number of citations and tweets measured in 2014 reveal a slight open access advantage, but the influence of language or subject should also be taken into account. However, Twitter usage and citation behaviour hardly overlap.
Malaysia has three main ethnic communities: Chinese, Indians and Malays. At independence in 1957, the Chinese dominated commercial life, and this led to ethnic tensions and finally riots. As a result in 1969 Malaysia introduced a "New Economic Policy'' (NEP) to promote Malays in all areas of activity, and in particular to assist them to obtain basic and higher education. We examined the scientific outputs from Malaysia between 1982 and 2014 and classified the names of Malaysian researchers into one of these three groups and two others. There was a major increase in Malay participation in research, which has risen from 20 % of researchers in 1982-1984 to 65 % in 2012-2014, with corresponding declines in the percentages of Chinese and Indian authors, although their absolute numbers have increased because Malaysian scientific output has increased so rapidly in the last 10 years. The huge increase in Malay researchers contrasts with their presence in the Malaysian population which has remained stable at about 50 % since 1969.
The objective of this research is to determine if the reference to a country in the title, keywords or abstract of a publication can influence its visibility (measured by the impact factor of the publishing journal) and citability (measured by the citations received). The study is based on Italian scientific production indexed in the Web of Science over the period 2004-2011. The analysis is conducted by comparing the values of four impact indicators for two subsets: (1) the indexed publications with a country's name in the title, keywords or abstract; (2) the remainder of the population, with no country' name. The results obtained both at the general level and by subject category show that publications with a country name systematically receive lower impact values, with the exception of a limited number of subject categories, Also, the incidence of highly-cited articles is lower for the first subset.
A collaborative Ph.D. project, carried out by a doctoral candidate, is a type of collaboration between university and industry. Due to the importance of such projects, researchers have considered different ways to evaluate the success, with a focus on the outputs of these projects. However, what has been neglected is the other side of the coin-the inputs. The main aim of this study is to incorporate both the inputs and outputs of these projects into a more meaningful measure called efficiency. A ratio of the weighted sum of outputs over the weighted sum of inputs identifies the efficiency of a Ph.D. project. The weights of the inputs and outputs can be identified using a multi-criteria decision-making (MCDM) method. Data on inputs and outputs are collected from 51 Ph.D. candidates who graduated from Eindhoven University of Technology. The weights are identified using a new MCDM method called Best Worst Method (BWM). Because there may be differences in the opinion of Ph.D. candidates and supervisors on weighing the inputs and outputs, data for BWM are collected from both groups. It is interesting to see that there are differences in the level of efficiency from the two perspectives, because of the weight differences. Moreover, a comparison between the efficiency scores of these projects and their success scores reveals differences that may have significant implications. A sensitivity analysis divulges the most contributing inputs and outputs.
Science and technology policy academics and evaluators use co-authorship as a proxy for research collaboration despite knowing better. Anecdotally we understand that an individual might be listed as an author on a particular publication for numerous reasons other than research collaboration. Yet because of the accessibility and other advantages of bibliometric data, co-authorship is continuously used as a proxy for research collaboration. In this study, a national (US) sample of academic researchers was asked about their relationships with their closest research collaborators-some with whom respondents reported having co-authored and some with whom respondents reported not co-authoring. The results suggest there are numerous dimensions of co-authorship, the most influential of which is informal and relational and with little (directly) to do with intellectual and/or other resource contributions. Implications for theory and practice are discussed. Generally we advise academics and evaluators interested in tracking co-authorship as a proxy for collaboration to collect additional data beyond those available from popular bibliometric resources because such information means better-informed modeling and better-informed policy and management decision making.
Citation counts can be used as a proxy to study the scholarly communication of knowledge and the impact of research in academia. Previous research has addressed several important factors of citation counts. In this study, we aim to investigate whether there exist quantitative patterns behind citations, and thus provide a detailed analysis of the factors behind successful research. The study involves conducting quantitative analyses on how various features, such as the author's quality, the journal's impact factor, and the publishing year, of a published scientific article affect the number of citations. We carried out full-text searches in Google Scholar to obtain our data set on citation counts. The data set is then set up into panels and used to conduct the proposed analyses by employing a negative binomial regression. Our results show that attributes such as the author's quality and the journal's impact factor do have important contributions to its citations. In addition, an article's citation count does not only depend on its own properties as mentioned above but also depends on the quality, as measured by the number of citations, of its cited articles. That is, the number of citations of a paper seems to be affected by the number of citations of articles that the particular paper cites. This study provides statistical characteristics of how different features of an article affect the number of citations. In addition, it provides statistical evidence that the number of citations of a scientific article depends on the number of citations of the articles it cites.
Average journal impact factor (JIF) percentile is a novel bibliometric indicator introduced by Thomson Reuters. It's of great significance to study the characteristics of its data distribution and relationship with other bibliometric indicators, in order to assess its usefulness as a new bibliometric indicator. The research began by analyzing the meaning of average JIF percentile, and compared its statistical difference with impact factor. Based upon factor analysis, the paper used multivariate regression and quantile regression to study the relationship between average JIF percentile and other bibliometric indicators. Results showed that average JIF percentile had changed the statistical characteristic of impact factor, e.g. improved the relative value of impact factor, having smaller variation coefficient and distribution closer to normal distribution. Because it's non-parametric transformation, it cannot be used to measure the relative gap between journals; Average JIF percentile had the highest regression coefficient with journal impact, followed by timeliness and lastly the citable items; The lower the average JIF percentile, the higher the elastic coefficient of journal impact; When average JIF percentile was extremely high or extremely low, citable items were not correlated with the average JIF percentile at all; When average JIF percentile was low, elastic coefficient of timeliness was even higher; Average JIF percentile was not a proper indicator for multivariate journal evaluation; Average JIF percentile had both the advantages and disadvantages of impact factor, and thus had the same limitation in applying as the impact factor.
Disingenuously manipulating impact factor is the significant way to harm the fairness of impact factor. That behavior should be banned with effective means. In this paper, data mining techniques are used to solve this problem. Firstly, ten features are collected into feature set for nine normal journals and nine abnormal journals from 2005 to 2014. Then, three types of strong classification methods, k-nearest neighbor, decision tree and support vector machine are adopted to learn the well classification models. Moreover, eight algorithms are run on the data set to find out suitable methods for detecting impact factor manipulation in our experiment. Finally, two excellent algorithms in performance with precisions higher than 85 % are picked out and used to predict new journal samples. According to the results, random forest and one type of support vector machine are relatively more suitable than k-nearest neighbor in this case of detecting abnormal journals. When using those two methods to recognize other 90 journals in the field of nine disciplines from 2007 to 2014, they are verified to be broadly applicable. Unfortunately, four journals are recognized to be manipulated in some years. Therefore, in this paper, two data mining methods are discovered to be intelligent and automatic ways to detect and ban impact factor manipulation for journal managers.
This study compares Spanish and UK research in eight subject fields using a range of bibliometric and social media indicators. For each field, lists of Spanish and UK journal articles published in the year 2012 and their citation counts were extracted from Scopus. The software Webometric Analyst was then used to extract a range of altmetrics for these articles, including patent citations, online presentation mentions, online course syllabus mentions, Wikipedia mentions and Mendeley reader counts and Altmetric.com was used to extract Twitter mentions. Results show that Mendeley is the altmetric source with the highest coverage, with 80 % of sampled articles having one or more Mendeley readers, followed by Twitter (34 %). The coverage of the remaining sources was lower than 3 %. All of the indicators checked either have too little data or increase the overall difference between Spain and the UK and so none can be suggested as alternatives to reduce the bias against Spain in traditional citation indexes.
One of the most popular methods to measure the quality of journal is impact factor. However this may not be the only criteria to evaluate a journal. It has its own limitations. The present work gives an alternative model to evaluate the quality of a journal. To test this model the present work used 17 popular journals on Social Sciences specifically in the areas of Economics, Political Science, and Sociology published in India during 2012-2014. The test proved to be successful. This model can be applied to any journal to find its quality. Three approaches have been considered, namely, physical presentation, reference studies and citation analysis. The model, named journal quality point (JQP), is suggested as a feasible technique to evaluate the quality of a journal.
Recently, we introduced the CitedReferencesExplorer (CRExplorer at http://www.crexplorer.net). The program was primarily developed to identify those publications in a field, a topic or by a researcher which have been frequently cited. This Letter to the Editor describes the features of the new release of the CRExplorer.
The prediction of the long-term impact of a scientific article is challenging task, addressed by the bibliometrician through resorting to a proxy whose reliability increases with the breadth of the citation window. In the national research assessment exercises using metrics the citation window is necessarily short, but in some cases is sufficient to advise the use of simple citations. For the Italian VQR 2011-2014, the choice was instead made to adopt a linear weighted combination of citations and journal metric percentiles, with weights differentiated by discipline and year. Given the strategic importance of the exercise, whose results inform the allocation of a significant share of resources for the national academic system, we examined whether the predictive power of the proposed indicator is stronger than the simple citation count. The results show the opposite, for all discipline in the sciences and a citation window above 2 years.
Evolution of low-energy nuclear physics publications over the last 120 years has been analyzed using nuclear physics databases. An extensive study of Nuclear Science References, Experimental Nuclear Reaction Data (EXFOR), and Evaluated Nuclear Structure Data File (ENSDF) contents provides a unique picture of refereed and non-refereed nuclear physics references. Significant fractional contributions of non-refereed reports, private communications and conference proceedings in EXFOR and ENSDF databases in the 1970's reflect extensive experimental campaigns and an insufficient number of research journals. This trend has been reversed in recent years because the number of measurements is much lower, while number of journals is higher. In addition, nuclear physics results are mainly published in a limited number of journals, such as Physical Review C and Nuclear Physics A. In the present work, historic publication trends and averages have been extracted and analyzed using nuclear data mining techniques. The results of this study and implications are discussed and conclusions presented.
For the biomedical sciences, the Medical Subject Headings (MeSH) make available a rich feature which cannot currently be merged properly with widely used citing/cited data. Here, we provide methods and routines that make MeSH terms amenable to broader usage in the study of science indicators: using Web-of-Science (WoS) data, one can generate the matrix of citing versus cited documents; using PubMed/MEDLINE data, a matrix of the citing documents versus MeSH terms can be generated analogously. The two matrices can also be reorganized into a 2-mode matrix of MeSH terms versus cited references. Using the abbreviated journal names in the references, one can, for example, address the question whether MeSH terms can be used as an alternative to WoS Subject Categories for the purpose of normalizing citation data. We explore the applicability of the routines in the case of a research program about the amyloid cascade hypothesis in Alzheimer's disease. One conclusion is that referenced journals provide archival structures, whereas MeSH terms indicate mainly variation (including novelty) at the research front. Furthermore, we explore the option of using the citing/cited matrix for main-path analysis as a by-product of the software.
This work examines whether the macroeconomic divide between northern and southern Italy is also present at the level of higher education. The analysis confirms that the research performance in the sciences of the professors in the south is on average less than that of the professors in the north, and that this gap does not show noticeable variations at the level of gender or academic rank. For the universities, the gap is still greater. The study analyzes some possible determinants of the gap, and provides some policy recommendations for its reduction.
The literature on academic writing suggests that writing in pairs leads to more readable papers than writing alone. We wondered whether academic blog posts written alone or in pairs would vary in style. We collected a corpus of 104 posts published with the LSE Impact of the Social Sciences blog. We found no differences in average sentence length between single- and co-authored posts. However, the posts written in pairs were slightly less readable than the single-authored posts, which challenges the current view on the advantages of writing in pairs.
Bibliometric indicators such as journal impact factors, h-indices, and total citation counts are algorithmic artifacts that can be used in research evaluation and management. These artifacts have no meaning by themselves, but receive their meaning from attributions in institutional practices. We distinguish four main stakeholders in these practices: (1) producers of bibliometric data and indicators; (2) bibliometricians who develop and test indicators; (3) research managers who apply the indicators; and (4) the scientists being evaluated with potentially competing career interests. These different positions may lead to different and sometimes conflicting perspectives on the meaning and value of the indicators. The indicators can thus be considered as boundary objects which are socially constructed in translations among these perspectives. This paper proposes an analytical clarification by listing an informed set of (sometimes unsolved) problems in bibliometrics which can also shed light on the tension between simple but invalid indicators that are widely used (e.g., the h-index) and more sophisticated indicators that are not used or cannot be used in evaluation practices because they are not transparent for users, cannot be calculated, or are difficult to interpret.
The accuracy of interdisciplinarity measurements depends on how well the data is used for this purpose and whether it can meaningfully inform about work that crosses disciplinary domains. At present, there are no ad hoc databases compiling information only and exclusively about interdisciplinary research, and those interested in assessing it have to reach out to existing databases that have been compiled for other purposes. Karlovcec and Mladenic (Scientometrics 102:433-454, 2015) saw an opportunity in a national database that brings together information meant to be used for assessing the scientific performance of the Slovene academic community, which they used to obtain information that was then applied to measure interdisciplinarity. However, the context and purpose for which databases are produced have certain implications on their use. In their study, the authors overlooked the social and political context within which that specific database was created, is maintained and is used for (evaluation of research performance). This resulted in an incomplete interpretation of the results obtained and description of the current situation. This commentary addresses two aspects that warrant further consideration: one pertains to the limitations of the dataset itself and the measures used to debunk these, while the second pertains to the line of reasoning behind the integration and use of IDR measures in this study.
This study aims to gain a better understanding of communication patterns in different publication types and the applicability of the Book Citation Index (BKCI) for building indicators for use in both informetrics studies and research evaluation. The authors investigated the differences not only in citation impact between journal and book literature, but also in citation patterns between edited books and their monographic authored counterparts. The complete 2005 volume of the Web of Science Core Collection database including the three journal databases and the BKCI has been processed as source documents. The results of this study show that books are more heterogeneous information sources and addressed to more heterogeneous target groups than journals. Comparatively, the differences between edited and authored books in terms of the citation impact are not so impressive as books versus journals. Advanced models and indicators which have been developed for periodicals also work for books-however with some limitations.
Some say that world science has become more 'applied', or at least more 'application-oriented', in recent years. Replacing the ill-defined distinction between 'basic research' and 'applied research', we introduce 'research application orientation' domains as an alternative conceptual and analytical framework for examining research output growth patterns. To distinguish possible developmental trajectories we define three institutional domains: 'university', 'industry', 'hospitals'. Our macro-level bibliometric analysis takes a closer look at general trends within and across some 750 of the world's largest research-intensive universities. To correct for database changes, our time-series analysis was applied to both a fixed journal set (same research journals and conference proceedings over time) and a dynamic journal set (changing set of publication outlets). We find that output growth in the 'hospital research orientation' has significantly outpaced the other two application domains, especially since 2006/2007. This happened mainly because of the introduction of new publication outlets in the WoS, but also partially because some universities-especially in China-seem to have become more visible in this domain. Our analytical approach needs further broadening and deepening to provide a more definitive answer whether hospitals and the medical sector are becoming increasingly dominant as a domain of scientific knowledge production and an environment for research applications.
The present paper examines the relationships between the major research organizations in Germany. Special focus is given to the three research fields natural sciences, engineering and technology, and medical and health sciences for the publication period 2007-2012. The results not only provide understanding of collaboration ties, but also of preference structures with regard to referencing and citation behavior and the citation impact of co-authored publications. The mean normalized citation rate and the PP(top 10 %) indicator show that inter-organizational co-authorship just as international co-authorship is rewarded with higher citation impact as opposed to intra-organizational publications.
The German Excellence Initiative started in 2006 as a public funding program of crucial importance for German universities. Since this time, several studies on different aspects of the program have been conducted, but there have been no analyses using bibliometric methods to measure the direct effects of funding-which is apparently due to the fact that publications resulting from the funding program are not publicly and comprehensively documented. This paper uses the concept of highly cited publications to measure excellent research, and explores two methodological approaches in order to attribute highly cited publications to the Excellence Initiative. To this end, the paper focuses on publications produced by the clusters of excellence (CoEs). The clusters of excellence constitute only one of three funding lines, but receive 60 % of the total funding of the excellence program and form the core research units of the Excellence Initiative. The highly cited publications of the CoEs are identified via self-selected lists of publications in the CoE renewal proposals and via a funding acknowledgement analysis. The validity of both data sources is analyzed comparatively. Based on the objectives of the Excellence Initiative, its effects on the level of funded clusters, universities and the overall German research system are explored. The bibliometric analysis gives evidence that the funding program has succeeded in concentrating excellent research and fostering collaborations between universities and the non-university research sector, but has not caused massive changes to the overall German research system.
The aim of this study was to provide a framework to evaluate bibliometric indicators as decision support tools from a decision making perspective and to examine the information value of early career publication rate as a predictor of future productivity. We used ROC analysis to evaluate a bibliometric indicator as a tool for binary decision making. The dataset consisted of 451 early career researchers in the mathematical sub-field of number theory. We investigated the effect of three different definitions of top performance groups-top 10, top 25, and top 50 %; the consequences of using different thresholds in the prediction models; and the added prediction value of information on early career research collaboration and publications in prestige journals. We conclude that early career performance productivity has an information value in all tested decision scenarios, but future performance is more predictable if the definition of a high performance group is more exclusive. Estimated optimal decision thresholds using the Youden index indicated that the top 10 % decision scenario should use 7 articles, the top 25 % scenario should use 7 articles, and the top 50 % should use 5 articles to minimize prediction errors. A comparative analysis between the decision thresholds provided by the Youden index which take consequences into consideration and a method commonly used in evaluative bibliometrics which do not take consequences into consideration when determining decision thresholds, indicated that differences are trivial for the top 25 and the 50 % groups. However, a statistically significant difference between the methods was found for the top 10 % group. Information on early career collaboration and publication strategies did not add any prediction value to the bibliometric indicator publication rate in any of the models. The key contributions of this research is the focus on consequences in terms of prediction errors and the notion of transforming uncertainty into risk when we are choosing decision thresholds in bibliometricly informed decision making. The significance of our results are discussed from the point of view of a science policy and management.
University rankings are typically presenting their results as league tables with more emphasis on final scores and positions, than on the clarification of why the universities are ranked as they are. Finding out the latter is often not possible, because final scores are based on weighted indicators where raw data and the processing of these are not publically available. In this study we use a sample of Scandinavian universities, explaining what is causing differences between them in the two most influential university rankings: Times Higher Education and the Shanghai-ranking. The results show that differences may be attributed to both small variations on what we believe are not important indicators, as well as substantial variations on what we believe are important indicators. The overall aim of this paper is to provide a methodology that can be used in understanding universities' different ranks in global university rankings.
This paper presents an analysis of resource acquisition and profile development of institutional units within universities. We conceptualize resource acquisition as a two-level nested process, where units compete for external resources based on their credibility, but at the same time are granted faculty positions from the larger units (department) to which they belong. Our model implies that the growth of university units is constrained by the decisions of their parent department on the allocation of professorial positions, which represent the critical resource for most units' activities. In our field of study this allocation is largely based on educational activities, and therefore, units with high scientific credibility are not necessarily able to grow, despite an increasing reliance on external funds. Our paper therefore sheds light on the implications that the dual funding system of European universities has for the development of units, while taking into account the interaction between institutional funding and third-party funding.
Public institutes for testing and research called Kosetsushi constitute an important component of regional innovation policies in Japan. They are organized as a technology diffusion program to help small and medium-sized enterprises (SMEs) improve productivity through various technology transfer activities. Using comprehensive patent data, this study quantitatively evaluates technology transfer activities of Kosetsushi from the perspective of sectoral innovation systems. The key findings can be summarized as follows. First, local SMEs' technological portfolios (the distribution of patents across technological fields) indicate a better fit with those of Kosetsushi than with those of local universities. This tendency is salient for manufacturing Kosetsushi. Second, Kosetsushi collaborate on research with local SMEs compared to local universities. This tendency is salient for manufacturing Kosetsushi. Third, in regions where SMEs' innovative activities concentrate in biotechnology, Kosetsushi are likely to engage in licensing. In regions where SMEs' innovative activities concentrate in mechanical engineering, Kosetsushi are likely to engage in technical consultation. Fourth, the successful commercialization of Kosetsushi patents relies on both understanding of technological needs of local SMEs and upgrading scientific quality of Kosetsushi researchers. Policy and research implications are discussed.
In a previous article of ours, we explained the reasons why the MNCS and all similar per-publication citation indicators should not be used to measure research performance, whereas efficiency indicators (output to input) such as the FSS are valid indicators of performance. The problem frequently indicated in measuring efficiency indicators lies in the availability of input data. If we accept that such data are inaccessible, and instead resort to per-publication citation indicators, the question arises as to what extent institution performance rankings by MNCS are different from those by FSS (and so what effects such results could have on policy-makers, managers and other users of the rankings). Contrasting the 2008-2012 performance by MNCS and FSS of Italian universities in the Sciences, we try to answer that question at field, discipline, and overall university level. We present the descriptive statistics of the shifts in rank, and the correlations of both scores and ranks. The analysis reveals strong correlations in many fields but weak correlations in others. The extent of rank shifts is never negligible: a number of universities shift from top to non-top quartile ranks. (C) 2016 Elsevier Ltd. All rights reserved.
A number of journal classification systems have been developed in bibliometrics since the launch of the Citation Indices by the Institute of Scientific Information (ISI) in the 1960s. These systems are used to normalize citation counts with respect to field-specific citation patterns. The best known system is the so-called "Web-of-Science Subject Categories" (WCs). In other systems papers are classified by algorithmic solutions. Using the Journal Citation Reports 2014 of the Science Citation Index and the Social Science Citation Index (n of journals = 11,149), we examine options for developing a new system based on journal classifications into subject categories using aggregated journal-journal citation data. Combining routines in VOSviewer and Pajek, a tree-like classification is developed. At each level one can generate a map of science for all the journals subsumed under a category. Nine major fields are distinguished at the top level. Further decomposition of the social sciences is pursued for the sake of example with a focus on journals in information science (LIS) and science studies (STS). The new classification system improves on alternative options by avoiding the problem of randomness in each run that has made algorithmic solutions hitherto irreproducible. Limitations of the new system are discussed (e.g. the classification of multi-disciplinary journals). The system's usefulness for field-normalization in bibliometrics should be explored in future studies. (C) 2016 Elsevier Ltd. All rights reserved.
This study estimates the development of hybrid open access (OA), i.e. articles published openly on the web within subscription-access journals. Included in the study are the five largest publishers of scholarly journals; Elsevier, Springer, Wiley-Blackwell, Taylor & Francis, and Sage. Since no central indexing or standardized metadata exists for identifying hybrid OA an explorative bottom-up methodological approach was developed. The individual search and filtering features of each publisher website and a-priori availability of data were leveraged to the extent possible. The results indicate a strong sustained growth in the volume of articles published as hybrid OA during 2007 (666 articles) to 2013 (13994 articles). The share of hybrid articles was at 3.8% of total published articles for the period of 2011-2013 for journals with at least one identified hybrid OA article. Journals within the Scopus discipline categorization of Health and Life Sciences, in particular the field of Medicine, were found to be among the most frequent publishers of hybrid OA content. The study surfaces the many methodological challenges involved in obtaining metrics regarding hybrid OA, a growing business for journal publishers as science policy pressures for reduced access barriers to research publications. (C) 2016 The Authors. Published by Elsevier Ltd.
In the last decade, a growing number of studies focused on the qualitative/quantitative analysis of bibliometric-database errors. Most of these studies relied on the identification and (manual) examination of relatively limited samples of errors. Using an automated procedure, we collected a large corpus of more than 10,000 errors in the two multidisciplinary databases Scopus and Web of Science (WoS), mainly including articles in the Engineering-Manufacturing field. Based on the manual examination of a portion (of about 10%) of these errors, this paper provides a preliminary analysis and classification, identifying similarities and differences between Scopus and WoS. The analysis reveals interesting results, such as: (i) although Scopus seems more accurate than WoS, it tends to forget to index more papers, causing the loss of the relevant citations given/obtained, (ii) both databases have relatively serious problems in managing the so-called Online-First articles, and (iii) lack of correlation between databases, regarding the distribution of the errors in several error categories. The description is supported by practical examples concerning a variety of errors in the Scopus and WoS databases. (C) 2016 Elsevier Ltd. All rights reserved.
Author co-citation analysis (ACA) has been widely used for identifying the subject disciplines of authors. Citations can reveal the explicit relationship between authors as well as their subject research fields. However, previous studies have seldom considered citation contents that convey useful implicit information on the authors or the influence of the links between the authors' subject fields by taking citation locations into account. This study aims to reveal the implicit relationship in the authors' subject disciplines by considering both citation contents and proximity. To this end, the researchers propose a new ACA method, called content- and proximity-based author co-citation analysis (CPACA). For the study, we extracted citation sentences and locations from full-text articles in the oncology field. The top 15 journals on oncology in Journal Citation Reports were selected, and 6,360 full-text articles from PubMed Central were collected. The results show that the proposed method enables the identification of distinct sub-fields of authors to represent authors' subject relatedness. (C) 2016 Elsevier Ltd. All rights reserved.
In this study, we propose an unconnected component inclusion technique (UCIT) for patent citation analysis. Our method generates a cluster solution that includes unconnected and connected components of a direct citation network, enabling a more complete analysis of the technology fields. Case studies of Internet of Things-related technologies were conducted to test the effectiveness of our proposed method. We observed that UCIT increased the number of nodes especially in relatively small networks. Additionally, we analyzed how the clusters changed by adding unconnected patents to the citation network and identified four types of clustering phenomenon. Our method can be used by patent officers, R&D managers, and policy makers when they want to understand the technology landscape better. (C) 2016 Elsevier Ltd. All rights reserved.
Citation Delay (D) introduced by Wang et al. (2015) is a measure of citation durability of articles reflecting information on the entire citation life-time. The characteristics of the measure and relationships of it to other article characteristics are examined in the six different fields using the citation data over 15 years of the articles published in 2000 in these fields. D distributes normally with good approximation and is not so much dependent on the subject field as the citation count. Although articles with higher D (more lately cited) tend to gain more citations in their life-time, this relationship is not linear but the mean of citations reaches a maximum at a certain value of D. Multiple regression analysis explaining D showed that articles with a higher Price index (i.e. citing more recent references) will receive most of the citations relatively earlier and that there is a weak tendency that articles containing more figures are cited earlier and those containing more tables are cited later. A seemingly contradictory result is found that more highly cited articles tend to have higher citation durability in individual journals while high-impact journals tend to include more articles with lower citation durability in higher proportions. (C) 2016 Elsevier Ltd. All rights reserved.
Classifying publication venues into top-tier or non-top-tier is quite subjective and can be debatable at times. In this paper, we propose ConfAssist, a novel assisting framework for conference categorization that aims to address the limitations in the existing systems and portals for venue classification. We start with the hypothesis that top-tier conferences are much more stable than other conferences and the inherent dynamics of these groups differs to a very large extent. We identify various features related to the stability of conferences that might help us separate a top-tier conference from the rest of the lot. While there are many clear cases where expert agreement can be almost immediately achieved as to whether a conference is a top-tier or not, there are equally many cases that can result in a conflict even among the experts. ConfAssist tries to serve as an aid in such cases by increasing the confidence of the experts in their decision. An analysis of 110 conferences from 22 sub fields of computer science clearly favors our hypothesis as the top-tier conferences are found to exhibit much less fluctuations in the stability related features than the non-top tier ones. We evaluate our hypothesis using systems based on conference categorization. For the evaluation, we conducted human judgment survey with 28 domain experts. The results are impressive with 85.18% classification accuracy. We also compare the dynamics of the newly started conferences with the older conferences to identify the initial signals of popularity. The system is applicable to any conference with atleast 5 years of publication history. (C) 2016 Elsevier Ltd. All rights reserved.
An earlier publication (Grossetti et al., 2014) has established that we are attending a decreasing concentration of scientific activities within "world-cities". Given that more and more cities and countries are contributing to the world production of knowledge, this article analyses the evolution of the world collaboration network both at the domestic and international levels during the 2000s. Using data from the Science Citation Index Expanded, scientific authors' addresses are geo-localized and grouped by urban areas. Our data suggests that interurban collaborations within countries increased together with international linkages. In most countries, domestic collaborations increased faster than international collaborations. Even among the top collaborating cities, sometimes referred to as "world cities", the share of domestic collaborations has gained momentum. Our results suggest that, contrary to common beliefs about the globalization process, national systems of research have been strengthening during the 2000s. (C) 2016 Elsevier Ltd. All rights reserved.
This paper explores a 7-stage cluster methodology as a process to identify appropriate indicators for evaluation of individual researchers at a disciplinary and seniority level. Publication and citation data for 741 researchers from 4 disciplines was collected in Web of Science. Forty-four indicators of individual researcher performance were computed using the data. The clustering solution was supported by continued reference to the researcher's curriculum vitae, an effect analysis and a risk analysis. Disciplinary appropriate indicators were identified and used to divide the researchers into four groups; low, middle, high and extremely high performers. Seniority-specific indicators were not identified. The practical importance of the recommended disciplinary appropriate indicators is concerning. Our study revealed several critical concerns that should be investigated in the application of statistics in research evaluation. The strength of the 7-stage cluster methodology is that it makes clear that in the evaluation of individual researchers, statistics cannot stand alone. The methodology is reliant on contextual information to verify the bibliometric values and cluster solution. It is important to do studies that investigate the usefulness of statistical evaluation methodologies to help us as a community learn more about the appropriateness of particular bibliometric indicators in the analysis of different researcher profiles. (C) 2016 Elsevier Ltd. All rights reserved.
In this submission we introduce the notion of under-cited influential publications and show that these publications are like "wake-up switches" for significant follow-up research. To be considered an under-cited influential article we require an article to meet three requirements. One is on the level of received number of citations (first generation citations), while the two other ones take subsequent citation generations into account. In general terms these three conditions are: 1) The article is reasonably well-cited (a basic requirement to be influential) 2) Citations of citations (second generation citations) are rather highly cited, so that the original one is influential in an indirect way (a more refined token of influence); 3) Given condition two, the article received fewer citations than expected (being under cited). We claim that the phenomenon of under-cited influential publications is important and should receive more attention. Moreover, one may say that under-cited influential publications belong to the group of truly foundational scientific discoveries acting as promoters of influential research as shown by significant follow-up research. (C) 2016 Elsevier Ltd. All rights reserved.
With the ever-increasing scientific literature, improving the efficiency of searching bibliographic data has become an important issue. With a lack of support of current bibliographic information retrieval systems in expressing complicated information needs, getting relevant bibliographic data is a demanding task. In this paper, we propose a visual graph query interface for bibliographic information retrieval. Through this interface, users can formulate bibliographic queries by interacting with a graph. Visual graph queries use a set of nodes with constraints and links among nodes to represent explicit and precise bibliographic information needs. The proposed visual graph query interface allows users to formulate several complex bibliographic queries (e.g., bibliographic coupling) that are not attainable in current major bibliographic information retrieval systems. In addition, the proposed interface requires less number of queries in completing everyday bibliographic search tasks. (C) 2016 Elsevier Ltd. All rights reserved.
Similarity measures are fundamental tools for identifying relationships within or across patent portfolios. Many bibliometric indicators are used to determine similarity measures; for example, bibliographic coupling, citation and co-citation, and co-word distribution. This paper aims to construct a hybrid similarity measure method based on multiple indicators to analyze patent portfolios. Two models are proposed: categorical similarity and semantic similarity. The categorical similarity model emphasizes international patent classifications (IPCs), while the semantic similarity model emphasizes textual elements. We introduce fuzzy set routines to translate the rough technical (sub-) categories of IPCs into defined numeric values, and we calculate the categorical similarities between patent portfolios using membership grade vectors. In parallel, we identify and highlight core terms in a 3 level tree structure and compute the semantic similarities by comparing the tree-based structures. A weighting model is designed to consider: 1) the bias that exists between the categorical and semantic similarities, and 2) the weighting or integrating strategy for a hybrid method. A case study to measure the technological similarities between selected firms in China's medical device industry is used to demonstrate the reliability our method, and the results indicate the practical meaning of our method in a broad range of informetric applications. (C) 2016 Elsevier Ltd. All rights reserved.
Speed and breadth have been suggested as two advantages of altmetrics over citation counts since they might estimate impact immediately after publication and beyond the academic community of authors. In order to investigate the validity of these claims, we performed a fifteen-month longitudinal study of the evolution of bookmarks in Mendeley for a set of 3813 articles published in Library and Information Science in 2014. Results show that 87.6% of the literature was bookmarked at least once by May 2016 whereas only 55% was cited. The correlation between bookmarks and citations was moderate and the overlap between the most frequently bookmarked and the most frequently cited papers increased over time. A significant share of the bookmarks were made by students and professionals, although the shares of bookmarks made by different categories of users changed as time went by. Bookmarks made by users based in less wealthy nations also increased over time. The study is limited by the incomplete information provided by Mendeley regarding users' academic status and country of residence, the upgrades of the software used in data collection, and the fact that one year is a rather long publication period for a longitudinal study of a fast changing feature like bookmarks. (C) 2016 Elsevier Ltd. All rights reserved.
Measures of research productivity (e.g. peer reviewed papers per researcher) is a fundamental part of bibliometric studies, but is often restricted by the properties of the data available. This paper addresses that fundamental issue and presents a detailed method for estimation of productivity (peer reviewed papers per researcher) based on data available in bibliographic databases (e.g. Web of Science and Scopus). The method can, for example, be used to estimate average productivity in different fields, and such field reference values can be used to produce field adjusted production values. Being able to produce such field adjusted production values could dramatically increase the relevance of bibliometric rankings and other bibliometric performance indicators. The results indicate that the estimations are reasonably stable given a sufficiently large data set. (C) 2016 Elsevier Ltd. All rights reserved.
This paper studies the so-called abnormal phenomenon of delayed recognition in bibliometrics and focuses on the first step in quantitatively measuring this phenomenon. As bibliometric analysis of a paper's recognition and influence is an uncertain and extended process, proper calculation of delayed recognition and "sleeping beauty" publications has limitations in current scientometric studies, such as restricted application indicators, scope, and complex calculation methods. This study suggests a solution for depicting the citation delay phenomenon of individual papers that avoids dividing them into different periods, is applicable to all papers with various types of citation curves, and is easy to calculate. Notably, this approach advocates using an uneven weighted summation based on earlier and later citation years when analyzing an individual paper's citation data. It demonstrates that the intrinsic relation between two independent indicators of citation delay and Gs index is based on the same logic of applying uneven weights to sum up yearly citations. This paper also recommends that simultaneous application of the new indicator D-a and final citation numbers can efficiently identify those delayed recognition papers, and that the criterion for selecting papers can be adjusted by the value of a. (C) 2016 Elsevier Ltd. All rights reserved.
A number of bibliometric studies have shown that many factors impact citation counts besides the scientific quality. This paper used a large bibliometric dataset to investigate the impact of the different statistical properties of author-selected keywords and the network attributes of their co-occurrence networks on citation counts. Four statistical properties of author-selected keywords were considered: (i) Keyword growth (i.e., the relative increase or decrease in the presence statistics of an underlying keyword over a given period of time); (ii) Keyword diversity (i.e., the level of variety in a set of author-selected keywords); (iii) Number of keywords; and (iv) Percentage of new keywords. This study also considered network centrality which is a network attribute from the keyword co-occurrence network. Network centrality was calculated using the average of three basic network centrality measures: degree, closeness and betweenness centrality. A correlation and regression analysis showed that all of these factors had a significant positive relation with citation counts except the percentage of new keywords that had a significant negative relation. However, when the effect of four potential control variables (i.e., the number of article authors, the length of an article, the quality of the journal in which the article was published and the length of the title of an article) were controlled, only four variables related to author-selected keywords showed a significant relation with citation counts. Keyword growth, number of keywords and network centrality showed a positive relation with citation counts; whereas, the percentage of new keywords showed a negative relation with citation counts. The implications of these findings are discussed in this article. (C) 2016 Elsevier Ltd. All rights reserved.
The analysis of bibliometric networks, such as co-authorship, bibliographic coupling, and co-citation networks, has received a considerable amount of attention. Much less attention has been paid to the construction of these networks. We point out that different approaches can be taken to construct a bibliometric network. Normally the full counting approach is used, but we propose an alternative fractional counting approach. The basic idea of the fractional counting approach is that each action, such as co-authoring or citing a publication, should have equal weight, regardless of for instance the number of authors, citations, or references of a publication. We present two empirical analyses in which the full and fractional counting approaches yield very different results. These analyses deal with co-authorship networks of universities and bibliographic coupling networks of journals. Based on theoretical considerations and on the empirical analyses, we conclude that for many purposes the fractional counting approach is preferable over the full counting one. (C) 2016 Elsevier Ltd. All rights reserved.
The Scientific and Technological Research Council of Turkey (Tubitak) gives subsidies to researchers for their publications. Tubitak groups journals into subject categories, and gives equal subsidies to publications from journals with comparable standing. This formulation aims at interfield equality among journals. Unfortunately, interfield equality among journals does not necessarily lead to interfield equality among researchers because there are interfield productivity differences. We show that chemists in prestigious Turkish universities on average receive 4.30 times more subsidies than economists. We also apply the subsidy formula to the publications of the researchers from world's most prestigious universities. In this case, the inequality between chemists and economists is less pronounced. (C) 2016 Elsevier Ltd. All rights reserved.
Citations between scientific papers and related bibliometric indices, such as the h-index for authors and the impact factor for journals, are being increasingly used - often in controversial ways - as quantitative tools for research evaluation. Yet, a fundamental research question remains still open: to which extent do quantitative metrics capture the significance of scientific works? We analyze the network of citations among the 449,935 papers published by the American Physical Society (APS) journals between 1893 and 2009, and focus on the comparison of metrics built on the citation count with network-based metrics. We contrast five article-level metrics with respect to the rankings that they assign to a set of fundamental papers, called Milestone Letters, carefully selected by the APS editors for "making long-lived contributions to physics, either by announcing significant discoveries, or by initiating new areas of research". A new metric, which combines PageRank centrality with the explicit requirement that paper score is not biased by paper age, is the best-performing metric overall in identifying the Milestone Letters. The lack of time bias in the new metric makes it also possible to use it to compare papers of different age on the same scale. We find that network-based metrics identify the Milestone Letters better than metrics based on the citation count, which suggests that the structure of the citation network contains information that can be used to improve the ranking of scientific publications. The methods and results presented here are relevant for all evolving systems where network centrality metrics are applied, for example the World Wide Web and online social networks. An interactive Web platform where it is possible to view the ranking of the APS papers by rescaled PageRank is available at the address http://www.sciencenow.info. (C) 2016 Elsevier Ltd. All rights reserved.
Bibliometricians face several issues when drawing and analyzing samples of citation records for their research. Drawing samples that are too small may make it difficult or impossible for studies to achieve their goals, while drawing samples that are too large may drain resources that could be better used for other purposes. This paper considers three common situations and offers advice for dealing with each. First, an entire population of records is available for an institution. We argue that, even though all records have been collected, the use of inferential statistics, significance testing, and confidence intervals is both common and desirable. Second, because of limited resources or other factors, a sample of records needs to be drawn. We demonstrate how power analyses can be used to determine in advance how large the sample needs to be to achieve the study's goals. Third, the sample size may already be determined, either because the data have already been collected or because resources are limited. We show how power analyses can again be used to determine how large effects need to be in order to find effects that are statistically significant. Such information can then help bibliometricians to develop reasonable expectations as to what their analysis can accomplish. While we focus on issues of interest to bibliometricians, our recommendations and procedures can easily be adapted for other fields of study. (C) 2015 Elsevier Ltd. All rights reserved.
Although the information-seeking literature has tended to focus upon the selection and use of inanimate objects as information sources, this research follows the more recent trend of investigating how individuals evaluate and use interpersonal information sources. By drawing from the structural, relational, and cognitive elements of social capital theory to inform antecedents to information quality and source accessibility, a research model is developed and tested. For interpersonal information sources, information quality is the key determinant of source use. Perceptions of information quality and accessibility of an interpersonal source are shown to be influenced by boundary spanning, transactive memory, and content type. Implications and prescriptions for future research are discussed.
Keeping up to date with research developments is a central activity of academic researchers, but researchers face difficulties in managing the rapid growth of available scientific information. This study examined how researchers stay up to date, using the information journey model as a framework for analysis and investigating which dimensions influence information behaviors. We designed a 2-round study involving semistructured interviews and prototype testing with 61 researchers with 3 levels of seniority (PhD student to professor). Data were analyzed following a semistructured qualitative approach. Five key dimensions that influence information behaviors were identified: level of seniority, information sources, state of the project, level of familiarity, and how well defined the relevant community is. These dimensions are interrelated and their values determine the flow of the information journey. Across all levels of professional expertise, researchers used similar hard (formal) sources to access content, while soft (interpersonal) sources were used to filter information. An important "pain point" that future information tools should address is helping researchers filter information at the point of need.
This article contributes to the growing body of research that explores the significance of context in health information behavior. Specifically, through the lens of trust judgments, it demonstrates that gender is a determinant of the information evaluation process. A questionnaire-based survey collected data from adults regarding the factors that influence their judgment of the trustworthiness of online health information. Both men and women identified credibility, recommendation, ease of use, and brand as being of importance in their trust judgments. However, women also take into account style, while men eschew this for familiarity. In addition, men appear to be more concerned with the comprehensiveness and accuracy of the information, the ease with which they can access it, and its familiarity, whereas women demonstrate greater interest in cognition, such as the ease with which they can read and understand the information. These gender differences are consistent with the demographic data, which suggest that: women consult more types of sources than men; men are more likely to be searching with respect to a long-standing health complaint; and, women are more likely than men to use tablets in their health information seeking. Recommendations for further research to better inform practice are offered.
Citations from patents to scientific publications provide useful evidence about the commercial impact of academic research, but automatically searchable databases are needed to exploit this connection for large-scale patent citation evaluations. Google covers multiple different international patent office databases but does not index patent citations or allow automatic searches. In response, this article introduces a semiautomatic indirect method via Bing to extract and filter patent citations from Google to academic papers with an overall precision of 98%. The method was evaluated with 322,192 science and engineering Scopus articles from every second year for the period 1996-2012. Although manual Google Patent searches give more results, especially for articles with many patent citations, the difference is not large enough to be a major problem. Within Biomedical Engineering, Biotechnology, and Pharmacology & Pharmaceutics, 7% to 10% of Scopus articles had at least one patent citation but other fields had far fewer, so patent citation analysis is only relevant for a minority of publications. Low but positive correlations between Google Patent citations and Scopus citations across all fields suggest that traditional citation counts cannot substitute for patent citations when evaluating research.
We analyze 18- million rows of Wi-Fi access logs collected over a 1-year period from over 120,000 anonymized users at an inner city shopping mall. The anonymized data set gathered from an opt-in system provides users' approximate physical location as well as web browsing and some search history. Such data provide a unique opportunity to analyze the interaction between people's behavior in physical retail spaces and their web behavior, serving as a proxy to their information needs. We found that (a) there is a weekly periodicity in users' visits to the mall; (b) people tend to visit similar mall locations and web content during their repeated visits to the mall; (c) around 60% of registered Wi-Fi users actively browse the web, and around 10% of them use Wi-Fi for accessing web search engines; (d) people are likely to spend a relatively constant amount of time browsing the web while the duration of their visit may vary; (e) the physical spatial context has a small, but significant, influence on the web content that indoor users browse; and (f) accompanying users tend to access resources from the same web domains.
Music Information Retrieval (MIR) evaluation has traditionally focused on system-centered approaches where components of MIR systems are evaluated against predefined data sets and golden answers (i.e., ground truth). There are two major limitations of such system-centered evaluation approaches: (a) The evaluation focuses on subtasks in music information retrieval, but not on entire systems and (b) users and their interactions with MIR systems are largely excluded. This article describes the first implementation of a holistic user-experience evaluation in MIR, the MIREX Grand Challenge, where complete MIR systems are evaluated, with user experience being the single overarching goal. It is the first time that complete MIR systems have been evaluated with end users in a realistic scenario. We present the design of the evaluation task, the evaluation criteria and a novel evaluation interface, and the data-collection platform. This is followed by an analysis of the results, reflection on the experience and lessons learned, and plans for future directions.
Query classification is an important part of exploring the characteristics of web queries. Existing studies are mainly based on Broder's classification scheme and classify user queries into navigational, informational, and transactional categories according to users' information needs. In this article, we present a novel classification scheme from the perspective of queries' temporal patterns. Queries' temporal patterns are inherent time series patterns of the search volumes of queries that reflect the evolution of the popularity of a query over time. By analyzing the temporal patterns of queries, search engines can more deeply understand the users' search intents and thus improve performance. Furthermore, we extract three groups of features based on the queries' search volume time series and use a support vector machine (SVM) to automatically detect the temporal patterns of user queries. Extensive experiments on the Million Query Track data sets of the Text REtrieval Conference (TREC) demonstrate the effectiveness of our approach.
Collaborative information seeking (CIS) is of growing importance in the information sciences and human-computer interaction (HCI) research communities. Current research has primarily focused on examining the social and interactional aspects of CIS in organizational or other settings and developing technical approaches to support CIS activities. As we continue to develop a better understanding of the interactional aspects of CIS, we need also start to examine the cognitive aspects of CIS. In particular, we need to understand CIS from a team cognition perspective. To examine how team cognition develops during CIS, we conducted a study using observations and interviews of student teams engaged in colocated CIS tasks in a laboratory setting. We found that a variety of awareness mechanisms play a key role in the development of team cognition during CIS. Specifically, we identify that search, information, and social methods of awareness are critical to developing team cognition during CIS. We discuss why awareness is important for team cognition, how team cognition comprises both individual and team-level cognitive activities, and the importance of examining both interaction and cognition to truly understand team cognition.
The goal of this research is to develop a generic ontological model for proverbs that unifies potential classification criteria and various characteristics of proverbs to enable their effective retrieval and large-scale analysis. Because proverbs can be described and indexed by multiple characteristics and criteria, we built a multidimensional ontology suitable for proverb classification. To evaluate the effectiveness of the constructed ontology for improving search and retrieval of proverbs, a large-scale user experiment was arranged with 70 users who were asked to search a proverb repository using ontology-based and free-text search interfaces. The comparative analysis of the results shows that the use of this ontology helped to substantially improve the search recall, precision, user satisfaction, and efficiency and to minimize user effort during the search process. A practical contribution of this work is an automated web-based proverb search and retrieval system which incorporates the proposed ontological scheme and an initial corpus of ontology-based annotated proverbs.
Topic models have been shown to be a useful way of representing the content of large document collections, for example, via visualization interfaces (topic browsers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a term list; that is the top-n words with highest conditional probability within the topic. Other topic representations such as textual and image labels also have been proposed. However, there has been no comparison of these alternative representations. In this article, we compare 3 different topic representations in a document retrieval task. Participants were asked to retrieve relevant documents based on predefined queries within a fixed time limit, presenting topics in one of the following modalities: (a) lists of terms, (b) textual phrase labels, and (c) image labels. Results show that textual labels are easier for users to interpret than are term lists and image labels. Moreover, the precision of retrieved documents for textual and image labels is comparable to the precision achieved by representing topics using term lists, demonstrating that labeling methods are an effective alternative topic representation.
Modern search systems often meet their users' information needs, but when the system fails, searchers struggle to formulate effective queries. Query suggestions may help, but research suggests these often go unused. Although much is known about how searchers scan results pages when assessing relevance, little is known about the processes searchers use when struggling to reformulate queries. Investigating how searchers overcome query difficulties, and how search systems help and hinder that process, requires enquiry into the cognitive procedures searchers use to select words for queries. The purpose of this paper is to investigate one cognitive process involved: semantic priming of words in memory. A framework for conceptualizing the role of semantic priming in search interaction is presented, along with results from two experiments that applied research methods from cognitive psychology, in an investigation of word selection and subsequent search for selected words. The results show that word selection activates related words in memory and that looking for a selected word among related words is effortful. The finding suggests that semantic priming may play a role in the difficulties people experience when reformulating queries. Ideas for continued development of semantic priming methods and their use in future research are also presented.
This article proposes, exemplifies, and validates the use of course-subject co-occurrence (CSCO) data to generate topic maps of an academic discipline. A CSCO event is when 2 course-subjects are taught in the same academic year by the same teacher. A total of 61,856 CSCO events were extracted from the 2010-11 directory of the American Association of Law Schools and used to visualize the structure of law school education in the United States. Different normalization, ordination (layout), and clustering algorithms were compared and the best performing algorithm of each type was used to generate the final map. Validation studies demonstrate that CSCO produces topic maps that are consistent with expert opinion and 4 other indicators of the topical similarity of law school course-subjects. This research is the first to use CSCO to produce a visualization of a domain. It is also the first to use an expanded, multi-part gold standard to evaluate the validity of domain maps and the intermediate steps in their creation. It is suggested that the framework used herein may be adopted for other studies that compare different inputs of a domain map in order to empirically derive the best maps as measured against extrinsic sources of topical similarity (gold standards).
Using 3 years of the Journal Citation Reports (2011, 2012, and 2013), indicators of transitions in 2012 (between 2011 and 2013) were studied using methodologies based on entropy statistics. Changes can be indicated at the level of journals using the margin totals of entropy production along the row or column vectors, but also at the level of links among journals by importing the transition matrices into network analysis and visualization programs (and using community-finding algorithms). Seventy-four journals were flagged in terms of discontinuous changes in their citations, but 3,114 journals were involved in hot links. Most of these links are embedded in a main component; 78 clusters (containing 172 journals) were flagged as potential hot spots emerging at the network level. An additional finding was that PLoS ONE introduced a new communication dynamic into the database. The limitations of the methodology were elaborated using an example. The results of the study indicate where developments in the citation dynamics can be considered as significantly unexpected. This can be used as heuristic information, but what a hot spot in terms of the entropy statistics of aggregated citation relations means substantively can be expected to vary from case to case.
This paper examines the use of scientometric overlay mapping as a tool of strategic intelligence to aid the governing of emerging technologies. We develop an integrative synthesis of different overlay mapping techniques and associated perspectives on technological emergence across geographical, social, and cognitive spaces. To do so, we longitudinally analyze (with publication and patent data) three case studies of emerging technologies in the medical domain. These are RNA interference (RNAi), human papillomavirus (HPV) testing technologies for cervical cancer, and thiopurine methyltransferase (TPMT) genetic testing. Given the flexibility (i.e., adaptability to different sources of data) and granularity (i.e., applicability across multiple levels of data aggregation) of overlay mapping techniques, we argue that these techniques can favor the integration and comparison of results from different contexts and cases, thus potentially functioning as a platform for distributed strategic intelligence for analysts and decision makers.
The publication performance of 30 scientometricians is studied. The individuals are classified into 3 cohorts according to their manifested professional recognition, as Price medalists (Pm), members of the editorial board of Scientometrics and the Journal of Informetrics (Rw), and session chairs (Sc) at an International Society of Scientometrics and Informetrics (ISSI) conference. Several core impact indicators are calculated: h, g, pi, citation distribution score (CDS), percentage rank position (PRP), and weight of influence of papers (WIP10). The indices significantly correlate with each other. The mean value of the indices of the cohorts decreases parallel with the decrease in professional recognition: Pm > Rw > Sc. The 30 scientometricians studied were clustered according to the core impact indices. The members in the clusters so obtained overlap only partly with the members in the cohorts made by professional recognition. The Total Overlap is calculated by dividing the sum of the diagonal elements in the cohorts-clusters matrix with the total number of elements, times 100. The highest overlap (76.6%) was obtained with the g-index. Accordingly, the g-index seems to have the greatest discriminative power in the system studied. The cohorts-clusters method may be used for validating scientometric indicators.
Brazilian scholarly output has rapidly increased, accompanied by the expansion of domestic collaborations. In this paper, we identify spatial patterns of collaboration in Brazil and measure the role of geographic proximity in determining the interaction among researchers. Using a database comprising more than one million researchers and seven million publications, we consolidated information on interregional research collaboration in terms of scientific coauthorship networks among 4,615 municipalities during the period between 1992 and 2009, which allowed us to analyze a range of data unprecedented in the literature. The effects of geographic distance on collaboration were measured for different areas by estimating spatial interaction models. The main results provide strong evidence of geographic deconcentration of collaboration in recent years, with increased participation of authors in scientifically less traditional regions, such as south and northeast Brazil. Distance remains a significant factor in determining the intensity of knowledge flow in collaboration networks in Brazil, as an increase of 100km between two researchers reduces the probability of collaboration by an average of 16%, and there is no evidence that the effect of distance has diminished over time, although the magnitude of such effects varies among networks of different areas.
This paper describes and evaluates an unsupervised and effective authorship verification model called SPATIUM-L1. As features, we suggest using the 200 most frequent terms of the disputed text (isolated words and punctuation symbols). Applying a simple distance measure and a set of impostors, we can determine whether or not the disputed text was written by the proposed author. Moreover, based on a simple rule we can define when there is enough evidence to propose an answer or when the attribution scheme is unable to make a decision with a high degree of certainty. Evaluations based on 6 test collections (PAN CLEF 2014 evaluation campaign) indicate that SPATIUM-L1 usually appears in the top 3 best verification systems, and on an aggregate measure, presents the best performance. The suggested strategy can be adapted without any problem to different Indo-European languages (such as English, Dutch, Spanish, and Greek) or genres (essay, novel, review, and newspaper article).
It has been observed that Southeast Asian countries and universities have ranked poorly in global research productivity and impact. The same is true for the field of language and linguistics. Some studies revealed that productivity and citation patterns in this field are lower compared to other fields of study. Thus, this study sought to examine the research performance of SEA countries and universities in the field of language and linguistics for efficient policy-making. The research performance of each SEA country and university was assessed through Scopus database using the following bibliometric indicators: total number of publications (P), total number of citations excluding self-citations (C), citations per publication (CPP), percent of non-cited articles (%PNC), and h-index. Findings revealed that SEA countries have only produced almost 2 % of all published articles in language and linguistics and around 1 % share in overall worldwide field citations. Interestingly, both the SEA countries and universities exhibit a trend toward increasing their yearly research productivity and citations. However, research productivity and citations in the field of language and linguistics are dominated by selected universities in each country particularly in Brunei, Singapore, Malaysia, and the Philippines. This study has implications for research policy-making and future studies.
The academic elite possesses outstanding abilities in terms of knowledge innovation, while they produce a spillover effect on other researchers. This study takes micro level data from projects under the Management Science Sector of the National Natural Science Foundation of China between 2006 and 2010 to define the three categories of funded elite, distinguished young elite, and Cheung Kong scholars; it also examines the correlation between identifying as "elite" and his or her individual project output in order to explore the elite's spillover effect on the knowledge output of other project principal investigators within the organization. We found that the three categories of elites had more output while they generated mixed spillover effect on their institute researchers' output. At the end, we discuss the reasons and policy implications behind this phenomenon.
The ranking of scientific journals is important because of the signal it sends to scientists about what is considered most vital for scientific progress. Existing ranking systems focus on measuring the influence of a scientific paper (citations)-these rankings do not reward journals for publishing innovative work that builds on new ideas. We propose an alternative ranking based on the proclivity of journals to publish papers that build on new ideas, and we implement this ranking via a text-based analysis of all published biomedical papers dating back to 1946. In addition, we compare our neophilia ranking to citation-based (impact factor) rankings; this comparison shows that the two ranking approaches are distinct. Prior theoretical work suggests an active role for our neophilia index in science policy. Absent an explicit incentive to pursue novel science, scientists underinvest in innovative work because of a coordination problem: for work on a new idea to flourish, many scientists must decide to adopt it in their work. Rankings that are based purely on influence thus do not provide sufficient incentives for publishing innovative work. By contrast, adoption of the neophilia index as part of journal-ranking procedures by funding agencies and university administrators would provide an explicit incentive for journals to publish innovative work and thus help solve the coordination problem by increasing scientists' incentives to pursue innovative work.
Citation information helps researchers observe the evolution of the knowledge. In scientific publications, a review paper discusses a professional field and thus tends to have more citations than general papers do. This study investigated whether specific characteristics of review papers induce different results in citation-based analysis. From the Scopus database, we collected scientific publications in a specific research field, e-tourism, to examine the role of review papers in citation-based analysis. The dataset includes 1421 publications covering the period from the 1988-2015. Empirical statistics show that review papers' specific citation patterns influence citation analysis. First, in the main path analysis, the result expresses review papers' integrative role in linking papers from diverse perspectives toward a clear mainstream topic. Second, in a well-defined research context, review papers introduce bias in citation-based clustering analysis because the specific high citation pattern in review papers obfuscates the grouping process. When using citation information in analysis, scholars must consider the purpose of the study and treat review papers distinctly to avoid bias when using certain analysis methods and datasets.
Development of accurate systems to assess academic research performance is an essential topic in national science agendas around the world. Providing quantitative elements such as scientometric rankings and indicators have contributed to measure prestige and excellence of universities, but more sophisticated computational tools are seldom exploited. We compare the evolution of Mexican scientific production in Scopus and the Web of Science as well as Mexico's scientific productivity in relation to the growth of the National Researchers System of Mexico is analyzed. As a main analysis tool we introduce an artificial intelligence procedure based on self-organizing neural networks. The neural network technique proves to be a worthy scientometric data mining and visualization tool which automatically carries out multiparametric scientometric characterizations of the production profiles of the 50 most productive Mexican Higher Education Institutions (in Scopus database). With this procedure we automatically identify and visually depict clusters of institutions that share similar bibliometric profiles in bidimensional maps. Four perspectives were represented in scientometric maps: productivity, impact, expected visibility and excellence. Since each cluster of institutions represents a bibliometric pattern of institutional performance, the neural network helps locate various bibliometric profiles of academic production, and the identification of groups of institutions which have similar patterns of performance. Also, scientometric maps allow for the identification of atypical behaviors (outliers) which are difficult to identify with classical tools, since they outstand not because of a disparate value in just one variable, but due to an uncommon combination of a set of indicators values.
To assess endocrinologic and metabolic research productivity in East Asia (i.e., China, Japan, and South Korea) and correlations between socioeconomic factors and endocrinologic and metabolic research productivity. Articles (except editorials, conference abstracts, letters, news, and corrections) published in 134 endocrinology and metabolism journals in 2005-2014 were screened with the Web of Science database. Total and annual numbers of articles, study designs, impact factors, citations, and articles in high-impact-factor journals were determined for China, Japan, and South Korea. Annual numbers of articles were related to socioeconomic factors for each country. In 2005-2014, there were 144,660 articles from East Asia published in endocrinology and metabolism journals: 10,190, 9470, and 3124 from Japan, China, and South Korea, respectively. Japan published the most randomized controlled trials, followed by China and South Korea, respectively. China had the most articles in high-impact-factor journals, followed by Japan and South Korea, respectively. South Korea had the highest average impact factor and number of citations. During the period studied, annual numbers of articles from China and South Korea increased remarkably (P < 0.05) but remained stable for Japan. Annual numbers of articles from China and South Korea were positively correlated with gross domestic product and expenditure on health care (P < 0.05). The increase in endocrinology and metabolism articles during 2005-2014 in China and South Korea was associated with improved socioeconomic conditions. China has made progress in scientific publication in the past decade; however, there is still room for improvement.
With the rise in new energy industries, electrochemical energy storage, which plays an important supporting role, has attracted extensive attention from researchers all over the world. To trace the electrochemical energy storage development history, determine the research theme and evolution path, and predict the future development directions, this paper will use CitNetExplorer to draw citation chronology charts and study the development trends in this field by analysing data downloaded from the Web of Science database. The results indicate that the research in this field originated from the study on energy storage materials and gradually divided into two major fields: energy storage materials and applications after 2000. The research on the energy storage materials refers to activated carbon materials, carbon nanotubes, graphene, and mesoporous carbon materials. Energy storage applications mainly focus on power systems, new energy vehicles, and wind farm dispatch. For research on electrochemical energy storage materials, the industrialization of graphene may become a new trending topic, and the application research will turn to the construction of energy Internet systems in the future. This paper will provide a full map for the development of electrochemical energy storage and forecast the future research directions in this field.
In the solar energy field, scientists publish numerous scientific articles every year. Some are highly-cited, while others may not even be cited. In this paper, we introduce two underlying scientific properties of a paper to explain this paper's highly-cited or un-cited probability: scientific relatedness and intellectual base. We utilize two main network techniques, knowledge element coupling network (concurrence-based) and paper citation network (citation-based) analyses, to measure scientific relatedness and intellectual base, respectively. What's more, we conduct descriptive analyses of un-cited and highly-cited papers at the country, organization and journal levels. Then we map knowledge element co-occurrence networks and paper citation networks to compare the network characteristics of un-cited and highly-cited papers. Further, we use article data in the solar energy field between 2004 and 2010 to examine our hypotheses. Findings from Ordered Logit Models indicate that when the scientific relatedness of a paper is high, this paper is more likely to be un-cited, whereas less likely to be highly-cited. The paper with higher intellectual base has a higher possibility to be highly-cited, whereas a low possibility to be un-cited. Overall, this paper provides important insights into the determinant factors of a paper's citation levels, which is helpful for researchers maximizing the scientific impact of their efforts.
Several scientometric impact indicators [total citations, h, g, and pi-index, percentage rank position (PRP), weighted citation share (WCS)] of 190 elite papers of 15 members of the Hungarian Academy of Sciences active in three different fields were calculated. From the indices the PRP indicator proved to be independent of the citation practices in the fields. The PRP index of a journal paper can be calculated in per cent as unity minus (the rank number of the paper by citation frequency within the publishing journal minus one divided with the total number of papers in the journal) times hundred. The sum of the PRP index of the elite papers of a scientist may characterize his or her total publication performance. The size of the elite set of journal papers within the total was calculated by different methods. The h-index and g-index corresponds to the size of the elite, i.e. number of the elite papers according to the h-statistics and g-statistics, respectively. The number of papers in the pi-set is equal to the square root of total papers. The pi-index equals to one hundredth of citations to the pi-set papers. In the present paper the size of the elite set is determined as the number of papers in the h-set, g-set, or pi-set, and as 10 % of total papers, or number of papers cited 2, 3, or 5 times the mean citation rate (MCR) of the publishing journal. The pi-citation threshold model is presented for demonstrating how MCR and the distribution of citations over the papers may influence the size of the elite set and the corresponding pi-index. It was found that the scientific performances concluded from the pi-index obtained from elite sets of different size are in good agreement.
Current research information systems (CRISs) offer great opportunities for scientometric studies of institutional research outputs. However, many of these opportunities have not been explored in depth, especially for the analysis of intra-institutional research collaboration. In this paper, we propose a hybrid methodology to analyze research collaboration networks with an underlying institutional structure. The co-authorship network extracted from the institutional CRIS of the Faculty of Sciences, University of Novi Sad, Serbia, is analyzed using the proposed methodology. The obtained results show that the organizational structure of the institution has a profound impact on both inter- and intra-institutional research collaboration. Moreover, researchers involved in inter-department collaborations tend to be drastically more productive (by all considered productivity measures), collaborative (measured by the number of co-authorship relations) and institutionally important (in terms of the betweenness centrality in the co-authorship network) compared to those who collaborate only with colleagues from their own research departments. Finally, our results indicate that quantifying research productivity by the normal counting scheme and Serbian research competency index is biased towards researchers from physics and chemistry research departments.
The paper exploits a newly created dataset offering several detailed bibliometric data and indicators on 251 subject categories for a large sample of universities in Europe, North America and Asia. In particular, it addresses the controversial issue of the distance between Europe and USA in research excellence (so called "transatlantic gap"). By building up indicators of objective excellence (top 10% worldwide in publications and citations) and subjective excellence (top 10% in the distribution of share of top journals out of total production at university level), it is shown that European universities fail to achieve objective measures of global excellence, while being competitive only in few fields. The policy implications of this state of affairs are discussed.
This paper investigates an association between two new variables and citations in papers. These variables include the abstract ratio (the sum of repetition of keywords in abstract divided by abstract length) and the weight ratio (the frequency of paper's keyword per journal). The data consist of 5875 papers from 12 journals in education: three journals from each SCImago quartile. The researchers used semi-continuous regression to model the data and measure the impact of the proposed variables on citations. The results revealed that both abstract ratio and weight ratio are statistically significant predictors of citations in scientific articles in education.
Ranking scientific authors is an important but challenging task, mostly due to the dynamic nature of the evolving scientific publications. The basic indicators of an author's productivity and impact are still the number of publications and the citation count (leading to the popular metrics such as h-index, g-index etc.). H-index and its popular variants are mostly effective in ranking highly-cited authors, thus fail to resolve ties while ranking medium-cited and low-cited authors who are majority in number. Therefore, these metrics are inefficient to predict the ability of promising young researchers at the beginning of their career. In this paper, we propose C-3-index that combines the effect of citations and collaborations of an author in a systematic way using a weighted multi-layered network to rank authors. We conduct our experiments on a massive publication dataset of Computer Science and show that-(1) C-3-index is consistent over time, which is one of the fundamental characteristics of a ranking metric, (2) C-3-index is as efficient as h-index and its variants to rank highly-cited authors, (3) C-3-index can act as a conflict resolution metric to break ties in the ranking of medium-cited and low-cited authors, (4) C-3-index can also be used to predict future achievers at the early stage of their career.
Human-computer interaction (HCI) is a research field which engages different disciplines, interest groups and communities, and which has emerged in different countries at different times. To understand how the HCI research community has evolved in Brazil, this paper applies data and visual analytics to its main conference series, the Brazilian Symposium on Human Factors in Computing Systems, henceforth IHC. We have explored the metadata of all 340 full papers published in the 14 editions of IHC. Our goal was to investigate the evolution of the Brazilian HCI community so we can raise the level of "self-knowledge" and thus discuss strategies that can further help develop this research community. From our analysis, we could understand more deeply the authorship profile of our community and how it has changed over time, the co-authorship networks evolution, the prominent institutions and states, the reference profile and the research topics over time. We hope that this paper will contribute to inspire other scientific communities to analyze themselves, and encourage their own discussions.
A 'Sleeping beauty' is a term used to describe a research article that has remained relatively uncited for several years and then suddenly blossoms forward. New technology now allows us to detect such articles more easily than before, and sleeping beauties can be found in numerous disciplines. In this article we describe three sleeping beauties that we have found in psychology-Stroop (J Exp Psychol 18:643-662, 1935), Maslow (Psychol Rev 50(4):370-396, 1943), and Simon (Psychol Rev 63(2):129-138, 1956).
Much academic research is never cited and may be rarely read, indicating wasted effort from the authors, referees and publishers. One reason that an article could be ignored is that its topic is, or appears to be, too obscure to be of wide interest, even if excellent scholarship produced it. This paper reports a word frequency analysis of 874,411 English article titles from 18 different Scopus natural, formal, life and health sciences categories 2009-2015 to assess the likelihood that research on obscure (rarely researched) topics is less cited. In all categories examined, unusual words in article titles associate with below average citation impact research. Thus, researchers considering obscure topics may wish to reconsider, generalise their study, or to choose a title that reflects the wider lessons that can be drawn. Authors should also consider including multiple concepts and purposes within their titles in order to attract a wider audience.
Despite the importance and magnitude of biomedical research, little is known about its development and responsiveness to current health needs. Herein, we characterized the evolution of disease specific biomedical research and assess the alignment of research and translational efforts with disease burden. Publication patterns for approximately 2700 diseases indicated a fluid landscape of modern biomedical interests. In studying a subset of diseases with available data, overall measures of disease burden explained a large fraction of publication variance but only a small portion of NIH funding variance. In addition, discrete measures of mortality and morbidity differentially impacted NIH funding levels, research efforts, and the number of clinical trials in the US. Our findings not only scrutinize the relevance of our current biomedical enterprise, but may also serve as a resource for fostering strategies that adequately prepare the scientific community to address future health needs and promote accountability in the allocation of resources.
This bibliometric analysis focuses on the general history of climate change research and, more specifically, on the discovery of the greenhouse effect. First, the Reference Publication Year Spectroscopy (RPYS) is applied to a large publication set on climate change of 222,060 papers published between 1980 and 2014. The references cited therein were extracted and analyzed with regard to publications, which are cited most frequently. Second, a new method for establishing a more subject-specific publication set for applying RPYS (based on the co-citations of a marker reference) is proposed (RPYS-CO). The RPYS of the climate change literature focuses on the history of climate change research in total. We identified 35 highly-cited publications across all disciplines, which include fundamental early scientific works of the nineteenth century (with a weak connection to climate change) and some cornerstones of science with a stronger connection to climate change. By using the Arrhenius (Philos Mag J Sci Ser 5(41):237-276, 1896) paper as a RPYS-CO marker paper, we selected only publications specifically discussing the discovery of the greenhouse effect and the role of carbon dioxide. Using different RPYS approaches in this study, we were able to identify the complete range of works of the celebrated icons as well as many less known works relevant for the history of climate change research. The analyses confirmed the potential of the RPYS method for historical studies: Seminal papers are detected on the basis of the references cited by the overall community without any further assumptions.
Book reviews play important roles in scholarly communication especially in arts and humanities disciplines. By using Web of Science's Science Citation Index Expanded, Social Sciences Citation Index, and Arts & Humanities Citation Index, this study probed the patterns and dynamics of book reviews within these three indexes empirically during the past decade (2006-2015). We found that the absolute numbers of book reviews among all the three indexes were relatively stable but the relative shares were decreasing. Book reviews were very common in arts and humanities, common in social sciences, but rare in natural sciences. Book reviews are mainly contributed by authors from developed economies such as the USA and the UK. Oppositely, scholars from China and Japan are unlikely to contribute to book reviews.
Perpetuation of retracted publications is an ongoing and even increasing problem in the scientific community. In addition to the direct distortion of scientific credibility, the use of retracted findings for interpretation und discussion in subsequent publications poses the risk of drawing false and, for example in medical research, even harmful conclusions. One major contributor to this development is that many authors are not aware of the retraction status of a paper they cite. COPE guidelines state that the "retracted status should be indicated as clearly as possible", but this is definitely not true for many retracted publications. Likewise, databases do not consequently link retracted articles with the notice of retraction. Furthermore, many papers are deposit in the "original", i.e. pre-retraction version on personal or institutional websites or online repositories. Similarly, printed "stock files" are obviously unaffected by a retraction. Clear identification of a retracted article using a watermark and in databases is a crucial step while incorporation of an electronic "retraction check" in reference management software and during the online submission is necessary to detect and avoid citing retracted literature. Solving this problem needs the close attention of everybody involved in the publishing process: authors, reviewers, and publishers.
In this article, we compare publication and citation coverage of the new Microsoft Academic with all other major sources for bibliometric data: Google Scholar, Scopus, and the Web of Science, using a sample of 145 academics in five broad disciplinary areas: Life Sciences, Sciences, Engineering, Social Sciences, and Humanities. When using the more conservative linked citation counts for Microsoft Academic, this data-source provides higher citation counts than both Scopus and the Web of Science for Engineering, the Social Sciences, and the Humanities, whereas citation counts for the Life Sciences and the Sciences are fairly similar across these three databases. Google Scholar still reports the highest citation counts for all disciplines. When using the more liberal estimated citation counts for Microsoft Academic, its average citations counts are higher than both Scopus and the Web of Science for all disciplines. For the Life Sciences, Microsoft Academic estimated citation counts are higher even than Google Scholar counts, whereas for the Sciences they are almost identical. For Engineering, Microsoft Academic estimated citation counts are 14% lower than Google Scholar citation counts, whereas for the Social Sciences this is 23%. Only for the Humanities are they substantially (69%) lower than Google Scholar citations counts. Overall, this first large-scale comparative study suggests that the new incarnation of Microsoft Academic presents us with an excellent alternative for citation analysis. We therefore conclude that the Microsoft Academic Phoenix is undeniably growing wings; it might be ready to fly off and start its adult life in the field of research evaluation soon.
Policymaking implies planning, and planning requires prediction-or at least some knowledge about the future. This contribution starts from the challenges of complexity, uncertainty, and agency, which refute the prediction of social systems, especially where new knowledge (scientific discoveries, emergent technologies, and disruptive innovations) is involved as a radical game-changer. It is important to be aware of the fundamental critiques, approaches, and fields such as Technology Assessment, the Forrester World Models, Economic Growth Theory, or the Linear Model of Innovation have received in the past decades. It is likewise important to appreciate the limitations and consequences these diagnoses pose on science, technology and innovation policy (STI policy). However, agent-based modeling and simulation now provide new options to address the challenges of planning and prediction in social systems. This paper will discuss these options for STI policy with a particular emphasis on the contribution of the social sciences both in offering theoretical grounding and in providing empirical data. Fields such as Science and Technology Studies, Innovation Economics, Sociology of Knowledge/Science/Technology etc. inform agent-based simulation models in a way that realistic representations of STI policy worlds can be brought to the computer. These computational STI worlds allow scenario analysis, experimentation, policy modeling and testing prior to any policy implementations in the real world. This contribution will illustrate this for the area of STI policy using examples from the SKIN model. Agent-based simulation can help us to shed light into the darkness of the future-not in predicting it, but in coping with the challenges of complexity, in understanding the dynamics of the system under investigation, and in finding potential access points for planning of its future offering "weak prediction".
Modern science has become collaborative and digital. The Internet has supported the emergence of scientific digital platforms that globally connect programmers and users of novel digital scientific products such as scientific interactive software tools. These digital scientific innovations complement traditional text-based products like journal publications. This article is focused on the scientific impact of a platform's programming community that produces these digital scientific innovations. The article's main theoretical argument is that beyond an individual's contribution efforts to these innovations, a new social structure affects his scientific recognition through citations of his tools in text-based publications. Taking a practice theory lens, we introduce the concept of a digital practice structure that emerges from the digital innovation work practice, performed by programmers who jointly work on a tool. This digital practice creates dependence forces among the community members in an analogy to Newton's gravity concept. Our model represents such dependencies in a spatial autocorrelative model. We empirically estimate this model using data of the programming community of nanoHUB in which 477 nanotechnology tool programmers have contributed more than 715 million lines of code. Our results show that a programmer's contributions to digital innovations may have positive effects, while the digital practice structure creates negative dependency effects. Colloquially speaking, being surrounded by star performers can be harmful. Our findings suggest that modeling scientific impact needs to account for a scientist's contribution to programming communities that produce digital scientific innovations and the digital work structures in which these contributions are embedded.
Technology is a complex system with technologies relating to each other in a space that can be mapped as a network. The technology network's structure can reveal properties of technologies and of human behavior, if it can be mapped accurately. Technology networks have been made from patent data using several measures of proximity. These measures, however, are influenced by factors of the patenting system that do not reflect technologies or their proximity. We introduce a method to precisely normalize out multiple impinging factors in patent data and extract the true signal of technological proximity by comparing the empirical proximity measures with what they would be in random situations that remove the impinging factors. With this method, we created technology networks, using data from 3.9 million patents. After normalization, different measures of proximity became more correlated with each other, approaching a single dimension of technological proximity. The normalized technology networks were sparse, with few pairs of technology domains being significantly related. The normalized network corresponded with human behavior: We analyzed the patenting histories of 2.8 million inventors and found they were more likely to invent in two different technology domains if the pair was closely related in the technology network. We also analyzed the patents of 250,000 firms and found that, in contrast with inventors, firms' inventive activities were only modestly associated with the technology network; firms' portfolios combined pairs of technology domains about twice as often as inventors. These results suggest that controlling for impinging factors provides meaningful measures of technological proximity for patent-based mapping of the technology space, and that this map can be used to aid in technology innovation planning and management.
There is an increasing pressure on scholars to publish to further or sustain a career in academia. Governments and funding agencies are greedy of indicators based on scientific production to measure science output. But what exactly do we know about the relation between publication levels and advances in science? How do social dynamics and norms interfere with the quality of the scientific production? Are there different regimes of scientific dynamics? The present study proposes some concepts to think about scientific dynamics, through the modeling of the relation between science policies and scholars' exploration-exploitation dilemmas. Passing, we analyze in detail the effects of the "publish or perish" policy, that turns out to have no significant effects in the developments of emerging scientific fields, while having detrimental impacts on the quality of the production of mature fields.
Paradigms and revolutions are popular concepts in science studies and beyond, yet their meaning is notoriously vague and their existence is widely disputed. Drawing on recent developments in agent-based modeling and scientometric data, this paper offers a precise conceptualization of paradigms and their dynamics, as well as a number of hypotheses that could in principle be used to test for the existence of scientific revolutions in scientometric data.
This paper presents a novel model of science funding that exploits the wisdom of the scientific crowd. Each researcher receives an equal, unconditional part of all available science funding on a yearly basis, but is required to individually donate to other scientists a given fraction of all they receive. Science funding thus moves from one scientist to the next in such a way that scientists who receive many donations must also redistribute the most. As the funding circulates through the scientific community it is mathematically expected to converge on a funding distribution favored by the entire scientific community. This is achieved without any proposal submissions or reviews. The model furthermore funds scientists instead of projects, reducing much of the overhead and bias of the present grant peer review system. Model validation using large-scale citation data and funding records over the past 20 years show that the proposed model could yield funding distributions that are similar to those of the NSF and NIH, and the model could potentially be more fair and more equitable. We discuss possible extensions of this approach as well as science policy implications.
This article analyses the evolution in the number of authors of scientific publications in computer science (CS). This analysis is based on a framework that structures CS into 17 constituent areas, proposed by Wainer et al. (Commun ACM 56(8):67-63, 2013), so that indicators can be calculated for each one in order to make comparisons. We collected and mined over 200,000 article references from 81 conferences and journals in the considered CS areas, spanning a 60-year period (1954-2014). The main insights of this article are that all CS areas witness an increase in the average number of authors, in every decade, with just one slight exception. We ordered the article references by number of authors, in ascending chronological order and grouped them into decades. For each CS area, we provide a perspective of how many groups (1-author papers, 2-author papers and so on) must be considered to reach certain proportions of the total for that CS area, e.g., the 90th and 95th percentiles. Different CS areas require different number of groups to reach those percentiles. For all 17 CS areas, an analysis of the point in time in which publications with n + 1 authors overtake the publications with n authors is presented. Finally, we analyse the average number of authors and their rate of increase.
A Sleeping Beauty (SB) is a publication that goes unnoticed for a long time, and then, almost suddenly, is awakened by a 'prince' (PR), attracting from there on a lot of attention in terms of citations. Although there are some studies on the SB and the PR phenomena in the sciences, barely any research on this topic has been conducted in the social sciences, let alone in innovation studies. Based on 52,373 articles extracted from the Web of Science and using a new method that, comparatively with extant methods, selects SBs with the highest scientific impact, we found that, similarly to the sciences, SBs are rare in the field of innovation (<0.02%). In contrast with the sciences, the depth of sleep is relatively small, ranging from 7 to 17 years. All the 8 SBs found, and the (37) corresponding princes, were published in highly renowned journals (e.g., Harvard Business Review, Journal of Management Studies, Organization Studies, Rand Journal of Economics, Research Policy). The explanations for the delayed recognition are associated with innovative methods, scientific resistance, and theoretical-relatedness. The role of highly influential authors and self-awakening mechanisms were critical triggers for bringing SBs into scientific notoriety.
What is academic research efficiency and what determines the differences between scholars' academic research efficiency? The literature on this topic has evolved exponentially during the last decades. However, the divergence of the approaches used, the differences in the bundles of outputs and inputs considered to estimate the efficiency frontiers, and the differences in the predictors of efficiency variability among scholars that are considered in prior studies, make it interesting to have an overview of the literature dedicated to this topic. Relying on a systematic review of empirical studies published between 1990 and 2012, this article proposes and discusses a framework which brings together a set of outputs and inputs related to academic research efficiency, and the individual, organizational, and contextual factors driving or hampering it. The ensuing results highlight several avenues which would help university administrators and policy makers to better foster academic research efficiency, and researchers to better channel their efforts in studying the phenomenon.
This work proposes an entropy-based disciplinarity indicator (EBDI) which allows the classification of scientific journals in four classes: knowledge importer, knowledge exporter, disciplinary and interdisciplinary with regards to the discipline(s) in which they are classified. Assuming that the set references in the papers published in a journal represent a significant part of their knowledge basis, the diversity (measured with Shannon's entropy) and ratio between internal and external (to the discipline in which the journal is classified) references can provide a measure of the disciplinarity/interdisciplinarity of the journal in the reference dimension. The homologous analysis can be applied to the set of citations received by the papers published in the journal. In this article, an entropy-based indicator for the measurement of the disciplinarity of scientific journals is developed, applied (to the cited and citing dimensions) and discussed. The indicator can take finite values and it is found to be theoretically consistent when tested against two definitions for bibliometric indicators. The combinations of disciplinarity values in the citing and cited dimensions permits the classification of journals according to their knowledge importing/exporting profile (separately, with regards to the social sciences or the sciences), providing a taxonomy of the role of journals according to their importing, exporting, interdisciplinary or specialized profile with regards to the subject category in which they are classified. The indicator, EBDI and the resulting taxonomy is proposed and tested for the set of journals in LIS subject category in JCR 2013 and for the sets of journals in Andrology and Legal Medicine in JCR 2015. Evidence of concurrent validity with journal co-classification patterns is found in the three sets of journals.
Maturity model has been increasingly approached by several researchers from various research areas in order to develop models that could assess the strengths and weaknesses of a system and/or a process, and to develop scripts for improvement. Additionally, they pursue the development of a process with desirable goals, such as a set of resources or practices, resulting in a more mature organization or system. This study provides an overview of the maturity models' literature and their main features while employing bibliometric analyses of publications from 2004 through 2014 aiming to identify scientific gaps that would serve as guides for future research. The analyses' results reveal that maturity models are highly applied in project management seeking for processes and/or systems improvement, and they can also be described as adaptable models which could be used in various research areas, as can be seen in the scientific gaps section. Therefore, the main contribution of this paper is to make the evolution of this subject and its importance better known, thus encouraging the use of maturity model in future researches, seeking the improvement of the current model.
In recent decades, internationalization of research activities has increased, as demonstrated by the phenomena of international scientific collaboration and international mobility of researchers. This paper investigates whether the international scientific collaboration is explained by researchers' motivation as well as their international migration. Using metadata from papers published in Nature and Science from 1989 to 2009, count data estimation was conducted. The results illustrate those researchers' international migration and motivation, shown by both synergy and difference effects between countries, explain international collaboration. This implies that international co-authorship in recent decades has been based on researchers' motivation as well as their networking. The positive result for synergy effects also means that pairs of countries with rich research environments tended to have more international collaboration, which may lead to the convergence of qualified research output in advanced scientific countries. Our findings also support the conclusion that researchers move to countries with better research environments, but networks created through international collaboration are not a factor in international migration. The relationship between international mobility and collaboration is confirmed as going in one direction, from mobility to collaboration.
This study addresses an early case of an association between a local journal and a commercial publisher in Latin America striving to improve quality. The two journals examined are Archivos de Investigacin Medica (AIM), 1970-1991 and its continuation as Archives of Medical Research (AMR), 1992-2014. The aim is to characterize and compare the publishing policies and patterns of scientific communication and bibliometric indicators developed under the two different types of publication: AIM as a source of local dissemination and the commercially circulated AMR. Publishing policies, production, and citations were identified in accordance with coverage by the Web of Science and Scopus indexes. The papers and citations were grouped into three categories according to the author's affiliation: local, regional, and external. This categorization resulted in different combinations of correlations between cited and citing papers, in addition to a distribution of collaborative production and citations by country organized by continent. The comparison of results reveals two successful publishing projects; however, editorial practices pose an irreconcilable situation between the AIM objectives as a regional journal and AMR objectives as a mainstream journal, according with dominant indicators of international competition. Some implications of this situation are discussed in the context of Latin American and Caribbean journals.
Information networks, especially citation networks, have many proven and potential applications in scientometrics. Identification of productivity of authors and journals is one of the prime concern of analysts. While there are many indices to measure the productivity of author or journal, there is no known index to determine productivity with respect to a particular research context. A network scientometric approach is devised to address the identification of contextual productivity. Work-author and Work-journal affiliations modelled as 2 mode networks provide effective means to assess the productivity of authors and journals in a particular research context. In this work, weighted 2 mode networks are created for the analysis of affiliations networks such that weights reflect some citation characteristics of the works in their original citation network. A set of network indices are proposed for the assessment of contextual importance of authors and journals which are illustrated in the case study of Biotechnology for Engineering. Online databases and digital libraries can use these indices to gather insights about most productive authors and journals, along with the search results.
Links between institutional academic performance and academic resources are of relevance for university managers, country officials and the public at large. This study aims to shed light on the issues using reliable data on research performance indicators as well as educational and resource indicators from research universities in Spain, Italy, Australia and Canada. The four countries selected for the study represent different academic traditions and belong to different geopolitical regions, yet they have relatively similar higher education systems in terms of student population, institutional resources and research production. Our study explores differences and similarities among them to better assess the performance of research universities from the four countries in a global context. The indicator set includes research production (number of indexed articles per year) and its quality (citation impact and number of highly cited papers), education production (full-time equivalent FTE student load and degree completions per year) and the resource base (annual ordinary expenditure and FTE number of faculty). We consider the raw indicators as well as a set of composite indicators normalised by measures of scale. Across the profile of universities in our complete sample, institutional size is the prime determinant of research production, with systematic differences in quality related to country, research intensity and resourcing level. Our data show that research universities allocate resources to research and education in country-specific and size-specific ways that are reflected in research performance.
Citation and coauthor networks offer an insight into the dynamics of scientific progress. We can also view them as representations of a causal structure, a logical process captured in a graph. From a causal perspective, we can ask questions such as whether authors form groups primarily due to their prior shared interest, or if their favourite topics are 'contagious' and spread through co-authorship. Such networks have been widely studied by the artificial intelligence community, and recently a connection has been made to nonlocal correlations produced by entangled particles in quantum physics-the impact of latent hidden variables can be analyzed by the same algebraic geometric methodology that relies on a sequence of semidefinite programming (SDP) relaxations. Following this trail, we treat our sample coauthor network as a causal graph and, using SDP relaxations, rule out latent homophily as a manifestation of prior shared interest only, leading to the observed patternedness. By introducing algebraic geometry to citation studies, we add a new tool to existing methods for the analysis of content-related social influences.
To reveal China's regional disparity both in research output and preferential research areas is the main purpose of this study. For this study, we investigated the research outputs of all 31 regions (27 provinces and 4 municipalities) in mainland China. The investigated dataset was sourced from CNKI, one of China's largest domestic academic databases. To measure research preferences between regions, we used the function of cosine distance rather than Euclidean distance. Clustering method was employed to classify the regions according to their similarity/disparity. In the end, six clusters were generated. Each cluster is different in research preferences. For example, Inner Mongolia in Cluster D is featured with the emphasis on animal handcraft; while Hubei province in Cluster A is characterized by a wide range of research areas.
Science is a societal process, designed on widely accepted general rules which facilitate its development. Productive researchers are viewed from the perspective of a social network of their interpersonal relations. In this paper we address performance of Slovenian research community using bibliographic networks between the years 1970 and 2015 from various aspects which determine prolific science. We focus on basic determinants of research performance including productivity, collaboration, internationality, and interdisciplinarity. For each of the determinants, we select a set of statistics and network measures to investigate the state of each in every year of the analyzed period. The analysis is based on high quality data from manually curated information systems. We interpret the results by relating them to important historical events impacting Slovenia and to domestic expenditure for research and development. Our results clearly demonstrate causal relations between the performance of research community and changes in wider society. Political and financial stability together with concise measuring of scientific productivity established soon after Slovenia won independence from Yugoslavia in 1991 had positive influence on all determinants. They were further leveraged by foundation of Slovenian research agency and joining EU and NATO. Publish and perish phenomenon, negative impacts of financial crisis in 2008-2014 and reshaping the domestic expenditure for research and development after 2008 have also clear response in scientific community. In the paper, we also study the researcher's career productivity cycles and present the analysis of the career productivity for all registered researchers in Slovenia.
This study aims to observe the researchers' behavior in Iranian scientific databases to determine the research gaps and priorities in their field of research. Text mining and natural language processing techniques were used to identify what researchers are looking for and to analyze existing research works. In this paper, the information about the behavior of researchers who work in the field of environmental science and existing research works in the Iranian scientific database are processed. The search trends in all areas are evaluated by analyzing the users' search data. The trend analysis indicates that in the period of February 2013 to July 2015, the growth of the researchers' requests in some domains of the environment such as Industry, Training, Assessment, Material, Water and Pollution was 1.5 up to 2 times more than the overall requests. A Combination of the trend analysis and clustering of queries led to shaping four priority zones. Then, the research priorities for each environmental research area were determined. The results show that Training, Pollution, Rangeland, Management and Law are those domains in the environmental research which have the most research gaps in Iran, but there are enough research in Forest, Soil and Industry domains. At the end, we describe the steps for the implementation of a decision support system in environmental research management. Researchers, managers and policy makers can use this proposed "research demand and supply monitoring'' system or RDSM to make appropriate decisions and allocate their resources more efficiently.
Tracing the trajectory of scientific fields has been recognized by informaticians, nonetheless, little effort has been dedicated to understanding the evolution of the fast-moving research field of transport, quantitatively and qualitatively. This paper identifies intellectual turning points and emerging trends in the area of transport. Using bibliometric methods, co-keyword networks, journal co-citation networks, highly cited categories, and country and institute networks are detected, visualized and discussed. To conduct this analysis, all publications (35,712) in 23 top journals in the field of transport are extracted from the Institute for Scientific Information (Web of Science). The output of this article could be a valuable source for academics and practitioners working in the field of transport planning and those who work in the areas having a strong relationship with transport issues including mathematicians, economics, operation research, management and geography.
We investigate the question of how long top scientists retain their stardom. We observe the research performance of all Italian professors in the sciences over three consecutive four-year periods, between 2001 and 2012. The top scientists of the first period are identified on the basis of research productivity, and their performance is then tracked through time. The analyses demonstrate that more than a third of the nation's top scientists maintain this status over the three consecutive periods, with higher shares occurring in the life sciences and lower ones in engineering. Compared to males, females are less likely to maintain top status. There are also regional differences, among which top status is less likely to survive in southern Italy than in the north. Finally we investigate the longevity of unproductive professors, and then check whether the career progress of the top and unproductive scientists is aligned with their respective performances. The results appear to have implications for national policies on academic recruitment and advancement.
This paper reports the results of an analysis of patent citation and patent renewal data, advancing a log-linear relation between patent citations and patent value. A complementary analysis of firms' patent portfolios confirms that modelling the relation between citations and firm value benefits from the adoption of the log-linear form.
Sustainable development (SD) was posited almost three decades ago by the World Commission on Environment and Development (WCED) as an integrated approach for addressing concerns regarding a number of environmental and socio-economic issues. To represent the knowledge structure and evolution of SD in the post-WCED era, this paper resorted to CiteSpace to identify and visualize cited references and keyword networks, the distribution of categories and countries, and highly cited references relating to SD research. Two indicators embedded in CiteSpace were introduced to investigate intellectual turning points and pivotal points to outline the emerging trends, and furthermore, a new indicator (BC x CB) was developed and applied for keyword analysis. Our findings were as follows. First, the United States and UK occupy dominant positions in relation to SD studies in general and meanwhile China records the highest publication counts. Second, the concept of nature capital has contributed significantly to interpretations of SD and the detected promising disciplinary frontiers are materials category and social sciences. Lastly, keyword analysis shows the valuable keywords under the measure of BC x CB and furthermore citation maps and visible hot research areas are revealed as well.
Bibliographic coupling (BC) is an effective measure to estimate the similarity between two scholarly articles (i.e., inter-article similarity between the two articles). It works on out-link references of articles (i.e., those references cited by the articles), and is essential for relatedness analysis and topic clustering of scholarly articles. In this paper, we present a new BC measure DescriptiveBC, which employs the titles of the out-link references to improve BC in two ways: given a target article a, DescriptiveBC provides more accurate information about how (based on numerical inter-article similarity) and why (based on textual descriptive terms) a scholarly article is related to a. Visualization of the information can support the identification, clustering, mapping, and navigation of the related evidence in scientific literature. Empirical evaluation justifies the contributions of DescriptiveBC. Release of the reference titles in each article is thus helpful for the dissemination of research findings in scientific literature, and DescriptiveBC can be incorporated into search engines of scholarly articles to help prospective researchers to navigate through the space of related articles online.
When the meaning of key terms is incompatible in competing taxonomies, a revolution might occur in the field by which the established taxonomy is replaced with another. Since the key term "impact'' in scientometrics seems to undergo a taxonomic change, a revolution might be taking place at present: Impact is no longer defined as impact on science alone (measured by citations), but on all sectors of society (e.g. economics, culture, or politics). In this Short Communication, we outline that the current revolution in scientometrics does not only imply a broadening of the impact perspective, but also the devaluation of quality considerations in evaluative contexts. Impact might no longer be seen as a proxy for quality, but in its original sense: the simple resonance in some sectors of society.
In the last three decades, several vernacular names of medicinal plants related to manufactured drugs names have been recognized in ethnobotanical surveys throughout Brazil. The medicalization may be the primarily responsible process for the rise of that type of vernacular names of Brazilian medicinal plants differentially for each geopolitical region of Brazil. We attempt to trace the regionalization of medicalization on vernacular names of medicinal plants through ethnobotanical studies carried out in Brazil since the 1980s. Articles were consulted in nine journals published between 1980 and 2014. Richness estimation by Jackknife 1 and correspondence analysis by contingency tables were performed, both by the occurrence of medicalized names collected in the surveys for each region. The South region presented the highest number of reported and estimated medicalized names, in addition to present the highest number of medicalized names in exclusive occurrence. Northeast and Southeast regions presented a great similarity of medicalized names probably due to the migration flows occurring in both regions over the twentieth century.
To provide users insight into the value and limits of world university rankings, a comparative analysis is conducted of five ranking systems: ARWU, Leiden, THE, QS and U-Multirank. It links these systems with one another at the level of individual institutions, and analyses the overlap in institutional coverage, geographical coverage, how indicators are calculated from raw data, the skewness of indicator distributions, and statistical correlations between indicators. Four secondary analyses are presented investigating national academic systems and selected pairs of indicators. It is argued that current systems are still one-dimensional in the sense that they provide finalized, seemingly unrelated indicator values rather than offering a dataset and tools to observe patterns in multi-faceted data. By systematically comparing different systems, more insight is provided into how their institutional coverage, rating methods, the selection of indicators and their normalizations influence the ranking positions of given institutions.
This paper proposes a multivariate model for the evaluation of international emergency medicine journals using the most widely used, efficient and representative productivity and citation indicators. This is a descriptive observational evaluation study based on a sample of 24 journals included in the emergency medicine category of the Journal Citation Reports 2015. The sample is evaluated based on seven evaluation indicators: Journal Citation Reports impact factor, three H-indices (Web of Science, Scimago and Google Scholar), the Scimago Journal Rank and two altmetric scores (3-month and any time). The bivariate correlations between the diverse distributions of evaluation indicators and a multivariate metric reliability index are calculated. The factorial structure of the indicators is explored and clusters of journals are defined. A factor score is assigned to each journal. The correlations between the seven evaluation indicators are high and statistically significant. The metric reliability of the multivariate analysis calculated using Cronbach's alpha is .97. A general factor explains the 84.64% variance of the factor space of the seven evaluation indicators, representing the 'quality of EM journals' construct. The most well-represented indicator, the one with the highest communality, is H_Index_WoS (h = .922), which is also the indicator with the greatest weight in the general factor (a = .96). The journals can be classified in four clusters according to their quality. A factor score is generated for each journal that could be used as a multivariate meta-index to evaluate journal quality and to define a ranking of journals.
Global scientific research output has experienced continuous and rapid growth during the last 20 years. The spatial and temporal variations of the international papers at the national and regional scales were analyzed by combining the remotely sensed nighttime light data from the Defense Meteorological Satellite Program's Operational Linescan System. The findings indicate that the publication of international-circulation scientific papers in most of the countries examined have experienced a trend of exponential increase which can be positively correlated with nighttime light in those counties or regions. Furthermore, the developing countries have higher correlation coefficients than the developed countries. Thus, literal nighttime light data can potentially be used in future to better predict the number of publications of the research of figurative 'luminaries' residing in developing countries.
With the emergence of Web 2.0, an online platform which encourages online creation of next generation tools, communication has become a nigh-indispensable tool for researchers. Allowing them to acquire, spread, and share research achievements, with a free flow of ideas online. At present, there are a growing number of studies on non-traditional evaluation indicators, but there is much fewer research focused on the software evaluation, especially for open source research software. What this research focuses on is; with the use of the open source project 'Depsy', this research evaluated and analyzed data collected from downloads made online through these open source software. Altmetrics cannot be confined to traditional measurable indicators. That the importance of the open source software used, and its position in the online community is itself a strong measure for academic impact and success, which is all too often overlooked in research. The research can also conclude that the multiplexing of software online, through the citation of a citation, ultimately leads to an online peered review system within the community, effectively developing and maintaining through open use software itself. Moreover, the benefits of such a system has only just begun to come to fruition, having a strong impact on academic research, and predicting research impact.
The present study sought to examine the trend and impact of international collaboration in scientific research in Vietnam during the period after the introduction of the a reform policy and the normalization of relations with the United States. Using the Thomson Reuters' Web of Science data (2001-2015) we found that 77% of Vietnam's scientific output (n = 18,044 papers) involved international collaborations, with the United States and Japan researchers being the most frequent partners. The proportion of international collaborations has decreased slightly over time at the expense of an increased rate of domestic collaborations. The rate of growth in Vietnam's scientific output was 17% per annum, and three-quarters of the growth was associated with international collaborations rather than purely domestic production. Moreover, internationally coauthored papers received twice the average citation as domestic papers. Of note, papers with overseas corresponding author had higher citation rate than papers with domestic corresponding author. These data suggest that the vast majority of scientific papers from Vietnam was attributable to international collaboration, and this had a positive impact on the quality and visibility of Vietnam science. The data also indicate that Vietnam is in the growth phase of building up research capacity.
The contributions of leading scientists, such as Nobel Prize winners often play an important role in the progress of mankind. In this article, we propose new indices to recognize foundational work in science. Based on case studies of publications by 2016 Nobel Prize winners we make a distinction between two types of fundamental contributions. In a metaphoric way we refer to them as directly igniting or sparking. Our work contains an important message for research evaluation. Besides short-term evaluations it is also important to perform longer term evaluations, otherwise work of Nobel class may fall under the radar and is not rewarded according to its scientific value. It is further suggested that scientometric investigations should not overlook transitional characteristics of scientific progress.
What are the characteristics of scientific papers published in World War II, and what papers from World War II, if any, are highly cited today? This paper reports that 3767 publications from World War II have been cited at least 100 times since 1939-1945. The data show that the publication rates of scientific papers declined during World War II only to increase rapidly after it. The USA was the most prolific source of scientific publications during the war, and Harvard University was the most dominant institute. In addition, there were five 'Sleeping Beauties', that is papers that were published but rarely cited during the war but came into prominence at a much later date.
This Letter to the Editor proposes to use the CSS method for classifying ranking results (e.g. from university rankings) into meaningful groups.
This letter describes how a large number of citations for particular publications are pleasing but how a low number is not, especially when the author thinks that some of the latter publications are just as, if not more, important than the former. If an author looks up his/her citations in Google Scholar, he or she may be in for a shock. One might assume that one would be pleased with the recognition given to some of them as shown by the high number of citations, and disappointed by the lack of recognition given to others. Well, in my case, it is worse than that! I looked up the fate (in terms of the number of citations) of over 500 or so books and articles that I have published since 1964. Happily some of these have been highly cited. But, to my surprise, some pieces that I felt had made major contributions were hardly cited at all.
In this paper a three-dimensional framework to see how Indian universities and research-focused institutions fare in the world of high end research in terms of excellence and diversity of its research base is proposed. At the country level scholarly performance is broken down into three components-size, excellence and balance or evenness. A web application available in the public domain which visualizes scientific excellence worldwide in several subject areas is used. India has a presence in fifteen of twenty-two subject areas in which there are at least 50 institutes globally that have published more than 500 papers. It has no institution which can be counted at this level of size and excellence in seven areas: Arts and Humanities; Business, Management and Accounting; Health Professions; Neuroscience; Nursing; Psychology; and Social Sciences. India's research base is completely skewed towards the Physical Sciences and Engineering with very little for Biological Sciences and Medicine and virtually none in Social Sciences and Arts and Humanities when excellence at the highest level is considered. Its performance is also benchmarked against three nations, namely Australia, The Netherlands and Taiwan which are of similar size in terms of GDP and scientific output. It is seen that although India has the highest GDP among the four countries, its performance lags considerably behind. Even in terms of diversity, its performance is poor compared to the three comparator countries.
This study had three objectives: to examine patterns of research collaboration in Ghana, to study reasons why Ghanaian-affiliated researchers collaborate with others (both inside and outside Ghana), and to determine the roles of Ghanaian-affiliated researchers in collaborations. The methodology comprised a bibliometric analysis of articles in the Web of Science for the years 1990 to 2013, and an online survey of 190 Ghanaian-affiliated corresponding authors of articles. Collaboration increased from 73% in 1990-1997 to 93% in 2006-2013, and international collaboration from 49 to 73% over the same time. The public university and government sectors, together with the three most research-productive organisations in each sector, were found to be highly dependent on collaboration for research production. The online survey revealed that collaboration with researchers in three regions (within Ghana, within the rest of Africa, and outside Africa) was to a large extent initiated by existing personal/working relationships. Access to expertise and enhanced productivity were the main reasons why Ghanaian-affiliated researchers collaborated with others in these three regions. Collaborators within Ghana were largely involved in the collection of data or fieldwork. Collaborators from outside Africa played instrumental roles in providing resources and securing research funds.
A 'Sleeping Beauty in Science' is a publication that goes unnoticed ('sleeps') for a long time and then, almost suddenly, attracts a lot of attention ('is awakened by a prince'). In our foregoing study we found that roughly half of the Sleeping Beauties are application-oriented and thus are potential Sleeping Innovations. In this paper we investigate a new topic: Sleeping Beauties that are cited in patents. In this way we explore the existence of a dormitory of inventions. To our knowledge this is the first study of this kind. We investigate the time lag between publication of the Sleeping Beauty and the first citation by a patent. We find that patent citation may occur before or after the awakening and that the depth of the sleep, i.e., citation rate during the sleeping period, is no predictor for later scientific or technological impact of the Sleeping Beauty. A surprising finding is that Sleeping Beauties are significantly more cited in patents than 'normal' papers. Inventor-author self-citations relations occur only in a small minority of the Sleeping Beauties that are cited in patents, but other types of inventor-author links occur more frequently. We develop an approach in different steps to explore the cognitive environment of Sleeping Beauties cited in patents. First, we analyze whether they deal with new topics by measuring the time-dependent evolution in the entire scientific literature of the number of papers related to both the precisely defined topics as well as the broader research theme of the Sleeping Beauty during and after the sleeping time. Second, we focus on the awakening by analyzing the first group of papers that cites the Sleeping Beauty. Third, we create concept maps of the topic-related and the citing papers for a time period immediately following the awakening and for the most recent period. Finally, we make an extensive assessment of the cited and citing relations of the Sleeping Beauty. We find that tunable co-citation analysis is a powerful tool to discover the prince(s) and other important application-oriented work directly related to the Sleeping Beauty, for instance papers written by authors who cite Sleeping Beauties in both the patents of which they are the inventors, as well as in their scientific papers.
In Brazil, the National Council for Scientific and Technological Development (CNPq) distributes productivity fellowships in research (RS) as a recognition to individuals with outstanding productivity levels in their areas. The aim of this study is to evaluate the scientific production of the Brazilian Pharmacy area, one division of the Health Sciences Great Area of CNPq, considering the profile and productivity levels of RS fellows. The results showed that most of the 156 active RS fellows in 2015 were female, with doctorate completed in the Southeast region (mainly in University of So Paulo) and with research activities developed in the South and Southeast regions. Most of their work was published in journals classified as B1 and B2 Qualis in Pharmacy by the Coordination for the Improvement of Higher Education Personnel (CAPES), with high prevalence of publications in local journals and/or specialized on medicinal plants. Besides, they featured much dependence on advising and productivity indexes related to the category and level of RS fellowship. The evolution of such data must be continually evaluated to determine the influence of CNPq productivity fellowships on performance and stratification of researchers in the Pharmacy area in Brazil.
The disciplinary structure of research on complex problems related to human activities is supported by the fundaments of the social, life, and hard sciences. In this work, we looked at the development of scientific research in the field of biofuels, as a sustainable source of energy, searching for references regarding its scientific roots and social relevance. Scientific communications on biofuels published between 1998 and 2007 were analyzed using a combination of bibliometric methods and text mining techniques. This field of research was characterized as interdisciplinary, with marked social relevance. Our bibliometric analysis shows that, in this research subject, 132 different, interacting fields of knowledge overlap, with dominance of Chemistry, Engineering and Agricultural Sciences. Through the use of text mining techniques, this field was configured into three groups of Disciplinary Dimensions. The first and most influential group includes the Agricultural Sciences, Social Sciences, and Environmental Sciences. The second group, which gives the field its technological basis, includes Chemistry, Engineering, and Microbiology. The third group includes disciplines with emerging involvement in the field of biofuels: Biology and Biochemistry, Animal and Plant Sciences, Molecular Biology and Genetics, Economics, Material Sciences, Nanosciences and Nanotechnology, Geosciences, Physics, Humanities, Multidisciplinary Sciences, Mathematics, and Computer Sciences. This study suggests that the first group of Disciplinary Dimensions conforms to the elements that socially validate the progress of research in the field of biofuels. This study also proposes a metric that can be used to measure the interdisciplinarity and the social framing of any other research field.
The aim of this paper is to explore the power-law relationship between citation-based performance (CBP) and co-authorship patterns for papers in management journals by analyzing its behavior according to the type of documents (articles and reviews) and the number of pages of documents. We analyzed 36,241 papers that received 239,172 citations. The scaling exponent of CBP for article papers was larger than for reviews. Citations to articles increased 2(1.67) or 3.18 times each time the number of article papers published in a year in management journals doubled. The citations to reviews increased 2(1.29) or 2.45 times each time the number of reviews published in a year in management journals doubled. The scaling exponent for the power-law relationship of citation-based performance according to number of pages of papers was 1.44 +/- 0.05 for articles and 1.25 +/- 0.05 for reviews. The citations to articles increased faster than citation to reviews. The scaling exponent for the power-law of citation-based performance to co-authored articles was higher than single-authored articles. For reviews the scaling exponent was the same for the relationship between citation based performance and the number of reviews. Citations increased faster in single authored reviews than co-authored reviews.
In this short communication, we provide an overview of a relatively newly provided source of altmetrics data which could possibly be used for societal impact measurements in scientometrics. Recently, Altmetric-a start-up providing publication level metrics-started to make data for publications available which have been mentioned in policy-related documents. Using data from Altmetric, we study how many papers indexed in the Web of Science (WoS) are mentioned in policy-related documents. We find that less than 0.5% of the papers published in different subject categories are mentioned at least once in policy-related documents. Based on our results, we recommend that the analysis of (WoS) publications with at least one policy-related mention is repeated regularly (annually) in order to check the usefulness of the data. Mentions in policy-related documents should not be used for impact measurement until new policy-related sites are tracked.
The article investigates the scientific performance of Russia in the field of nanotechnology, focusing on production, impact and collaboration. An underlying multidisciplinary corpus of publications was extracted from the Science Citation Index Expanded database through relevant keywords. The various bibliometric findings are presented in a top-down sequence, starting with a comparative analysis of Russia and other selected countries, scrutinizing further a revitalization of science in universities and finally presenting some (possible) centers of excellence within the domestic scientific system. Focusing on the most highly-cited nano papers, I use the analysis not only in terms of percentages of world shares of publications, but also in terms of the proportions of top-1 and top-10% publications. It is shown that among the comparative countries, Russia maximally increases the citation impact depending on its internationalization efforts and that, for example, the co-authorship between Russia and Australia in the top-10% layer as well as between Russia and the UK in the top-1% layer is above expectation. Implementing the president's initiative "Strategy of Nanoindustry Development" and the role of governmental university-centered policy are discussed in light of the performed bibliometric study.
In this paper we undertake a quantitative review of the existing literature on parks and incubators to identify their foundations from a longitudinal perspective. To do so, we searched records in the SSCI database from 1990 to 2015 and identified 318 citing documents, which we split into four periods of 5 years each, to identify the interactions and path dependence that exist between different foundations of research. We evaluate the evolution of the theoretical foundations of this research line taking into account changes in citations over time. We also identify areas of future research closely connected with the theoretical foundations already identified. For this purpose, we used two bibliometric techniques-co-citation analysis and bibliographic coupling-that enable us to assess the thematic similarity between scientific publications based on overlaps in their referencing patterns.
The author byline is an indispensable component of a scientific paper. Some journals have added contribution lists for each paper to provide detailed information of each author's role. Many papers have explored, respectively, the byline and contribution lists. However, the relationship between the two remains unclear. We select three prominent general medical journals: Journal of the American Medical Association (JAMA), Annals of Internal Medicine (Annals), and PLOS Medicine (PLOS). We analyze the relationship between the author byline and contribution lists using four indexes. Four main findings emerged. First, the number, forms, and names of contribution lists significantly differed among the three journals, although they adopted the criteria of the International Committee of Medical Journal Editors. Second, a U-shaped relationship exists between the extent of contribution and author order: the participation levels in contribution lists were highest for first authors, followed by last and second authors, and then middle authors with the lowest levels. Third, regarding the consistency between author order in the contribution list and byline, every contribution category has a high consistency in JAMA and Annals, while PLOS shows a low consistency, in general. Fourth, the three journals have a similar distribution for the first authors in the contribution category; the first author in the byline contributes the highest proportion, followed by the middle and second authors, and then the last author with the lowest proportion. We also develop recommendations to modify academic and writing practice: implement structured cross-contribution lists, unify formats and standards of contribution lists, draft the author contribution criteria in the social sciences and humanities, and consider author contribution lists in scientific evaluation.
We investigate economics PhDs minted at German, Austrian, and Swiss universities from 1991 to 2008. We find that cohort sizes increased overall, and the share of PhDs who publish in a peer-reviewed journal within 6 years after graduation increased from 18% in 1991 to 46% in 2008. Publishing rates are heterogeneous across departments. Younger cohorts publish slightly more compared to older cohorts, but these publications are not significantly better in terms of quality. Publication productivity is highly skewed within and between departments. A key difference between PhDs of the German-speaking area and North America lies in their patterns of collaboration.
Innovations in scholarly publishing have led to new possibilities for academic journals (e.g., open access), and provided scholars with a range of indicators that can be used to evaluate their characteristics and their impact. This study identifies and evaluates the journal characteristics reported in five databases: Ulrich's Periodicals Directory (Ulrichs), Journal Citation Reports (JCR), SCImago Journal & Country Rank (SJR), Google Scholar Metrics (GS), and Cabell's Periodical Directory (Cabells). It describes the 13 indicators (variables) that are available through these databases-scholarly impact, subject category, age, total articles, distribution medium, open access, peer review, acceptance rate, pricing, language, country, status, and issue frequency-and highlights the similarities and differences in the ways these indicators are defined and reported. The study also addresses the ways in which this kind of information can be used to better understand particular journals as well as the scholarly publishing system.
A number of bibliometric studies point out that the role of conference publications in computer science differs from that in other traditional fields. Thus, it is interesting to identify the relative status of journal and conference publications in different subfields of computer science based on the citation rates categorised by the China Computer Federation (CCF) classifications and venue types. In this research, we construct a dataset containing over 100,000 papers recommended by the CCF catalogue and their citation information. We also investigate some other factors that often influence a paper's citation rate. An experimental study shows that the relative status of journals and conferences varies greatly in different subfields of computer science, and the impact of different publication levels varies according to the citation rate. We also verify that the classification of a publication, number of authors, maximum h-index of all authors of a paper, and average number of papers published by a publication have different effects on the citation rate, although the citation rate may have a different degree of correlation with these factors.
Phytocompounds and herbal extracts have been utilized in Ayurveda, Siddha and Unani medicine since thousands of years for treatment of various ailments. Success behind herbal medicine strongly suggests the interaction between bio-active phytocompounds with crucial biochemical pathways in a human body without causing adverse effects. The increasing incidence of diseases like cancer instigated the scientific world to focus intently on their pathophysiology and prevention, leading to accelerated research activity in past three decades. This study aims at understanding the evolving global importance of herbal medicine and quality of research against various cancers through scientometric analyses by studying the output from research publications, followed by the contribution from various countries, research institutes, authors, scientific journals and areas of research. To visualize the research structure and dynamics, more than 5000 publications with Science Citation Index that appeared from 1984 to 2013 have been studied and compared for a trend in its growth of publication along with the contributions from various bibliometric parameters stated above. After using the 'Web of Science' database it became well evident that the concerned bibliometric parameters contributed substantially in projecting the overall scientific output in the field of herbal anticancer research as reflected from the citation analysis and h-index data. It has been observed that the number of publications increased with compound annual growth rate of 10.39% during the studied periods. Evolving trend of the research topics was visualised by drawing the keyword co-occurrence map in this field.
I screen academic literature for cases of misattribution of cited author's gender. While such mistakes are overall not common, their frequency depends dramatically on the gender of the cited author. Female scholar are cited as if they were male more than ten times more often than the opposite happens, probably revealing that citers are influenced by the gender-science stereotype. The gender of the citing author and the field of study appear to have only limited effect.
Motivated by applications in scientometrics, we study the occurrence of first significant digits in Lavalette distribution and in double Pareto distribution. We obtain modifications of Benford's law. When the exponents are small, significant deviations to Benford's law are observed; when the exponents are large, the two distributions conform with Benford's law. Both analytical and numerical results are presented. Scientometric data can fairly well be described by the modifications.
In this study, we quantitatively compared the impact of mission-oriented research grants and curiosity-driven grants on the diversity of research subjects in Japan. First, we examined data for Japanese principal investigators receiving research funding between 2000 and 2010 in the field of nanotechnology and materials science, and identified groups of researchers whose publication performance was positively affected by the mission-oriented grant, CREST. We then compared the effect of CREST with that of the curiosity-driven grant, KAKENHI. The analysis uses both propensity score matching and difference in differences (PSM-DID) methodologies. Our results show that for participants in the CREST program there was an increase in number of publications of more than 10% per year, for periods of both 5 and 3 years after the funding ended, even though the observed average effect on citation was not statistically significant. Second, we evaluated the diversity of research subjects through analysis of the distribution of the classification codes applied to articles published between 1996 and 2013, utilizing the J-Global database, which has the finest granularity of category among existing bibliographic scientific publication databases. Research subjects were better conserved under the mission-oriented program than the curiosity-driven one, a finding contrary to predictions of conventional theory. We also found that under mission-oriented funding, there was an increase in diversity in the sense of marginal utility. These findings should be of use in the "diversityaware" design of programs for the funding of fundamental research.
In this study, we investigated the evolution process and historical roots of citation analysis study by reference publication year spectroscopy (RPYS), which is an advanced research method recently introduced in the field of Scientometrics. Through analyzing the publication year and citation frequency of cited references in a knowledge domain, RPYS can identify the citation peak of such cited references. We collected 2543 articles including 56,392 references regarding citation analysis in SCI-E and SSCI databases between 1970 and July 2016 as our data source. Based on the RPYS method by the CRExplorer program, the results showed that the peak citation publication years by chronological order occurred in 1955, 1963, 1973, 1979, 1981, 1990, 2005 and 2008 in the field of citation analysis study. According to the overall distribution of peak citation publication years, the RPYS for citation analysis was divided into five time periods for the convenience of comparison in this paper: that is, before 1900, 1901-1950, 1951-1970, 1971-2000, and 2001-2016. Pre 1950, especially during the 1900s-1950s, before the citation analysis method was introduced, there were three rather high peak citation publication years, Lotka's law and Bradford's law laid the knowledge foundation for citation analysis. 1950s-1970s was the forming period of citation analysis, among the three citation peaks in this period, Garfield (Science 122(3159): 1127-1128, 1955), Price (Von Der Studierstube Zur 7(3- 6): 443-458, 1963, Science 149: 510-515, 1965), and Kessler (1963) established an important knowledge base for the formation of citation analysis study. 1971-2000 was a developing period of citation analysis. Document Co-citation Analysis and Author Co-citation Analysis methods laid the foundation of the development of citation analysis study. Since 2000, citation analysis study has been expanding rapidly. By the number of published papers and number of cited references with highly cited frequency, Garfield E., White H. D., Small H., MacRoberts M. H., Price D. D. have played an important role in promoting the evolution of citation analysis study.
Digital object identifiers (DOIs) were launched in 1997 to facilitate the longterm access and identification of objects in digital environments. The objective of the present investigation is to assess the DOI availability of articles in biomedical journals indexed in the PubMed database and to complete this investigation with a geographical analysis of journals by the country of publisher. Articles were randomly selected from PubMed using their PubMed identifier and were downloaded from and processed through developed Hypertext Preprocessor language scripts. The first part of the analysis focuses on the period 1966-2015 (50 years). Of the 496,665 articles studied over this period, 201,055 have DOIs (40.48%). Results showed that the percentage of articles with DOIs began to increase for articles published in the 2000s, with spectacular growth in the years 2002-2003, then reached a peak in 2015. Data on countries showed that some countries gradually implemented DOIs over the period 1966 to 2015 (the United States, the United Kingdom, and the Netherlands), while some did not (Russia, the Czech Republic, and Romania). The second part of the analysis focuses on the year 2015 and includes 268,790 articles published in 2015, randomly selected to evaluate the current implementation of DOIs. In 2015, 86.42% of articles had DOIs. The geographical analysis of countries of publishers showed that some countries (Russia, Thailand, and Ukraine) still assigned few DOIs to articles in 2015. Thus, if the scientific community aims to increase the number and the usefulness of services rendered by DOIs, efforts must be made to generalize their use by all persons involved in scientific publication, particularly publishers.
Comparing 5 publications from China that described knockdowns of the human TPD52L2 gene in human cancer cell lines identified unexpected similarities between these publications, flaws in experimental design, and mis-matches between some described experiments and the reported results. Following communications with journal editors, two of these TPD52L2 publications have been retracted. One retraction notice stated that while the authors claimed that the data were original, the experiments had been out-sourced to a biotechnology company. Using search engine queries, automatic text-analysis, different similarity measures, and further visual inspection, we identified 48 examples of highly similar papers describing single gene knockdowns in 1-2 human cancer cell lines that were all published by investigators from China. The incorrect use of a particular TPD52L2 shRNA sequence as a negative or non-targeting control was identified in 30/48 (63%) of these publications, using a combination of Google Scholar searches and visual inspection. Overall, these results suggest that some publications describing the effects of single gene knockdowns in human cancer cell lines may include the results of experiments that were not performed by the authors. This has serious implications for the validity of such results, and for their application in future research.
Scientific activity plays a major role in innovation for biomedicine and healthcare. For instance, fundamental research on disease pathologies and mechanisms can generate potential targets for drug therapy. This co-evolution is punctuated by papers which provide new perspectives and open new domains. Despite the relationship between scientific discovery and biomedical advancement, identifying these research milestones that truly impact biomedical innovation can be difficult and is largely based solely on the opinions of subject matter experts. Here, we consider whether a new class of citation algorithms that identify seminal scientific works in a field, Reference Publication Year Spectroscopy (RPYS) and multi-RPYS, can identify the connections between innovation (e.g., therapeutic treatments) and the foundational research underlying them. Specifically, we assess whether the results of these analytic techniques converge with expert opinions on research milestones driving biomedical innovation in the treatment of Basal Cell Carcinoma. Our results show that these algorithms successfully identify the majority of milestone papers detailed by experts (Wong and Dlugosz in J Investig Dermatol 134(e1):E18-E22, 2014)-thereby validating the power of these algorithms to converge on independent opinions of seminal scientific works derived by subject matter experts. These advances offer an opportunity to identify scientific activities enabling innovation in biomedicine.
In the present paper, we study the discovery of the chemical element number 23, Erythronium/Vanadium (E/V), as an early example of the modern process of validating knowledge claims in M,xico. We examined the published work between 1802 and 1832 of Andr,s Manuel del Rio (AMR) in the Royal Mining Seminar of M,xico and contrasted the styles of argument and forms of certification between his teaching and experimental writings concerning his claim to the paternity of E/V discovery. We also analyze the respective papers of European authors that replicated, rediscovered and certified AMR's finding. We use a combination of bibliometric, sociotechnical network and literary critical analysis in order to show that the certification of E/V spawned an emerging mode for producing and validating new knowledge in the American continent and particularly in M,xico. In turn, this approach supports AMR's claim to the discovery of E/V from the production process of the lead brown ore in Zimapan, Mexico.
In this study, we investigate the downloads behavior of readers for two well-known IEEE journals in the field of education, i.e., IEEE Transactions on Learning Technologies (TLT) and IEEE Transactions on Education (ToE). In our analysis, we found that articles in both journals are not downloaded rapidly in earlier months. The majority of articles reach to 50 and 80% of their first 12 months downloads' total in later months. Using linear regression analysis, we discovered that the cumulative first 12 months downloads of articles cannot be predicted by earlier months downloads. However, it can be predicted more accurately by using cumulative downloads count of later months. Moreover, we found that average downloads of articles in both journals increases rapidly as soon as they are assigned to an issue. In case of TLT which follows a delayed open access policy, we observed that average downloads after open access increases marginally for 2 months, and then it declines and continues to progress more or less in a consistent manner for 2 years. While in ToE which does not follow such policy, the average downloads decreases persistently.
Acquiring an overview of an unfamiliar discipline and exploring relevant papers and journals is often a laborious task for researchers. In this paper we show how exploratory search can be supported on a large collection of academic papers to allow users to answer complex scientometric questions which traditional retrieval approaches do not support optimally. We use our ConceptCloud browser, which makes use of a combination of concept lattices and tag clouds, to visually present academic publication data (specifically, the ACM Digital Library) in a browsable format that facilitates exploratory search. We augment this dataset with semantic categories, obtained through automatic keyphrase extraction from papers' titles and abstracts, in order to provide the user with uniform keyphrases of the underlying data collection. We use the citations and references of papers to provide additional mechanisms for exploring relevant research by presenting aggregated reference and citation data not only for a single paper but also across topics, authors and journals, which is novel in our approach. We conduct a user study to evaluate our approach in which we asked 34 participants, from different academic backgrounds with varying degrees of research experience, to answer a variety of scientometric questions using our ConceptCloud browser. Participants were able to answer complex scientometric questions using our ConceptCloud browser with a mean correctness of 73%, with the user's prior research experience having no statistically significant effect on the results.
We investigate the relationship between article title characteristics and citations in economics using a large data set from Web of Science. Our results suggest that articles with a short title that also contains a non-alphanumeric character achieve a higher citation count.
This paper provides a citation network analysis of the publications of the journal Higher Education from 1972 to 2014 inclusive. This represents nearly the entire history of the journal. It analyses the most published authors and the most cited articles, as well as the most cited authors. This data includes the highest number of publications both by institution and country of origin. 2176 articles were taken from Web of Science (TM) as a source of primary data. These articles were found to have 68,009 references. Analysis was carried out using the Web of Science (TM) online analytics tool and Excel(A (R)). Gephi (TM), a data visualisation and manipulation software, was then used to provide visual representations of the associated citation networks. These representations were shown to constitute "terrains" of citations or "geographies of influence"-effectively bringing to bear empirical data in support of Macfarlane's higher education research "archipelago". Nationality biases were observed between US and UK/European/Australian higher education journals. Results indicate that the most published authors throughout the journal's history are Meyer, Kember, Richardson, Enders and Prosser. Confirming earlier studies on UK and Australian journals, the five most cited authors are Entwistle, Clark, Marton, Biggs and Ramsden. The single most cited article is Clark's 1983 Higher education system: academic organization in cross-national perspective. The top publication years for the journal were 2012, 2009 and 2011. Results from this paper shed light into the evolving concerns of the journal and its readership, and provide a demonstration of a powerful way of analysing citation data.
Journal maps and classifications for 11,359 journals listed in the combined Journal Citation Reports 2015 of the Science and Social Sciences Citation Indexes are provided at https://leydesdorff.github.io/journals/ and http://www.leydesdorff.net/jcr15. A routine using VOSviewer for integrating the journal mapping and their hierarchical clusterings is also made available. In this short communication, we provide background on the journal mapping/clustering and an explanation about and instructions for the routine. We compare journal maps for 2015 with those for 2014 and show the delineations among fields and subfields to be sensitive to fluctuations. Labels for fields and sub-fields are not provided by the routine, but an analyst can add them for pragmatic or intellectual reasons. The routine provides a means of testing one's assumptions against a baseline without claiming authority; clusters of related journals can be visualized to understand communities. The routine is generic and can be used for any 1-mode network.
Systems biology is a new field of biology that has great implications for agriculture, medicine, and sustainability. In this article we explore the contributions of Chinese authors to systems biology through analysis of the metadata of more than 9000 articles on systems biology. Our big-data approach includes scientometric analysis, GIS analysis, co-word network analysis, and comparative analysis. By 2013 China has become second in the number of publications on systems biology. Similar to previous studies on Chinese science, we find an unequal distribution of research power in China, favoring big cities and coastal cities. Overall, 75% of the articles in systems biology were published by scholars from universities, 15% by scholars from the Chinese of Academy of Sciences institutions, and 9% from other institutions. Many Chinese scholars' research topics are similar to those in the US, Japan, and Germany, but one salient difference is that traditional Chinese medicine is an important topic among Chinese systems biologists. 25% of Chinese systems biologists cooperate with scientists abroad, suggesting that they take advantage of the opening-up policy. From the year 2011-2013, the average impact factor of the journals that Chinese scholars publish in is generally lower than that of their counterparts in the US, but the trend points to a gradual increase in impact.
International collaboration in science continues to grow at a remarkable rate, but little agreement exists about dynamics of growth and organization at the discipline level. Some suggest that disciplines differ in their collaborative tendencies, reflecting their epistemic culture. This study examines collaborative patterns in six previously studied specialties to add new data and analyze patterns over time. Our findings show that a global network of collaboration continues to add new nations and new participants; since 1990, each specialty has added many new nations to lists of collaborating partners. We also find that the scope of international collaboration is positively related to impact. Network characteristics for the six specialties are notable in that instead of reflecting underlying culture, they tend towards convergence at the global level. This observation suggests that the global level may represent next-order dynamics that feed back to the national and local levels (as subsystems) in a complex, networked hierarchy.
We examine the number of citations in 10 highly cited retracted papers, and compare their current pre- and post-citation values. We offer some possible explanations for the continued citation of these retracted papers, and point out some of the risks that may be involved for the communities that continue to cite them. In general, retracted papers should not be cited, but often there is fault with unclear publisher web-sites, the existence of pirate web-sites or sites that display copies of the unretracted version of the paper, or even the insistent citation of a retracted paper because the results remain valid, or because the authors (most likely) refuse to accept the retracted status of that paper, or continue to believe that the core findings of the study remain valid.
This research proposes a framework for music mood classification that uses multiple and complementary information sources, namely, music audio, lyric text, and social tags associated with music pieces. This article presents the framework and a thorough evaluation of each of its components. Experimental results on a large data set of 18 mood categories show that combining lyrics and audio significantly outperformed systems using audio-only features. Automatic feature selection techniques were further proved to have reduced feature space. In addition, the examination of learning curves shows that the hybrid systems using lyrics and audio needed fewer training samples and shorter audio clips to achieve the same or better classification accuracies than systems using lyrics or audio singularly. Last but not least, performance comparisons reveal the relative importance of audio and lyric features across mood categories.
User-generated content is one of the most interesting phenomena of current published media, as users are now able not only to consume, but also to produce content in a much faster and easier manner. However, such freedom also carries concerns about content quality. In this work, we propose an automatic framework to assess the quality of collaboratively generated content. Quality is addressed as a multidimensional concept, modeled as a combination of independent assessments, each regarding different quality dimensions. Accordingly, we adopt a machine-learning (ML)based multiview approach to assess content quality. We perform a thorough analysis of our framework on two different domains: Questions and Answer Forums and Collaborative Encyclopedias. This allowed us to better understand when and how the proposed multiview approach is able to provide accurate quality assessments. Our main contributions are: (a) a general ML multiview framework that takes advantage of different views of quality indicators; (b) the improvement (up to 30%) in quality assessment over the best state-of-the-art baseline methods; (c) a thorough feature and view analysis regarding impact, informativeness, and correlation, based on two distinct domains.
Type 2 diabetes has grown increasingly prevalent over recent decades, now affecting nearly 400 million people worldwide; however, nearly half of these individuals have no idea they have it. Consumer health information behavior (CHIB), which encompasses people's healthrelated information needs as well as the ways in which they interact (or do not interact) with health-related information, plays an important role in people's ability to prevent, cope with, and successfully manage a serious chronic disease across time. In this mixed-method longitudinal study, the CHIB of 34 people with type 2 diabetes is explored with the goal of identifying the factors that motivate, demotivate, or impede their diabetes-related information seeking and use. The findings reveal that while these processes can be motivated by many different factors and can lead to important benefits, there are significant barriers (such as " incognizance," defined herein as having an information need that one is not aware of) that may demotivate or impede information seeking and use. The implications of these findings are discussed, focusing on how we might work toward preventing, identifying, and addressing incognizance among this population, ensuring they have the information they need when it can be of the most use to them.
One advantage of crowds over traditional teams is that crowds enable the assembling of a large number of individuals to address problems. The literature is unclear, however, about the relationship between the size of crowds and its impact on outcomes. To better understand the effects of crowd size we conducted a study on the retention and performance based on 4,317 articles in the WikiProject Film community. Our results suggest that crowds benefit from their size when they are diverse, experienced, and have low retention rates.
Many people turn to their social networks to find information through the practice of question and answering. We believe it is necessary to use different answering strategies based on the type of questions to accommodate the different information needs. In this research, we propose the ASK taxonomy that categorizes questions posted on social networking sites into three types according to the nature of the questioner's inquiry of accuracy, social, or knowledge. To automatically decide which answering strategy to use, we develop a predictive model based on ASK question types using question features from the perspectives of lexical, topical, contextual, and syntactic as well as answer features. By applying the classifier on an annotated data set, we present a comprehensive analysis to compare questions in terms of their word usage, topical interests, temporal and spatial restrictions, syntactic structure, and response characteristics. Our research results show that the three types of questions exhibited different characteristics in the way they are asked. Our automatic classification algorithm achieves an 83% correct labeling result, showing the value of the ASK taxonomy for the design of social question and answering systems.
The cultural heritage sector has embraced social tagging as a way to increase both access to online content and to engage users with their digital collections. In this article, we build on two current lines of research. (a) We use Waisda?, an existing labeling game, to add timebased annotations to content. (b) In this context, we investigate the role of experts in human-based computation (nichesourcing). We report on a small-scale experiment in which we applied Waisda? to content from film archives. We study the differences in the type of timebased tags between experts and novices for film clips in a crowdsourcing setting. The findings show high similarity in the number and type of tags (mostly factual). In the less frequent tags, however, experts used more domain-specific terms. We conclude that competitive games are not suited to elicit real expert-level descriptions. We also confirm that providing guidelines, based on conceptual frameworks that are more suited to moving images in a time-based fashion, could result in increasing the quality of the tags, thus allowing for creating more tag-based innovative services for online audiovisual heritage.
The processes that authors use to publish their papers in journals can be analyzed in terms of field-specific practices. How they select targeted publications can influence competitive relationships among journals. In this paper, the author quantifies the publishing choices of a set of scholars to confirm this ecological perspective. The results indicate a strong focus on a small number of journals. A measure of author publishing choices was used to define four ecological characteristics: coverage, coreness, exclusivity, and journal overlap. Several types of journals indexed in the Information Science and Library Science section of the Journal Citation Reports are compared in terms of their ecological characteristics. The data show that some journals cover large numbers of authors, but compete with other journals in subcommunities. Some journals with author profiles similar to those of high-ranking journals lost potential submissions. Others with low coverage, high coreness, and high exclusivity were found to have groups of " fans" who used them for all of their submissions, but still exhibited a strong need to sustain their exclusivity. It is hoped that the method and results presented in this paper will provide useful information for editorial boards interested in managing their submissions according to author profiles.
Impact is embedded in today's research culture, with increasing importance being placed on the value of research to society. In interdisciplinary and cross-sector projects, team members may hold distinct views on the types of impact they want to create. Set in the context of an interdisciplinary, cross-sector project comprised of partners from academia, industry, and the nonprofit sector, our paper unpacks how these diverse project members understand impact. Our analysis shows that interdisciplinary projects offer a unique opportunity to create impact on a number of different levels. Moreover, it demonstrates that a lack of accountable design and collaboration practices can potentially hinder pathways to impact. Finally, we find that the interdisciplinary perspectives that such projects introduce encourage a rich gamut of sustainable outcomes that go beyond commercialization. Our findings support researchers working in these complex contexts to appreciate the opportunities and challenges involved in interdisciplinary cross-sector research contexts while imparting them with strategies for overcoming these challenges.
Since their beginnings, bibliographic information systems have been displaying results in the form of long, textual lists. With the development of new data models and computer technologies, the need for new approaches to present and interact with bibliographic data has slowly been maturing. To investigate how this could be accomplished, a prototype system, FrbrVis 1, was designed to present work families within a bibliographic information system using information visualization. This paper reports on two user studies, a controlled and an observational experiment, that have been carried out to assess the Functional Requirements for Bibliographic Records (FRBR)-based against an existing system as well as to test four different hierarchical visual layouts. The results clearly show that FrbrVis offers better performance and user experience compared to the baseline system. The differences between the four hierarchical visualizations (Indented tree, Radial tree, Circlepack, and Sunburst) were, on the other hand, not as pronounced, but the Indented tree and Sunburst design proved to be the most successful, both in performance as well as user perception. The paper therefore not only evaluates the application of a visual presentation of bib-liographic work families, but also provides valuable results regarding the performance and user acceptance of individual hierarchical visualization techniques.
In this paper we describe the creation and use of metadata on the early Arpanet as part of normal network function. By using the Arpanet Host-Host Protocol and its sockets as an entry point for studying the generation of metadata, we show that the development and function of key Arpanet infrastructure can be studied by examining the creation and stabilization of metadata. More specifically, we use the Host-Host Protocol's sockets as an example of something that, at the level of the network, functions as both network infrastructure and metadata simultaneously. By presenting the function of sockets in tandem with an overview of the Host-Host Protocol, we argue for the further integrated study of infrastructure and metadata. Finally, we reintroduce the concept of infradata to refer specifically to data that locate data throughout an infrastructure and are required by the infrastructure to function, separating them from established and stabilized standards. We argue for the future application of infradata as a concept for the study of histories and political economies of networks, bridging the largely library and information science (LIS) study of metadata with the largely science and technology studies (STS) domain of infrastructure.
Recent works in the information science literature have presented cases of using patent databases and patent classification information to construct network maps of technology fields, which aim to aid in competitive intelligence analysis and innovation decision making. Constructing such a patent network requires a proper measure of the distance between different classes of patents in the patent classification systems. Despite the existence of various distance measures in the literature, it is unclear how to consistently assess and compare them, and which ones to select for constructing patent technology network maps. This ambiguity has limited the development and applications of such technology maps. Herein, we propose to compare alternative distance measures and identify the superior ones by analyzing the differences and similarities in the structural properties of resulting patent network maps. Using United States patent data from 1976 to 2006 and the International Patent Classification (IPC) system, we compare 12 representative distance measures, which quantify interfield knowledge base proximity, field-crossing diversification likelihood or frequency of innovation agents, and co-occurrences of patent classes in the same patents. Our comparative analyses suggest the patent technology network maps based on normalized coreference and inventor diversification likelihood measures are the best representatives.
This investigation of new approaches to improving collaboration, user/librarian experiences, and sustainability for virtual reference services (VRS) reports findings from a grant project titled Cyber Synergy: Seeking Sustainability between Virtual Reference and Social Q&A Sites (Radford, Connaway, & Shah, -2014). In-depth telephone interviews with 50 VRS librarians included questions on collaboration, referral practices, and attitudes toward Social Question and Answer (SQA) services using the Critical Incident Technique (Flanagan, ). The Community of Practice (CoP) (Wenger, ; Davies, ) framework was found to be a useful conceptualization for understanding VRS professionals' approaches to their work. Findings indicate that participants usually refer questions from outside of their area of expertise to other librarians, but occasionally refer them to nonlibrarian experts. These referrals are made possible because participants believe that other VRS librarians are qualified and willing collaborators. Barriers to collaboration include not knowing appropriate librarians/experts for referral, inability to verify credentials, and perceived unwillingness to collaborate. Facilitators to collaboration include knowledge of appropriate collaborators who are qualified and willingness to refer. Answers from SQA services were perceived as less objective and authoritative, but participants were open to collaborating with nonlibrarian experts with confirmation of professional expertise or extensive knowledge.
This study investigates online material published in reaction to a Science Magazine report showing the absence of peer-review and editorial processes in a set of fee-charging open access journals in biology. Quantitative and qualitative textual analyses are combined to map conceptual relations in these reactions, and to explore how understandings of scholarly communication and publishing relate to specific conceptualizations of science and of the hedging of scientific knowledge. A discussion of the connection of trust and scientific knowledge and of the role of peer review for establishing and communicating this connection provides for the theoretical and topical framing. Special attention is paid to the pervasiveness of digital technologies in formal scholarly communication processes. Three dimensions of trust are traced in the material analyzed: (a) trust through personal experience and informal knowledge, (b) trust through organized, internal control, (c) trust through form. The article concludes by discussing how certain understandings of the conditions for trust in science are challenged by perceptions of possibilities for deceit in digital environments.
The large multidisciplinary academic social website ResearchGate aims to help academics to connect with each other and to publicize their work. Despite its popularity, little is known about the age and discipline of the articles uploaded and viewed in the site and whether publication statistics from the site could be useful impact indicators. In response, this article assesses samples of ResearchGate articles uploaded at specific dates, comparing their views in the site to their Mendeley readers and Scopus-indexed citations. This analysis shows that ResearchGate is dominated by recent articles, which attract about three times as many views as older articles. ResearchGate has uneven coverage of scholarship, with the arts and humanities, health professions, and decision sciences poorly represented and some fields receiving twice as many views per article as others. View counts for uploaded articles have low to moderate positive correlations with both Scopus citations and Mendeley readers, which is consistent with them tending to reflect a wider audience than Scopus-publishing scholars. Hence, for articles uploaded to the site, view counts may give a genuinely new audience indicator.
In this article, we propose a graph-based interactive bibliographic information retrieval systemGIBIR. GIBIR provides an effective way to retrieve bibliographic information. The system represents bibliographic information as networks and provides a form-based query interface. Users can develop their queries interactively by referencing the system-generated graph queries. Complex queries such as papers on information retrieval, which were cited by John's papers that had been presented in SIGIR can be effectively answered by the system. We evaluate the proposed system by developing another relational database-based bibliographic information retrieval system with the same interface and functions. Experiment results show that the proposed system executes the same queries much faster than the relational database-based system, and on average, our system reduced the execution time by 72% (for 3-node query), 89% (for 4-node query), and 99% (for 5-node query).
The promise and challenge of information management in the humanities has garnered a great deal of attention and interest (Bulger et al., ; Freiman et al., ; Trace & Karadkar, ; University of Minnesota Libraries, ; Wilson & Patrick, ). Research libraries and archives, as well as groups from within the humanities disciplines themselves, are being tasked with providing robust support for information management practices, including helping to engage humanities scholars with appropriate digital technologies in ways that are sensitive to disciplinary-based cultures and practices. However, infrastructure (services, tools, and collaborative networks) to support scholarly information significant barriers impede this work, primarily because the management is still under development. Under the aegis of the Scholars Tracking Archival Resources (STAR) project we are studying how humanities scholars gather and manage primary source materials with a goal of developing software to support their information management practices. This article reports the findings from our interviews with 26 humanities scholars, in conjunction with a set of initial requirements for a mobile application that will support scholars in capturing documents, recreating the archival context, and uploading these documents to cloud storage for access and sharing from other devices.
The need for new indicators on universities is growing enormously. Governments and decision makers at all levels are faced with the huge opportunities generated by the availability of new knowledge and information and, simultaneously, are pressed by tight budget constraints. University rankings, in particular, are attracting policy and media attention, but at the same time receive harsh methodological criticism. After summarizing the main criticisms of rankings, we describe 2 trends in the user requirements for indicators; namely, granularity and cross-referencing. We then suggest that a change in the paradigm of the design and production of indicators is needed. The traditional approach is one that not only leverages on the existing data but also suggests heavy investment to integrate existing databases and to build up tailored indicators. We show, based on the European universities case, how the intelligent integration of existing data may lead to an open-linked data platform which permits the construction of new indicators. The power of the approach derives from the ability to combine heterogeneous sources of data to generate indicators that address a variety of user requirements without the need to design indicators on a custom basis.
Information and communication technology (ICT) has increasingly important implications for our everyday lives, with the potential to both solve existing social problems and create new ones. This article focuses on one particular group of ICT professionals, computational modelers, and explores how these ICT professionals perceive their own societal responsibilities. Specifically, the article uses a mixed-method approach to look at the role of professional codes of ethics and explores the relationship between modelers' experiences with, and attitudes toward, codes of ethics and their values. Statistical analysis of survey data reveals a relationship between modelers' values and their attitudes and experiences related to codes of ethics. Thematic analysis of interviews with a subset of survey participants identifies two key themes: that modelers should be faithful to the reality and values of users and that codes of ethics should be built from the bottom up. One important implication of the research is that those who value universalism and benevolence may have a particular duty to act on their values and advocate for, and work to develop, a code of ethics.
With the information overload of user-generated content in microblogging, users find it extremely challenging to browse and find valuable information in their first attempt. In this paper we propose a microblogging recommendation algorithm, TSI-MR (Topic-Level Social Influence-based Microblogging Recommendation), which can significantly improve users' microblogging experiences. The main innovation of this proposed algorithm is that we consider social influences and their indirect structural relationships, which are largely based on social status theory, from the topic level. The primary advantage of this approach is that it can build an accurate description of latent relationships between two users with weak connections, which can improve the performance of the model; furthermore, it can solve sparsity problems of training data to a certain extent. The realization of the model is mainly based on Factor Graph. We also applied a distributed strategy to further improve the efficiency of the model. Finally, we use data from Tencent Weibo, one of the most popular microblogging services in China, to evaluate our methods. The results show that incorporating social influence can improve microblogging performance considerably, and outperform the baseline methods.
This study investigates the interplay between online news, reader comments, and social networks to detect and characterize comments leading to the revelation of censored information. Censorship of identity occurs in different contexts-for example, the military censors the identity of personnel and the judiciary censors the identity of minors and victims. We address three objectives: (a) assess the relevance of identity censorship in the presence of user-generated comments, (b) understand the fashion of censorship circumvention (what people say and how), and (c) determine how comment analysis can aid in identifying decensorship and information leakage through comments. After examining 3,582 comments made on 48 articles containing obfuscated terms, we find that a systematic examination of comments can compromise identity censorship. We identify and categorize information leakage in comments indicative of knowledge of censored information that may result in information decensorship. We show that the majority of censored articles contained at least one comment leading to censorship circumvention.
Online participation is becoming an increasingly common means for individuals to contribute to citizen science projects, yet such projects often rely on only a small fraction of participants to make the majority of contributions. Here, we investigate a means for influencing the performance of citizen scientists toward enhancing overall participation. Building on past social comparison research, we pair citizen scientists with a software-based virtual peer in an environmental monitoring project. Through a series of experiments in which virtual peers outperform, underperform, or perform similarly to human participants, we investigate the influence of their presence on citizen science participation. To offer insight into the psychological determinants to the response to this intervention, we propose a new dynamic model describing the bidirectional interaction between humans and virtual peers. Our results demonstrate that participant contribution can be enhanced through the presence of a virtual peer, creating a feedback loop where participants tend to increase or decrease their contribution in response to their peers' performance. By including virtual peers that systematically outperform the participants, we demonstrate a fourfold increase in their contribution to the citizen science project.
Using Ellis's seminal model of information seeking as an example, this study demonstrates how the elaborations made to the original framework since the late 1980s have contributed to conceptual growth in information-seeking studies. To this end, nine key studies elaborating Ellis's model were scrutinized by conceptual analysis. The findings indicate that the elaborations are based on two main approaches: adding novel, context-specific components in the model and redefining and restructuring the components. The elaborations have contributed to conceptual growth in three major ways. First, integrating formerly separate parts of knowledge; second, generalizing and explaining lower abstraction-level knowledge through higher-level constructs; and third, expanding knowledge by identifying new characteristics of the object of study, that is, information-seeking behavior. Further elaboration of Ellis's model toward a theory would require more focused attempts to test hypotheses in work-related environments in particular.
Collaborative information-seeking (CIS) tasks, such as holiday planning, academic research, medical/health information seeking, cannot be tackled without making sense of the task and the encountered information together with collaborators, that is, collaborative sensemaking. In CIS, collaborative sensemaking is an important but understudied aspect. A thorough understanding of collaborative sensemaking behavior in CIS tasks is essential to develop tools to support collaborative sensemaking activities in CIS. In this article, we investigate the general patterns and differences in collaborative sensemaking behavior in travel planning and topic research tasks using the data from 2 observational user studies. The results show the common stages of the collaborative sensemaking process and the differences in users' collaborative sensemaking strategies and activities between the 2 tasks. This comparative study enhances our understanding of the collaborative sensemaking process in CIS tasks and the differences in user's sensemaking behavior according to tasks, and describes implications for supporting collaborative sensemaking behavior in CIS tasks.
Research on information technology (IT) adoption and use, one of the most mature streams of research in the information science and information systems literature, is primarily based on the intentionality framework. Behavioral intention (BI) to use an IT is considered the sole proximal determinant of IT adoption and use. Recently, researchers have discussed the limitations of BI and argued that behavioral expectation (BE) would be a better predictor of IT use. However, without a theoretical and empirical understanding of the determinants of BE, we remain limited in our comprehension of what factors promote greater IT use in organizations. Using the unified theory of acceptance and use of technology as the theoretical framework, we develop a model that posits 2 determinants (i.e., social influence and facilitating conditions) of BE and 4 moderators (i.e., gender, age, experience, and voluntariness of use) of the relationship between BE and its determinants. We argue that the cognitions underlying the formation of BI and BE differ. We found strong support for the proposed model in a longitudinal field study of 321 users of a new IT. We offer theoretical and practical IT implications of our findings.
The increasing popularity of academic social networking sites (ASNSs) requires studies on the usage of ASNSs among scholars and evaluations of the effectiveness of these ASNSs. However, it is unclear whether current ASNSs have fulfilled their design goal, as scholars' actual online interactions on these platforms remain unexplored. To fill the gap, this article presents a study based on data collected from ResearchGate. Adopting a mixed-method design by conducting qualitative content analysis and statistical analysis on 1,128 posts collected from ResearchGate Q&A, we examine how scholars exchange information and resources, and how their practices vary across three distinct disciplines: library and information services, history of art, and astrophysics. Our results show that the effect of a questioner's intention (i.e., seeking information or discussion) is greater than disciplinary factors in some circumstances. Across the three disciplines, responses to questions provide various resources, including experts' contact details, citations, links to Wikipedia, images, and so on. We further discuss several implications of the understanding of scholarly information exchange and the design of better academic social networking interfaces, which should stimulate scholarly interactions by minimizing confusion, improving the clarity of questions, and promoting scholarly content management.
In November 2014, the Nature Index (NI) was introduced (see http://www.natureindex.com) by the Nature Publishing Group (NPG). The NI comprises the primary research articles published in the past 12 months in a selection of reputable journals. Starting from two short comments on the NI (Haunschild & Bornmann, 2015a, 2015b), we undertake an empirical analysis of the NI using comprehensive country data. We investigate whether the huge efforts of computing the NI are justified and whether the size-dependent NI indicators should be complemented by size-independent variants. The analysis uses data from the Max Planck Digital Library in-house database (which is based on Web of Science data) and from the NPG. In the first step of the analysis, we correlate the NI with other metrics that are simpler to generate than the NI. The resulting large correlation coefficients point out that the NI produces similar results as simpler solutions. In the second step of the analysis, relative and size-independent variants of the NI are generated that should be additionally presented by the NPG. The size-dependent NI indicators favor large countries (or institutions) and the top-performing small countries (or institutions) do not come into the picture.
This article explores how linguistics has influenced information retrieval (IR) and attempts to explain the impact of linguistics through an analysis of internal developments in information science generally, and IR in particular. It notes that information science/IR has been evolving from a case science into a fully fledged, "disciplined"/disciplinary science. The article establishes correspondences between linguistics and information science/IR using the three established IR paradigms-physical, cognitive, and computational-as a frame of reference. The current relationship between information science/IR and linguistics is elucidated through discussion of some recent information science publications dealing with linguistic topics and a novel technique, "keyword collocation analysis," is introduced. Insights from interdisciplinarity research and case theory are also discussed. It is demonstrated that the three stages of interdisciplinarity, namely multidisciplinarity, interdisciplinarity (in the narrow sense), and transdisciplinarity, can be linked to different phases of the information science/IR-linguistics relationship and connected to different ways of using linguistic theory in information science and IR.
Domain experts are skilled in buliding a narrow ontology that reflects their subfield of expertise based on their work experience and personal beliefs. We call this type of ontology a single-viewpoint ontology. There can be a variety of such single viewpoint ontologies that represent a wide spectrum of subfields and expert opinions on the domain. However, to have a complete formal vocabulary for the domain they need to be linked and unified into a multi-viewpoint model while having the subjective viewpoint statements marked and distinguished from the objectively true statements. In this study, we propose and implement a two-phase methodology for multiviewpoint ontology construction by nonexpert users. The proposed methodology was implemented for the domain of the effect of diet on health. A large-scale crowdsourcing experiment was conducted with about 750 ontological statements to determine whether each of these statements is objectively true, viewpoint, or erroneous. Typically, in crowdsourcing experiments the workers are asked for their personal opinions on the given subject. However, in our case their ability to objectively assess others' opinions was examined as well. Our results show substantially higher accuracy in classification for the objective assessment approach compared to the results based on personal opinions.
The article presents three advanced citation-based methods used to detect potential breakthrough articles among very highly cited articles. We approach the detection of such articles from three different perspectives in order to provide different typologies of breakthrough articles. In all three cases we use the hierarchical classification of scientific publications developed at CWTS based on direct citation relationships. We assume that such contextualized articles focus on similar research interests. We utilize the characteristics scores and scales (CSS) approach to partition citation distributions and implement a specific filtering algorithm to sort out potential highly-cited "followers," articles not considered breakthroughs. After invoking thresholds and filtering, three methods are explored: A very exclusive one where only the highest cited article in a micro-cluster is considered as a potential breakthrough article (M1); as well as two conceptually different methods, one that detects potential breakthrough articles among the 2% highest cited articles according to CSS (M2a), and finally a more restrictive version where, in addition to the CSS 2% filter, knowledge diffusion is also considered (M2b). The advance citation-based methods are explored and evaluated using validated publication sets linked to different Danish funding instruments including centers of excellence.
The idea of constructing science maps based on bibliographic data has intrigued researchers for decades, and various techniques have been developed to map the structure of research disciplines. Most science mapping studies use a single method. However, as research fields have various properties, a valid map of a field should actually be composed of a set of maps derived from a series of investigations using different methods. That leads to the question of what can be learned from a combination-triangulation-of these different science maps. In this paper we propose a method for triangulation, using the example of water science. We combine three different mapping approaches: journal-journal citation relations (JJCR), shared author keywords (SAK), and title word-cited reference co-occurrence (TWRC). Our results demonstrate that triangulation of JJCR, SAK, and TWRC produces a more comprehensive picture than each method applied individually. The outcomes from the three different approaches can be associated with each other and systematically interpreted to provide insights into the complex multidisciplinary structure of the field of water research.
We performed an exploratory case study to understand how subject indexing performed by television production staff using a semicontrolled vocabulary affects indexing quality. In the study we used triangulation, combining tag analysis and semistructured interviews, with production staff of the Norwegian Broadcasting Corporation. The main findings reveal incomplete indexing of TV programs and their parts, in addition to low indexing consistency and uneven indexing exhaustivity. The informants expressed low motivation and a high level of uncertainty regarding the task. Internal guidelines and high domain knowledge among the indexers does not form a sufficient basis for creating quality and consistency in the vocabulary. The challenges that are revealed in the terminological analysis, combined with low indexing knowledge and lack of motivation, will create difficulties in the retrieval phase.
This investigation examines perceptions of normality emerging from two distinct studies of information behavior associated with life disrupting health symptoms and theorizes the search for normality in the context of sense making theory. Study I explored the experiences of women striving to make sense of symptoms associated with menopause; Study II examined posts from two online discussion groups for people with symptoms of obsessive compulsive disorder. Joint data analysis demonstrates that normality was initially perceived as the absence of illness. A breakdown in perceived normality because of disruptive symptoms created gaps and discontinuities in understanding. As participants interacted with information about the experiences of health-challenged peers, socially constructed notions of normality emerged. This was internalized as a "new normal." Findings demonstrate normality as an element of sense making that changes and develops over time, and experiential information and social contexts as central to health-related sense making. Re-establishing perceptions of normality, as experienced by health-challenged peers, was an important element of sense making. This investigation provides nuanced insight into notions of normality, extends understanding of social processes involved in sense making, and represents the first theorizing of and model development for normality within the information science and sense making literature.
Individual academics and research evaluators often need to assess the value of published research. Although citation counts are a recognized indicator of scholarly impact, alternative data is needed to provide evidence of other types of impact, including within education and wider society. Wikipedia is a logical choice for both of these because the role of a general encyclopaedia is to be an understandable repository of facts about a diverse array of topics and hence it may cite research to support its claims. To test whether Wikipedia could provide new evidence about the impact of scholarly research, this article counted citations to 302,328 articles and 18,735 monographs in English indexed by Scopus in the period 2005 to 2012. The results show that citations from Wikipedia to articles are too rare for most research evaluation purposes, with only 5% of articles being cited in all fields. In contrast, a third of monographs have at least one citation from Wikipedia, with the most in the arts and humanities. Hence, Wikipedia citations can provide extra impact evidence for academic monographs. Nevertheless, the results may be relatively easily manipulated and so Wikipedia is not recommended for evaluations affecting stakeholder interests.
In this article we describe another problem with journal impact factors by showing that one journal's impact factor is dependent on other journals' publication delays. The proposed theoretical model predicts a monotonically decreasing function of the impact factor as a function of publication delay, on condition that the citation curve of the journal is monotone increasing during the publication window used in the calculation of the journal impact factor; otherwise, this function has a reversed U shape. Our findings based on simulations are verified by examining three journals in the information sciences: the Journal of Informetrics, Scientometrics, and the Journal of the Association for Information Science and Technology.
Thomson Reuters's Web of Science (WoS) began systematically collecting acknowledgment information in August 2008. Since then, bibliometric analysis of funding acknowledgment (FA) has been growing and has aroused intense interest and attention from both academia and policy makers. Examining the distribution of FA by citation index database, by language, and by acknowledgment type, we noted coverage limitations and potential biases in each analysis. We argue that despite its great value, bibliometric analysis of FA should be used with caution.
Important emerging measures of academic impact are article download and citation rates. Yet little is known about the influences on these and ways in which academics might manage this approach to dissemination. Three groups of papers by academics in a center for speech-language-science (available through a university repository) were compared. The first group of target papers were blogged, and the blogs were systematically tweeted. The second group of connected control papers were nonblogged papers that we carefully matched for author, topic, and year of publication. The third group were papers by different staff members on a variety of topics-Unrelated Control Papers. The results suggest an effect of social media on download rate, which was limited not just to Target Papers but also generalized to Connected Control Papers. Unrelated Control Papers showed no increase over the same amount of time (main effect of time, F(1,27) = 55.6, p < .001); Significant Group x Time Interaction, F(2,27) = 7.9, p = .002). The effect on citation rates was less clear but followed the same trend. The only predictor of the 2015 citation rate was downloads after blogging (r = 0.450, p = .012). These preliminary results suggest that promotion of academic articles via social media may enhance download and citation rate and that this has implications for impact strategies.
Tags (keywords freely assigned by users to describe web content) have become highly popular on Web 2.0 applications, because of the strong stimuli and easiness for users to create and describe their own content. This increase in tag popularity has led to a vast literature on tag recommendation methods. These methods aim at assisting users in the tagging process, possibly increasing the quality of the generated tags and, consequently, improving the quality of the information retrieval (IR) services that rely on tags as data sources. Regardless of the numerous and diversified previous studies on tag recommendation, to our knowledge, no previous work has summarized and organized them into a single survey article. In this article, we propose a taxonomy for tag recommendation methods, classifying them according to the target of the recommendations, their objectives, exploited data sources, and underlying techniques. Moreover, we provide a critical overview of these methods, pointing out their advantages and disadvantages. Finally, we describe the main open challenges related to the field, such as tag ambiguity, cold start, and evaluation issues.
The article presents a conceptual framework for distinguishing different sorts of heterogeneous digital materials. The hypothesis is that a wide range of heterogeneous data resources can be characterized and classified due to their particular configurations of hypertext features such as scripts, links, interactive processes, and time scalings, and that the hypertext configuration is a major but not sole source of the messiness of big data. The notion of hypertext will be revalidated, placed at the center of the interpretation of networked digital media, and used in the analysis of the fast-growing amounts of heterogeneous digital collections, assemblages, and corpora. The introduction summarizes the wider background of a fast-changing data landscape.
Personalized search approaches tailor search results to users' current interests, so as to help improve the likelihood of a user finding relevant documents for their query. Previous work on personalized search focuses on using the content of the user's query and of the documents clicked to model the user's preference. In this paper we focus on a different type of signal: We investigate the use of behavioral information for the purpose of search personalization. That is, we consider clicks and dwell time for reranking an initially retrieved list of documents. In particular, we (i) investigate the impact of distributions of users and queries on document reranking; (ii) estimate the relevance of a document for a query at 2 levels, at the query-level and at the word-level, to alleviate the problem of sparseness; and (iii) perform an experimental evaluation both for users seen during the training period and for users not seen during training. For the latter, we explore the use of information from similar users who have been seen during the training period. We use the dwell time on clicked documents to estimate a document's relevance to a query, and perform Bayesian probabilistic matrix factorization to generate a relevance distribution of a document over queries. Our experiments show that: (i) for personalized ranking, behavioral information helps to improve retrieval effectiveness; and (ii) given a query, merging information inferred from behavior of a particular user and from behaviors of other users with a user-dependent adaptive weight outperforms any combination with a fixed weight.
We study the news reading behavior of several hundred thousand users on 65 highly visited news sites. We focus on a specific phenomenon: users reading several articles related to a particular news development, which we call story-focused reading. Our goal is to understand the effect of story-focused reading on user engagement and how news sites can support this phenomenon. We found that most users focus on stories that interest them and that even casual news readers engage in story-focused reading. During story-focused reading, users spend more time reading and a larger number of news sites are involved. In addition, readers employ different strategies to find articles related to a story. We also analyze how news sites promote story-focused reading by looking at how they link their articles to related content published by them, or by other sources. The results show that providing links to related content leads to a higher engagement of the users, and that this is the case even for links to external sites. We also show that the performance of links can be affected by their type, their position, and how many of them are present within an article.
In today's digital age, daily reading may be becoming digital reading. To understand this possible shift from reading print media to reading digital media, we investigated reading behavior for 11 media and reading preferences between print and digital in different circumstances. In August 2012, an online survey was used to inquire about the reading behavior and preference of 1,755 participants, ranging in age from 18 to 69 years. The participants contained equal numbers of men and women from five age brackets. Our main finding was that approximately 70% of total reading time was spent on digital media and that preferences favored print media. Cluster analysis of reading time by media was used to categorize respondents into eight clusters, and a second cluster analysis on stated preference (digital or print) yielded six clusters. The correspondence analysis between reading behavior clusters and preference clusters revealed that there is a mismatch between reading behavior and stated preference for either print or digital media.
In this article we evaluate context-aware recommendation systems for information re-finding by knowledge workers. We identify 4 criteria that are relevant for evaluating the quality of knowledge worker support: context relevance, document relevance, prediction of user action, and diversity of the suggestions. We compare 3 different context-aware recommendation methods for information re-finding in a writing support task. The first method uses contextual prefiltering and content-based recommendation (CBR), the second uses the just-intime information retrieval paradigm (JITIR), and the third is a novel network-based recommendation system where context is part of the recommendation model (CIA). We found that each method has its own strengths: CBR is strong at context relevance, JITIR captures document relevance well, and CIA achieves the best result at predicting user action. Weaknesses include that CBR depends on a manual source to determine the context and in JITIR the context query can fail when the textual content is not sufficient. We conclude that to truly support a knowledge worker, all 4 evaluation criteria are important. In light of that conclusion, we argue that the network-based approach the CIA offers has the highest robustness and flexibility for context-aware information recommendation.
Understanding relative statures of channels for disseminating knowledge is of practical interest to both generators and consumers of knowledge flows. For generators, stature can influence attractiveness of alternative dissemination routes and deliberations of those who assess generator performance. For knowledge consumers, channel stature may influence knowledge content to which they are exposed. This study introduces a novel approach to conceptualizing and measuring stature of knowledge-dissemination channels: the power-impact (PI) technique. It is a flexible technique having 3 complementary variants, giving holistic insights about channel stature by accounting for both attraction of knowledge generators to a distribution channel and degree to which knowledge consumers choose to use a channel's knowledge content. Each PI variant is expressed in terms of multiple parameters, permitting customization of stature evaluation to suit its user's preferences. In the spirit of analytics, each PI variant is driven by objective evidence of actual behaviors. The PI technique is based on 2 building blocks: (a) power that channels have for attracting results of generators' knowledge work, and (b) impact that channel contents' exhibit on prospective recipients. Feasibility and functionality of the PI-technique design are demonstrated by applying it to solve a problem of journal stature evaluation for the information-systems discipline.
The vast majority of the current author name disambiguation solutions are designed to disambiguate a whole digital library (DL) at once considering the entire repository. However, these solutions besides being very expensive and having scalability problems, also may not benefit from eventual manual corrections, as they may be lost whenever the process of disambiguating the entire repository is required. In the real world, in which repositories are updated on a daily basis, incremental solutions that disambiguate only the newly introduced citation records, are likely to produce improved results in the long run. However, the problem of incremental author name disambiguation has been largely neglected in the literature. In this article we present a new author name disambiguation method, specially designed for the incremental scenario. In our experiments, our new method largely outperforms recent incremental proposals reported in the literature as well as the current state-of-the-art non-incremental method.
Data reuse refers to the secondary use of data-not for its original purpose but for studying new problems. Although reusing data might not yet be the norm in every discipline, the benefits of reusing shared data have been asserted by a number of researchers, and data reuse has been a major concern in many disciplines. Assessing data for trustworthiness becomes important in data reuse with the growth in data creation because of the lack of standards for ensuring data quality and potential harm from using poor-quality data. This research explores many facets of data reusers' trust in data generated by other researchers focusing on the trust judgment process with influential factors that determine reusers' trust. The author took an interpretive qualitative approach by using in-depth semistructured interviews as the primary research method. The study results suggest different stages of trust development associated with the process of data reuse. Data reusers' trust may remain the same throughout their experiences, but it can also be formed, lost, declined, and recovered during their data reuse experiences. These various stages reflect the dynamic nature of trust.
Design of peer-review support systems is shaped by the policies that define and govern the process of peer review. An important component of these are policies that deal with anonymity: The rules that govern the concealment and transparency of information related to identities of the various stakeholders (authors, reviewers, editors, and others) involved in the peer-review process. Anonymity policies have been a subject of debate for several decades within scholarly communities. Because of widespread criticism of traditional peer-review processes, a variety of new peer-review processes have emerged that manage the trade-offs between disclosure and concealment of identities in different ways. Based on an analysis of policies and guidelines for authors and reviewers provided by publication venues, we developed a framework for understanding how disclosure and concealment of identities is managed. We discuss the appropriate role of information technology and computer support for the peer-review process within that framework.
Goodreads is an Amazon-owned book-based social web site for members to share books, read, review books, rate books, and connect with other readers. Goodreads has tens of millions of book reviews, recommendations, and ratings that may help librarians and readers to select relevant books. This article describes a first investigation of the properties of Goodreads users, using a random sample of 50,000 members. The results suggest that about three quarters of members with a public profile are female, and that there is little difference between male and female users in patterns of behavior, except for females registering more books and rating them less positively. Goodreads librarians and super-users engage extensively with most features of the site. The absence of strong correlations between book-based and social usage statistics (e.g., numbers of friends, followers, books, reviews, and ratings) suggests that members choose their own individual balance of social and book activities and rarely ignore one at the expense of the other. Goodreads is therefore neither primarily a book-based website nor primarily a social network site but is a genuine hybrid, social navigation site.
In 1965, Price foresaw the day when a citation-based taxonomy of science and technology would be delineated and correspondingly used for science policy. A taxonomy needs to be comprehensive and accurate if it is to be useful for policy making, especially now that policy makers are utilizing citation-based indicators to evaluate people, institutions and laboratories. Determining the accuracy of a taxonomy, however, remains a challenge. Previous work on the accuracy of partition solutions is sparse, and the results of those studies, although useful, have not been definitive. In this study we compare the accuracies of topic-level taxonomies based on the clustering of documents using direct citation, bibliographic coupling, and co-citation. Using a set of new gold standards-articles with at least 100 references- we find that direct citation is better at concentrating references than either bibliographic coupling or co-citation. Using the assumption that higher concentrations of references denote more accurate clusters, direct citation thus provides a more accurate representation of the taxonomy of scientific and technical knowledge than either bibliographic coupling or co-citation. We also find that discipline-level taxonomies based on journal schema are highly inaccurate compared to topic-level taxonomies, and recommend against their use.
This article contributes to the development of methods for analysing research funding systems by exploring the robustness and comparability of emerging approaches to generate funding landscapes useful for policy making. We use a novel data set of manually extracted and coded data on the funding acknowledgements of 7,510 publications representing UK cancer research in the year 2011 and compare these "reference data" with funding data provided by Web of Science (WoS) and MEDLINE/PubMed. Findings show high recall (around 93%) of WoS funding data. By contrast, MEDLINE/ PubMed data retrieved less than half of the UK cancer publications acknowledging at least one funder. Conversely, both databases have high precision (+90%): That is, few cases of publications with no acknowledgment to funders are identified as having funding data. Nonetheless, funders acknowledged in UK cancer publications were not correctly listed by MEDLINE/PubMed and WoS in around 75% and 32% of the cases, respectively. Reference data on the UK cancer research funding system are used as a case study to demonstrate the utility of funding data for strategic intelligence applications (e.g., mapping of funding landscape and co-funding activity, comparison of funders' research portfolios).
In this article, we show that the dramatic increase in the number of research articles indexed in the Web of Science database impacts the commonly observed distributions of citations within these articles. First, we document that the growing number of physics articles in recent years is attributed to existing journals publishing more and more articles rather than more new journals coming into being as it happens in computer science. Second, even though the references from the more recent articles generally cover a longer time span, the newer articles are cited more frequently than the older ones if the uneven article growth is not corrected for. Nevertheless, despite this change in the distribution of citations, the citation behavior of scientists does not seem to have changed.
Induced by "big data," "topic modeling" has become an attractive alternative to mapping co-words in terms of co-occurrences and co-absences using network techniques. Does topic modeling provide an alternative for co-word mapping in research practices using moderately sized document collections? We return to the word/document matrix using first a single text with a strong argument (" The Leiden Manifesto") and then upscale to a sample of moderate size (n = 5687) to study the pros and cons of the two approaches in terms of the resulting possibilities for making semantic maps that can serve an argument. The results from co-wordmapping (using two different routines) versus topic modeling are significantly uncorrelated. Whereas components in the co-word maps can easily be designated, the topic models provide sets of words that are very differently organized. In these samples, the topic models seem to reveal similarities other than semantic ones (e.g., linguistic ones). In other words, topic modeling does not replace co-word mapping in small and medium-sized sets; but the paper leaves open the possibility that topic modeling would work well for the semantic mapping of large sets.
In recent years, the relationship of collaboration among scientists and the citation impact of papers have been frequently investigated. Most of the studies show that the two variables are closely related: An increasing collaboration activity (measured in terms of number of authors, number of affiliations, and number of countries) is associated with an increased citation impact. However, it is not clear whether the increased citation impact is based on the higher quality of papers that profit from more than one scientist giving expert input or other (citation-specific) factors. Thus, the current study addresses this question by using two comprehensive data sets with publications (in the biomedical area) including quality assessments by experts (F1000Prime member scores) and citation data for the publications. The study is based on more than 15,000 papers. Robust regression models are used to investigate the relationship between number of authors, number of affiliations, and number of countries, respectively, and citation impact-controlling for the papers' quality (measured by F1000Prime expert ratings). The results point out that the effect of collaboration activities on impact is largely independent of the papers' quality. The citation advantage is apparently not quality related; citation-specific factors (e.g., self-citations) seem to be important here.
The characterization of scholarly communication is dominated by citation-based measures. In this paper we propose several metrics to describe different facets of open access and open research. We discuss measures to represent the public availability of articles along with their archival location, licenses, access costs, and supporting information. Calculations illustrating these new metrics are presented using the authors' publications. We argue that explicit measurement of openness is necessary for a holistic description of research outputs.
Hutchins, Yuan, Anderson, and Santangelo (2015) proposed the Relative Citation Ratio (RCR) as a new field-normalized impact indicator. This study investigates the RCR by correlating it on the level of single publications with established field-normalized indicators and assessments of the publications by peers. We find that the RCR correlates highly with established field-normalized indicators, but the correlation between RCR and peer assessments is only low to medium.
Patents are the main source of data on innovation. Since most of the innovative activity happens outside of the patenting system, and since patents and innovations- have different quality, complexity, and impact on each market, unweighted sums of patents and proxies are an imperfect indicator of a country's innovative activity. I generate two very simple indices of innovation (one dependent on the size of a country, and another that normalizes country-size), based on weighting patents and exports by a complexity measure. Each index captures the technological complexity of innovations inside and outside the intellectual property rights system. I empirically analyze the rankings of these innovation indices, and contrast the results with technological development, GDP, and the existing mainstream innovation index. (C) 2016 Elsevier Ltd. All rights reserved.
We assemble a massive sample of 180,000 CVs of Brazilian academic researchers of all disciplines from the Lattes platform. From the CVs we gather information on key variables related to the researchers and their publications. We find males are more productive in terms of quantity of publications, but the effect of gender in terms of research impact is mixed for individual groups of subject areas. For all fields of science, holding a PhD from abroad increases the chance for a researcher to publish in journals of higher impact. We also find that the more years a researcher takes to finish his or her doctorate, the more likely he or she will publish less thereafter, although in outlets of higher impact. The data also support the existence of an inverted U-shaped function relating research age and productivity. (C) 2016 Elsevier Ltd. All rights reserved.
In this paper, we propose a new criterion for choosing between a pair of classification systems of science that assign publications (or journals) to a set of clusters. Consider the standard target (cited-side) normalization procedure in which cluster mean citations are used as normalization factors. We recommend system A over system B whenever the standard normalization procedure based on system A performs better than the standard normalization procedure based on system B. Performance is assessed in terms of two double tests- one graphical, and one numerical- that use both classification systems for evaluation purposes. In addition, a pair of classification systems is compared using a third, independent classification system for evaluation purposes. We illustrate this strategy by comparing a Web of Science journal-level classification system, consisting of 236 journal subject categories, with two publication-level algorithmically constructed classification systems consisting of 1363 and 5119 clusters. There are two main findings. Firstly, the second publication-level system is found to dominate the first. Secondly, the publication-level system at the highest granularity level and the Web of Science journal-level system are found to be non-comparable. Nevertheless, we find reasons to recommend the publication-level option. (C) 2016 Elsevier Ltd. All rights reserved.
A procedure for identifying discoveries in the biomedical sciences is described that makes use of citation context information, or more precisely citing sentences, drawn from the PubMed Central database. The procedure focuses on use of specific terms in the citing sentences and the joint appearance of cited references. After a manual screening process to remove non -discoveries, a list of over 100 discoveries and their associated articles is compiled and characterized by subject matter and by type of discovery. The phenomenon of multiple discovery is shown to play an important role. The onset and timing of recognition of the articles are studied by comparing the number of citing sentences with and without discovery terms, and show both early onset and delays in recognition. A comparative analysis of the vocabularies of the discovery and non -discovery sentences reveals the types of words and concepts that scientists associate with discoveries. A machine learning application is used to efficiently extend the list. Implications of the findings for understanding the nature and justification of scientific discoveries are discussed. (C) 2016 Elsevier Ltd. All rights reserved.
Whether patent citations indicate knowledge linkage is still a controversial issue, which is very important for the widespread use of the patent citation analysis method. We hypothesize that there exists technological knowledge linkage between patents and their citations, and that the linkage can be detected through measuring text similarities between them. To test the hypothesis, we selected citing-cited patent pairs as the observation group and selected patent pairs without citing-cited relationship as the control group. Using the VSM with WF-IDF weighting method, we calculated text similarity values of the two groups. Through comparing text similarity values between the two groups, we validate that in the vast majority of cases text similarity values of citing-cited pairs are much higher than those of non-citing-cited pairs. The study in nano-technology field shows that the above results are the same, although patents in the same technological area are more relevant than in different technological areas. Furthermore, by comparing text similarities between applicant and examiner citing-cited pairs, the results show that in more cases examiner citations indicate knowledge linkage a bit better than applicant citations. Preferably, examiner citations can be regarded as not only the supplement of applicant citations but also the more important technological background and the prior art closely related to the patents. Compared to applicant citations, examiner citations are a good indicator of knowledge linkage rather than an incomplete and noisy indicator. In short, the results suggest that most certainly patent citations can indicate knowledge linkage, and more likely examiner citations can indicate knowledge linkage a bit better than applicant citations, especially for the component of patent claims. Therefore, we accept the hypothesis that patent citations can indicate knowledge linkage, which is the basic assumption of the patent citation analysis method. (C) 2016 Elsevier Ltd. All rights reserved.
Acknowledgments are one of many conventions by which researchers publicly bestow recognition towards individuals, organizations and institutions that contributed in some way to the work that led to publication. Combining data on both co-authors and acknowledged individuals, the present study analyses disciplinary differences in researchers' credit attribution practices in collaborative context. Our results show that the important differences traditionally observed between disciplines in terms of team size are greatly reduced when acknowledgees are taken into account. Broadening the measurement of collaboration beyond co-authorship by including individuals credited in the acknowledgements allows for an assessment of collaboration practices and team work that might be closer to the reality of contemporary research, especially in the social sciences and humanities. (C) 2016 Elsevier Ltd. All rights reserved.
Gender differences in collaborative research have received little attention when compared with the growing importance that women hold in academia and research. Unsurprisingly, most of bibliometric databases have a strong lack of directly available information by gender. Although empirical-based network approaches are often used in the study of research collaboration, the studies about the influence of gender dissimilarities on the resulting topological outcomes are still scarce. Here, networks of scientific subjects are used to characterize patterns that might be associated to five categories of authorships which were built based on gender. We find enough evidence that gender imbalance in scientific authorships brings a peculiar trait to the networks induced from papers published in Web of Science (WoS) indexed journals of Economics over the period 2010-2015 and having at least one author affiliated to a Portuguese institution. Our results show the emergence of a specific pattern when the network of co-occurring subjects is induced from a set of papers exclusively authored by men. Such a male-exclusive authorship condition is found to be the solely responsible for the emergence of that particular shape in the network structure. This peculiar trait might facilitate future network analysis of research collaboration and interdisciplinarity. (C) 2016 Elsevier Ltd. All rights reserved.
Output resulting from institutional collaboration has been widely used to create performance indicators, but focusing on research guarantors has recently provided a way to recognize the salient role of certain scientific actors. This paper elaborates on this approach to characterize the performance of an institution as guarantor based not only on its guarantor output but also on the importance of the institutions with which it collaborates. Accepting that guarantorship implies in some way acknowledgement of a prominent role on the part of the collaborating institutions, and that this recognition will be more important the more important the collaborating institutions, the paper describes two approaches to measuring this acknowledgement and discusses their effectiveness in helping to recognize prominent scientific actors by using a case study in the Library and Information Science field. The results show a high assortativity in scientific collaboration relationships, confirming the original hypothesis that important institutions tend to grant prestigious institutions the recognition of their relevance. (C) 2016 Elsevier Ltd. All rights reserved.
Prior investigations have offered contrasting results on a troubling question: whether the alphabetical ordering of bylines confers citation advantages on those authors whose surnames put them first in the list. The previous studies analyzed the surname effect at publication level, i.e. whether papers with the first author early in the alphabet trigger more citations than papers with a first author late in the alphabet. We adopt instead a different approach, by analyzing the surname effect on citability at the individual level, i.e. whether authors with alphabetically earlier surnames result as being more cited. Examining the question at both the overall and discipline levels, the analysis finds no evidence whatsoever that alphabetically earlier surnames gain advantage. The same lack of evidence occurs for the subpopulation of scientists with very high publication rates, where alphabetical advantage might gain more ground. The field of observation consists of 14,467 scientists in the sciences. (C) 2016 Elsevier Ltd. All rights reserved.
Although altmetrics and other web-based alternative indicators are now commonplace in publishers' websites, they can be difficult for research evaluators to use because of the time or expense of the data, the need to benchmark in order to assess their values, the high proportion of zeros in some alternative indicators, and the time taken to calculate multiple complex indicators. These problems are addressed here by (a) a field normalisation formula, the Mean Normalised Log-transformed Citation Score (MNLCS) that allows simple confidence limits to be calculated and is similar to a proposal of Lundberg, (b) field normalisation formulae for the proportion of cited articles in a set, the Equalised Mean-based Normalised Proportion Cited (EMNPC) and the Mean-based Normalised Proportion Cited (MNPC), to deal with mostly uncited data sets, (c) a sampling strategy to minimise data collection costs, and (d) free unified software to gather the raw data, implement the sampling strategy, and calculate the indicator formulae and confidence limits. The approach is demonstrated (but not fully tested) by comparing the Scopus citations, Mendeley readers and Wikipedia mentions of research funded by Wellcome, NIH, and MRC in three large fields for 2013-2016. Within the results, statistically significant differences in both citation counts and Mendeley reader counts were found even for sets of articles that were less than six months old. Mendeley reader counts were more precise than Scopus citations for the most recent articles and all three funders could be demonstrated to have an impact in Wikipedia that was significantly above the world average. (C) 2016 Elsevier Ltd. All rights reserved.
The main objective of this paper is to empirically test whether the identification of highly cited documents through Google Scholar is feasible and reliable. To this end, we carried out a longitudinal analysis (1950-2013), running a generic query (filtered only by year of publication) to minimise the effects of academic search engine optimisation. This gave us a final sample of 64,000 documents (1000 per year). The strong correlation between a document's citations and its position in the search results (r = 0.67) led us to conclude that Google Scholar is able to identify highly-cited papers effectively. This, combined with Google Scholar's unique coverage (no restrictions on document type and source), makes the academic search engine an invaluable tool for bibliometric research relating to the identification of the most influential scientific documents. We find evidence, however, that Google Scholar ranks those documents whose language (or geographical web domain) matches with the user's interface language higher than could be expected based on citations. Nonetheless, this language effect and other factors related to the Google Scholar's operation, i.e. the proper identification of versions and the date of publication, only have an incidental impact. They do not compromise the ability of Google Scholar to identify the highly-cited papers. (C) 2016 Elsevier Ltd. All rights reserved.
Using percentile shares, one can visualize and analyze the skewness in bibliometric data across disciplines and over time. The resulting figures can be intuitively interpreted and are more suitable for detailed analysis of the effects of independent and control variables on distributions than regression analysis. We show this by using percentile shares to analyze so-called "factors influencing citation impact" (FICs; e.g., the impact factor of the publishing journal) across years and disciplines. All articles (n=2,961,789) covered by WoS in 1990 (n=637,301), 2000 (n =919,485), and 2010 (n =1,405,003) are used. In 2010, nearly half of the citation impact is accounted for by the 10% most-frequently cited papers; the skewness is largest in the humanities (68.5% in the top-10% layer) and lowest in agricultural sciences (40.6%). The comparison of the effects of the different FICs (the number of cited references, number of authors, number of pages, and JIF) on citation impact shows that the JIF has indeed the strongest correlations with the citation scores. However, the correlation between FICs and citation impact is lower, if citations are normalized instead of using raw citation counts. (C) 2016 Elsevier Ltd. All rights reserved.
metaknowledge is a full-featured Python package for computational research in information science, network analysis, and science of science. It is optimized to scale efficiently for analyzing very large datasets, and is designed to integrate well with reproducible and open research workflows. It currently accepts raw data from the Web of Science, Scopus, PubMed, ProQuest Dissertations and Theses, and select funding agencies. It processes these raw data inputs and outputs a variety of datasets for quantitative analysis, including time series methods, Standard and Multi Reference Publication Year Spectroscopy, computational text analysis (e.g. topic modeling, burst analysis), and network analysis (including multi-mode, multi-level, and longitudinal networks). This article motivates the use of metaknowledge and explains its design and core functionality. (C) 2016 Elsevier Ltd. All rights reserved.
In today's complex academic environment the process of performance evaluation of scholars is becoming increasingly difficult. Evaluation committees often need to search in several repositories in order to deliver their evaluation summary report for an individual. However, it is extremely difficult to infer performance indicators that pertain to the evolution and the dynamics of a scholar. In this paper we propose a novel computational methodology based on unsupervised machine learning that can act as an important tool at the hands of evaluation committees of individual scholars. The suggested methodology compiles a list of several key performance indicators (features) for each scholar and monitors them over time. All these indicators are used in a clustering framework which groups the scholars into categories by automatically discovering the optimal number of clusters using clustering validity metrics. A profile of each scholar can then be inferred through the labeling of the clusters with the used performance indicators. These labels can ultimately act as the main profile characteristics of the individuals that belong to that cluster. Our empirical analysis gives emphasis on the "rising stars" who demonstrate the biggest improvement over time across all of the key performance indicators (KPIs), and can also be employed for the profiling of scholar groups. Published by Elsevier Ltd.
In this paper we present "citation success index", a metric for comparing the citation capacity of pairs of journals. Citation success index is the probability that a random paper in one journal has more citations than a random paper in another journal (50% means the two journals do equally well). Unlike the journal impact factor (IF), the citation success index depends on the broadness and the shape of citation distributions. Furthermore, it is insensitive to sporadic highly-cited papers that affect the IF. Nevertheless, we show, based on 16,000 journals containing 2.4 million articles, that the citation success index is a relatively tight function of the ratio of IFs of journals being compared. This is due to the fact that journals with the same IF have quite similar citation distributions. The citation success index grows slowly as a function of IF ratio. It is substantial (>90%) only when the ratio of IFs exceeds 6, whereas a factor of two difference in IF values translates into a modest advantage for the journal with higher IF (index of 70%). We facilitate the wider adoption of this metric by providing an online calculator that takes as input parameters only the IFs of the pair of journals. (C) 2016 Elsevier Ltd. All rights reserved.
Neuroscience or Neural Science is a very active and interdisciplinary field that seeks to understand the brain and the nervous system. In spite of important advances made in recent decades, women are still underrepresented in neuroscience research output as a consequence of gender inequality in science overall. This study carries out a scientometric analysis of the 30 neuroscience journals (2009-2010) with the highest impact in the Web of Science database (Thomson Reuters) in order to quantitatively examine the current contribution of women in neuroscientific production, their pattern of research collaboration, scientific content, and the analysis of scientific impact from a gender perspective. From a total of 66,937 authorships, gender could be identified in 53,351 (79.7%) of them. Results revealed that 67.1% of the authorships corresponded to men and 32.9% to women. In relative terms, women tend to be concentrated in the first position of the authorship by-line (which could be a reflection of new female incorporations into neuroscience research publishing their first studies), and much less in the last (senior) position. This double pattern suggests that age probably plays a role in (partly) explaining gender asymmetry, both in science in general and in neuroscience in particular. (C) 2016 Elsevier Ltd. All rights reserved.
In this study, we explore paper characteristics that facilitate the knowledge flow from science to technology by using the patent-to-paper citation data. The linear growth trajectory of the number of patent citations to a scientific paper over time is used to measure the dynamism of its utilization for technology applications. The citation data used were obtained from the USPTO database based on two 5-year citation windows, 2001-2005 and 2009-2013. The former included patent citations to the publications in the Thomson Reuters Web of Science in 1998, and the latter included those in 2006. Only the publications in the top ten most frequently cited subject categories in the Web of Science were selected. By using growth modeling, we have found that the mean slope of the trajectory is significant. Moreover, the paper citation count, the ranking factor of the journal in which the paper was published, whether the paper is an industrial publication, and whether it is a review article have been identified to exert significant effects on the growth of the citation of scientific literature by patented inventions. Some policy implications are discussed. (C) 2016 Elsevier Ltd. All rights reserved.
In this contribution we study partial orders in the set of zero-sum arrays. Concretely, these partial orders relate to local and global hierarchy and dominance theories. The exact relation between hierarchy and dominance curves is explained. Based on this investigation we design a new approach for measuring dominance or stated otherwise, power structures, in networks. A new type of Lorenz curve to measure dominance or power is proposed, and used to illustrate intrinsic characteristics of networks. The new curves, referred to as D-curves are partly concave and partly convex. As such they do not satisfy Dalton's transfer principle. Most importantly, this article introduces a framework to compare different power structures as a whole. It is shown that D-curves have several properties making them suitable to measure dominance. If dominance and being a subordinate are reversed, the dominance structure in a network is also reversed. (C) 2016 Elsevier Ltd. All rights reserved.
Measuring the contribution of each author of a multi-author paper has been a long standing concern. As a possible solution to this, we propose a list of intellectual activities and logistic support activities that might be involved in the production of a research paper. We then develop a quantitative approach to estimate an author's relative intellectual contribution to a published work. An author's relative intellectual contribution is calculated as the percent contribution of an author to each intellectual activity involved in the production of the paper multiplied by a weighing factor for each intellectual activity. The relative intellectual contribution calculated in this way can be used to determine the position of an author in the author list of a paper. Second, a corrected citation index for each author, called the T-index, can be calculated by multiplying the relative intellectual contribution by the total citations received by a paper. The proposed approach can be used to measure the impact of an author of a multi-authored paper in a more accurate way than either giving each author full credit or dividing credit equally. Our proposal not only resolves the long standing concern for the fair distribution of each author's credit depending on his/her contribution, but it will also, hopefully, discourage addition of non-contributing authors to a paper. (C) 2017 Elsevier Ltd. All rights reserved.
This article discusses the metrics used in the national research evaluation in Poland of the period 2009-2012. The Polish system uses mostly parametric assessments to make the evaluation more objective and independent from its peers. We have analysed data on one million research outcomes and assessment results of 962 scientific units in the period 2009-2012. Our study aims to determine how much data the research funding system needs to proceed with evaluation. We have used correlation analysis, multivariate logistic regressions models and decision trees to show which metrics of the evaluation played a major role in the final results. Our analysis revealed that many metrics taken into account in the evaluation are closely correlated. We have found that in the Polish system, not all the collected data are necessary to achieve the main goal of the system, namely the categorization of scientific units in terms of their research performance. Our findings highlight the fact that there is a high correlation between performance in terms of publications and the scientific potential of a given scientific unit. We conclude with recommendations and a suggestion of a transition from a system in which the scientific units report all their metrics to a system in which they show only the most important metrics that meet the requirements of excellence in research. (C) 2017 Elsevier Ltd. All rights reserved.
The rising trend of coauthored academic works obscures the credit assignment that is the basis for decisions of funding and career advancements. In this paper, a simple model based on the assumption of an unvarying "author ability" is introduced. With this assumption, the weight of author contributions to a body of coauthored work can be statistically estimated. The method is tested on a set of some more than five-hundred authors in a coauthor network from the CiteSeerX database. The ranking obtained agrees fairly well with that given by total fractional citation counts for an author, but noticeable differences exist. (C) 2017 Elsevier Ltd. All rights reserved.
Young scholars in academia often seek to work in collaboration with top researchers in their field in pursuit of a successful career. While success in academia can be defined differently, everyone agrees that training with a well-known researcher can help lead to an efficacious career. This study aims to investigate whether collaborating with established scientists does, in fact, improve junior scholars' chances of success. If not, what makes young scientists soar in their academic careers? We investigate this question by analyzing the effect of collaboration with a known-star on success of a young scholar. The results suggest that working with leading experts can lead to a successful career, but that it is not the only way. Researchers who were not fortunate enough to start their career with an elite researcher could still succeed through hard work and passion. These findings emerged from analyses of two discrete sets of well-known scholars on the career of newcomers, suggesting their strength and validity. (C) 2017 Elsevier Ltd. All rights reserved.
The paper provides an empirical examination of how research productivity distributions differ across scientific fields and disciplines. Productivity is measured using the FSS indicator, which embeds both quantity and impact of output. The population studied consists of over 31,000 scientists in 180 fields (10 aggregate disciplines) of a national research system. The Characteristic Scores and Scale technique is used to investigate the distribution patterns for the different fields and disciplines. Research productivity distributions are found to be asymmetrical at the field level, although the degree of skewness varies substantially among the fields within the aggregate disciplines. We also examine whether the field productivity distributions show a fractal nature, which reveals an exception more than a rule. Differently, for the disciplines, the partitions of the distributions show skewed patterns that are highly similar. (C) 2017 Elsevier Ltd. All rights reserved.
Increasingly complex competitive environments drive corporations in almost all industries to conduct omnibearing innovation activities to enhance their technological innovation capability and international competitiveness. Against this background, we propose subject-action-object (SAO) based morphological analysis to identify technology opportunities by detecting prioritized combinations within the morphology matrix. SAO structures emphasize the key concepts with provision of diverse technology information based on semantic relationships. The combination of SAO semantic structures can support the establishment of matrix, which consists of two dimensions: compositions and properties of technology. Later, novel indicators are used to evaluate the subsequent technological feasibility of each new configuration under a customized analysis and prior combinations aided by a high score can be identified. We apply this method to the case of dye-sensitized solar cells (DSSCs) in patents documents. The approach holds promise to strengthen information support systems for commercial enterprises in technical innovation and market innovation activities. We believe the analysis can be adapted well to fit other technologies, especially in their emerging stage.
Firms participating in a patent transaction network demonstrate interdependence and mutually influence one another. The characteristics of such a network structure demonstrate a complex overall configuration. This study proposes a dynamic perspective for investigating the structure of a patent transaction network. By using network analysis, this study defines the structural configuration of a patent transaction network by measuring centralization, centrality, and linkage distribution. Data from patent transactions of the flat panel display sector from 1976 to 2012 have been examined evaluate their networking. The results show that the structural configuration of a patent transaction network shows significant stratification patterns in terms of a given firm's technology exportation and brokerage capabilities, but also operates as a complex system. This analysis provides insights into patent transaction networks, and also addresses policy implications for firms and authorities who are interested in acquiring market competition or governance.
Having a new technology opportunity is a significant variable that can lead to dominance in a competitive market. In that context, accurately understanding the state of development of technology convergence and forecasting promising technology convergence can determine the success of a firm. However, previous studies have mainly focused on examining the convergence paths taken in the past or the current state of convergence rather than projecting the future trends of convergence. In addition, few studies have dealt with multi-technology convergence by taking a pairwise-analysis approach. Therefore, this research aimed to propose a forecasting methodology for multi-technology convergence, which is more realistic than pairwise convergence, based on a patent-citation analysis, a dependency-structure matrix, and a neural-network analysis. The suggested methodology enables both researchers and practitioners in the convergence field to plan their technology development by forecasting the technology combination that will occur in the future.
This study was performed to discuss an R&D investment planning method based on the technology spillover among R&D fields, from the point of view of technology convergence. The empirical analysis focused on a particular R&D group, such as university departments and specialized research institutes, since local technology combinations are more effective than distant combinations to create a new technology, according to previous research. In addition, worldwide technology competition is increasing, and with the recent convergence of various technologies and industries, strategies for R&D selection and resources allocation of particular R&D groups are becoming increasingly important. The empirical analysis uses a modified Decision Making Trial and Evaluation Laboratory method combined with information on patent citations to resolve the latent problems of the existing model, using as an empirical example the case of the Korea Institute of Geoscience and Mineral Resources (KIGAM), specialized in the geology and resources development R&D area. Through the empirical analysis, the KIGAM's current R&D investment status is considered, and a reasonable R&D investment planning is suggested from the perspective of technology spillover. By using this framework, the magnitude of technology spillover from the R&D investment planning within a particular R&D group can be measured based on objective quantitative data, and the current R&D investment can be compared with recent global trends.
This contribution studies the technological capabilities of Central and Eastern European (CEE) economies based on priority filings for the period of 1980-2009. From a global perspective, the indicators suggest a division of labour in technological activities among world regions whereby Europe, Latin America and the former USSR are specializing in sectors losing technological dynamism (Chemicals and Mechanical Engineering) while North America, the Middle East (especially Israel) and Asia Pacific are increasingly specializing in Electrical Engineering, a sector with significant technological opportunities. Regarding priority filings, CEE reduced its technological activities drastically after 1990. The recovery of CEE economies regarding technological capabilities is unfolding very slowly. The results speak for the ability of CEE countries in contributing to a limited number of fields with growing technological opportunities. The technological profile of the CEE region will more likely than not complicate the technology upgrading process towards activities at the technological frontier.
The novelty of a patent may be seen as those patterns that distinguishes it from other patents and scientific literature. Its understanding may serve for many purposes, both in scientometric research and in the management of technological information. While many methods exist that deal with a patent's meta-information like citation networks or co-classification analysis, the analysis of novelty in the full text of a patent is still at the beginning of research and in practice a time-consuming manual task. The question we pose is whether computer-based text mining methods are able to identify those elements of such a patent that make it novel from a technological and application/market perspective. For this purpose we introduce and operationalize the concept of near environment analysis and use a three-step text mining approach on one of the patents nominated as finalist in the 2012 European Inventor Award contest. We demonstrate that such an approach is able to single out, content-wise in a near environment, the novelty of the patent. The method can be used also for other patents and-with adaption of the near environment analysis-for scientific literature.
From the perspective of science-based innovation, this study introduces measures of both scientific linkage (technology-science correlation index) and technological innovation capabilities (relative growth rate, relative patent position and revealed technological advantage) to compare and analyze the international competitiveness of solar energy technologies among the United States, the European Union, Japan, China and South Korea, based on the solar energy technologies-related patents in the European Patent Office Worldwide Patent Statistical Database. After making international comparisons of their technological development and innovation paradigm, we find that there are different innovation characteristics of various technology fields within the solar energy industry and then propose some relevant policy recommendations for latecomers to implement catch-up strategies. The results show that the leading countries and regions of the solar energy industry such as the United States and the European Union focus mainly on science-based innovation, while Japan and latecomers like China and South Korea pay more attention on technology-based innovation. In addition, those two fields within the solar energy industry present opposite innovation characteristics: solar photovoltaic technologies, especially thin film and organic cells, present strong technological innovation capabilities with high scientific linkage, while solar thermal technologies show strong technological innovation capabilities with low scientific linkage.
Additive manufacturing (AM) or 3D printing includes techniques capable of manufacturing regular and irregular shapes for small batches of customized products. The ability to customize unusual shapes makes the process particularly suitable for prosthetic products used in biomedical applications. AM adoption in the field of biomedical applications (called bio-AM in this research) has seen significant growth over the last few years. This research develops an Intellectual Property (IP) analytical methodology to explore the portfolios and evolution of patents, as well as their relevance to Taiwan's Ministry of Science and Technology (MOST) research projects in bio-AM domain. Specifically, global and domestic IP portfolios for bio-AM innovations are studied using the proposed method. First, the domain documents (of US patents and MOST projects) are collected from a global patent database and MOST project database. The key term frequency counts and technical clustering analysis of the collected documents are derived. The key terms and appearance frequencies in documents form the basis for document clustering and similarity analysis. The ontology of bio-AM is constructed based on the clustering results. Finally, the patents and projects in the adjusted clusters are subject to evolution analysis using concept lattice analysis. This research provides a computer supported IP evolution analysis system, based on the developed algorithms, for the decision support of IP and R&D strategic planning.
This paper describes an empirical study to identify factors of patent description and examination that affect the outcome of later challenges to patent validity. Using ternary dummy variables based on the change in the scope of protection before and after Japanese invalidation trials as the dependent variable, the study compares models estimated by ordered and binary logistic regression. The standardized length of the first part of the description explaining the prior art, and the second part explaining the details of the invention, were found to relate to the "maximum" and "minimum" survival of patents, respectively. Other factors were also identified. Different modes of survival are discussed. Models on the likelihood of a challenge to patent validity were also estimated. The variables related to patent validity were compared to those related to the likelihood of a challenge. This revealed factors affected by selection biases based on the opponent's decision to make the challenge. Factors including the first and second parts of the description were not affected by selection biases and thus confirmed to be related to patent validity. Factors that not only affect patent validity but also affect an opponent's decision to challenge patent validity are also discussed.
Whether it be for countries to improve the ability to undertake independent innovation or for enterprises to enhance their international competitiveness, tracing historical progression and forecasting future trends of technology evolution is essential for formulating technology strategies and policies. In this paper, we apply co-classification analysis to reveal the technical evolution process of a certain technical field, use co-word analysis to extract implicit or unknown patterns and topics, and employ main path analysis to discover significant clues about technology hotspots and development prospects. We illustrate this hybrid approach with 3D printing, referring to various technologies and processes used to synthesize a three-dimensional object. Results show how our method offers technical insights and traces technology evolution pathways, and then helps decision-makers guide technology development.
Patents and relevant topics are gaining momentum in economic analysis and scientific research with the rapid global intellectual property filings growth. However, a corresponding increase seems to be unspectacular in patent research publications, especially under the category of information science and library science. This paper provided a retrospect to the existing studies on patents collected from web of science and emphatically characterized the current situation through performing a series of bibliometric analysis. Prominent authors and institutions from mainland China, Taiwan and Belgium have carried out various studies on patent separately or jointly. Topics involved in 884 journal papers are reclassified from perspectives of the development, application and analysis of patents based on the results of keyword co-occurrence and typical publications in each stage. The final, but the novel part of this study was a sentence-by-sentence analysis of conclusive and citing ideas of recent publications, for tracing problems and potential researchable topics and indicating that patent research still has more spaces to move up.
Despite their important position in the research environment, there is a growing theoretical uncertainty concerning what research metrics indicate (e.g., quality, impact, attention). Here we utilize the same tools used to study latent traits like Intelligence and Personality to get a quantitative understanding of what over 20 common research metrics indicate about the papers they represent. The sample is all of the 32,962 papers PLoS published in 2014, with results suggesting that there are at least two important underlying factors, which could generally be described as Scientific Attention/Discussion (citations), General Attention/Discussion (views, tweets), and potentially Media Attention/Discussion (media mentions). The General Attention metric is correlated about .50 with both the Academic and Media factors, though the Academic and Media attention are only correlated with each other below .05. The overall best indicator of the dataset was the total lifetime views on the paper, which is also probably the easiest to game. The results indicate the need for funding bodies to decide what they value and how to measure it (e.g., types of attention, quality).
In this paper, we study the spatial characteristics of a sample of 2605 highly productive economists, and a subsample of 332 economists with outstanding productivity. Individual productivity is measured in terms of a quality index that weights the number of publications up to 2007 in four journal classes. We analyze the following four issues. (1) The "funneling effect" towards the US and the clustering of scholars in the top US institutions. (2) The high degree of collective inbreeding in the training of elite members. (3) The partition of those born in a given country into brain drain (who work in a country different from their country of origin), brain circulation (who study and/or work abroad followed by a return to the home country), and stayers (whose entire academic career takes place in their country of origin). We also study the partition of the economists working in 2007 in a given geographical area into nationals (stayers plus brain circulation) and migrants (brain drained from other countries). (4) Finally, we estimate the research output in different geographical areas in two instances: when we classify researchers by the institution where they work in 2007, or by their country of origin.
Context of altmetrics data is essential for further understanding value of altmetrics beyond raw counts. Mainly two facets of context are explored, the count type which reflects user's multiple altmetrics behaviors and user category which reflects part of user's background. Based on 5.18 records provided by Altmetric.com, both descriptive statistics and t test result show significant difference between number of posts (NP) and number of unique users (NUU). For several altmetrics indicators, NP has moderate to low correlation with NUU. User category is found to have huge impact on altmetrics count. Analysis of twitter user category shows the general tweet distribution is strongly influenced by the public user. Tweets from research user are more correlated with citations than any other user categories. Moreover, disciplinary difference exists for different user categories.
This study sheds light on the unexplored phenomenon of multiple institutional affiliations using scientific publications. Institutional affiliations are important in the organisation and governance of science. Multiple affiliations may alter the traditional framework of academic employment and careers and may require a reappraisal of institutional assessment based on research outcomes of affiliated staff. Results for authors in three major science and technology nations (Germany, Japan and the UK) and in three fields (biology, chemistry, and engineering) show that multiple affiliations have at least doubled over the past few years. The analysis proposes three major types of multiple affiliations that depend on the structure of the research sector and its international openness. Highly internationalised and higher education-centred affiliations are most common for researchers in the UK whereas Germany and Japan have stronger cross-sector affiliation patterns. International multiple affiliations are, however, still more common in Germany compared to Japan which is characterised by a domestic, cross-sector affiliation distribution. Moreover, multiple affiliation authors are more often found on high impact papers, particularly in the case of authors from Japan and Germany in the fields of biology and chemistry.
A forecasting methodology for technology development trends is proposed based on a two-level network model consisting of knowledge-transfer among patents and patent subclasses, with the aim to confront the increasing complex challenge in technology investment and management. More specifically, the patents of the "coherent light generators'' classification, granted from 1976 to 2014 by examiners of the United States Patent and Trademark Office, are collected and with which the first-level citation network is constructed first. Then, a new approach to assess patent importance from the perspective of topological structure is provided and the second-level citation network, which consists of patent subclasses, is produced with the evaluation results. Moreover, three assessment indices of the subclass citation network are abstracted as impact parameters for technology development trends. Finally, two typical time series models, the Bass and ARIMA model, are utilized and compared for development trend forecasting. Based on the results of evolution prediction and network analysis, the highlighted patent subclasses with more development potential are identified, and the correlation between technology development opportunity and topological structure of the patent citation network is discussed.
The present paper takes its place in the stream of studies that analyze the effect of interdisciplinarity on the impact of research output. Unlike previous studies, in this study the interdisciplinarity of the publications is not inferred through their citing or cited references, but rather by identifying the authors' designated fields of research. For this we draw on the scientific classification of Italian academics, and their publications as indexed in the WoS over a 5-year period (2004-2008). We divide the publications in three subsets on the basis the nature of co-authorship: those papers coauthored with academics from different fields, which show high intensity of inter-field collaboration ("specific'' collaboration, occurring in 110 pairings of fields); those papers coauthored with academics who are simply from different "non-specific'' fields; and finally co-authorships within a single field. We then compare the citations of the papers and the impact factor of the publishing journals between the three subsets. The results show significant differences, generally in favor of the interdisciplinary authorships, in only one third (or slightly more) of the cases. The analysis provides the value of the median differences for each pair of publication subsets.
After developing independently following World War II, the research systems of East and West Germany reunited at the end of the Cold War, resulting in Westernization of East German Research institutions. Using data from the Web of Science over the 1980-2000 period, this paper analyses the effects of these political changes on the research activity of scholars from East and West Germany before and after the reunification. It shows that these groups differ in terms of levels of production, publication language, collaboration patterns and scientific impact and that, unsurprisingly, the scholarly output of the East became much more similar to that of the West after the reunification. At the level of individual researchers, analysis shows that East German researchers who had direct or indirect ties with the West prior to the 1990s were less affected by the reunification, or were perhaps quicker to adapt to this major change, than their colleagues who were more deeply rooted in the Eastern research system.
To explore whether there are other factors than count and sentiment that should be incorporated in evaluating research papers with social media mentions, this paper analyses the content of tweets linking to the top 100 papers of 2015 taken from www.altmetric.com, focusing on the goals, functions and features of research. We discuss three basic issues inherent in using tweets for research evaluation: whose tweets can be used to assess a paper, what objects can be evaluated, and how to score the paper according to each tweet. We suggest that tweets written by those involved in publication of the paper in question should not be included in the paper's evaluation. Tweets unrelated to the content of the paper should also be excluded. Because controversies in research are inevitable and difficult to resolve, we suggest omitting somewhat supportive and negative tweets in research evaluation. Logically, neutral tweets (such as those linking to, and excerpts from, papers) express a degree of compliment, agreement, interest, or surprise, albeit less so than the tweets explicitly expressing these sentiments. Recommendation tweets also reflect one or more of these sentiments. Expansion tweets, which are inspired by the papers, reflect the function of research. Therefore, we suggest giving a higher weight to praise, agreement, interest, surprise, recommendation and expansion tweets linking to an academic paper than neutral tweets when scoring a paper. Issues related to electronic publishing and social media as learned from tweets are also discussed.
We explore if and how Microsoft Academic (MA) could be used for bibliometric analyses. First, we examine the Academic Knowledge API (AK API), an interface to access MA data, and compare it to Google Scholar (GS). Second, we perform a comparative citation analysis of researchers by normalizing data from MA and Scopus. We find that MA offers structured and rich metadata, which facilitates data retrieval, handling and processing. In addition, the AK API allows retrieving frequency distributions of citations. We consider these features to be a major advantage of MA over GS. However, we identify four main limitations regarding the available metadata. First, MA does not provide the document type of a publication. Second, the "fields of study'' are dynamic, too specific and field hierarchies are incoherent. Third, some publications are assigned to incorrect years. Fourth, the metadata of some publications did not include all authors. Nevertheless, we show that an average-based indicator (i.e. the journal normalized citation score; JNCS) as well as a distribution-based indicator (i.e. percentile rank classes; PR classes) can be calculated with relative ease using MA. Hence, normalization of citation counts is feasible with MA. The citation analyses in MA and Scopus yield uniform results. The JNCS and the PR classes are similar in both databases, and, as a consequence, the evaluation of the researchers' publication impact is congruent in MA and Scopus. Given the fast development in the last year, we postulate that MA has the potential to be used for full-fledged bibliometric analyses.
Scientometric studies have, by and large, focused on the features of the hard sciences rather than the soft sciences. Prior research has been highly centered around natural science disciplines and not many studies have dealt with the social sciences. This applies to Africa as well. However, attempts to investigate the features and tendencies in the social sciences are gradually emerging. This is the first paper to explore the social sciences in South Africa, examining the interrelationships between the types of collaboration and the impact of research publications as measured in the count of citations. Extracting Web of Science data from its Social Science Citation Index (from 1956 to present) for sampled years between 1970 and 2015 (n = 4991), the analysis explains citations in terms of the type of collaboration, international partners and subject areas. The highlights of this analysis are that the social sciences in South Africa have certain distinguishing characteristics that determine the production and impact on knowledge.
Increased collaboration between researchers working in university, industry, and governmental settings is changing the landscape of academic science. Traditional models of the interaction between these sectors, such as the triple helix concept, draw clear distinctions between academic and non-academic settings and actors. This study surveyed scientists (n = 469) working outside of university settings who published articles indexed in the Web of Science about their modes of collaboration, perceptions about publishing, workplace characteristics, and information sources. We study the association between these variables, and use text analysis to examine the roles, duties, sites, topics, and workplace missions among non-university based authors. Our analysis shows that 72% of authors working in non-university settings who collaborate and publish with other scientists self-identify as academics. Furthermore, their work life resembles that of those working in university settings in that the majority report doing fundamental research in government research organizations and laboratories. Contrary to our initial hypothesis, this research suggests that peer-reviewed publications are much more dominated by non-university academics than we previously thought and that collaboration as co-authors on academic publications is not likely to be a primary conduit for the transfer of scientific knowledge between academe and industry.
This article discusses the Polish Journal Ranking, which is used in the research evaluation system in Poland. In 2015, the ranking, which represents all disciplines, allocated 17,437 journals into three lists: A, B, and C. The B list constitutes a ranking of Polish journals that are indexed neither in the Web of Science nor the European Reference Index for the Humanities. This ranking was built by evaluating journals in three dimensions: formal, bibliometric, and expert-based. We have analysed data on 2035 Polish journals from the B list. Our study aims to determine how an expert-based evaluation influenced the results of final evaluation. In our study, we used structural equation modelling, which is regression based, and we designed three pairs of theoretical models for three fields of science: (1) humanities, (2) social sciences, and (3) engineering, natural sciences, and medical sciences. Each pair consisted of the full model and the reduced model (i.e., the model without the expert-based evaluation). Our analysis revealed that the multidimensional evaluation of local journals should not rely only on the bibliometric indicators, which are based on the Web of Science or Scopus. Moreover, we have shown that the expert-based evaluation plays a major role in all fields of science. We conclude with recommendations that the formal evaluation should be reduced to verifiable parameters and that the expert-based evaluation should be based on common guidelines for the experts.
This study examines characteristics of data sharing and data re-use in Genetics and Heredity, where data citation is most common. This study applies an exploratory method because data citation is a relatively new area. The Data Citation Index (DCI) on the Web of Science was selected because DCI provides a single access point to over 500 data repositories worldwide and to over two million data studies and datasets across multiple disciplines and monitors quality research data through a peer review process. We explore data citations for Genetics and Heredity, as a case study by examining formal citations recorded in the DCI and informally by sampling a selection of papers for implicit data citations within publications. Citer-based analysis is conducted in order to remedy self-citation in the data citation phenomena. We explore 148 sampled citing articles in order to identify factors that influence data sharing and data re-use, including references, main text, supplementary data/information, acknowledgments, funding information, author information, and web/author resources. This study is unique in that it relies on a citer-based analysis approach and by analyzing peer-reviewed and published data, data repositories, and citing articles of highly productive authors where data sharing is most prevalent. This research is intended to provide a methodological and practical contribution to the study of data citation.
Circular economy (CE) is a term that exists since the 1970s and has acquired greater importance in the past few years, partly due to the scarcity of natural resources available in the environment and changes in consumer behavior. Cutting-edge technologies such as big data and internet of things (IoT) have the potential to leverage the adoption of CE concepts by organizations and society, becoming more present in our daily lives. Therefore, it is fundamentally important for researchers interested in this subject to understand the status quo of studies being undertaken worldwide and to have the overall picture of it. We conducted a bibliometric literature review from the Scopus Database over the period of 2006-2015 focusing on the application of big data/IoT on the context of CE. This produced the combination of 30,557 CE documents with 32,550 unique big data/IoT studies resulting in 70 matching publications that went through content and social network analysis with the use of 'R' statistical tool. We then compared it to some current industry initiatives. Bibliometrics findings indicate China and USA are the most interested countries in the area and reveal a context with significant opportunities for research. In addition, large producers of greenhouse gas emissions, such as Brazil and Russia, still lack studies in the area. Also, a disconnection between important industry initiatives and scientific research seems to exist. The results can be useful for institutions and researchers worldwide to understand potential research gaps and to focus future investments/studies in the field.
This paper presents an empirical study of the evolutionary patterns of national disciplinary profiles in research using a dataset extracted from Scopus covering publications from 45 nations for the period 1996-2015. Measures of disciplinary specializations and statistical models are employed to examine the distribution of disciplinary specializations across nations, the patterns of structural changes in the world's disciplinary profiles, and the evolutionary patterns of research profiles in individual nations. It is found that, while there has been a continuous process of convergence in national research profiles, nations differ greatly in their evolutionary patterns. Changes in national disciplinary profiles are decomposed into the regression effect and the mobility effect and both effects are analyzed for individual nations. The G7 and the BRICs countries are used as cases for the in-depth scrutiny. Policy implications based on the findings and directions for future research are discussed.
Important journals usually guide the research and development directions in academic circles. Therefore, it is necessary to find the important journals among a number of academic journals. This study presents a model named the multiple-link, mutually reinforced journal-ranking (MLMRJR) model based on the PageRank and the Hyperlink-Induced Topics Search algorithms that considers not only the quantity and quality of citations in intra-networks, but also the mutual reinforcement in inter-networks. First, the multiple links between four intra-networks and three inter-networks of paper, author, and journal are involved simultaneously. Second, a time factor is added to the paper citation network as the weight of the edges to solve the rank bias problem of the PageRank algorithm. Third, the author citation network and the co-authorship network are considered simultaneously. The results of a case study showed that the proposed MLMRJR model can obtain a reasonable journal ranking based on Spearman's and Kendall's ranking correlation coefficient and ROC curve analysis. This study provides a systematic view of such field from the perspective of measuring the prestige of journals, which can help researchers decide where to view publications and publish their papers, and help journal editors and organizations evaluate the quality of other journals and focus on the strengths of their own journals.
A few years ago Clarivate Analytics (formerly Thomson Reuters, provider of the Web of Science database) started evaluating publications in the sciences and social sciences with a view to identifying international highly-cited researchers (HCR) over a publication period of around 10 years. This Letter to the Editor presents the findings of a small study involving the analyses of some personal data (e.g. academic title) relating to the HCR 2015.
Try to imagine that a figure, a table or an explanatory box in your main manuscript gets cited, in addition to citations to the main paper. Some scientists would no doubt be ecstatic at this unrealistic opportunity of gathering additional citations. This paper highlights a case in which a text box (Unger and Couzin in Science 312(5770): 40-41, 2006. doi:10.1126/science.312.5770.40) within a larger paper (Couzin and Unger in Science 312(5770): 38-43, 2006. doi:10.1126/science.312.5770.38), as well as the paper itself, are both cited, 33 and 8 times, respectively, according to Clarivate Analytics' (formerly Thomson Reuters) Web of Science. Both papers were published in AAAS' Science. This paper explores details of these citations and shows how four papers between 2007 and 2015 have cited both papers, including the text box. The argument is put forward that citation of least divisible units of a paper, in this case, a text box, are unfair citation practices, and since they refer to the citation of a part of the same paper, the term "nested self-citation" has been coined. Given the attention given in recent times to citation manipulation, citation rings and inappropriate citations, the risks of nested self-citations, including the skewing of citation counts, and of not correcting potentially misleading information, need to be explored.
To date, the journal impact factor (JIF), owned by Thomson Reuters (now Clarivate Analytics), has been the dominant metric in scholarly publishing. Hated or loved, the JIF dominated academic publishing for the better part of six decades. However, a rise in the ranks of unscholarly journals, academic corruption and fraud has also seen the accompaniment of a parallel universe of competing metrics, some of which might also be predatory, misleading, or fraudulent, while others yet may in fact be valid. On December 8, 2016, Elsevier B.V. launched a direct competing metric to the JIF, CiteScore (CS). This short communication explores the similarities and differences between JIF and CS. It also explores what this seismic shift in metrics culture might imply for journal readership and authorship.
The knowledge workforce is changing: global economic factors, increasing professional specialization, and rapid technological advancements mean that more individuals than ever can be found working in independent, modular, and mobile arrangements. Little is known about professional information practices or actions outside of traditional, centralized offices; however, the dynamic, unconventional, and less stable mobile work context diverges substantially from this model, and presents significant challenges and opportunities for the accomplishing of work tasks. This article identifies 5 main information practices geared toward mobilizing work, based on in-depth interviews with 31 mobile knowledge workers (MKWs). It then uses these 5 practices as starting points for beginning to delineate the context of mobile knowledge work. We find that the information practices and information contexts of MKWs are mutually constitutive: challenges and opportunities of their work arrangements are what enable the development of practices that continually (re)construct productive spatial, temporal, social, and material contexts for work. This article contributes to an empirical understanding of the information practices of an increasingly visible yet understudied population, and to a theoretical understanding of the contemporary mobile knowledge work information context.
The objective of this paper is to systematically assess sources' (e.g., journals and proceedings) impact in knowledge trading. While there have been efforts at evaluating different aspects of journal impact, the dimension of knowledge trading is largely absent. To fill the gap, this study employed a set of trading-based indicators, including weighted degree centrality, Shannon entropy, and weighted betweenness centrality, to assess sources' trading impact. These indicators were applied to several time-sliced source-to-source citation networks that comprise 33,634 sources indexed in the Scopus database. The results show that several interdisciplinary sources, such as Nature, PLoS One, Proceedings of the National Academy of Sciences, and Science, and several specialty sources, such as Lancet, Lecture Notes in Computer Science, Journal of the American Chemical Society, Journal of Biological Chemistry, and New England Journal of Medicine, have demonstrated their marked importance in knowledge trading. Furthermore, this study also reveals that, overall, sources have established more trading partners, increased their trading volumes, broadened their trading areas, and diversified their trading contents over the past 15 years from 1997 to 2011. These results inform the understanding of source-level impact assessment and knowledge diffusion.
We present the results of running 4 different papers through the automated filtering system used by the open access preprint server "arXiv" to classify papers and implement quality control barriers. The exercise was carried out in order to assess whether these highly sophisticated, state-of-the-art filters can distinguish between papers that are controversial or have gone past their "sell-by date," and otherwise normal papers. We conclude that not even the arXiv filters, which are otherwise successful in filtering fringe-topic papers, can fully acquire "Domain-Specific Discrimination" and thus distinguish technical papers that are taken seriously by an expert community from those that are not. Finally, we discuss the implications this has for citizen and policy-maker engagement with the Primary Source Knowledge of a technical domain.
Information searching in practice seldom is an end in itself. In work, work task (WT) performance forms the context, which information searching should serve. Therefore, information retrieval (IR) systems development/evaluation should take the WT context into account. The present paper analyzes how WT features: task complexity and task types, affect information searching in authentic work: the types of information needs, search processes, and search media. We collected data on 22 information professionals in authentic work situations in three organization types: city administration, universities, and companies. The data comprise 286 WTs and 420 search tasks (STs). The data include transaction logs, video recordings, daily questionnaires, interviews. and observation. The data were analyzed quantitatively. Even if the participants used a range of search media, most STs were simple throughout the data, and up to 42% of WTs did not include searching. WT's effects on STs are not straightforward: different WT types react differently to WT complexity. Due to the simplicity of authentic searching, the WT/ST types in interactive IR experiments should be reconsidered.
This article reports on a longitudinal analysis of query logs of a web-based case library system during an 8-year period (from 2005 to 2012). The analysis studies 3 different information-seeking approaches: keyword searching, browsing, and case-based reasoning (CBR) searching provided by the system by examining the query logs that stretch over 8 years. The longitudinal dimension of this study offers unique possibilities to see how users used the 3 different approaches over time. Various user information-seeking patterns and trends are identified through the query usage pattern analysis and session analysis. The study identified different user groups and found that a majority of the users tend to stick to their favorite information-seeking approach to meet their immediate information needs and do not seem to care whether alternative search options will offer greater benefits. The study also found that return users used CBR searching much more frequently than 1-time users and tend to use more query terms to look for information than 1-time users.
We present the first systematic study of the influence of time on user judgements for rankings and relevance grades of web search engine results. The goal of this study is to evaluate the change in user assessment of search results and explore how users' judgements change. To this end, we conducted a large-scale user study with 86 participants who evaluated 2 different queries and 4 diverse result sets twice with an interval of 2 months. To analyze the results we investigate whether 2 types of patterns of user behavior from the theory of categorical thinking hold for the case of evaluation of search results: (a) coarseness and (b) locality. To quantify these patterns we devised 2 new measures of change in user judgements and distinguish between local (when users swap between close ranks and relevance values) and nonlocal changes. Two types of judgements were considered in this study: (a) relevance on a 4-point scale, and (b) ranking on a 10-point scale without ties. We found that users tend to change their judgements of the results over time in about 50% of cases for relevance and in 85% of cases for ranking. However, the majority of these changes were local.
Prominent news sites on the web provide hundreds of news articles daily. The abundance of news content competing to attract online attention, coupled with the manual effort involved in article selection, necessitates the timely prediction of future popularity of these news articles. The future popularity of a news article can be estimated using signals indicating the article's penetration in social media (e.g., number of tweets) in addition to traditional web analytics (e.g., number of page views). In practice, it is important to make such estimations as early as possible, preferably before the article is made available on the news site (i.e., at cold start). In this paper we perform a study on cold-start news popularity prediction using a collection of 13,319 news articles obtained from Yahoo News, a major news provider. We characterize the popularity of news articles through a set of online metrics and try to predict their values across time using machine learning techniques on a large collection of features obtained from various sources. Our findings indicate that predicting news popularity at cold start is a difficult task, contrary to the findings of a prior work on the same topic. Most articles' popularity may not be accurately anticipated solely on the basis of content features, without having the early-stage popularity values.
Both user involvement and system support play important roles in applying search tactics. To apply search tactics in the information retrieval (IR) processes, users make decisions and take actions in the search process, while IR systems assist them by providing different system features. After analyzing 61 participants' information searching diaries and questionnaires we identified various types of user involvement and system support in applying different types of search tactics. Based on quantitative analysis, search tactics were classified into 3 groups: user-dominated, system-dominated, and balanced tactics. We further explored types of user involvement and types of system support in applying search tactics from the 3 groups. The findings show that users and systems play major roles in applying user-dominated and system-dominated tactics, respectively. When applying balanced tactics, users and systems must collaborate closely with each other. In this article, we propose a model that illustrates user involvement and system support as they occur in user-dominated tactics, system-dominated tactics, and balanced tactics. Most important, IR system design implications are discussed to facilitate effective and efficient applications of the 3 groups of search tactics.
Recent, rapid changes in technology have resulted in a proliferation of choices for music storage and access. Portable, web-enabled music devices are widespread, and listeners now enjoy a plethora of options regarding formats, devices, and access methods. Yet in this mobile music environment, listeners' access and management strategies for music collections are poorly understood, because behaviors surrounding the organization and retrieval of music collections have received little formal study. Our current research seeks to enrich our knowledge of people's music listening and collecting behavior through a series of systematic user studies. In this paper we present our findings from interviews involving 20 adult and 20 teen users of commercial cloud music services. Our results contribute to theoretical understandings of users' music information behavior in a time of upheaval in music usage patterns, and more generally, the purposes and meanings users ascribe to personal media collections in cloud-based systems. The findings suggest improvements to the future design of cloud-based music services, as well as to any information systems and services designed for personal media collections, benefiting both commercial entities and listeners.
While there is significant progress with policy and a lively debate regarding the potential impact of open access publishing, few studies have examined academics' behavior and attitudes to open access publishing (OAP) in scholarly journals. This article seeks to address this gap through an international and interdisciplinary survey of academics. Issues covered include: use of and intentions regarding OAP, and perceptions regarding advantages and disadvantages of OAP, journal article publication services, peer review, and reuse. Despite reporting engagement in OAP, academics were unsure about their future intentions regarding OAP. Broadly, academics identified the potential for wider circulation as the key advantage of OAP, and were more positive about its benefits than they were negative about its disadvantages. As regards services, rigorous peer review, followed by rapid publication were most valued. Academics reported strong views on reuse of their work; they were relatively happy with noncommercial reuse, but not in favor of commercial reuse, adaptations, and inclusion in anthologies. Comparing science, technology, and medicine with arts, humanities, and social sciences showed a significant difference in attitude on a number of questions, but, in general, the effect size was small, suggesting that attitudes are relatively consistent across the academic community.
Although gender differences are known to exist in the publishing industry and in reader preferences, there is little public systematic data about them. This article uses evidence from the book-based social website Goodreads to provide a large scale analysis of 50 major English book genres based on author genders. The results show gender differences in authorship in almost all categories and gender differences the level of interest in, and ratings of, books in a minority of categories. Perhaps surprisingly in this context, there is not a clear gender-based relationship between the success of an author and their prevalence within a genre. The unexpected almost universal authorship gender differences should give new impetus to investigations of the importance of gender in fiction and the success of minority genders in some genres should encourage publishers and librarians to take their work seriously, except perhaps for most male-authored chick-lit.
Citation classics are not only highly cited, but also cited during several decades. We explore whether the peaks in the spectrograms generated by Reference Publication Years Spectroscopy (RPYS) indicate such long-term impact by comparing across RPYS for subsequent time intervals. Multi-RPYS enables us to distinguish between short-term citation peaks at the research front that decay within 10 years versus historically constitutive (long-term) citations that function as concept symbols. Using these constitutive citations, one is able to cluster document sets (e.g., journals) in terms of intellectually shared histories. We test this premise by clustering 40 journals in the Web of Science Category of Information and Library Science using multi-RPYS. It follows that RPYS can not only be used for retrieving roots of sets under study (cited), but also for algorithmic historiography of the citing sets. Significant references are historically rooted symbols among other citations that function as currency.
Using visual analytic systems effectively may incur a steep learning curve for users, especially for those who have little prior knowledge of either using the tool or accomplishing analytic tasks. How do users deal with a steep learning curve over time? Are there particularly problematic aspects of an analytic process? In this article we investigate these questions through an integrative study of the use of CiteSpacea visual analytic tool for finding trends and patterns in scientific literature. In particular, we analyze millions of interactive events in logs generated by users worldwide over a 14-month period. The key findings are: (i) three levels of proficiency are identified, namely, level 1: low proficiency, level 2: intermediate proficiency, and level 3: high proficiency, and (ii) behavioral patterns at level 3 are resulted from a more engaging interaction with the system, involving a wider variety of events and being characterized by longer state transition paths, whereas behavioral patterns at levels 1 and 2 seem to focus on learning how to use the tool. This study contributes to the development and evaluation of visual analytic systems in realistic settings and provides a valuable addition to the study of interactive visual analytic processes.
The aim of this paper is to extend our knowledge about the power-law relationship between citation-based performance and coauthorship patterns in papers in the natural sciences. We analyzed 829,924 articles that received 16,490,346 citations. The number of articles published through coauthorship accounts for 89%. The citation-based performance and coauthorship patterns exhibit a power-law correlation with a scaling exponent of 1.20 +/- 0.07. Citations to a subfield's research articles tended to increase 2.(1.20) or 2.30 times each time it doubled the number of coauthored papers. The scaling exponent for the power-law relationship for single-authored papers was 0.85 +/- 0.11. The citations to a subfield's single-authored research articles increased 2.(0.85) or 1.89 times each time the research area doubled the number of single-authored papers. The Matthew Effect is stronger for coauthored papers than for single-authored. In fact, with a scaling exponent <1.0 the impact of single-authored papers exhibits a cumulative disadvantage or inverse Matthew Effect.
The past 2 decades have witnessed the emergence of information as a scientific discipline and the growth of information schools around the world. We analyzed the current state of the iSchool community in the U.S. with a special focus on the evolution of the community. We conducted our study from the perspectives of acquiring talents and producing research, including the analysis on iSchool faculty members' educational backgrounds, research topics, and the hiring network among iSchools. Applying text mining techniques and social network analysis to data from various sources, our research revealed how the iSchool community gradually built its own identity over time, including the growing number of faculty members who received their doctorates from the field that studies information, the deviation from computer science and library science, the rising emphasis on the intersection of information, technology, and people, and the increasing educational and research homogeneity as a community. These findings suggest that iSchools in the U.S. are evolving into a mature and independent discipline with a more established identity.
Citation frequencies are commonly interpreted as measures of quality or impact. Yet, the true nature of citations and their proper interpretation have been the center of a long, but still unresolved discussion in Bibliometrics. A comparison of 67,578 pairs of studies on the same healthcare topic, with the same publication age (1-15 years) reveals that when one of the studies is being selected for citation, it has on average received about three times as many citations as the other study. However, the average citation-gap between selected or deselected studies narrows slightly over time, which fits poorly with the name-dropping interpretation and better with the quality and impact-interpretation. The results demonstrate that authors in the field of Healthcare tend to cite highly cited documents when they have a choice. This is more likely caused by differences related to quality than differences related to status of the publications cited.
Webometrics research methods can be effectively used to measure and analyze information on the web. One topic discussed vehemently online that could benefit from this type of analysis is vaccines. We carried out a study analyzing the web presence of both sides of this debate. We collected a variety of webometric data and analyzed the data both quantitatively and qualitatively. The study found far more anti- than pro-vaccine web domains. The anti and pro sides had similar web visibility as measured by the number of links coming from general websites and Tweets. However, the links to the pro domains were of higher quality measured by PageRank scores. The result from the qualitative content analysis confirmed this finding. The analysis of site ages revealed that the battle between the two sides had a long history and is still ongoing. The web scene was polarized with either pro or anti views and little neutral ground. The study suggests ways that professional information can be promoted more effectively on the web. The study demonstrates that webometrics analysis is effective in studying online information dissemination. This kind of analysis can be used to study not only health information but other information as well.
To understand the current state of a discipline and to discover new knowledge of a certain theme, one builds bibliometric content networks based on the present knowledge entities. However, such networks can vary according to the collection of data sets relevant to the theme by querying knowledge entities. In this study we classify three different bibliometric content networks. The primary bibliometric network is based on knowledge entities relevant to a keyword of the theme, the secondary network is based on entities associated with the lower concepts of the keyword, and the tertiary network is based on entities influenced by the theme. To explore the content and properties of these networks, we propose a tomographic content analysis that takes a slice-and-dice approach to analyzing the networks. Our findings indicate that the primary network is best suited to understanding the current knowledge on a certain topic, whereas the secondary network is good at discovering new knowledge across fields associated with the topic, and the tertiary network is appropriate for outlining the current knowledge of the topic and relevant studies.
Social network sites (SNSs) are growing in popularity and social significance. Although researchers have attempted to explain the effect of SNS use on users' psychological well-being, previous studies have produced inconsistent results. In addition, most previous studies relied on healthy students as participants; other cohorts of SNSs users, in particular people living with serious health conditions, have been neglected. In this study, we carried out semistructured interviews with users of the Ovarian Cancer Australia (OCA) Facebook to assess how and in what ways SNS use impacts their psychological well-being. A theoretical model was proposed to develop a better understanding of the relationships between SNS use and the psychological well-being of cancer patients. Analysis of data collected through a subsequent quantitative survey confirmed the theoretical model and empirically revealed the extent to which SNS use impacts the psychological well-being of cancer patients. Findings showed the use of OCA Facebook enhances social support, enriches the experience of social connectedness, develops social presence and learning and ultimately improves the psychological well-being of cancer patients.
The abundance of medical resources has encouraged the development of systems that allow for efficient searches of information in large medical image data sets. State-of-the-art image retrieval models are classified into three categories: content-based (visual) models, textual models, and combined models. Content-based models use visual features to answer image queries, textual image retrieval models use word matching to answer textual queries, and combined image retrieval models, use both textual and visual features to answer queries. Nevertheless, most of previous works in this field have used the same image retrieval model independently of the query type. In this article, we define a list of generic and specific medical query features and exploit them in an association rule mining technique to discover correlations between query features and image retrieval models. Based on these rules, we propose to use an associative classifier (NaiveClass) to find the best suitable retrieval model given a new textual query. We also propose a second associative classifier (SmartClass) to select the most appropriate default class for the query. Experiments are performed on Medical ImageCLEF queries from 2008 to 2012 to evaluate the impact of the proposed query features on the classification performance. The results show that combining our proposed specific and generic query features is effective in query classification.
This paper presents the findings from a survey study of UK academics and their publishing behaviour. The aim of this study is to investigate academics' attitudes towards and practice of open access (OA) publishing. The results are based on a survey study of academics at 12 Russell Group universities, and reflect responses from over 1800 researchers. This study found that whilst most academics support the principle of making knowledge freely available to everyone, the use of OA publishing among UK academics was still limited despite relevant established OA policies. The results suggest that there were differences in the extent of OA practice between different universities, academic disciplines, age and seniorities. Academics' use in OA publishing was also related to their awareness of OA policy and OA repositories, their attitudes towards the importance of OA publishing and their belief in OA citation advantage. The implications of these findings are relevant to the development of strategies for the implementation of OA policies.
Rating scales are used to elicit data about qualitative entities (e.g., research collaboration). This study presents an innovative method for reducing the number of rating scale items without the predictability loss. The "area under the receiver operator curve method" (AUC ROC) is used. The presented method has reduced the number of rating scale items (variables) to 28.57% (from 21 to 6) making over 70% of collected data unnecessary. Results have been verified by two methods of analysis: Graded Response Model (GRM) and Confirmatory Factor Analysis (CFA). GRM revealed that the new method differentiates observations of high and middle scores. CFA proved that the reliability of the rating scale has not deteriorated by the scale item reduction. Both statistical analysis evidenced usefulness of the AUC ROC reduction method.
The innovation literature often argues that major inventions arise through the cumulative synthesis of existing components and principles. An important economic phenomenon associated with this argument is the knowledge spillover. Although increasing attention has been paid to knowledge spillovers as a means to grasp innovation, little is known about its structural characteristics. This study examines the structural patterns of knowledge flow evidenced in patent citations by focusing on two aspects: the reciprocity of citations between technology sectors and the concentration of citing and cited sectors. The results indicate that the knowledge flow tends to be highly reciprocal within pairs of technology sectors instead of having a clear direction and that there are relatively low inflow and outflow concentrations in most sectors, although there are some exceptions. These results suggest that most technological sectors become both a knowledge provider and a knowledge consumer at the same time and they coevolve with reciprocal knowledge exchanges with each other.
Drawing upon the periodical publication delay model and the Weibull distribution model, we develop an improved model and conduct an exploratory analysis to characterize patent grant delay, and learn the crux of the problem. In order to test the effect of the new model, we perform an experiment based on a database of four technological fields from the United States Patent and Trademark Office. The results show that the new model can improve the fitting effect, and is suitable for calculating the time delay between patent application and grant. In addition, we apply the improved model in two different technological fields to study the changing rules in the last two decades by comparing the results, and obtain some valuable information. For a theoretical contribution, we deduce the examination probability under steady-state conditions, extend the periodical publication delay model from a negative exponential distribution to a Weibull distribution, and overcome the shortcomings of the original model.
This study investigates the relationship between characteristics of the firm's top management team (TMT) and its research and development (R&D) activities. Specifically, this research analyzes how observable characteristics of the TMT, such as functional experiences or educational background, and average tenure affect the firm's proportion of explorative R&D activities. From the perspective of the upper-echelon theory, we hypothesize that the TMT's functional experiences with R&D or science or engineering educational backgrounds increase the firm's tendency towards explorative R&D. Moreover, we propose that the average tenure of TMT members with innovation-related experiences would have a positive moderation effects on these relationships. The hypotheses are tested using a dataset containing biographical information of the TMT members, financial, and patent data of 89 firms in U.S. high-tech industries from 2006 to 2009. Firm's explorative R&D activities are analyzed using data on patent citations, patent classes, and non-patent references. The empirical analysis shows that the top managers' educational background in science or engineering as well as their previous functional experiences with R&D have a positive effect on the firm's explorative innovation activities. We also find that the size of these effects increases with a longer tenure of these TMT members. Our findings provide implications related to the effects of organizational characteristics on the establishment of a R&D strategy and highlight the role of TMT members with innovative experiences in directing a firm's R&D activities and outcomes.
Patent applicants and examiners do not always have the same point of view when conducting a prior-art search. Although several studies have suggested differences between citations by applicants and examiners, the data and range of empirical studies are too incomplete to generalize the characteristics of relationships between citation types and the value of a technology or invention. To overcome this limitation, it is crucial to compare citations by applicants and by examiners in depth, with diverse perspectives and data, to determine the value of patent information for technological innovation. Thus, this paper suggests that the differences in the composition of technical information and patent quality in patent-level investigations as well as the locus of the knowledge source and knowledge recentness in knowledge-level investigations according to patent citation type (by applicants and examiners) reflect Pavitt's perspective on the nature, impact, and source of technological innovation. We found that the quality of patents cited by applicants is higher than that of those by examiners in four industries, excluding a supplier-dominated industry. The citation types are related to the locus of the knowledge source in four industries, excluding the supplier-dominated industry. In particular, the patents cited by examiners tended to be more recently issued in all sectoral fields. This research contributes to confirming the technological value of patents based on the citation behaviors of applicants and examiners through empirical analysis. The results can be utilized to investigate signals or noise in technological innovation and improve processes or systems of patent examination. In addition, it can help applicants conduct more thorough prior-art searches by comprehending the examiner's perspective toward citations to increase the probability of patent registration.
According to the increasing importance of advanced technologies for economy growth and the incremental complexity of research and development management, a novel methodology is proposed in this paper to monitor the evolution trace of innovation sources. This approach focuses on the knowledge-transfer among technologies using patent cluster analysis. More specifically, a citation network model, consisting of patents in "Coherent Light Generators" classification, is established with the data collected from the United States Patent and Trademark Office. In addition, dynamical topological structure is investigated to probe into the overview properties and identify key milestones for the expanding citation network from 1976 to 2014. Next, a novel framework for patent clustering is developed to find out knowledge chunks of which internal knowledge-flows are dense while cut edges are sparse. Community detection algorithms are compared with different assessment indices based on citation network and the selected solution is improved using optimization objectives of cluster analysis. Then, the dynamical structure of the detected knowledge chunks is investigated and the evolution of innovation sources, identified by k-core decomposition, is monitored to unveil the technology development trace. Finally, analysis results are discussed and related conclusions are summarized. This article improves approaches for patent cluster analysis and develops a new follow-up investigation methodology for detected knowledge chunks. It is discovered there are not only scale increases, but also the integration for knowledge chunks during the focal period. Identifying the knowledge chunks which obtain rapid growth in both cluster scale and innovation source is useful to detect technology development opportunities.
The paradigm of the mobile ecosystem is rapidly changing, especially since the uction of smart devices. New important players are emerging, and the scope of the mobile ecosystem is expanding and encroaching on the technological boundaries of other IT ecosystems. However, our understanding of the mobile ecosystem has been limited given the few existing studies. Therefore, in this paper, we empirically examine the network structure of a mobile ecosystem by measuring the technology knowledge flows between firms based on a patent citation analysis of the mobile industry. We found that platform providers are emerging as the center of the knowledge flow activity in the ecosystem. By introducing a new network index, the Network Concentration Index, which measures a firm's knowledge concentration ratio toward subsectors, we also found that each platform provider shows distinctive changing patterns in their technology knowledge network and that each is forming its own sub-network with increased influence on affiliated firms.
Emerging technologies are often not part of any official industry, patent or trademark classification systems. Thus, delineating boundaries to measure their early development stage is a nontrivial task. This paper is aimed to present a methodology to automatically classify patents concerning service robots. We introduce a synergy of a traditional technology identification process, namely keyword extraction and verification by an expert community, with a machine learning algorithm. The result is a novel possibility to allocate patents which (1) reduces expert bias regarding vested interests on lexical query methods, (2) avoids problems with citation approaches, and (3) facilitates evolutionary changes. Based upon a small core set of worldwide service robotics patent applications, we derive apt n-gram frequency vectors and train a support vector machine, relying only on titles, abstracts, and IPC categorization of each document. Altering the utilized Kernel functions and respective parameters, we reach a recall level of 83% and precision level of 85%.
This paper provides an introduction to multi-source data fusion (MSDF) and comprehensively overviews the ingredients and challenges of MSDF in scientometrics. As compared to the MSDF methods in the sensor and other fields, and considering the features of scientometrics, in this paper an application model and procedure of MSDF in scientometrics are proposed. The model and procedure can be divided into three parts: data type integration, fusion of data relations, and ensemble clustering. Furthermore, the fusion of data relations can be divided into cross-integration of multi-mode data and matrix fusion of multi-relational data. To obtain a clearer and deeper analysis of the MSDF model, this paper further focuses on the application of MSDF in topic identification based on text analysis of scientific literatures. This paper also discusses the application of MSDF for the exploration of scientific literatures. Finally, the most suitable MSDF methods for different situations are discussed.
This paper provides an in-depth analysis of the characteristics of international patent families, including their domestic component. We exploit a relatively under-studied feature of patent families, namely the number of patents covering the same invention within a given jurisdiction. Using this information, we highlight common patterns in the structure of international patent families, which reflect both the patenting strategies of innovators and the peculiarities of the different patent systems. While the literature has extensively used family size, i.e. the number of countries in which a given invention is protected, as a measure of patent value, our results suggest that the number of patent filings in the priority country within a patent family as well as the timespan between the first and last filings within a family are other insightful indicators of the value of patented innovations.
The method of Characteristic Scores and Scales (CSS), previously developed for application at the macro- and meso-level, has been applied to individual author statistics. In particular, two datasets have been used. Firstly, authors with Thomson Reuters Researcher-ID, independently of the field where authors are publishing and, secondly, authors who are active in the field of scientometrics, independently whether they are registered authors or not. The objective is to find a parameter-free solution for citation-impact assessment at this level of aggregation that is insensitive to possible outliers. As in the case of any statistics, the only limitation is the lower bound, which has been set to 10 for the present analysis. The results demonstrate the usefulness of the CSS method at this level while also pointing to some remarkable statistical properties.
This paper seeks to contemplate a sequence of steps in connecting the fields of science, technology and industrial products. A method for linking different classifications (WoS-IPC-ISIC concordance) is proposed. The ensuing concordance tables inherit the roots of Grupp's perspective on science, technology, product and market. The study contextualized the linking process as it can be instrumental for policy planning and technology targeting. The presented method allows us to postulate the potential development of technology in science and industrial products. The proposed method and organized concordance tables are intended as a guiding tool for policy makers to study the prospects of a technology or industry of interest. Two perceived high potential technologies-traditional medicine and ICT-that were sought by two aspirant economies-Hong Kong and Malaysia-are considered as case studies for the proposed method. The selected cases provide us the context of what technological research is being pursued for both fundamental knowledge and new industries. They enable us to understand the context of policy planning and targeting for sectoral and regional innovation systems. While we note the constraints of using joint-publishing and joint-patenting data to study the core competencies of developing economies and their potential for development, we realize that the proposed method enables us to highlight the gaps between science and technology and the core competencies of the selected economies, as well as their prospects in terms of technology and product development. The findings provide useful policy implications for further development of the respective cases.
This paper is an early attempt of using co-citation analysis to sort out and to analyze the development and evolution of a latest hot area, business model. A dataset of 1498 records published between 1995 and 2015 is collected from Web of Science database. The empirical results show the latest hot topics in the business model study focus on business model innovation and value creation. In addition, technology oriented articles and strategy oriented articles provide some of the main perspectives of business model study. The conclusions and implications of business model in this paper will be particularly illuminating for both academic research and enterprises' practice application.
Industrial technology grouping is a common phenomenon that occurs as an industry develops and evolves. However, the research on innovation diffusion has given little attention to the role of industrial technology grouping. This paper extends the prior research to analyze the impact of industrial technology grouping on innovation diffusion within the framework of structural embeddedness. In our empirical study, we selected a sample of patents in the smart phone industry during the 2004-2014 period. We used both hierarchical regression analysis and patent citation analysis to explore the impact of industrial technology grouping on innovation diffusion in the two dimensions of clustering and bridging ties, which yielded several valuable results. First, industrial technology grouping is a common phenomenon in the development of industrial technology. Moreover, the dynamic changes of technology clusters are an important driving force shaping the trends and diversity of industrial technology. Second, industrial technology grouping does not have a significant effect on firm innovation diffusion, whereas structural embeddedness directly affects innovation diffusion. Third, industrial technology grouping positively moderates the impact of structural embeddedness on firm innovation diffusion in both dimensions of clustering and bridging ties.
This study is to examine the structure, characteristics, and change of a research collaboration network using co-assignee information of joint patents in Korea. The study was conducted in three stages: data collection, network construction, and network analysis. For network analysis, network topological analysis, node centrality analysis, and block modeling were performed in sequence. The analysis results revealed that the network has small-world properties. The results also showed that while government-sponsored research institutes (GRIs) played a role as a hub and bridge in the network in the early 2000s, universities gradually took their place to play a key role as a hub and bridge in the network. In addition, the block modeling analysis indicated that while, in the early 2000s, GRIcentered network density was shown to be high, the network density became concentrated around universities after 2004, and this tendency intensified after 2008. Bearing in mind a lack of empirical studies on inter-organizational research collaboration networks using patent data, this study made an academic contribution by specifically analyzing the structure and change of research collaboration networks by targeting Korea's major innovative actors. From the policy perspective, it provides important implications for figuring out the effects of university-industry-GRI (UIG) collaboration policies implemented so far, and can be of assistance for making evidence-based policies to build up a more effective UIG collaboration network or establish a new national science and technology innovation system.
Many challenges still remain in the processing of explicit technological knowledge documents such as patents. Given the limitations and drawbacks of the existing approaches, this research sets out to develop an improved method for searching patent databases and extracting patent information to increase the efficiency and reliability of nanotechnology patent information retrieval process and to empirically analyse patent collaboration. A tech-mining method was applied and the subsequent analysis was performed using Thomson data analyser software. The findings show that nations such as Korea and Japan are highly collaborative in sharing technological knowledge across academic and corporate organisations within their national boundaries, and China presents, in some cases, a great illustration of effective patent collaboration and co-inventorship. This study also analyses key patent strengths by country, organisation and technology.
Beyond explicit reviewer commentary, editors may rely on other metrics when evaluating manuscripts under consideration for publication. One potential, indirect measure of merit may be the ease or difficulty in identifying reviewers willing to review a given paper. We sought to determine whether reviewer decisions to agree or decline to review a manuscript are associated with manuscript acceptance. Original Research submissions to "Radiology'' from 1/1/2008 to 12/31/2011 were studied. Using Student's t tests, we studied the association between the ratio number-of-reviewers-declining: number-of-reviewers-agreeing to review manuscripts ("decline: agree ratio'') and editor decision to accept or reject the manuscript. A subgroup analysis of papers in which all four invited reviewers agreed to review the paper ("universal agree-to-review group'') was performed. Pearson's correlation was used to study decline: agree ratio and accepted manuscript citation rate. Original Research manuscript acceptance rate at Radiology was 14.5% (780/5375). Decline: agree ratio was similar between accepted and rejected manuscripts (0.87 +/- 0.84 versus 0.90 +/- 0.86 respectively, P = 0.35). "Universal agree-to-review'' papers were accepted at similar rates to other papers (15.7% [22/140] versus 14.5% [758/5235] respectively, P = 0.69). Higher decline: agree ratios corresponded to lower manuscript citation rates (r = 0.09, P = 0.048). Our study, based on the lack of correlation between agreement to review rate and acceptance rate to Radiology and the direct correlation between agreement to review rate and manuscript citation rate, suggests that reviewers may have a preference for manuscripts with greater potential scientific relevance, but that reviewer motivation to agree to review does not include the expectation of manuscript acceptance.
A dataset containing 111,616 documents in astronomy and astrophysics (Astroset) has been created and is being partitioned by several research groups using different algorithms. For this paper, rather than partitioning the dataset directly, we locate the data in a previously created model of the full Scopus database. This allows comparisons between using local and global data for community detection, which is done in an accompanying paper. We can begin to answer the question of the extent to which the rest of a large database (a global solution) affects the partitioning of a smaller journal-based set of documents (a local solution). We find that the Astro-set, while spread across hundreds of partitions in the Scopus map, is concentrated in only a few regions of the map. From this perspective there seems to be some correspondence between local information and the global cluster solution. However, we also show that the within-Astro-set links are only onethird of the total links that are available to these papers in the full Scopus database. The non-Astro-set links are significant in two ways: (1) in areas where the Astro-set papers are concentrated, related papers from non-astronomy journals are included in clusters with the Astro-set papers, and (2) Astro-set papers that have a very low fraction of within-set links tend to end up in clusters that are not astronomy-based. Overall, this work highlights limitations of the use of journal-based document sets to identify the structure of scientific fields.
Document clustering is generally the first step for topic identification. Since many clustering methods operate on the similarities between documents, it is important to build representations of these documents which keep their semantics as much as possible and are also suitable for efficient similarity calculation. As we describe in Koopman et al. (Proceedings of ISSI 2015 Istanbul: 15th International Society of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29 June to 3 July, 2015. Bogazici University Printhouse. http://www.issi2015.org/files/downloads/all-papers/1042.pdf, 2015), the metadata of articles in the Astro dataset contribute to a semantic matrix, which uses a vector space to capture the semantics of entities derived from these articles and consequently supports the contextual exploration of these entities in LittleAriadne. However, this semantic matrix does not allow to calculate similarities between articles directly. In this paper, we will describe in detail how we build a semantic representation for an article from the entities that are associated with it. Base on such semantic representations of articles, we apply two standard clustering methods, K-Means and the Louvain community detection algorithm, which leads to our two clustering solutions labelled as OCLC-31 (standing for K-Means) and OCLC-Louvain (standing for Louvain). In this paper, we will give the implementation details and a basic comparison with other clustering solutions that are reported in this special issue.
In this paper we use the information theoretic Infomap algorithm (Rosvall and Bergstrom in Proc Natl Acad Sci 105(4):1118-1123, 2008) iteratively in order to cluster the direct citation network of the Astro Data Set (publications in 59 astrophysical journals between 2003 and 2010.) We obtain 22 clusters of documents from the giant component of the network that we interpret as constituting 'topics' in the field of astrophysics. Upon investigation of the content of the topics we find a grouping of topics by shared features of their 'journal signature', that is the journals that are most characteristic for a topic due to their popularity and distinctiveness. These groups of topics match sub disciplines within the field. We generate a cognitive map of the field using a topic affinity network that shows what topics are disproportionally well connected (by citations) to other topics. The topology of the topic affinity network highlights a high-level organization of the field by sub-discipline and observational distance of the research object from Earth.
Clustering scientific publications in an important problem in bibliometric research. We demonstrate how two software tools, CitNetExplorer and VOSviewer, can be used to cluster publications and to analyze the resulting clustering solutions. CitNetExplorer is used to cluster a large set of publications in the field of astronomy and astrophysics. The publications are clustered based on direct citation relations. CitNetExplorer and VOSviewer are used together to analyze the resulting clustering solutions. Both tools use visualizations to support the analysis of the clustering solutions, with CitNetExplorer focusing on the analysis at the level of individual publications and VOSviewer focusing on the analysis at an aggregate level. The demonstration provided in this paper shows how a clustering of publications can be created and analyzed using freely available software tools. Using the approach presented in this paper, bibliometricians are able to carry out sophisticated cluster analyses without the need to have a deep knowledge of clustering techniques and without requiring advanced computer skills.
Based on a dataset on Astronomy and Astrophysics, hybrid cluster analyses have been conducted. In order to obtain an optimum solution and to analyse possible issues resulting from the bibliometric methodologies used, we have systematically studied three models and, within these models, two scenarios each. The hybrid clustering was based on a combination of bibliographic coupling and textual similarities using the Louvain method at two resolution levels. The procedure resulted in three clearly hierarchical structures with six and thirteen, seven and thirteen and finally five and eleven clusters, respectively. These structures are analysed with the help of a concordance table. The statistics reflect a high quality of classification. The results of these three models are presented, discussed and compared with each other. For labelling and interpreting clusters, core documents representing the obtained clusters are used. Furthermore, these core documents help depict the internal structure of the complete network and the clusters. This work has been done as part of the international project 'Measuring the Diversity of Research' and in the framework a special workshop on the comparative analysis of algorithms for the identification of topics in science organised in Berlin in August 2014.
In spite of recent advances in field delineation methods, bibliometricians still don't know the extent to which their topic detection algorithms reconstruct 'ground truths', i.e., thematic structures in the scientific literature. In this paper, we demonstrate a new approach to the delineation of thematic structures that attempts to match the algorithm to theoretically derived and empirically observed properties all thematic structures have in common. We cluster citation links rather than publication nodes, use predominantly local information and search for communities of links starting from seed subgraphs in order to allow for pervasive overlaps of topics. We evaluate sets of links with a new cost function and assume that local minima in the cost landscape correspond to link communities. Because this cost landscape has many local minima we define a valid community as the community with the lowest minimum within a certain range. Since finding all valid communities is impossible for large networks, we designed a memetic algorithm that combines probabilistic evolutionary strategies with deterministic local searches. We apply our approach to a network of about 15,000 Astronomy and Astrophysics papers published 2010 and their cited sources, and to a network of about 100,000 Astronomy and Astrophysics papers (published 2003-2010) which are linked through direct citations.
This paper describes how semantic indexing can help to generate a contextual overview of topics and visually compare clusters of articles. The method was originally developed for an innovative information exploration tool, called Ariadne, which operates on bibliographic databases with tens of millions of records (Koopman et al. in Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems. doi:10.1145/2702613.2732781, 2015b). In this paper, the method behind Ariadne is further developed and applied to the research question of the special issue "Same data, different results"-the better understanding of topic (re-)construction by different bibliometric approaches. For the case of the Astro dataset of 111,616 articles in astronomy and astrophysics, a new instantiation of the interactive exploring tool, LittleAriadne, has been created. This paper contributes to the overall challenge to delineate and define topics in two different ways. First, we produce two clustering solutions based on vector representations of articles in a lexical space. These vectors are built on semantic indexing of entities associated with those articles. Second, we discuss how LittleAriadne can be used to browse through the network of topical terms, authors, journals, citations and various cluster solutions of the Astro dataset. More specifically, we treat the assignment of an article to the different clustering solutions as an additional element of its bibliographic record. Keeping the principle of semantic indexing on the level of such an extended list of entities of the bibliographic record, LittleAriadne in turn provides a visualization of the context of a specific clustering solution. It also conveys the similarity of article clusters produced by different algorithms, hence representing a complementary approach to other possible means of comparison.
Visualization of literature-related information is common in scientometrics and related fields. Despite this, relatively little work has been done to visualize knowledge organization systems, such as controlled vocabularies or thesauri. In this paper we explore the creation and use of contextual visualizations based on thesauri. Two different methods are developed for creating maps of thesaurus terms that can then be used as templates or basemaps on which to display the contents of publication sets. The first example maps first-level terms from the Unified Astronomy Thesaurus into a wheel-like (hub and spokes) configuration. This circular map can then be used to show relative positions of clusters of astronomy papers from different cluster solutions based on the thesaurus terms assigned to the papers in the clusters. The second example triangulates the entire Public Library of Science (PLOS) thesaurus onto a global map of science, and then uses the resulting map of thesaurus terms as the basis for an overlay map. This map can be used for several purposes, including mapping of subsets of PLOS content, and the identification of thesaurus terms whose rule bases may need to be changed.
After a clustering solution is generated automatically, labelling these clusters becomes important to help understanding the results. In this paper, we propose to use a mutual information based method to label clusters of journal articles. Topical terms which have the highest normalised mutual information with a certain cluster are selected to be the labels of the cluster. Discussion of the labelling technique with a domain expert was used as a check that the labels are discriminating not only lexical-wise but also semantically. Based on a common set of topical terms, we also propose to generate lexical fingerprints as a representation of individual clusters. Eventually, we visualise and compare these fingerprints of different clusters from either one clustering solution or different ones.
This is the last paper in the Synthesis section of this special issue on 'Same Data, Different Results'. We first provide a framework of how to describe and distinguish approaches to topic extraction from bibliographic data of scientific publications. We then compare solutions delivered by the different topic extraction approaches in this special issue, and explore where they agree and differ. This is achieved without reference to a ground truth, since we have to assume the existence of multiple, equally important, valid perspectives and want to avoid bias through the adoption of an ad-hoc yardstick. Instead, we apply different ways to quantitatively and visually compare solutions to explore their commonalities and differences and develop hypotheses about the origin of these differences. We conclude with a discussion of future work needed to develop methods for comparison and validation of topic extraction results, and express our concern about the lack of access to non-proprietary benchmark data sets to support method development in the field of scientometrics.
Histories of information help clarify the values and intellectual commitments of the discipline. This study takes a rhetorical history approach to better understand the development of information studies as a discipline. Information studies historians have identified that the Cold War period was critical for the development of information science and consequently of its modern-day incarnations. Due to post-World War II prosperity, the 1960s saw a surge in interest in scientific and technical information. Many from government, education, and private sectors took interest in developing new ways to compete with Soviet science. This interest led to the National Science Foundation (NSF)-funded Georgia Tech conferences of 1961 and 1962, which are analyzed here. I find that concerns about the "science information problem" provided language that was critical for transforming some of the information studies' central concepts. In particular, I find that the idea of an "information scientist" was made possible by national funding for science information. I suggest that attending to the discursive traffic between public and disciplinary discourse of information studies can better attune the field to its intellectual commitments.
The omnipresence and escalating efficiency of digital, networked information systems alongside the resulting deluge of digital corpora, apps, software, and data has coincided with increased concerns in the humanities with new topics and methods of inquiry. In particular, digital humanities (DH), the subfield that has emerged as the site of most of this work, has received growing attention in higher education in recent years. This study seeks to facilitate a better understanding of digital humanities by studying the motivations and practices of digital humanists as information workers in the humanities. To this end, we observe information work through interviews with DH scholars about their work practices and through a survey of DH programs such as graduate degrees, certificates, minors, and training institutes. In this study we focus on how the goals behind methodology (a link between theories and method) surface in everyday DH work practices and in DH curricula in order to investigate if the critiques that have appeared in relation to DH information work are well founded and to suggest alternative narratives about information work in DH that will help advance the impact of the field in the humanities and beyond.
Researchers in information science and related areas have developed various methods for analyzing textual data, such as survey responses. This article describes the application of analysis methods from two distinct fields, one method from interpretive social science and one method from statistical machine learning, to the same survey data. The results show that the two analyses produce some similar and some complementary insights about the phenomenon of interest, in this case, nonuse of social media. We compare both the processes of conducting these analyses and the results they produce to derive insights about each method's unique advantages and drawbacks, as well as the broader roles that these methods play in the respective fields where they are often used. These insights allow us to make more informed decisions about the tradeoffs in choosing different methods for analyzing textual data. Furthermore, this comparison suggests ways that such methods might be combined in novel and compelling ways.
The domain of cultural variations in interpersonal communication is becoming increasingly important in various areas, including human-human interaction (e.g., business settings) and human-computer interaction (e.g., during simulations, or with social robots). User-generated content (UGC) in social media can provide an invaluable source of culturally diverse viewpoints for supporting the understanding of cultural variations. However, discovering and organizing UGC is notoriously challenging and laborious for humans, especially in ill-defined domains such as culture. This calls for computational approaches to automate the UGC sensemaking process by using tagging, linking, and exploring. Semantic technologies allow automated structuring and qualitative analysis of UGC, but are dependent on the availability of an ontology representing the main concepts in a specific domain. For the domain of cultural variations in interpersonal communication, no ontological model exists. This paper presents the first such ontological model, called AMOn+, which defines cultural variations and enables tagging culture-related mentions in textual content. AMOn+ is designed based on a novel interdisciplinary approach that combines theoretical models of culture with crowdsourced knowledge (DBpedia). An evaluation of AMOn+ demonstrated its fitness-for-purpose regarding domain coverage for annotating culture-related concepts mentioned in text corpora. This ontology can underpin computational models for making sense of UGC.
With the rapid development of the Internet and its applications, growing volumes of documents increasingly become interconnected to form large-scale document networks. Accordingly, topic modeling in a network of documents has been attracting continuous research attention. Most of the existing network-based topic models assume that topics in a document are influenced by its directly linked neighbouring documents in a document network and overlook the potential influence from indirectly linked ones. The existing work also has not carefully modeled variations of such influence among neighboring documents. Recognizing these modeling limitations, this paper introduces a novel Local Context-Aware LDA Model (LC-LDA), which is capable of observing a local context comprising a rich collection of documents that may directly or indirectly influence the topic distributions of a target document. The proposed model can also differentiate the respective influence of each document in the local context on the target document according to both structural and temporal relationships between the two documents. The proposed model is extensively evaluated through multiple document clustering and classification tasks conducted over several large-scale document sets. Evaluation results clearly and consistently demonstrate the effectiveness and superiority of the new model with respect to several state-of-the-art peer models.
Evaluation of information during information problem-solving processes already starts when trying to select the appropriate search result on a search engine results page (SERP). Up to now, research has mainly focused on the evaluation of webpages, while the evaluation of SERPs received less attention. Furthermore, task complexity is often not taken into account. A within-subjects design was used to study the influence of task complexity on search query formulation, evaluation of search results, and task performance. Three search tasks were used: a fact-finding, cause-effect, and a controversial topic task. To measure perceptual search processes, we used a combination of log files, eye-tracking data, answer forms, and think-aloud protocols. The results reveal that an increase in task complexity results in more search queries and used keywords, more time to formulate search queries, and more considered search results on the SERPs. Furthermore, higher ranked search results were considered more often than lower ranked results. However, not all the results for the most complex task were in line with expectations. These conflicting results can be explained by a lack of prior knowledge and the possible interference of prior attitudes.
Although modern search systems require minimal skill for meeting simple information needs, most systems provide weak support for gaining advanced skill; hence, the goal of designing systems that guide searchers in developing expertise. Essential to developing such systems are a description of expert search behavior and an understanding of how it may be acquired. The present study contributes a detailed analysis of the query behavior of 10 students as they completed assigned exercises during a semester-long course on expert search. Detailed query logs were coded for three dimensions of query expression: the information structure searched, the type of query term used, and intent of the query with respect to specificity. Patterns of query formulation were found to evidence a progression of instruction, suggesting that the students gained knowledge of fundamental system-independent constructs that underlie expert search, and that domain-independent search expertise may be defined as the ability to use these constructs. Implications for system design are addressed.
This paper explores the relationships between natural language lexicons in lexical semantics and thesauri in information retrieval research. These different areas of knowledge have different restrictions on use of vocabulary; thesauri are used only in information search and retrieval contexts, whereas lexicons are mental systems and generally applicable in all domains of life. A set of vocabulary requirements that defines the more concrete characteristics of vocabulary items in the 2 contexts can be derived from this framework: lexicon items have to be learnable, complex, transparent, etc., whereas thesaurus terms must be effective, current and relevant, searchable, etc. The differences in vocabulary properties correlate with 2 other factors, the well-known dimension of Control (deliberate, social activities of building and maintaining vocabularies), and Syntagmatization, which is less known and describes vocabulary items' varying formal preparedness to exit the thesaurus/lexicon, enter into linear syntactic constructions, and, finally, acquire communicative functionality. It is proposed that there is an inverse relationship between Control and Syntagmatization.
This study aims to identify different styles of personal digital information categorization based on the mindscape of the categorizers. To collect data, a questionnaire, a diary study, and 2 semistructured interviews were conducted with each of 18 participants. Then a content analysis was used to analyze the data. Based on the analysis of the data, this study identified 3 different types of categorizers: (i) rigid categorizers, (ii) fuzzy categorizers, and (iii) flexible categorizers. This study provides a unique way to understand personal information categorization by showing how it reflects the mindscapes of the categorizers. In particular, this study explains why people organize their personal information differently and have different tendencies in developing and maintaining their organizational structures. The findings provide insights on different ways of categorizing personal information and deepen our knowledge of categorization, personal information management, and information behavior. In practice, understanding different types of personal digital information categorization can make contributions to the development of systems, tools, and applications that support effective personal digital information categorization.
The practice of citation is foundational for the propagation of knowledge along with scientific development and it is one of the core aspects on which scholarship and scientific publishing rely. Within the broad context of data citation, we focus on the automatic construction of citations problem for hierarchically structured data. We present the learning to cite framework, which enables the automatic construction of human- and machine-readable citations with different levels of coarseness. The main goal is to reduce the human intervention on data to a minimum and to provide a citation system general enough to work on heterogeneous and complex XML data sets. We describe how this framework can be realized by a system for creating citations to single nodes within an XML data set and, as a use case, show how it can be applied in the context of digital archives. We conduct an extensive evaluation of the proposed citation system by analyzing its effectiveness from the correctness and completeness viewpoints, showing that it represents a suitable solution that can be easily employed in real-world environments and that reduces human intervention on data to a minimum.
Information and communication technologies (ICTs) provide a distinctive structure of opportunities with the potential to promote political engagement. However, concerns remain over unequal technological access in our society, as political resources available on the internet empower those with the resources and motivation to take advantage of them, leaving those who are disengaged farther behind. Hence, those who face digital inequalities are not only deprived of the benefits of the so-called Information Society, they are also deprived of exercising their civic rights. To promote political engagement among the marginalized, we analyze different sociotechnical factors that may play a role in promoting their inclusion in future political activities. We employed a survey for marginalized communities to analyze a set of research questions relating to sociotechnical factors. We show that online content creation, digital freedom, and access to the mobile Internet may positively impact political engagement. The development of these factors may not only promote the inclusion of marginalized populations in future political events, but also help to build a more equal society where everyone's voice has a chance to be heard.
In the information science literature, recent studies have used patent databases and patent classification information to construct network maps of patent technology classes. In such a patent technology map, almost all pairs of technology classes are connected, whereas most of the connections between them are extremely weak. This observation suggests the possibility of filtering the patent network map by removing weak links. However, removing links may reduce the explanatory power of the network on inventor or organization diversification. The network links may explain the patent portfolio diversification paths of inventors and inventing organizations. We measure the diversification explanatory power of the patent network map, and present a method to objectively choose an optimal tradeoff between explanatory power and removing weak links. We show that this method can remove a degree of arbitrariness compared with previous filtering methods based on arbitrary thresholds, and also identify previous filtering methods that created filters outside the optimal tradeoff. The filtered map aims to aid in network visualization analyses of the technological diversification of inventors, organizations, and other innovation agents, and potential foresight analysis. Such applications to a prolific inventor (Leonard Forbes) and company (Google) are demonstrated.
Journal rankings, frequently determined by the journal impact factor or similar indices, are quantitative measures for evaluating a journal's performance in its discipline, which is presently a major research thrust in the bibliometrics field. Recently, text mining was adopted to augment journal ranking-based evaluation with the content analysis of a discipline taking a time-variant factor into consideration. However, previous studies focused mainly on a silo analysis of a discipline using either citation-or content-oriented approaches, and no attempt was made to analyze topical journal ranking and its change over time in a seamless and integrated manner. To address this issue, we propose a journal-time-topic model, an extension of Dirichlet multinomial regression, which we applied to the field of bioinformatics to understand journal contribution to topics in a field and the shift of topic trends. The journal-time-topic model allows us to identify which journals are the major leaders in what topics and the manner in which their topical focus. It also helps reveal an interesting distinct pattern in the journal impact factor of high- and low-ranked journals. The study results shed a new light for understanding topic specific journal rankings and shifts in journals' concentration on a subject.
The Internet has triggered transformational change in the dissemination of science in the form of a global transition to open access (OA) publishing. Heavy investment favoring Gold over Green OA has been associated with increased total publication costs, inequality of opportunity to publish, and concerns about integrity in science reporting. Notwithstanding current fluidity because of ongoing competition for market share between supporters of the major alternative publishing strategies, emerging trends indicate the need for material and human resources to be redirected away from Gold and toward Green OA. Doing so will reduce total publication costs, increase equality of access for authors and readers, and remove the financial incentives that have encouraged poor and corrupt publishing practices.
This paper presents a simple parametric statistical approach to comparing different citation-based journal ranking metrics within a single academic field. The mechanism can also be used to compare the same metric across different academic fields. The mechanism operates by selecting an optimal normalization factor and an optimal distributional adjustment for the rank-score curve, both of which are instrumental in making sound intermetric and interfield journal comparisons.
Given the complexity of questions studied by academicians, institutions are increasingly encouraging interdisciplinary research to tackle these problems; however, neither the individual-level pathways leading to the pursuit of interdisciplinary research nor the resulting market outcomes have been closely examined. In this study, we focus attention on the individuals who complete interdisciplinary dissertations to ask "who are they and how do they fare after earning the PhD?" Since interdisciplinary research is known to be relatively risky among academics, we examine demographic variables that are known to be associated in other contexts with risk-taking before considering whether interdisciplinarians' outcomes are different upon graduating. First among our three main findings, students whose fathers earned a college degree demonstrated a 1.3% higher probability of pursuing interdisciplinary research. Second, the probability that non-citizens pursue interdisciplinary dissertation work is 4.6% higher when compared with US citizens. Third, individuals who complete an interdisciplinary dissertation tend to earn approximately 2% less in the year after graduation; however, mediation analyses show that the decision to become a postdoctoral researcher accounts for the apparent salary penalty. Our findings shed light on the antecedents and near-term consequences for individuals who complete interdisciplinary dissertations and contribute to broader policy debates concerning supports for academic career paths.
Scientific research activities cluster in cities or towns. Modern cities can play a crucial role in the national or regional innovation system. Strengthening R&D collaboration between cities can contribute to perfectly integrating various regional innovation systems. Using the cross-sectional co-patent data of the Chinese Patent Database as a proxy for R&D collaboration, this paper investigates the spatial patterns of R&D collaborations between 224 Chinese cities and the major factors that affect cross-city R&D collaborations in China. A spatial interaction model was used to examine how spatial, economic, technological and political factors affect cross-city R&D collaborations. The degree of centrality shows that cross-city collaborative R&D activities mainly occur in favored regions, advanced municipalities and coastal regions. The mean collaboration intensity for intra-provincial cross-city collaborations is 4.74; however, for inter-provincial collaborations, it is 0.69. The econometric findings reveal that spatial, economic, technological and political bias factors do yield significant influences on the frequency of cross-city R&D collaboration. Specifically, as evidenced by the model coefficient, it is more likely that R&D collaborations occur among cities that are connected by high-speed railways.
This research used a cell structure map to visualize technological evolution and showed the developmental trend in a technological field. The basic concept was to organize patents into a map produced by growing cell structures. The map was then disassembled into clusters with similar contexts using the Girvan-Newman algorithm. Next, the continuity between clusters in two snapshots was identified and used as the base for establishing a trajectory in the technology. An analysis of patents in the flaw detection field found that the field was composed of several technological trajectories. Among them, ultrasonic flaw detection, wafer inspection and substrate inspection were relatively larger and more continuing technologies, while infrared thermography defect inspection has been an emerging topic in recent years. It is to be hoped that the map of technology constructed in this research provides insights into the history of technological evolution and helps explain the transition patterns through changes in cluster continuity. This can serve a reference point by experts who attempt to visualize the mapping of technological development or identify the latest focus of attention.
Promoting knowledge diffusion and reducing the delay between scientific research and technology patents is important to achieve success in the highly competitive global environment. This paper studies the time delay between scientific research and technology patents, and focuses on the key components of time in the promotion of knowledge transformation. Based on United States Patent and Trademark Office patent data, we apply periodical citation distribution models to the patent process. The results show that our transfer function model is better than others, and is suitable for calculating the delay between basic scientific research activities and technology patents.
The ongoing discussion in the bibliometric community about the best similarity measures has led to diverse insights. Although these insights are sometimes contradicting, there is one very consistent conclusion: Hybrid measures outperform the application of their singular components. While this initially answers the question as to what is the best similarity measure, it also raises issues which have been resolved in part for conventional similarity measures. Given this, in this study we investigate the impact of the right weighting factors, the appropriate level of edge cutting, the performance of first- in contrast to second-order similarities, and the interaction of these three parameters in the context of hybrid similarities. Building upon a dataset of over 8000 articles from the manufacturing engineering field and using different parameter settings we calculated over 100 similarity matrices. For each matrix we determined several cluster solutions of different resolution levels, ranging from 100 to 1000 clusters, and evaluated them quantitatively with the help of a textual coherence value based on the Jensen Shannon Divergence. We found that second-order hybrid similarity measures calculated with a weighting factor of 0.6 for the citation-based similarity and a reduction to only the strongest values yield the best clustering results. Furthermore, we found the assessed parameters to be highly interdependent, where for example hybrid first-order outperforms second-order when no edge cutting is applied. Given this, our results can serve the bibliometric community as a guideline for the appropriate application of hybrid measures.
When the technological development of an enterprise is path dependent, core technological competencies will develop. In addition, core technological competencies promote technological development. Consequently, enterprises should always examine the advantages of their core technological competencies. Under dynamic competition, enterprises should monitor their own performance as well as their competitors at all times and consequently adjust their technological strategies. This study used two patent indices, Patent Share and Revealed Technological Advantage, to measure the internal core technological competencies of manufacturers. It also integrated four other indices namely: (1) Technology Attractiveness (Relative Growth Rate), (2) growth potential of technologies (Relative Development of Technology Growth Rate), (3) Relative Patent Position, and (4) Revealed Patent Advantages. These were used to analyze the external strengths and weaknesses of the research and development (R&D) portfolios of companies. These two analytical methods can effectively identify the internal core technological competencies and the external advantages of R&D portfolios of leading companies in the solar photovoltaic (PV) industry. This study also discussed the relationship between R&D portfolios and core technological competencies of leading solar photovoltaic companies and compared those with two core technological competencies with those that have a single core technological competence. The study results show that the R&D portfolios of companies engaged in a single, specific technology field have advantages. This study helps improve the quality of technological planning and decision-making of manufacturers, proposes a method of using core technological competencies to analyze the advantages of R&D portfolios, and helps solar PV manufacturers monitor their own core technological competencies as well as their competitors and partner companies.
Finding out the characteristics of patent applications that lead to successful grants is an important and yet under-investigated topic in the scientometric literature. Using data from financial fraud-related patent applications submitted to the United States Patent and Trademark Office (USPTO), this study aims to determine which factors that can be influenced by inventors relate to successful patent grants. A descriptive statistical model is proposed to estimate the likelihood of a patent document being granted by the USPTO based on a number of explanatory variables. The following factors are among the notable statistically significant determinants for the studied patent sample: number of drawings, drafting aggressiveness, proportion of granted patent prior art references, proportion of web-based non-patent literature references, subclass specialization, and representation by a patent attorney or agent. The implications of these empirical findings are discussed in the context of entrepreneurship.
Collaboration is a major factor in the knowledge and innovation creation in emerging science-driven industries where the technology is rapidly changing and constantly evolving, such as nanotechnology. The objective of this work is to investigate the role of individual scientists and their collaborations in enhancing the knowledge flows, and consequently the scientific production. The methodology involves two main phases. First, the data on all the nanotechnology journal publications in Canada was extracted from the SCOPUS database to create the co-authorship network, and then employ statistical data mining techniques to analyze the scientists' research performance and partnership history. Also, a questionnaire was sent directly to the researchers selected from our database seeking the predominant properties that make a scientist sufficiently attractive to be selected as a research partner. In the second phase, an agent-based model using Netlogo has been developed to study the network in its dynamic context where several factors could be controlled. It was found that scientists in centralized positions in such networks have a considerable positive impact on the knowledge flows, while loyalty and strong connections within a dense local research group negatively affect the knowledge transmission. Star scientists appear to play a substitutive role in the network and are selected when the usual collaborators, i.e., most famous, and trustable partners are scarce or missing.
Of the existing theoretical formulas for the h-index, those recently suggested by Burrell (J Informetr 7: 774-783, 2013b) and by Bertoli-Barsotti and Lando (J Informetr 9(4): 762-776, 2015) have proved very effective in estimating the actual value of the h-index Hirsch (Proc Natl Acad Sci USA 102: 16569-16572, 2005), at least at the level of the individual scientist. These approaches lead (or may lead) to two slightly different formulas, being based, respectively, on a "standard'' and a "shifted'' version of the geometric distribution. In this paper, we review the genesis of these two formulas-which we shall call the "basic'' and "improved'' Lambert-W formula for the h-index-and compare their effectiveness with that of a number of instances taken from the well-known Glanzel-Schubert class of models for the h-index (based, instead, on a Paretian model) by means of an empirical study. All the formulas considered in the comparison are "ready-to-use'', i.e., functions of simple citation indicators such as: the total number of publications; the total number of citations; the total number of cited paper; the number of citations of the most cited paper. The empirical study is based on citation data obtained from two different sets of journals belonging to two different scientific fields: more specifically, 231 journals from the area of "Statistics and Mathematical Methods'' and 100 journals from the area of "Economics, Econometrics and Finance'', totaling almost 100,000 and 20,000 publications, respectively. The citation data refer to different publication/citation time windows, different types of "citable'' documents, and alternative approaches to the analysis of the citation process ("prospective'' and "retrospective''). We conclude that, especially in its improved version, the Lambert-W formula for the h-index provides a quite robust and effective ready-to-use rule that should be preferred to other known formulas if one's goal is (simply) to derive a reliable estimate of the h-index.
Using social network analysis to examine patenting data available at the USPTO, this paper explores an evolutionary process of global nanotechnology collaboration network from the perspective of entry and exit of collaborative organizations (nodes) and network's preferential attachment process. The results show that the nanotechnology collaboration network evolved through frequent updates of the nodes and their relations (links). Compared with degree centrality and closeness centrality, betweenness centrality of an existing node was a significantly better predictor of the preferential attachment. The nodes with higher betweenness centrality were more influential to attract other nodes. This fact is observed while the network evolved. The results reveal that the core nodes with higher betweenness centrality were mostly large organizations that were equipped with core technology. They played an important broker role attracting more organizations into collaboration.
Data sets of publication meta data with manually disambiguated author names play an important role in current author name disambiguation (AND) research. We review the most important data sets used so far, and compare their respective advantages and shortcomings. From the results of this review, we derive a set of general requirements to future AND data sets. These include both trivial requirements, like absence of errors and preservation of author order, and more substantial ones, like full disambiguation and adequate representation of publications with a small number of authors and highly variable author names. On the basis of these requirements, we create and make publicly available a new AND data set, SCAD-zbMATH. Both the quantitative analysis of this data set and the results of our initial AND experiments with a naive baseline algorithm show the SCAD-zbMATH data set to be considerably different from existing ones. We consider it a useful new resource that will challenge the state of the art in AND and benefit the AND research community.
As a typical multi-criteria group decision making (MCGDM) problem, research and development (R&D) project selection involves multiple decision criteria which are formulated by different frames of discernment, and multiple experts who are associated with different weights and reliabilities. The evidential reasoning (ER) rule is a rational and rigorous approach to deal with such MCGDM problems and can generate comprehensive distributed evaluation outcomes for each R&D project. In this paper, an ER rule based model taking into consideration experts' weights and reliabilities is proposed for R&D project selection. In the proposed approach, a utility based information transformation technique is applied to handle qualitative evaluation criteria with different evaluation grades, and both adaptive weights of criteria and utilities assigned to evaluation grades are introduced to the ER rule based model. A nonlinear optimisation model is developed for the training of weights and utilities. A case study with the National Science Foundation of China is conducted to demonstrate how the proposed method can be used to support R&D project selection. Validation data show that the evaluation results become more reliable and consistent with reality by using the trained weights and utilities from historical data.
We propose an improvement over the co-word analysis method based on semantic distance. This combines semantic distance measurements with concept matrices generated from ontologically based concept mapping. Our study suggests that the co-word analysis method based on semantic distance produces a preferable research situation in terms of matrix dimensions and clustering results. Despite this method's displayed advantages, it has two limitations: first, it is highly dependent on domain ontology; second, its efficiency and accuracy during the concept mapping progress merit further study. Our method optimizes co-word matrix conditions in two aspects. First, by applying concept mapping within the labels of the co-word matrix, it combines words at the concept level to reduce matrix dimensions and create a concept matrix that contains more content. Second, it integrates the logical relationships and concept connotations among studied concepts into a co-word matrix and calculates the semantic distance between concepts based on domain ontology to create the semantic matrix.
Several studies have examined the relationships between citation and download data. Some have also analyzed disciplinary differences in the relationships by comparing a few subject areas or a few journals. To gain a deeper understanding of the disciplinary differences, we carried out a comprehensive study investigating the issue in five disciplines of science, engineering, medicine, social sciences, and humanities. We used a systematic method to select fields and journals ensuring a very broad spectrum and balanced representation of various academic fields. A total of 69 fields and 150 journals were included. We collected citation and download data for these journals from China Academic Journal Network Publishing Database, the largest Chinese academic journal database in the world. We manually filtered out non-research papers such as book reviews and editorials. We analyzed the relationships both at the journal and the paper level. The study found that social sciences and humanities are different from science, engineering, and medicine and that the pattern of differences are consistent across all measures studied. Social sciences and humanities have higher correlations between citations and downloads, higher correlations between downloads per paper and Journal Impact Factor, and higher download-to-citation ratios. The disciplinary differences mean that the accuracy or utility of download data in measuring the impact are higher in social sciences and humanities and that download data in those disciplines reflect or measure a broader impact, much more than the impact in citing authors.
The objective of this study is to determine the role of gender and faculty rank in explaining variance in individual research impact and productivity for social work doctoral faculty. Research impact and productivity were assessed with the H-Index, which is a widely used citation index measure that assesses the quality and quantity of published research articles. We compared the individual H-Index scores for all doctoral level social work faculty from doctoral programs in the United States (N = 1699). Differences in H-Index means were assessed between genders at each tenure-track faculty rank, and between faculty ranks for each gender. Both gender and faculty rank were associated with differences in scholarly impact and productivity. Although men had higher H-Index scores than women in all faculty ranks, the gender gap was the greatest between men and women at the Full Professor level. The gender gap was least pronounced at the Associate Professor level, where women's H-Index scores were closer to those of men. Results support previous studies in which women in the social sciences have lower H-Index scores than men. The diminished gap between men and women at the Associate Professor level may suggest that women get promoted to Full Professor less frequently than men at comparable career milestones. While the results of this study are consistent with the argument that women face unique barriers to academic promotion and other forms of academic success in social work, these results do not explain any specific barriers that may cause the gender gap.
In the first part of our study (Zhang and Glanzel in Scientometrics, 2017) we provided a view of the literature ageing based on a synchronous approach. Taking up the ideas by Egghe (Scientometrics 27(2):195-214, 1993) and Glanzel et al. (Scientometrics 109(3):2165-2179, 2016) we extend our study in the second part by applying a diachronous approach on the basis of citing literature. For this purpose we used the Prospective Price Index which was recently introduced by Glanzel et al. (Scientometrics 109(3):2165-2179, 2016). Finally, we compare the two aspects of literature ageing. In particular, we analyze the correlation between the share of recent references and the share of fast response, and found a generally positive correlation between both aspects at different levels of aggregation (subfields, major fields and the individual paper level). However, the consistence varies among different aggregations. For examples, on the level of subject fields, Chemistry, Biology, Neuroscience & Behavior are found with evidently better ranks by Prospective Price Index than Price Index, indicating their faster ageing process in the mirror of citations than references, while Engineering and Social sciences are found with the opposite ageing features. At the journal level, we observed a striking divergence between the reference and citation ageing patterns in some cases. Thus several journals proved 'hard' from the perspective of information sources (cited papers) but, at the same time, rather 'soft' in the light of information targets (citing papers).
We provide a view of the literature aging features in the sciences and social sciences, at different aggregation levels, of major fields, subfields, journals and individual papers and from different perspectives. Against to the wide belief that scientific literature may become more rapidly obsolete, we found that, in general, the share of more recent references were distinctly lower in 2014 than that in 1992, which holds for all aggregation levels. As exceptions, the subfields related to Chemistry and the subfield energy and fuels, have shown a clear trend to cite more recent literature than older articles. Particle and nuclear physics and astronomy and astrophysics, the two subfields which strongly rely on e-print archives, have shown a 'polarization' tendency of reference distribution. Furthermore, we stress that it is very important to measure the Price Index at the paper level to account for differences between the documents published in the same journals and (sub-)fields.
Facebook has become the object of research in different areas. The present study presents a bibliometric analysis of the scientific literature related to the use of this social network in educational research. To this end, bibliometric techniques were applied in the analysis of scientific articles indexed at the Web of Science Core Collection, from Thomson Reuters, and linked to the research areas of Education/Educational Research. This resulted in the identification of, among others, developments in scientific production, the most important journals that publish papers on the topic, the main authors and the main articles published in the area. The results indicate the growth of scientific production in the area from 2008 onwards, pointing to Computers and Education as the most relevant journal by number of publications (22) and impact factor and indicate that authors from the United States, Australia, Taiwan, United Kingdom and South Africa stand out in the construction of knowledge on educational research applying Facebook. Moreover, the ego-network of the Educational Research area shows that this area coexists with other areas of knowledge in the use of social networking, such as Computer Science, Linguistics and Health Sciences, indicating an interdisciplinary and transversal nature in different areas of research.
In recent years, China's urbanization has developed very quickly. Many scholars have conducted China's urbanization research (CUR) and have published a large number of articles. With CUR as a case, we construct the dynamic co-word network to analyze the characteristics of development about the knowledge system (KS). We draw several conclusions from this research. (1) The development of CUR possesses small-world characteristics and scale-free effects. The co-word network of CUR increases significantly to a large-scale network. (2) Betweenness centrality and eigenvector centrality of the dynamic co-word networks positively correlate with node degree. The popular nodes connecting with a large number of topics are also the nodes that occur in the critical paths. The popular keywords in CUR also bridge distant clusters of the related topics. (3) The clustering coefficients indicate that a number of topics with low degrees tend to relate to the adjacent topics more directly to form "conglobation" clusters. The network is a hierarchical clustered structure. The hub keywords play a crucial role in bridging distinct clusters of highly associated keywords and make them form an integrated network. (4) Since 2003, the CUR has begun to develop systematically. From 1998 to 2015, the hotspots in CUR varied diversely, which are highly correlated with social issues and public concerns. (5) We proposed several practical implications on the development of CUR from the dynamic co-word network measures. Beyond the case of CUR's KS, we hope the versatility of methods in this research also provides enlightenment for other KS studies.
We have compiled information from scientific articles from the ISI Web of Knowledge database (Thomson Reuters) to monitor trends and identify gaps in the use of functional diversity in studies of macroinvertebrates in stream ecosystems. We identified regions, periodicals, taxa and set of functional traits, which were more frequently used. The use of functional diversity with stream macroinvertebrates has proved to be recent, intended to answer the negative effects of land use on the community, especially of aquatic insects and mainly distributed in the European and North American continents. The meaning of terms used in the functional approach differ from concepts found in earlier studies, which may cause misunderstandings in interpretations and comparisons between articles. In addition to this issue, absence of species lists and functional values, and the combination of binary or continuous data and different taxonomic levels also hinder these comparisons. The standardization of terminology has been proposed previously and would simplify the use of functional traits, facilitating the understanding and the search for articles. We highlight the need for funding agencies to determine measures that force researchers to provide the databases after a certain period of time, thus contributing to the creation of a database in other continents, as has been occurring especially in Europe. Such initiative strengthens research, especially those distributed in areas undergoing intense human pressure.
International collaboration in research is increasingly recognized as an important component of both research and internationalization priorities by higher education institutions. This study analyzed the input-output trends of international research collaboration at five U.S. public universities using quantitative research metrics. We also tested these set of metrics to understand its individual direct relationship with international research collaboration using binary logistic regression. Results showed that international faculty, research funding, research influence, and academic impact were statistically significant (P < 0.05) and can serve as single predictors of international research collaboration for the five universities. Findings should provide international officers and research managers with clear sample data and metrics, and their association to make judgments and decisions on the value and impact of international research collaborations as they relate to overall research progress, productivity and research quality of U.S. universities.
Scholarly monograph authors are compared to other authors, based on bibliographic data registered in the VABB-SHW database from Flanders (Belgium). Monograph authors are found to be most often established male researchers with high productivity, who are relatively less involved in research collaboration (co-authored publications) than are other authors. There exists a clear divergence between most of the individual social science disciplines, where monograph authors make up a marginal share of all authors, and several humanities disciplines where shares are up to one fifth. Relatively more female and non-established authors publish monographs in the humanities compared to the social sciences. A statistical comparison of productivity points to diverging publication patterns in Flemish SSH research: the group of most productive authors counts both monograph authors who also rely on other book publication types, and other authors who publish mostly journal articles.
Women academics publish less frequently than men, they may also be subject to discrimination and gender bias in men dominated disciplines. Citation metrics are advocated by assessment bodies and advisory agencies to standardise research assessment. We focus on the metrics suggested for assessment and newer metrics that capture additional dimensions of performance. Our data comes from a sample of accounting authors from 1958 to 2008. The results suggest that citation metrics that accommodate excess citations, such as the e-index, tend to treat women researchers more favourably, and offer an evaluation of research performance that is better able to reflect the type of research output profile that is more typical for women.
This work characterized the research community of supply chain analytics (SCA) with respect to coauthorship, a special kind of collaboration. A characterization of coauthorship in terms of researchers' countries, institutions and individuals was elaborated, so three different one-mode networks were studied. Besides, the SCA research community is characterized in terms of Supply Chain Management (SCM) research streams. Coauthorship among researchers working on different streams is also analyzed. Metrics that depict the importance of the network nodes were studied such as degree, betweenness and closeness. This study found out an intense collaboration between USA and countries such as China, India, United Kingdom and Canada. Researchers from Canada and Ireland are better situated (central) in the network, although they have not published a considerable amount of papers. The presence of cliques and the small-world effect were also observed in these networks. In terms of research streams, more research on SCA located at the Strategic Management, Technology-focused and Logistics streams was found. The most common links between research streams are on the one side, Technology-focused with both Strategic Management and Logistics and on the other side Strategic Management with both Logistics and Organizational behavior. SCA researchers are rarely working with a focus on Marketing. This study contributes to the SCA literature by identifying the most central actors in this area and by characterizing the area in terms of SCM research streams. This study may contribute to the development of more focused research incentive programs and collaborations.
Skills underlying scientific innovation and discovery generally develop within an academic community, often beginning with a graduate mentor's laboratory. In this paper, a network analysis of doctoral student-dissertation advisor relationships in The Academic Family Tree indicates the pattern of Nobel laureate mentoring relationships is non-random. Nobel laureates had a greater number of Nobel laureate ancestors, descendants, mentees/grandmentees, and local academic family, supporting the notion that assortative processes occur in the selection of mentors and mentees. Subnetworks composed entirely of Nobel laureates extended across as many as four generations. Several successful mentoring communities in high-level science were identified, as measured by number of Nobel laureates within the community. These communities centered on Cambridge University in the latter nineteenth century and Columbia University in the early twentieth century. The current practice of building web-based academic networks, extended to include a wider variety of measures of academic success, would allow for the identification of modern successful scientific communities and should be promoted.
Assorted bibliometric indices have been proposed leading to ambiguity in choosing the appropriate metric for evaluation. On the other hand, attempts to fit universal distribution patterns to scientific output have not converged to unified conclusions. To this end, we introduce the concept of fractal dimension to further examine the citation curve of an author. The fractal dimension of the citation curve could provide insight in its shape and form, level of skewness and distance from uniformity as well as the existing publishing patterns, without a priori assumptions on the particular citation distribution. It is shown that the notion of fractal dimension is not correlated to other well-known bibliometric indices. Further, a thorough experimentation of the fractal dimension is presented by using a set of 30,000 computer scientists and more than 9 million publications with over 38 million citations. The distinguishing power of the fractal dimension is investigated when comparing the impact of scientists and when trying to identify award winning scientists in their respective fields.
The aim of the present study is to conduct a bibliometric analysis on the association between the themes 'behavioral finance' and 'financial and managerial decision making', and the cognitive biases 'overconfidence', 'anchoring effect' and 'confirmation bias'. The search for articles was performed at the Web of Science database using EndNote (R) as reference management software, and CiteSpace (Chen in Proc Natl Acad Sci 101(suppl 1):5303-5310, 2004; J Am Soc Inf Sci Technol 57(3):359-377, 2006) as bibliometric analysis software. The search led to 889 articles published between 1990 and 2016, and the results have shown that the number of researches relating overconfidence, anchoring and confirmation biases to behavioral finances has been growing throughout time, mainly from 2008 on. Besides, the results have confirmed the importance of Amos Tversky and Daniel Kahneman to this research field. The bias presenting the closest proximity to the behavioral finance field in the present study was overconfidence. The confirmation bias was the one presenting the smallest number of publications and the slightest relation to this study field, fact that opens a promising research field.
Author self-citation is a practice that has been historically surrounded by controversy. Although the prevalence of self-citations in different scientific fields has been thoroughly analysed, there is a lack of large scale quantitative research focusing on its usefulness at guiding readers in finding new relevant scientific knowledge. In this work we empirically address this issue. Using as our main corpus the entire set of PLOS journals research articles, we train a topic discovery model able to capture semantic dissimilarity between pairs of articles. By dividing pairs of articles involved in intra-PLOS citations into self-citations (articles linked by a cite which share at least one author) and non-self-citations (articles linked by a cite which share no author), we observe the distribution of semantic dissimilarity between citing and cited papers in both groups. We find that the typical semantic distance between articles involved in self-citations is significantly smaller than the observed one for articles involved in non-self-citations. Additionally, we find that our results are not driven by the fact that authors tend to specialize in particular areas of research, make use of specific research methodologies or simply have particular styles of writing. Overall, assuming shared content as an indicator of relevance and pertinence of citations, our results indicate that self-citations are, in general, useful as a mechanism of knowledge diffusion.
While "publish or perish" has been an integral part of academic research in Western countries for several decades, the phenomenon has made its way to Central and Eastern Europe (CEE) only recently. The current paper shows how publishing criteria in the field of economics and business have developed in seven CEE countries since 2000 and how economists have responded by altering their publishing behavior. The research indicates a dichotomous development: on one hand the annual number of Web of Science publications has increased by 317% between 2000 and 2015, economists distribute their works across a wider range of journals than before, they are more cited and the weighted average of impact factors of all journals where they publish has risen by 228%. On the other hand, however, a number of economists have chosen an opposite strategy and publish mostly in local or "predatory" journals. Recommendations for policy makers are provided on how to maximize the benefits and minimize negative impacts of the publishing criteria.
Collaboration is one of the key features in scientific research. With more and more knowledge accumulated in each discipline, individual researcher can only be an expert in some specific areas. As such, there are more and more multi-author papers nowadays. Many works in scientometrics have been devoted to analyze the structure of collaboration networks. However, how the collaboration impacts an author's future career is much less studied in the literature. In this paper, we provide empirical evidence with American Physical Society data showing that collaboration with outstanding scientists (measured by their total citation) will significantly improve young researchers' career. Interestingly, this effect is strongly nonlinear and subject to a power function with an exponent < 1. Our work is also meaningful from practical point of view as it could be applied to identifying the potential young researchers.
Author name disambiguation is an important problem that needs to be resolved in bibliometric analysis or tech mining. Many techniques have been presented; however, most of them require a long run time or additional information. A new method based on semantic fingerprints was presented to disambiguate author names without external data. A manually annotated dataset was built to testify on the efficiency of the presented method. Experiments using co-author features, institution features, and text fingerprints were conducted respectively. We found that the first two methods had higher precision, but their recall was low, and the text fingerprint method had higher recall and satisfied precision. Based on these results, we integrated co-author features, institution features, and text fingerprints to provide semantic fingerprints for disambiguating author names and achieving better performance on the F-measure.
In earlier studies (e.g. Glanzel and Thijs in Scientometrics, 2017) we have used components of text analysis in combination with link-based techniques to cluster documents spaces and to detect emerging research topics on the large scale. Taking up now the objectives of evaluative scientometrics, we attempt to link the textual analysis of small sets of individual scientific papers to evaluative bibliometrics. The objective is, however, quite similar. We focus on the detection of similarities and on monitoring structural changes but this time on the small scale. We proceed from earlier approaches used in quantitative linguistics applied to bibliometrics (Telcs et al. in Math Soc Sci; 10(2):169-178, 1985). In the present pilot study we have selected 18 papers by Andras Schubert and published in three different periods with 6 papers each: 1983-1985, 1993-1998 and 2010-2013. The objective is twofold: We first try only to detect linguistic regularities in the scientometric text by applying a Waring model to the analysis of Schubert's vocabulary on the basis of all words and nouns. The second goal refers to the identification of changes in the used vocabulary over a period of three decades. The main findings are discussed along with future research tasks, which arise from these result in the context of the analysis of dynamics and emergence of research topics at the micro and nano level.
Profiling the technological strategy of different competitors is a key element for the companies in a given industry, as well to technology planners and R&D strategists. The analysis of the patent portfolio of a company as well as its evolution in the time line is of interest for technology analysts and decision makers. However, the need for the participation of experts in the field of a company as well as patent specialists, slows down the process. Bibliometrics and text mining techniques contribute to the interpretation of specialists. The present paper tries to offer a step by step procedure to analyze the technology strategy of several companies through the analysis of their portfolio claims, combined with the use of TechMining with the help of a text mining tool. The procedure, complemented with a semantic TRIZ analysis provides key insights in disclosing the technological analysis of some competitors in the field of probiotics for livestock health. The results show interesting shifts in the key probiotic and prebiotic ingredients for which companies claim protection and therefore offers clues about their technology intention in the life sciences industry in a more dynamic, convenient and simple way.
How to evaluate the value of a patent in technological innovation quantitatively and systematically challenges bibliometrics. Traditional indicator systems and weighting approaches mostly lead to "moderation" results; that is, patents ranked to a top list can have only good-looking values on all indicators rather than distinctive performances in certain individual indicators. Orienting patents authorized by the United States Patent and Trademark Office (USPTO), this paper constructs an entropy-based indicator system to measure their potential in technological innovation. Shannon's entropy is introduced to quantitatively weight indicators and a collaborative filtering technique is used to iteratively remove negative patents. What remains is a small set of positive patents with potential in technological innovation as the output. A case study with 28,509 USPTO-authorized patents with Chinese assignees, covering the period from 1976 to 2014, demonstrates the feasibility and reliability of this method.
Since early 1960s, there has been a growing interest in the emergence and development of new technologies accompanied by a strong wish from decision makers to govern related processes at the corporate and national levels. One of the key categories that appeared to set up analytical and regulatory frameworks was the 'advanced technology' category. Primarily associated with computer electronics and microelectronics, it soon had new meanings derived from a variety of professional discussions primarily in the social sciences. Later in a new term, 'emerging technologies', appeared to highlight the speed of change in a wide range of promising research areas. This paper focuses on the evolution of academic discussions concerning advanced and emerging technologies in social sciences literature for the period from 1955 until 2015. In order to identify whether studies in these areas constitute separate research fields, the paper studies the evolution of co-citation networks and the centrality characteristics of transitionary references. It was shown that social studies in emerging technologies demonstrate better consistency in background in literature. However, an analysis of transitionary references and their centrality characteristics can hardly confirm the existence of separate research fields in both cases. The suggested method for the identification and tracking of papers mediating ongoing discussions in a selected knowledge network may be helpful in understanding the evolution of weakly conceptualized and growing research areas.
Author name disambiguation plays a very important role in individual based bibliometric analysis and has suffered from lack of information. Therefore, some have tried to leverage external web sources to obtain additional evidence with success. However, the main problem is generally the high cost of extracting data from web pages due to their diverse designs. Considering this challenge, we employed ResearchGate (RG), a social network platform for scholars presenting their publication lists in a structured way. Even though the platform might be imperfect, it can be valuable when it is used along with traditional approaches for the purpose of confirmation. To this end, in our first (retrieval) stage we applied a graph based machine learning approach, connected components (CC) and formed clusters. Then, the data crawled from RG for the same authors were combined with the CC results in stage 2. We observed that 76.40% of the clusters formed by CC were confirmed by the RG data and they accounted for 68.33% of all citations. Second, a subset was drawn from the dataset by retaining those clusters having at least 10 members to examine the details. This time we additionally employed the Google Custom Search Engine (CSE) API to access authors' web pages as a complementary tool to RG. We observed an F score of 0.95 when CC results were confirmed by RG&CSE. Almost the same success was observed when only the CC approach was applied. In addition, we observed that the publications identified and confirmed through the external sources were cited to a greater extent than those publications not found in the related external sources. Even though promising, there are still issues with the use of external sources. We have seen that many authors present only a few selected papers on the web. This hampers our procedure, making it unable to obtain the entire publication list. Missing publications affect bibliometric analysis adversely since all citation data is required. That is, if only the data confirmed via external sources is used, bibliometric indicators will be overestimated. On the other hand, our suggested methodology can potentially decrease the manual work required for individual based bibliometric analysis. The procedure may also present more reliable results by confirming cluster members derived from unsupervised grouping methods. This approach might be especially beneficial for large datasets where extensive manual work would otherwise be required.
Using a management formula to standardize innovation management can be thought of as deeply contradictory, however, several successful firms in Spain have been certified under the pioneer innovation management standard UNE 166002. This paper analyzes the effects that standardization has in the attitudes and values as regard to innovation for a sample of firms by text-mining their corporate disclosures. Changes in the relevance of the concepts, co-word networks and emotion analysis have been employed to conclude that the effects of certification on the corporate behavior about innovation are coincident with the open innovation and transversalization concepts that UNE 166002 promotes.
Databases on scientific publications are a well-known source for complex network analysis. The present work focuses on tracking evolution of collaboration amongst researchers on leishmaniasis, a neglected disease associated with poverty and very common in Brazil, India and many other countries in Latin America, Asia and Africa. Using SCOPUS and PubMed databases we have identified clusters of publications resulting from research areas and collaboration between countries. Based on the collaboration patterns, areas of research and their evolution over the past 35 years, we combined different methods in order to understand evolution in science. The methods took into consideration descriptive network analysis combined with lexical analysis of publications, and the collaboration patterns represented by links in network structure. The methods used country of the authors' publications, MeSH terms, and the collaboration patterns in seven five-year period collaboration network and publication networks snapshots as attributes. The results show that network analysis metrics can bring evidences of evolution of collaboration between different research groups within a specific research area and that those areas have subnetworks that influence collaboration structures and focus.
Interdisciplinary research has been a focus in academia, and it is beneficial to understand the properties and structure of interdisciplinary research from the viewpoint of bibliometrics. This paper detects distinctions between publication categories and citation categories to measure the interdisciplinarity of individual publications, and then to measure interdisciplinarity for one research system by the average interdisciplinarity of individual publications, which are taken as elements in the research system. The average and the standard deviation (SD) that reflects the variance of the elements' interdisciplinarity in one research system, of all the publications' integration scores and diffusion scores, were then respectively calculated. Sixty of the most productive authors from three Web of Science categories (Mathematics, Applied; Computer Science, Artificial Intelligence; and Operations Research and Management Science) were selected as a case to validate our approach. The results showed that measuring the interdisciplinarity of individual elements effectively lessened the impacts caused by some elements with distinctive citation categories on the research system's interdisciplinarity (especially those research systems with large SDs). Furthermore, measuring the distinction between publication categories and citation categories is essential for individual publications' interdisciplinarity when the citation categories do not appear in the categories of the publication itself or the publication has only a single citation category.
With rapid advances and diversifications in new fields of science and technology, new journals are emerging as a location for the exchange of research methods and findings in these burgeoning communities. These new journals are large in number and, in their early years, it is unclear how central these journals will be in the fields of science and technology. On one hand, these new journals offer valuable data sources for bibliometric scholars to understand and analyze emerging fields; on the other hand, how to identify important peer-reviewed journals remains a challenge-and one that is essential for funders, key opinion leaders, and evaluators to overcome. To fulfill growing demand, the Web of Science platform, as the world's most trusted research publication and citation index, launched the Emerging Sources Citation Index (ESCI) in November 2015 to extend the universe of journals already included in the Science Citation Index Expanded, the Social Sciences Citation Index, and the Arts & Humanities Citation Index. This paper profiles ESCI, drawing some comparisons against these three established indexes in terms of two questions: (1) Does ESCI cover more regional journals of significant importance and provide a more balanced distribution of journals? (2) Does ESCI offer earlier visibility of emerging fields and trends through upgraded science overlay maps? The results show that the ESCI has a positive effect on research assessment and it accelerates communication in the scientific community. However, ESCI brings little impact to promoting the inferior role of non-English countries and regions. In addition, medical science, education research, social sciences, and humanities are emerging fields in recent research, reflected by the lower proportion of traditional fundamental disciplines and applied science journals included in ESCI. Furthermore, balancing the selection of journals across different research domains to facilitate cross-disciplinary research still needs further effort.
Rapid changes in Science & Technology (S&T) along with breakthroughs in products and services concern a great deal of policy and strategy makers and lead to an ever increasing number of Foresight and other types of forward-looking work. At the outset, the purpose of these efforts is to investigate emerging S&T areas, set priorities and inform policies and strategies. However, there is still no clear evidence on the mutual linkage between science and strategy, which may be attributed to Foresight and S&T policy making activities. The present paper attempts to test the hypothesis that both science and strategy affect each other and this linkage can be investigated quantitatively. The evidence for the mutual attribution of science and strategy is built on a quantitative trend monitoring process drawing on semantic analysis of large amount of textual data and text mining tools. Based on the proposed methodology the similarities between science and strategy documents along with the overlaps between them across a certain period of time are calculated using the case of the Agriculture and Food sector, and thus the linkages between science and strategy are investigated.
This study advances a four-part indicator for technical emergence. While doing so it focuses on a particular class of emergent concepts-those which display the ability to repeatedly maintain an emergent status over multiple time periods. The authors refer to this quality as staying power and argue that those concepts which maintain this ability are deserving of greater attention. The case study we consider consists of 15 subdatatsets within the dye-sensitized solar cell framework. In this study the authors consider the impact technical domain and scale have on the behavior of persistently emergent concepts and test which of these has a greater influence.
This paper uses a theoretical-conceptual applied research framework to describe analyze scientific production in Brazil addressing the topic of the solidarity economy between 2000 and 2014. In order to achieve the objectives of the study, a search was performed in the Scielo database search engine to gather data and select articles that use the phrase "Solidarity Economy". A bibliometric analysis was then carried out with the purpose of identifying the publications' performance over the fifteen-year period. The aspects considered were the authors, language, journals with the greatest number of articles published on the topic, the most frequently used keywords, and most cited papers. The results of the analysis showed that research addressing the topic are still scarce in Brazil, especially in the field of Production Engineering. This paper highlights the need for future studies about the solidarity economy and, moreover, that research in the area should be guided by a structured portfolio selection process using the most relevant works in the field.
This paper proposes a statistical analysis that captures similarities and differences between classical music composers with the eventual aim to understand why particular composers 'sound' different even if their 'lineages' (influences network) are similar or why they 'sound' alike if their 'lineages' are different. In order to do this we use statistical methods and measures of association or similarity (based on presence/absence of traits such as specific 'ecological' characteristics and personal musical influences) that have been developed in biosystematics, scientometrics, and bibliographic coupling. This paper also represents a first step towards a more ambitious goal of developing an evolutionary model of Western classical music.
Research funding is a significant support for the development of scientific research. The inequality of research funding is an intrinsic feature of science, and policy makers have realized the over-concentration of funding allocation. Previous studies have tried to use the Gini coefficient to measure this inequality; however, the phenomena of multiple funding sources and funding subdivision have not been deeply discussed and empirically studied due to limitations on data availability. This paper provides a more accurate analysis of the distribution inequality of research funding, and it considers all of the funding sources in the funding system and the subdivision of funding to junior researchers within research teams. We aim to determine the influence of these two aspects of the Gini results at the individual level. A dataset with 68,697 project records and 80,380 subproject records from the Chinese Academy of Sciences during the period from 2011 to 2015 is collected to validate the problem. The empirical results show that (1) the Gini coefficient for a single funding source is biased and may be overestimated or underestimated, and the most common data source, which is the National Natural Science Foundation of China (NSFC), causes the Gini coefficient to be underestimated; and (2) considering the subdivision of research funding lowers the inequality of research funding, with a smaller Gini coefficient, although the decrease is moderate.
In this study, three chemistry research themes closely associated with the Nobel Prize are bibliometrically analyzed-Ribozyme, Ozone and Fullerene-as well as a research theme in chemistry not associated with the Nobel Prize (a Nobel snub theme): Brunauer-Emmett-Teller equation. We analyze, based on an algorithmically constructed publication-level classification system, the evolution of the four themes with respect to publication volume and international collaboration, using two datasets, one of them a subset of highly cited publications, for each considered time period. The focus of the study is on international collaboration, where co-occurrence of country names in publications is used as a proxy for international collaboration. For all four themes, especially for Brunauer-Emmett-Teller equation, the publication volumes increase considerably from the earliest period to the later periods. The international collaboration rate shows an increasing trend for each theme. For Ozone, Fullerene and Brunauer-Emmett-Teller equation, the international collaboration rate tend to be higher for the highly cited publications compared to full datasets. With regard to the evolution of number of countries per international publication and per highly cited international publication, a vast majority of the distributions are positively skewed, with a large share of publications with two countries. With respect to the last four periods of the study, the concentration to two countries per publication is more pronounced for the Brunauer-Emmett-Teller equation theme compared to the three Nobel Prize themes.
Big Data is a research field involving a large number of collaborating disciplines. Based on bibliometric data downloaded from the Web of Science, this study applies various social network analysis and visualization tools to examine the structure and patterns of interdisciplinary collaborations, as well as the recently evolving overall pattern. This study presents the descriptive statistics of disciplines involved in publishing Big Data research; and network indicators of the interdisciplinary collaborations among disciplines, interdisciplinary communities, interdisciplinary networks, and changes in discipline communities over time. The findings indicate that the scope of disciplines involved in Big Data research is broad, but that the disciplinary distribution is unbalanced. The overall collaboration among disciplines tends to be concentrated in several key fields. According to the network indicators, Computer Science, Engineering, and Business and Economics are the most important contributors to Big Data research, given their position and role in the research collaboration network. Centering around a few important disciplines, all fields related to Big Data research are aggregated into communities, suggesting some related research areas, and directions for Big Data research. An ever-changing roster of related disciplines provides support, as illustrated by the evolving graph of communities.
In the previous three decades, in Saudi Arabia, prevalence of cardiovascular diseases (CVD) has increased and the government has invested significantly in education, healthcare, and research. This study examined the research productivity trends and characterized the types and focus of the all CVD research studies from Saudi Arabia. Data were extracted from studies published up until December 2015 and indexed in the PubMed database. Study eligibility criteria included: (1) sample selected within Saudi Arabia, and (2) CVD or a risk factor for CVD as an outcome, or (3) patients with CVD as study participants. Bibliometric data and study characteristics were extracted from each study; examples include authorship (number, gender, affiliation), journal, publication year, study location, research design, sample size, sample type (general or patient), sample composition (male or female), and sampling strategy (random or non-random). Analysis included 295 studies that pertained to 19 types of CVD; the most common were coronary artery disease (18%), hypertension (16%), stroke (14%), peripheral artery disease (11%), and congenital heart disease (10%). In the past 30 years, the overall productivity, use of a hypothesis-testing design (i.e. case-control, cohort, or trial), international collaborations, and funding increased incrementally. The experimental design constituted only 3% of all studies and less than 10% of the hypothesis-testing design studies. The scientific literature from Saudi Arabia addressed many of the CVD types. However, there were very few experimental studies conducted to date. Funding agencies should consider patronizing more studies with a hypothesis-testing design.
The aim of this paper is to explore the power-law relationship between the publishing size of complex innovation systems and their citation-based impact. We analyzed articles and reviews from InCites (TM) database published by six complex innovation systems. We found scale-invariant properties in the complex innovation systems which were analyzed. These properties are evidenced in the power law correlation between complex innovation systems' citation-based impact and their size with a scaling exponent alpha approximate to 1.19 +/- 0.01. The results suggest citations to a complex innovation system tend to increase 2(1.19) or 2.28 times when the system doubles its size over time. These scale-invariant emergent properties are a common property of a complex innovation system. These properties can be quantified using the parameters of scale-invariant correlation. These parameters can be used to formulate measures and models useful for informing public policy about scale-invariant emerging properties of a complex innovation system, making comparisons of citation impact between complex systems of vastly different sizes, evaluating citation impact of complex innovation systems according to their size, long range planning, and elaborating rankings across complex innovation systems.
Although research collaboration has been studied extensively, we still lack understanding regarding the factors stimulating researchers to collaborate with different kinds of research partners including members of the same research center or group, researchers from the same organization, researchers from other academic and non-academic organizations as well as international partners. Here, we provide an explanation of the emergence of diverse collaborative ties. The theoretical framework used for understanding research collaboration couples scientific and technical human capital embodied in the individual with the social organization and cognitive characteristics of the research field. We analyze survey data collected from Slovenian scientists in four scientific disciplines: mathematics; physics; biotechnology; and sociology. The results show that while individual characteristics and resources are among the strongest predictors of collaboration, very different mechanisms underlie collaboration with different kinds of partners. International collaboration is particularly important for the researchers in small national science systems. Collaboration with colleagues from various domestic organizations presents a vehicle for resource mobilization. Within organizations collaboration reflects the elaborated division of labor in the laboratories and high level of competition between different research groups. These results hold practical implications for policymakers interested in promoting quality research.
This paper proposes a model that can measure the R&D efficiency of each region (DMU) or each production unit while taking the inter-DMU competition and inter-subprocesses competition into account. The game cross-efficiency concept is introduced into the parallel DEA model. Furthermore, each DMU (subprocess) tries to maximize its own efficiency without harming the cross efficiency of each of the other DMUs (subprocess). We carry out an algorithm to obtain the best game cross-efficiency scores. This score has been proved to converge to a Nash equilibrium point. We use the proposed model to measure the R&D efficiency of the 30 provinces of China. The results show that the algorithm converges to a unique cross efficiency and our model indeed takes the bargaining power of DMUs and subprocesses into account.
Assessing the quality of scientific outputs (i.e. research papers, books and reports) is a challenging issue. Although in practice, the basic quality of scientific outputs is evaluated by committees/peers (peer review) who have general knowledge and competencies. However, their assessment might not comprehensively consider different dimensions of the quality of the scientific outputs. Hence, there is a requirement to evaluate scientific outputs based on some other metrics which cover more aspects of quality after publishing, which is the aim of this study. To reach this aim, first different quality metrics are identified through an extensive literature review. Then a recently developed multi-criteria methodology (best worst method) is used to find the importance of each quality metric. Finally, based on the importance of each quality metric and the data which are collected from Scopus, the quality of research papers published by the members of a university faculty is measured. The proposed model in this paper provides the opportunity to measure quality of research papers not only by considering different aspects of quality, but also by considering the importance of each quality metric. The proposed model can be used for assessing other scientific outputs as well.
This paper contributes to the literature on the factors explaining the regional university production of science and its quality in the field of Food Science and Technology (FS&T). We hypothesized that the regional quantity of science generated by universities is shaped not only by the amount of research and development (R&D) funds, as the main mainstream literature suggests, but also by the demand for science at the regional level. Furthermore, given the evolutionary nature of knowledge production, we suggest that the number of publications has a significant effect on the quality of scientific research at the regional level. Drawing on a sample of 48,207 scientific papers in FS&T over the period 1998-2010, we first map and examine the regional distribution of science and its quality across Europe-15. Second, we address our hypotheses by specifying several econometric models to identify the factors affecting the quantity and quality of scientific production. Our results show that the regional demand for FS&T-captured by the regional employment in the food and beverage industry-matters for the generation of science. Additionally, our findings support the hypothesis of a positive and significant effect of the production of papers on the scientific quality at the regional level.
ResearchGate is increasingly used by scholars to upload the full-text of their articles and make them freely available for everyone. This study aims to investigate the extent to which ResearchGate members as authors of journal articles comply with publishers' copyright policies when they self-archive full-text of their articles on ResearchGate. A random sample of 500 English journal articles available as full-text on ResearchGate were investigated. 108 articles (21.6%) were open access (OA) published in OA journals or hybrid journals. Of the remaining 392 articles, 61 (15.6%) were preprint, 24 (6.1%) were post-print and 307 (78.3%) were published (publisher) PDF. The key finding was that 201 (51.3%) out of 392 non-OA articles infringed the copyright and were noncompliant with publishers' policy. While 88.3% of journals allowed some form of self-archiving (SHERPA/RoMEO green, blue or yellow journals), the majority of non-compliant cases (97.5%) occurred when authors self-archived publishers' PDF files (final published version). This indicates that authors infringe copyright most of the time not because they are not allowed to self-archive, but because they use the wrong version, which might imply their lack of understanding of copyright policies and/or complexity and diversity of policies.
Despite 'knowledge transfer' emerged as a separate field of study at least three decades ago, its academic literature remains rather fragmented. To reduce complexity, several journals' special issues have attempted to frame up the literature both in a qualitative way and in a quantitative manner. Although these reviews help to bring some order to a flourishing literature, the theoretical background of knowledge transfer field of study still needs clarification. Who are their foremost scholars? How do they gather in visible or invisible colleges? How far the scientific communities of such domain have evolved over time? Has the knowledge transfer topic gained the status of an independent scientific domain? This article aims at shedding light on the knowledge transfer domain by mapping the invisible colleges on which the discipline is based. Drawing evidence from a network analysis of the backwards citations of the second generation of knowledge transfer studies, the authors point out that although the entire scientific domain has reached a strongly connected international dimension, it still manifests a persistent fragmentation. The paradoxical presence of a popular scientific domain without a proper independent theoretical body is consequently underlined.
Today, governments tackle the science of science policy with quantitative analysis, especially in the USA. This trend promotes the use of objective data by decision-makers in the fields of innovation policy and technology management. These decision-makers seek more reliable research and development at an earlier stage to contribute to the national economy. Under this worldwide trend, there is a high demand for quantitative analysis accompanying complex IT skills from governmental officers and non-IT researchers. The purpose of this research is to extract the destinations of currently growing fields of science by means of tracking citations and to capture signs of state of the art studies. The analysis data are bibliographic data of academic papers retrieved from Web of Science. I provide results for 16 growing topics as of July 2016. I deepen the understanding of the "convolutional neural network'' among a lot of topics, and found future application candidates. I provide a methodology to discover future seeds of research and development. Additionally, the analysis system used in this research automatically publishes its results, and always provides the results. This research will contribute to improving strategies around research and development.
Following the dynamism in spin-off research, in this study we conduct a structural and longitudinal bibliometric analysis of a sample of 812 articles on spin-offs published in 234 journals included in the ISI Web of Knowledge over a period of three decades. The analyses do not seek to establish a new conceptualization but rather to reveal the intellectual structure of the field and how it has evolved, and the profile of the knowledge network established in the three perspectives: corporate, academic and entrepreneurial spin-offs. The diversity involved in the three streams of spin-off research signals substantial differences. Theoretically, transaction costs, agency and the resource-based view have remained a foundation of spin-off research, albeit that research has been driven more by the phenomena than by developing the theory. The more traditional focus on corporate spin-offs was followed by emphasis on academic spin-offs and more recently on entrepreneurial spin-offs. This shift has been accompanied by a more business/management theoretical orientation, replacing a more financial and taxation-based perspective underlying corporate spin-offs. This study systematizes the existing stock of knowledge and raises avenues for additional inquiry.
We present a bibliometric comparison of publication performance in 226 scientific disciplines in the Web of Science (WoS) for six post-communist EU member states relative to six EU-15 countries of a comparable size. We compare not only overall country-level publication counts, but also high quality publication output where publication quality is inferred from the journal Article Influence Scores. As of 2010-2014, post-communist countries are still lagging far behind their EU counterparts, with the exception of a few scientific disciplines mainly in Slovenia. Moreover, research in post-communist countries tends to focus relatively more on quantity rather than quality. The relative publication performance of post-communist countries in the WoS is strongest in natural sciences and engineering. Future research is needed to reveal the underlying causes of these performance differences, which may include funding and productivity gaps, the historical legacy of the communist ideology, and Web of Science coverage differences.
Scientists may encounter many collaborators of different academic ages throughout their careers. Thus, they are required to make essential decisions to commence or end a creative partnership. This process can be influenced by strategic motivations because young scholars are pursuers while senior scholars are normally attractors during new collaborative opportunities. While previous works have mainly focused on cross-sectional collaboration patterns, this work investigates scientific collaboration networks from scholars' local perspectives based on their academic ages. We aim to harness the power of big scholarly data to investigate scholars' academic-age-aware collaboration patterns. From more than 621,493 scholars and 2,646,941 collaboration records in Physics and Computer Science, we discover several interesting academic-age-aware behaviors. First, in a given time period, the academic age distribution follows the long-tail distribution, where more than 80% scholars are of young age. Second, with the increasing of academic age, the degree centrality of scholars goes up accordingly, which means that senior scholars tend to have more collaborators. Third, based on the collaboration frequency and distribution between scholars of different academic ages, we observe an obvious homophily phenomenon in scientific collaborations. Fourth, the scientific collaboration triads are mostly consisted with beginning scholars. Furthermore, the differences in collaboration patterns between these two fields in terms of academic age are discussed.
In this study we examined who tweeted academic articles that had at least one Finnish author or co-author affiliation and that had high altmetric counts on Twitter. In this investigation of national level altmetrics we chose the most tweeted scientific articles from four broad areas of science (Agricultural, Engineering and Technological Sciences; Medical and Health Sciences; Natural Sciences; Social Sciences and Humanities). By utilizing both quantitative and qualitative methods of analysis, we studied the data using research techniques such as keyword categorization, co-word analysis and content analysis of user profile descriptions. Our results show that contrary to a random sample of Twitter users, users who tweet academic articles describe themselves more factually and by emphasizing their occupational expertise rather than personal interests. The more field-specific the articles were, the more research-related descriptions dominated in Twitter profile descriptions. We also found that scientific articles were tweeted to promote ideological views especially in instances where the article represented a topic that divides general opinion.
In this study we present a scientometric analysis of the Australian Conference on Human-Computer Interaction (OzCHI) proceedings over the period of a decade (2006-2015). Conference proceedings were manually extracted from the ACM Digital Library and analysed. We observed OzCHI to be a popular conference attracting both submissions and citations. A group of leading researchers dominated the publication count followed by a long list of mid career academics. We observed the themes of Design, Health and Well-being and Education to be growing in importance. We also observed that full papers were cited significantly more than short papers. We conclude with a reflection on our methodology and a proposal of recommendations for the HCI/OzCHI community in Australia.
A sleeping beauty in diffusion indicates that certain information, whether an idea or innovation, will experience a hibernation period before it undergoes a sudden spike of popularity, and this pattern is found widely in the citation history of scientific publications. However, in this study, we demonstrate that the sleeping beauty is an interesting and unexceptional phenomenon in information diffusion; more inspiring is that there exists two consecutive sleeping beauties in the entire lifetime of a meme's propagation, which suggests that the information, including scientific topics, search queries or Wikipedia entries, which we call memes, will go unnoticed for a period and suddenly attract some attention, and then it falls asleep again and later wakes up with another unexpected popularity peak. Further exploration of this phenomenon shows that the intervals between two wake-ups follow an exponential distributions, both the rising and falling stage lengths, follow power law distributions, and the second wake-up tends to reach its peak in a shorter period of time. In addition, the total volumes of the two wake-ups have positive correlations. Taking these findings into consideration, an upgraded Bass model is presented to well describe the diffusion dynamics of memes on different media. Our results can help understand the common mechanism behind the propagation of different memes and are instructive towards locating the tipping point in marketing or in finding innovative publications in science.
In this study usage counts and times cited from Web of Science Core Collection (WoS) were collected for articles published in 2013 with Belgian, Israeli and Iranian addresses. We investigated the relations among three indicators related to citation impact, usage counts and co-authorship, respectively. In addition, we applied the method of Characteristic Scores and Scales (CSS) to analyse the distributions of citations and usage counts to further test the relation between the usage and citation impact. The results show that citations and usage counts in WoS correlate significantly, especially in the social sciences. However, higher numbers of co-authors are not associated with higher usage counts or citations. Furthermore, the stability of CSS-class distributions substantiates the applicability of CSS in characterising both usage and citation distributions. Distinctly different patterns in citations and usage are observed, but the similarities within citations and usage in these fields are somewhat unexpected.
Massive open online course (Mooc) is an educational technology that involves both education and technological innovation. The past investigations of Mooc are very limited in research methodologies for this interdisciplinary research field. Using social network analysis, bibliometrics, text mining and idea of epidemic model, this work quantitatively measures, analyzes and compares Mooc research papers' concept complexity, knowledge ageing rate and Mooc diffusion pattern at international and country-specific levels.
The Small Business Innovation Research (SBIR) program is the primary source of public funding in the United States for research by small firms on new technologies, and the National Institutes of Health (NIH) is a major contributor to that funding agenda. Although previous research has explored the determinants of research success for NIH SBIR projects, little is known about the determinants of project failure. This paper provides important, new evidence on the characteristics of NIH SBIR projects that fail. Specifically, we find that firms that have a founder with a business background are less likely to have their funded projects fail. We also find, after controlling for the endogenous nature of woman-owned firms, that such firms are also less likely to fail.
The academic social network site ResearchGate (RG) has its own indicator, RG Score, for its members. The high profile nature of the site means that the RG Score may be used for recruitment, promotion and other tasks for which researchers are evaluated. In response, this study investigates whether it is reasonable to employ the RG Score as evidence of scholarly reputation. For this, three different author samples were investigated. An outlier sample includes 104 authors with high values. A Nobel sample comprises 73 Nobel winners from Medicine and Physiology, Chemistry, Physics and Economics (from 1975 to 2015). A longitudinal sample includes weekly data on 4 authors with different RG Scores. The results suggest that high RG Scores are built primarily from activity related to asking and answering questions in the site. In particular, it seems impossible to get a high RG Score solely through publications. Within RG it is possible to distinguish between (passive) academics that interact little in the site and active platform users, who can get high RG Scores through engaging with others inside the site (questions, answers, social networks with influential researchers). Thus, RG Scores should not be mistaken for academic reputation indicators.
Scientific research contributes to sustainable economic growth environments. Hence, policy-makers should understand how the different inputs-namely labor and capital-are related to a country's scientific output. This paper addresses this issue by estimating output elasticities for labor and capital using a panel of 31 countries in nine years. Due to the nature of scientific output, we also use spatial econometric models to take into account the spillover effects from knowledge produced as well as labor and capital. The results show that capital elasticity is closer to the labor elasticity. The results suggest a decreasing return to scale production of scientific output. The spatial model points to negative spillovers from capital expenditure and no spillovers from labor or the scientific output.
Collaborations and citations within scientific research grow simultaneously and interact dynamically. Modelling the coevolution between them helps to study many phenomena that can be approached only through combining citation and coauthorship data. A geometric graph for the coevolution is proposed, the mechanism of which synthetically expresses the interactive impacts of authors and papers in a geometrical way. The model is validated against a dataset of papers published on PNAS during 2007-2015. The validation shows the ability to reproduce a range of features observed with citation and coauthorship data combined and separately. Particularly, in the empirical distribution of citations per author there exist two limits, in which the distribution appears as a generalized Poisson and a power-law respectively. Our model successfully reproduces the shape of the distribution, and provides an explanation for how the shape emerges via the decisions of authors. The model also captures the empirically positive correlations between the numbers of authors' papers, citations and collaborators.
Citation-based indicators are often used to help evaluate the impact of published medical studies, even though the research has the ultimate goal of improving human wellbeing. One direct way of influencing health outcomes is by guiding physicians and other medical professionals about which drugs to prescribe. A high profile source of this guidance is the AHFS DI Essentials product of the American Society of Health-System Pharmacists, which gives systematic information for drug prescribers. AHFS DI Essentials documents, which are also indexed by Drugs.com, include references to academic studies and the referenced work is therefore helping patients by guiding drug prescribing. This article extracts AHFS DI Essentials documents from Drugs.com and assesses whether articles referenced in these information sheets have their value recognised by higher Scopus citation counts. A comparison of mean log-transformed citation counts between articles that are and are not referenced in AHFS DI Essentials shows that AHFS DI Essentials references are more highly cited than average for the publishing journal. This suggests that medical research influencing drug prescribing is more cited than average.
Dynamic capabilities currently emerge as a vibrant field of study within the theoretical framework based on resource and strategic management. To this end, and as a complex field of study, we set out to conceptually map this approach. Hence, we carried out a bibliometric study with recourse to co-citations. For the multivariate analysis, we applied cluster analysis and factor analysis. Through the former, we conclude that dynamic capacities concentrate on five approaches: Digital Capabilities, Knowledge Capabilities, Absorptive Capabilities, Strategic Capabilities and Resources. As regards factor analysis, this returns five factors with two of them concentrated into the same approach: Resources and Capabilities. We would also state that the Strategic Capabilities approach spans across the remaining three factors and does not constitute a single factor.
The understanding of emerging technologies and the analysis of their development pose a great challenge for decision makers, as being able to assess and forecast technological change enables them to make the most of it. There is a whole field of research focused on this area, called technology forecasting, in which bibliometrics plays an important role. Within that framework, this paper presents a forecasting approach focused on a specific field of technology forecasting: research activity related to an emerging technology. This approach is based on four research fields-bibliometrics, text mining, time series modelling and time series forecasting-and is structured in five interlinked steps that generate a continuous flow of information. The main milestone is the generation of time series that measure the level of research activity and can be used for forecasting. The usefulness of this approach is shown by applying it to an emerging technology: cloud computing. The results enable the technology to be structured into five main sub-technologies which are characterised through five time series. Time series analysis of the trends related to each sub-technology shows that Privacy and Security has been the most active sub-technology to date in this area and is expected to maintain its level of interest in the near future.
Counts of Mendeley readers may give useful evidence about the impact of published research. Although previous studies have found significant positive correlations between counts of Mendeley readers and citation counts for journal articles, it is not known if this is equally true for conference papers. To fill this gap, Mendeley readership data and Scopus citation counts were extracted for both journal articles and conference papers published in 2011 in four fields for which conferences are important: Computer Science Applications; Computer Software; Building and Construction Engineering; and Industrial and Manufacturing Engineering. Mendeley readership counts correlated moderately with citation counts for both journal articles and conference papers in Computer Science Applications and Computer Software. The correlations were much lower between Mendeley readers and citation counts for conference papers than for journal articles in Building & Construction Engineering and Industrial and Manufacturing Engineering. Hence, there seem to be disciplinary differences in the usefulness of Mendeley readership counts as impact indicators for conference papers, even between fields for which conferences are important.
Allometric scaling can reflect underlying mechanisms, dynamics and structures in complex systems; examples include typical scaling laws in biology, ecology and urban development. In this work, we study allometric scaling in scientific fields. By performing an analysis of the outputs/inputs of various scientific fields, including the numbers of publications, citations, and references, with respect to the number of authors, we find that in all fields that we have studied thus far, including physics, mathematics and economics, there are allometric scaling laws relating the outputs/inputs and the sizes of scientific fields. Furthermore, the exponents of the scaling relations have remained quite stable over the years. We also find that the deviations of individual subfields from the overall scaling laws are good indicators for ranking subfields independently of their sizes.
Collaboration among researchers is an essential component of the scientific process, playing a particularly important role in findings with significant impact. While extensive efforts have been devoted to quantifying and predicting scientific impact, the question of how credit is allocated to coauthors of publications with multiple authors within a complex evolving system remains a long-standing problem in scientometrics. In this paper, we propose a dynamic credit allocation algorithm that captures the coauthors' contribution to a publication as perceived by the scientific community, incorporating a reinforcement mechanism and a power-law temporal relaxation function. The citation data from American Physical Society are used to validate our method. We find that the proposed method can significantly outperform the state-of-the-art method in identifying the authors of Nobel-winning papers that are credited for the discovery, independent of their positions in the author list. Furthermore, the proposed methodology also allows us to determine the temporal evolution of credit between coauthors. Finally, the predictive power of our method can be further improved by incorporating the author list prior appropriately.
The quantitative evaluation of Social Science and Humanities (SSH) and the investigation of the existing similarities between SSH and Life and Hard Sciences (LHS) represent the forefront of scientometrics research. We analyse the scientific production of the universe of Italian academic scholars , over a 10-year period across 2002-2012, from a national database built by the Italian National Agency for the Evaluation of Universities and Research Institutes. We demonstrate that all Italian scholars of SSH and LHS are equals, as far as their publishing habits. They share the same general law, which is a lognormal. At the same time, however, they are different, because we measured their scientific production with different indicators required by the Italian law; we eliminated the "silent" scholars and obtained different scaling values-proxy of their productivity rates. Our findings may be useful to further develop indirect quali-quantitative comparative analysis across heterogeneous disciplines and, more broadly, to investigate on the generative mechanisms behind the observed empirical regularities.
Book reviews have been published in psychology journals since 1900-and possibly before then. Approximately 200 such reviews were published each year until the 1950s and this number increased to nearly 600 before 1990. However, since then, the number of book reviews in psychology journals has reverted back to the current rate of approximately 200 a year. Whether or not this can be attributed to the measurement of impact factors is a moot point.
During the last decade, several scientometrics as well as bibliometrics indices were proposed to quantify the scientific impact of individual. The h-index gives a breakthrough in scientific evaluation, but this index suffers with big hit problem, i.e., once a paper is selected in h-core publication, further citation of h-core articles is not considered in scientific evaluation. To overcome this limitation of h-index, the e-index was proposed, but it does not consider the core citation count. It considers only the excess citation count. To overcome this limitation, the EM-index is proposed in this article. The EM-index is the extension of h-index and e-index, which uses the concept of multidimensional h-index. The EM-index uses all citation counts of h-core articles at multi-level to quantify the scientific impact of the individual. But this index does not consider all publication citations. To overcome this limitation of EM-index a multidimensional extension of the EM-index is also proposed called EM'-index. To validate the proposed indicators, an experimental analysis has been done on 82 scientist's publication citation count, who are working in scientiometrics field. In such a way, we found a more balanced and fine-grained approach to evaluate the scientific impact of individual as well as to compare the scientific impact of two different researchers/scientists.
The paper asks to what extent the top 200 universities in scientific publishing in four research fields are the same universities that occupy the top 200 positions in global university rankings. In this article, the top 200 lists of universities in scientific publishing are compiled of universities' contribution rates in four fields: biological, physical, social and life sciences. Out of some 4000 academic institutions included in InCites, the paper identifies 437 top publishing universities in at least one field. The paper analyses the extent to which those universities are covered in six global university rankings' top 200 listings: Academic Ranking of World Universities (ARWU), Performance Ranking of Scientific Papers for World Universities (NTU), QS World University Rankings (QS), Times Higher Education World University rankings (THE), University Ranking by Academic Performance (URAP) and US News and World Report Best Global University Rankings (USNWR). Out of these rankings, URAP, NTU and USNWR are more distinctive than ARWU, QS and THE in covering the 437 top publishing universities in their top 200 listings. Therefore URAP, NTU and USNWR are utilized in analyzing top publishing universities' chances to be recognized. The paper identifies a total of 64 top publishing universities in at least three out of four fields recognized without exception in the top 200 lists of NTU, URAP and USNWR; out of them 24 reside in the USA, 21 in the EU and 19 in the rest of the World. In addition, the paper identifies other (373) top publishing universities variably recognized by the relevant of the three rankings (in terms of odds) depending on the fields in which the universities reach the top 200; odds range from 0.01 (biological sciences) to 12.00 (life, social and biological sciences). The region of residence of these 373 universities also varies, so that those reaching the top 200 in life sciences reside most frequently in the EU, those reaching the top 200 in social sciences in the USA and those reaching the top 200 in physical or in biological sciences in the rest of the World.
It is inevitable that the publish or perish paradigm has implications for the quality of research published because this leads to scientific output being evaluated based on quantity and not preferably on quality. The pressure to continually publish results in the creation of predatory journals acting without quality peer review. Moreover the citation records of papers do not reflect their scientific quality but merely increase the impact of their quantity. The growth of sophisticated push -button technologies allows for easier preparation of publications while facilitating ready-to-publish data. Articles can thus be compiled merely through combining various measurements, usually without thought to their significance and to what purpose they may serve. Moreover any deep-rooted theory which contravenes mainstream assumptions is not welcomed because it challenges often long-established practice. The driving force for the production of an ever growing number of scientific papers is the need for authors to be recognised in order to be seriously considered when seeking financial support. Funding and fame are distributed to scientists according to their publication and citation scores. While the number of publications is clearly a quantitative criterion, much hope has been placed on citation analysis, which promised to serve as an adequate measure of genuine scientific value, i.e. of the quality of the scientific work.
For nearly a decade, several national exercises have been implemented for assessing the Italian research performance, from the viewpoint of universities and other research institutions. The penultimate one - i.e., the VQR 2004-2010, which adopted a hybrid evaluation approach based on bibliometric analysis and peer review - suffered heavy criticism at a national and international level. The architecture of the subsequent exercise - i.e., the VQR 2011-2014, still in progress - is partly similar to that of the previous one, except for a few presumed improvements. Nevertheless, this other exercise is suffering heavy criticism too. This paper presents a structured discussion of the VQR 2011-2014, collecting and organizing some critical arguments so far emerged, and developing them in detail. Some of the major vulnerabilities of the VQR 2011-2014 are: (1) the fact that evaluations cover a relatively small fraction of the scientific publications produced by the researchers involved in the evaluation, (2) incorrect and anachronistic use of the journal metrics (i.e., ISI Impact Factor and similar ones) for assessing individual papers, and (3) conceptually misleading criteria for normalizing and aggregating the bibliometric indicators in use. (C) 2017 Elsevier Ltd. All rights reserved.
This study collects the educational backgrounds of 14310 full professors from top 48 universities in the United States. The aim is to analyze the role of foreign education in academics training in the United States. There are two parts of the analysis. In the first part, we find the countries from where the professors get their education. We note that there are some concentrations in provision of undergraduate studies. For example, Greece provides more undergraduate degrees to professors than the whole continents of South America or Africa. Moreover, we show that most of the foreign-educated professors get their undergraduate education from high-income countries. In the second part, we find the ratio of foreign educated professors by the type of the university and the academic field in which they currently work. We show that the ratio of foreign-educated academics does not vary with public ownership of the university or the ranking of the university. However, the ratio of foreign-educated professors varies significantly among academic fields. (C) 2017 Elsevier Ltd. All rights reserved.
In the context of research collaboration and co-authorship, we studied scholars' scientific achievements and success, based on their collection of shared publications. By means of a novel regression model, which exploits the two-mode structure of co-authorship, we translated paper scientific impact into author professional achievement, to simultaneously account for the effect of paper properties (access status, funding bodies, etc.) as well as author demographic and behavioral characteristics (gender, nationality) on academic success and impact. After a detailed analysis of the proposed statistical procedure, we illustrated our approach with an empirical analysis of a co-authorship network based on 1007 scientific articles. (C) 2017 Elsevier Ltd. All rights reserved.
Since few universities can afford to be excellent in all subject areas, university administrators face the difficult decision of selecting areas for strategic investment. While the past decade has seen a proliferation of university ranking systems, several aspects in the design of most ranking systems make them inappropriate to benchmark performance in a way that supports formulation of effective institutional research strategy. To support strategic decision making, universities require research benchmarking data that is sufficiently fine-grained to show variation among specific research areas and identify focused areas of excellence; is objective and verifiable; and provides meaningful comparisons across the diversity of national higher education environments. This paper describes the Global Research Benchmarking System (GRBS) which satisfies these requirements by providing fine-grained objective data to internationally benchmark university research performance in over 250 areas of Science and Technology. We provide analyses of research performance at country and university levels, using the diversity of indicators in GRBS to examine distributions of research quality in countries and universities as well as to contrast university research performance from volume and quality perspectives. A comparison of the GRBS results with those of the three predominant ranking systems shows how GRBS is able to identify pockets of excellence within universities that are overlooked by the more traditional aggregate level approaches. (C) 2017 Elsevier Ltd. All rights reserved.
Research papers not only involve author collaboration networks but also relate to knowledge networks. Previous research claims that a paper's citations are related to the node attributes of its.authors in collaboration networks. We further propose that a paper's citations can also be affected by the node attributes of its knowledge elements in knowledge networks. In this study, we develop a new method to construct the knowledge network using article keywords. Further, we explore the antecedents of paper citations from both the collaboration and knowledge network perspectives. Using wind energy paper data (16,351 records) collected from WoS (Web of Science) and JCR (Journal Citation Reports) database, we construct two distinct networks and empirically examine the hypotheses of the relationships between node attributes of two networks and the paper's citations, which fill the gap in prior studies and will inspire related studies. We have the following key findings: in the collaboration network, the structural holes of authors have positive but non-significant effects on the paper's citations, while the authors' centrality has inverted U effects on the paper's citations; in the knowledge network, the structural holes of knowledge elements are positively related to the paper's citations, and the knowledge elements centrality has an inverted U relationship with the paper's citations. (C) 2017 Elsevier Ltd. All rights reserved.
The Scientific and Technological Research Council of Turkey (Tubitak) gives individual researchers subsidies for their publications. Researchers freely use these publication subsidies as pocket money. The publication subsidy given to a researcher for an article is inversely proportional to the number of authors of the article. That is, a researcher who publishes an article receives X/N Turkish Lira (TL), where X is the subsidy amount assigned to the journal in which the article is published and N is the number of authors. In this paper, we use the 250 TL rule to see whether publication subsidies affect the behavior of researchers. The rule states that no subsidy is given to any of the authors of an article if X/N is smaller than 250 TL. We use this discontinuity to provide evidence that Turkish researchers limit their number of co-authors in order to receive publication subsidies. (C) 2017 Elsevier Ltd. All rights reserved.
The paper addresses the issue of the transatlantic gap in research excellence between Europe and USA by examining the performance of individual universities. It introduces a notion of leadership in research excellence by combining a subjective definition of excellence with an objective one. It applies this definition to a novel dataset disaggregated for 251 Subject Categories, covering the 2007-2010 period, based on Scopus data. The paper shows that European universities are able to show excellence only in a few disciplinary areas each, while US universities are able to excel across the board. It explains this difference in terms of institutional differences in recruitment process and governance of universities. It discusses the European model of distributed excellence in terms of the recent rise of input competition. (C) 2017 Published by Elsevier Ltd.
The recently proposed Euclidean index offers a novel approach to measure the citation impact of academic authors, in particular as an alternative to the h-index. We test if the index provides new, robust information, not covered by existing bibliometric indicators, discuss the measurement scale and the degree of distinction between analytical units the index offers. We find that the Euclidean index does not outperform existing indicators on these topics and that the main application of the index would be solely for ranking, which is not seen as a recommended practice. (C) 2017 Elsevier Ltd. All rights reserved.
Local altmetrics is currently an integral part of the altmetrics landscape. This paper aims to investigate the characteristics of microblog altmetrics of the Chinese microblog platform, Weibo, to shed light on cultural differences and draw attention to local altmetrics in developing countries. The analysis is based on 4.4 million records provided by Altmetric.com. Data collected are from March 2014 to July 2015. It is found that Weibo users discuss global science, more actively compared with several international altmetrics sources. Statistical results show strong evidence of the immediacy advantage of metrics based on Weibo as well as Twitter and the general altmetrics over citations. Distribution of Weibo altmetrics on the article level, source level and discipline level are highly skewed. Overall, compared with Twitter, Weibo altmetrics present similar distributions, with some minor variations. To better understand how and why Weibo users discuss global scientific articles, the top weiboed articles, sources and disciplines are identified and further explored. Our content analysis shows that the common motivation of scientific weibos is to disseminate or discuss the articles because they are interesting, surprising, academically useful or practically useful. Conclusion of articles is the most frequently mentioned element in scientific weibos. In addition, different from Twitter, Weibo users have a preference for traditional prestigious journals. (C) 2017 Elsevier Ltd. All rights reserved.
This study aims to identify variables and indicators that substantiate the development of rules that focus on the structural analysis of scientific articles. Variables and indicators for structural analysis are derived from hypotheses deduced from editorials in important scientific journals. To exemplify and test the indicators, a structural analysis was conducted of 108 scientific articles published in important journals in the field of Management. The hypotheses were mostly tested in accordance with the idea of estimation statistics. The approach that was developed for the structural analysis of the network of texts innovates by employing network analysis indicators (indegree and outdegree). For this purpose, the text matrix is employed through the identification and encoding of cross-references between sections and subsections of each article under study. For the context in question, the field of Management, twelve rules were developed. The interpretations of the possible values for the indicators, expressed in the form of rules, are applied as directives to less experienced scholars in preparing their scientific articles, and for the generation of information to support activities concerning the classification and analysis of scientific articles. (C) 2017 Elsevier Ltd. All rights reserved.
Science is becoming increasingly more interdisciplinary, giving rise to more diversity in the areas of expertise, In such a complex environment, the participation of authors became more specialized, hampering the task of evaluating authors according to their contributions. While some metrics were adapted to account for the order (or rank) of authors in a paper, many journals are now requiring a description of their specific roles in the publication. Surprisingly, the investigation of the relationships between credited contributions and author's rank has been limited to a few studies. Here we analyzed such a kind of data and show, quantitatively, that the regularity in the authorship contributions decreases with the number of authors in a paper. Furthermore, we found that the rank of authors and their roles in papers follow three general patterns according to the nature of their contributions: (i) the total contribution increases with author's rank; (ii) the total contribution decreases with author's rank; and (iii) the total contribution is symmetric, with most of contributions being performed by first and last authors. This was accomplished by collecting and analyzing the data retrieved from PLoS One and by devising a measurement of the effective number of authors in a paper. The analysis of such patterns confirms that some aspects of the author ranking are in accordance with the expected convention, such as the first and last authors being more likely to contribute more diversely in a scientific work. Conversely, such analysis also revealed that authors in the intermediary positions of the rank contribute more in specific roles, such as collecting data. This indicates that the an unbiased evaluation of researchers must take into account the distinct types of scientific contributions. (C) 2017 Elsevier Ltd. All rights reserved.
Patent citation analysis is considered a useful tool for technology impact analysis. However, the outcomes of previous methods do not provide a fair reflection of a technology's future prospects since they are based on deterministic approaches, assuming that future trends will remain the same as those in the past. As a remedy, we propose a Hawkes process-based patent citation analysis method to assess the future technological impact and uncertainty of a technology in a time period of interest by employing the future citation counts of the relevant patents as a quantitative proxy. For this, we construct a citation interval matrix from the United States Patent and Trademark Office (USPTO) database, and employ a Hawkes process a special case of path-dependent stochastic processes - as a method for patent citation forecasting. Specifically, the Hawkes process models the idiosyncratic and dynamic behaviours of a technology's evolution and obsolescence by increasing the likelihood of another subsequent citation by oneself (i.e., self-excitation) and decaying the likelihood back towards the initial level naturally. A case study of the patents about molecular amplification diagnosis technology shows that our method outperforms previous deterministic approaches in terms of accuracy and practicality. (C) 2017 Elsevier Ltd. All rights reserved.
When comparing the average citation impact of research groups, universities and countries, field normalisation reduces the influence of discipline and time. Confidence intervals for these indicators can help with attempts to infer whether differences between sets of publications are due to chance factors. Although both bootstrapping and formulae have been proposed for these, their accuracy is unknown. In response, this article uses simulated data to systematically compare the accuracy of confidence limits in the simplest possible case, a single field and year. The results suggest that the MNLCS (Mean Normalised Log transformed Citation Score) confidence interval formula is conservative for large groups but almost always safe, whereas bootstrap MNLCS confidence intervals tend to be accurate but can be unsafe for smaller world or group sample sizes. In contrast, bootstrap MNCS (Mean Normalised Citation Score) confidence intervals can be very unsafe, although their accuracy increases with sample sizes. (C) 2017 Elsevier Ltd. All rights reserved.
The Peer Reputation (PR) metric was recently proposed in the literature, in order to judge a researcher's contribution through the quality of the venue in which the researcher's work is published. PR, proposed by Nelakuditi et al., ties the selectivity of a publication venue with the reputation of the first author's institution. By computing PR for a percentage of the papers accepted in a conference or journal, a more solid indicator of a venue's selectivity than the paper Acceptance Ratio (AR) can be derived. In recent work we explained the reasons for which we agree that PR offers substantial information that is missing from AR, however we also pointed out several limitations of the metric. These limitations make PR inadequate, if used only on its own, to give a solid evaluation of a researcher's contribution. In this work, we present our own approach for judging the quality of a Computer Science/Computer Engineering conference venue, and thus, implicitly, the potential quality of a paper accepted in that conference. Driven by our previous findings on the adequacy of PR, as well as our belief that an institution does not necessarily "make" a researcher, we propose a Conference Classification Approach (CCA) that takes into account a number of metrics and factors, in addition to PR. These are the paper's impact and the authors' h-indexes. We present and discuss our results, based on data gathered from close to 3000 papers from 12 top-tier Computer Science/Computer Engineering conferences belonging to different research fields. In order to evaluate CCA, we compare our conference rankings against multiple publicly available rankings based on evaluations from the Computer Science/Computer Engineering community, and we show that our approach achieves a very comparable classification. (C) 2017 Elsevier Ltd. All rights reserved.
The ever-growing number of venues publishing academic work makes it difficult for researchers to identify venues that publish data and research most in line with their scholarly interests. A solution is needed, therefore, whereby researchers can identify information dissemination pathways in order to both access and contribute to an existing body of knowledge. In this study, we present a system to recommend scholarly venues rated in terms of relevance to a given researcher's current scholarly pursuits and interests. We collected our data from an academic social network and modeled researchers' scholarly reading behavior in order to propose a new and adaptive implicit rating technique for venues. We present a way to recommend relevant, specialized scholarly venues using these implicit ratings that can provide quick results, even for new researchers without a publication history and for emerging scholarly venues that do not yet have an impact factor. We performed a large-scale experiment with real data to evaluate the current scholarly recommendation system and showed that our proposed system achieves better results than the baseline. The results provide important up-to-the-minute signals that compared with post-publication usage-based metrics represent a closer reflection of a researcher's interests. (C) 2017 Elsevier Ltd. All rights reserved.
Academic genealogy can be defined as the study of intellectual heritage that is undertaken through the relationship between a professor (advisor/mentor) and student (advisee) and on the basis of these ties, it establishes a social framework that is generally represented by an academic genealogy graph. Obtaining relevant knowledge of academic genealogy graphs makes it possible to analyse the academic training of scientific communities, and discover ancestors or forbears who had special skills and talents. The use of metrics for characterizing this kind of graph is an active form of knowledge extraction. In this paper, we set out a formal definition of a metric called 'genealogical index', which can be used to assess how far researchers have affected advisor-advisee relationships. This metric is based on the bibliometrics h-index and its definition can be broadened to measure the effect of researchers on several generations of scientists. A case study is employed that includes an academic genealogy graph consisting of more than 190,000 Ph.D.s registered in the Mathematics Genealogy Project. Additionally, we compare the genealogical indices obtained from both the Fields Medal and Wolf Prize winners, and found that the latter has had a greater impact than the former. (C) 2017 Elsevier Ltd. All rights reserved.
We study the problem of determining the cognitive distance between the publication portfolios of two units. In this article we provide a systematic overview of five different methods (a benchmark Euclidean distance approach, distance between barycenters in two and in three dimensions, distance between similarity-adapted publication vectors, and weighted cosine similarity) to determine cognitive distances using publication records. We present a theoretical comparison as well as a small empirical case study. Results of this case study are not conclusive, but we have, mainly on logical grounds, a small preference for the method based on similarity-adapted publication vectors. (C) 2017 Elsevier Ltd. All rights reserved.
In this work, we extend our previous work on largeness tracing among physicists to other fields, namely mathematics, economics and biomedical science. Overall, the results confirm our previous discovery, indicating that scientists in all these fields trace large topics. Surprisingly, however, it seems that researchers in mathematics tend to be more likely to trace large topics than those in the other fields. We also find that on average, papers in top journals are less largeness-driven. We compare researchers from the USA, Germany, Japan and China and find that Chinese researchers exhibit consistently larger exponents, indicating that in all these fields, Chinese researchers trace large topics more strongly than others. Further correlation analyses between the degree of largeness tracing and the numbers of authors, affiliations and references per paper reveal positive correlations papers with more authors, affiliations or references are likely to be more largeness-driven, with several interesting and noteworthy exceptions: in economics, papers with more references are not necessary more largeness-driven, and the same is true for papers with more authors in biomedical science. We believe that these empirical discoveries may be valuable to science policy-makers. (C) 2017 Elsevier Ltd. All rights reserved.
A central question in science of science concerns how time affects citations. Despite the long-standing interests and its broad impact, we lack systematic answers to this simple yet fundamental question. By reviewing and classifying prior studies for the past 50 years, we find a significant lack of consensus in the literature, primarily due to the coexistence of retrospective and prospective approaches to measuring citation age distributions. These two approaches have been pursued in parallel, lacking any known connections between the two. Here we developed a new theoretical framework that not only allows us to,connect the two approaches through precise mathematical relationships, it also helps us reconcile the interplay between temporal decay of citations and the growth of science, helping us uncover new functional forms characterizing citation age distributions. We find retrospective distribution follows a lognormal distribution with exponential cutoff, while prospective distribution is governed by the interplay between a lognormal distribution and the growth in the number of references. Most interestingly, the two approaches can be connected once rescaled by the growth of publications and citations. We further validate our framework using both large-scale citation datasets and analytical models capturing citation dynamics. Together this paper presents a comprehensive analysis of the time dimension of science, representing a new empirical and theoretical basis for all future studies in this area. (C) 2017 Elsevier Ltd. All rights reserved.
This paper focuses on the evaluation of research institutions in terms of size-independent indicators. There are well-known procedures in this context, such as what we call additive rules, which provide an evaluation of the impact of any research unit in a scientific field based upon a partition of the field citations into ordered categories, along with some external weighting system to weigh those categories. We introduce here a new ranking procedure that is not an additive rule - the HV procedure, after Herrero & Villar (2013) - and compare it those conventional evaluation rules within a common setting. Given a set of ordered categories, the HV procedure measures the performance of the different research units in terms of the relative probability of getting more citations. The HV method also provides a complete, transitive and cardinal evaluation, without recurring to any external weighting scheme. Using a large dataset of publications in 22 scientific fields assigned to 40 countries, we compare the performance of several additive rules - the Relative Citation Rate, four percentile-based ranking procedures, and two average-based high-impact indicators - and the corresponding HV procedures under the same set of ordered categories. Comparisons take into account re-rankings, and differences in the outcome variability, measured by the coefficient of variation, the range, and the ratio between the maximum and minimum index values. (C) 2017 Elsevier Ltd. All rights reserved.
In this study, we examine academic ranking and inequality in library and information science (LIS) using a faculty hiring network of 643 faculty members from 44 LIS schools in the United States. We employ four groups of measures to study academic ranking, including adjacency, placement and hiring, distance-based measures, and hubs and authorities. Among these measures, closeness and hub measures have the highest correlation with the U.S. News ranking (r = 0.78). We study academic inequality using four distinct methods that include downward/upward placement, Lorenz curve, cliques, and egocentric networks of LIS schools and find that academic inequality exists in the LIS community. We show that the percentage of downward placement (68%) is much higher than that of upward placement (22%); meanwhile, 20% of the 30 LIS schools that have doctoral programs produced nearly 60% of all LIS faculty, with a Gini coefficient of 0.53. We also find cliques of highly ranked schools and a core/periphery structure that distinguishes LIS schools of different ranks. Overall, LIS faculty hiring networks have considerable value in deriving credible academic ranking and revealing faculty exchange within the field. (C) 2017 Elsevier Ltd. All rights reserved.
Business research has established itself in largely six disciplines: Accounting, Marketing, Organizational Behavior and Management, Finance, Management Science and Operations Research, and Management Information Systems. The knowledge flows among these six disciplines and the factors that drive knowledge diffusion are important considerations. The quantitative analyses on a large dataset containing over 400,000 journal-to-journal citations for business journals published between 1997 and 2009 reveal important patterns of knowledge diffusion in business research. The cross-disciplinary knowledge diffusion is discipline-dependent and converging to a similar level in terms of the diversity. Aside from other factors such as articles published in the journal and the number of classifications, we find that journal quality, as measured by inclusion in the UT Dallas top journal list, has a significant effect on cross-disciplinary knowledge flows. We also offer some potential explanations for the effect of this formalized measure of quality. (C) 2017 Elsevier Ltd. All rights reserved.
Although heterogeneous networks are ubiquitous, a precise definition is lacking, in our opinion. We submit a definition of network heterogeneity and elaborate on a resulting approach to its measurement. This measure is denoted as HE. As an illustration HE is applied to examples from the fields of informetrics and neurosciences. Yet, it is pointed out that our definition has universal applicability. (C) 2017 Elsevier Ltd. All rights reserved.
Over the years, Twitter has become a popular platform for information dissemination and information gathering. However, the popularity of Twitter has attracted not only legitimate users but also spammers who exploit social graphs, popular keywords, and hashtags for malicious purposes. In this paper, we present a detailed analysis of the HSpam14 dataset, which contains 14 million tweets with spam and ham (i.e., nonspam) labels, to understand spamming activities on Twitter. The primary focus of this paper is to analyze various aspects of spam on Twitter based on hashtags, tweet contents, and user profiles, which are useful for both tweet-level and user-level spam detection. First, we compare the usage of hashtags in spam and ham tweets based on frequency, position, orthography, and co-occurrence. Second, for content-based analysis, we analyze the variations in word usage, metadata, and near-duplicate tweets. Third, for user-based analysis, we investigate user profile information. In our study, we validate that spammers use popular hashtags to promote their tweets. We also observe differences in the usage of words in spam and ham tweets. Spam tweets are more likely to be emphasized using exclamation points and capitalized words. Furthermore, we observe that spammers use multiple accounts to post near-duplicate tweets to promote their services and products. Unlike spammers, legitimate users are likely to provide more information such as their locations and personal descriptions in their profiles. In summary, this study presents a comprehensive analysis of hashtags, tweet contents, and user profiles in Twitter spamming.
Twitter has attracted billions of users for life logging and sharing activities and opinions. In their tweets, users often reveal their location information and short-term visiting histories or plans. Capturing user's short-term activities could benefit many applications for providing the right context at the right time and location. In this paper we are interested in extracting locations mentioned in tweets at fine-grained granularity, with temporal awareness. Specifically, we recognize the points-of-interest (POIs) mentioned in a tweet and predict whether the user has visited, is currently at, or will soon visit the mentioned POIs. A POI can be a restaurant, a shopping mall, a bookstore, or any other fine-grained location. Our proposed framework, named TS-Petar (Two-Stage POI Extractor with Temporal Awareness), consists of two main components: a POI inventory and a two-stage time-aware POI tagger. The POI inventory is built by exploiting the crowd wisdom of the Foursquare community. It contains both POIs' formal names and their informal abbreviations, commonly observed in Foursquare check-ins. The time-aware POI tagger, based on the Conditional Random Field (CRF) model, is devised to disambiguate the POI mentions and to resolve their associated temporal awareness accordingly. Three sets of contextual features (linguistic, temporal, and inventory features) and two labeling schema features (OP and BILOU schemas) are explored for the time-aware POI extraction task. Our empirical study shows that the subtask of POI disambiguation and the subtask of temporal awareness resolution call for different feature settings for best performance. We have also evaluated the proposed TS-Petar against several strong baseline methods. The experimental results demonstrate that the two-stage approach achieves the best accuracy and outperforms all baseline methods in terms of both effectiveness and efficiency.
Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our insight workflow, which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
Virtual reference services provide opportunities for library patrons to produce requests of reference librarians through quasi-synchronous computer-mediated exchanges in which requests and deliverables are produced as online textual objects. Text postings only become the actions they perform, such as an information request or deliverable, through the recipient's work of reading. Text postings thus are designed for their recipients and are built in ways that instruct particular readings. In this paper, I show that patron requests are interactional achievements co-constituted by librarians and patrons through the exchange of text postings that are designed to be seen as requests. The Reference and User Services Association offers guidelines for online interactions between librarians and patrons. However, such guidelines provide only general recommendations by which librarians may overcome difficulties in identifying the specific information needs of patrons. I examine actual chat logs of virtual reference interactions and describe how librarians engage with patrons to co-construct actionable requests to specify and fulfill patron information needs. Conversation analytic methods are used to identify the way texts are produced to instruct recipients in the ways they are to be read and how these texts serve, through reading's work, as an analysis of the actions prior texts perform.
Critics worry that algorithmic filtering could lead to overly polished, homogeneous web experiences. Serendipity, in turn, has been touted as an antidote. Yet, the desirability of serendipity could vary by context, as users may be more or less receptive depending on the services they employ. We propose a nomological model of online serendipity experiences, conceptualizing both cognitive and behavioral antecedents. Based on a survey of 1,173 German Internet users, we conduct structural equation modeling and multigroup analyses to differentiate the antecedents and effects of serendipity across three distinct contexts: online shopping, information services, and social networking sites. Our findings confirm that antecedents and outcomes of digital serendipity vary by context, with serendipity only being associated with user satisfaction in the context of social network sites.
There has been little research on task complexity and difficulty in music information retrieval (MIR), whereas many studies in the text retrieval domain have found that task complexity and difficulty have significant effects on user effectiveness. This study aimed to bridge the gap by exploring i) the relationship between task complexity and difficulty; ii) factors affecting task difficulty; and iii) the relationship between task difficulty, task complexity, and user search behaviors in MIR. An empirical user experiment was conducted with 51 participants and a novel MIR system. The participants searched for 6 topics across 3 complexity levels. The results revealed that i) perceived task difficulty in music search is influenced by task complexity, user background, system affordances, and task uncertainty and enjoyability; and ii) perceived task difficulty in MIR is significantly correlated with effectiveness metrics such as the number of songs found, number of clicks, and task completion time. The findings have implications for the design of music search tasks (in research) or use cases (in system development) as well as future MIR systems that can detect task difficulty based on user effectiveness metrics.
The purpose of this research was to establish an upper bound on finding answers to health-related questions in MedlinePlus and other online resources. Seven reference librarians tested a set of protocols to determine whether it was possible to use the types and foci of the questions extracted from customer requests submitted to the National Library of Medicine to find authoritative answers to these questions. Librarians tested the protocols manually to determine if the process was sufficiently robust and accurate to later automate. Results indicated that the extracted terms provide enough information to find authoritative answers for about 60% of questions and that certain question types are more likely to result in authoritative answers than others. The question corpus and analysis performed for this project will inform automatic question answering systems, and could lead to suggestions for new content to include in MedlinePlus. This approach can serve as an example to researchers interested in methods of evaluating question answering tools and the contents of online databases.
Attribute-based approaches have recently attracted much attention in visual recognition tasks. These approaches describe images by using semantic attributes as the mid-level feature. However, low recognition accuracy becomes the biggest barrier that limits their practical applications. In this paper, we propose a novel framework termed Boosting Attribute Recognition (BAR) for the image recognition task. Our framework stems from matrix factorization, and can explore latent relationships from the aspect of attribute and image simultaneously. Furthermore, to apply our framework in large-scale visual recognition tasks, we present both offline and online learning implementation of the proposed framework. Extensive experiments on 3 data sets demonstrate that our framework achieves a sound accuracy of attribute recognition.
In this article, we present a new clustering algorithm for Person Name Disambiguation in web search results. The algorithm groups web results according to the individuals they refer to. The best state-of-the-art approaches require training data in order to learn thresholds for deciding when to group the webpages. However, the ambiguity level of person names on the web could not be previously estimated and the results of those methods strongly depend on the thresholds obtained with the training collections. We present the concept of adaptive threshold, which avoids the need of a previous supervised learning process, depending only on the content of the compared documents to decide if they refer to the same person. We evaluated our approach using three datasets reaching close results to those obtained by the most successful algorithms in the state-of-the-art that require such a learning process, and outperforming the results of those obtained by algorithms that do not require it.
This paper looks at 10 years of reviews in a multidisciplinary journal, The Journal of Artificial Societies and Social Simulation (JASSS), which is the flagship journal of social simulation. We measured referee behavior and referees' agreement. We found that the disciplinary background and the academic status of the referee have an influence on the report time, the type of recommendation and the acceptance of the reviewing task. Referees from the humanities tend to be more generous in their recommendations than other referees, especially economists and environmental scientists. Second, we found that senior researchers are harsher in their judgments than junior researchers, and the latter accept requests to review more often and are faster in reporting. Finally, we found that articles that had been refereed and recommended for publication by a multidisciplinary set of referees were subsequently more likely to receive citations than those that had been reviewed by referees from the same discipline. Our results show that common standards of evaluation can be established even in multidisciplinary communities.
New types of digital data, tools, and methods, for instance those that cross academic disciplines and domains, those that feature teams instead of single scholars, and those that involve individuals from outside the academy, enables new forms of scholarship and teaching in digital humanities. Such scholarship promotes reuse of digital data, provokes new research questions, and cultivates new audiences. Digital curation, the process of managing a trusted body of information for current and future use, helps maximize the value of research in digital humanities. Predicated on semistructured interviews, this naturalistic case study explores the creation, use, storage, and planned reuse of data by 45 interviewees involved with 19 Office of Digital Humanities Start-Up Grant (SUG) projects. Interviewees grappled with challenges surrounding data, collaboration and communication, planning and project management, awareness and outreach, resources, and technology. Overall this study explores the existing digital curation practices and needs of scholars engaged in innovative digital humanities work and to discern how closely these practices and needs align with the digital curation literature.
This brief communication analyzes the statistics and methods Lotka used to derive his inverse square law of scientific productivity from the standpoint of modern theory. It finds that he violated the norms of this theory by extremely truncating his data on the right. It also proves that Lotka himself played an important role in establishing the commonly used method of identifying power-law behavior by the R-2 fit to a regression line on a log-log plot that modern theory considers unreliable by basing the derivation of his law on this very method.
One of the first bibliometric laws appeared in Alfred J. Lotka's 1926 examination of author productivity in chemistry and physics. The result was a productivity distribution described by a power law. In this paper, Lotka's original data on author productivity in chemistry are reconsidered. We define a discrete power law with exponential cutoff, test Lotka's data, and compare the fit to the discrete power law.
Over the past two decades, the number of international cooperative patents has grown significantly with the development of globalization. Although previous studies have almost exclusively used econometric methods, we argue that international cooperative patents change over time and that the relevant actors are dynamic, connected, and coexist to determine a country's position in the international network of such patents. This study verified that different dimensions in the growth trajectory of the central patent international cooperative network curve form covariance structure perspectives using a latent growth curve model. Additionally, past studies have rarely examined whether technological concentration affects national technology innovation capacities; here, we integrated technological concentration with the estimation model. The results indicated that knowledge stock has a positive effect on the evolutionary growth rate of international cooperative patent network centrality. Moreover, technological concentration was found to strengthen the effect of knowledge stock on network centrality. An experimental map was produced to illustrate the interrelationships of the dimensions, which may be used as a reference by the government.
A growing number of researchers are exploring the use of citation relationships such as direct citation, bibliographic coupling, and co-citation for information retrieval in scientific databases and digital libraries. In this paper, I propose a method of ranking the relevance of citation-based search results to a set of key, or seed, papers by measuring the number of citation relationships they share with those key papers. I tested the method against 23 published systematic reviews and found that the method retrieved 87% of the studies included in these reviews. The relevance ranking approach identified a subset of the citation search results that comprised 27% of the total documents retrieved by the method, and 7% of the documents retrieved by these reviews, but that contained 75% of the studies included in these reviews. Additional testing suggested that the method may be less appropriate for reviews that combine literature in ways that are not reflected in the literature itself. These results suggest that this ranking method could be useful in a range of information retrieval contexts.
Individuals often appear with multiple names when considering large bibliographic datasets, giving rise to the synonym ambiguity problem. Although most related works focus on resolving name ambiguities, this work focus on classifying and characterizing multiple name usage patterns-the root cause for such ambiguity. By considering real examples bibliographic datasets, we identify and classify patterns of multiple name usage by individuals, which can be interpreted as name change, rare name usage, and name co-appearance. In particular, we propose a methodology to classify name usage patterns through a supervised classification task and show that different classes are robust (across datasets) and exhibit significantly different properties. We show that the collaboration network structure emerging around nodes corresponding to ambiguous names from different name usage patterns have strikingly different characteristics, such as their common neighborhood and degree evolution. We believe such differences in network structure and in name usage patterns can be leveraged to design more efficient name disambiguation algorithms that target the synonym problem.
This paper investigates how scholars in the digital humanities use Twitter for informal scholarly communication. In particular, the paper observes the hosting of an annual conference over a number of years by one association in order to see whether there was a change in the network configuration structure, the influential scholars in the network, the information sources, and the tweet contents. Annual conferences held by the Association of Internet Researchers over 3 years are used for data collection. According to our result, while the Twitter communication network developed into a bigger network, the basic form of the network configuration remained stable as a Tight Crowd structure and the core influential people were not much changed. Analyses on information source and content found topic changes in each year but consistency in the kind of information source and content.
The application of Lean Philosophy in a healthcare environment is still a relatively new field of research, but there already exists a considerable amount of literature on the topic. This article structures, analyzes and interprets the data of articles on Lean Healthcare from 2002 through 2015 bibliometrically. The databases used in order to realize the analysis are the main sources for citation data available: Elsevier's Scopus and Thomson Reuters' Web of Science. The scientific contribution of this article lies in structuring the literature on Lean Healthcare and summarizing future research proposals. Thereby it encourages the review and consolidation of existing directions in this field of research and the exploration of new ones. With respect to the article's applied contribution it helps hospitals and professionals which want to apply Lean Healthcare to deal with uncertainties and challenges before or during its implementation by granting easy access to the literature. It also allows adapting suitable findings of major publications for the practical implementation with the goal to improve the patient experience. As results of this bibliometric analysis it was found that J.E. Bassham and D.I. Ben-Tovim share the first place as most cited authors in Lean Healthcare and the Linkopings Universitet is the most cited institution. With respect to countries, the United States take the first rank. The most cited article is Trends and approaches in lean healthcare by L. Brando de Souza which was published in the most cited journal, namely Leadership in Health Services. Four main tendencies for future research possibilities in Lean Healthcare were identified: Evaluate the implementation; Amplify basic knowledge; Investigate challenges and success stories; Expand the focus.
Drawing on more than half a million granted patents in all technological sectors filed at EPO between 1998 and 2012 we gather information on 300,000 inscriptions affecting changes in their ownership at the EPO and top national registers (France, Germany, Switzerland, Spain). After grouping parallel legal events in different European validation countries, we find that more than 30% of EPO patents in all fields change ownership at least once. For the field of medical technologies, we exploit additional information on corporate structures to further distinguish between "intragroup" and "bare" changes of ownership thanks. Our results indicate that more than two-thirds of the transfers happen "intragroup", that is between related corporate entities. Using regressions analyses, we finally explore and discuss relations between the occurrence and types of these transfers of patent and proxies for patent quality.
Scientific communities are clusters of researchers and play important roles in modern science. Studying different forms of scientific communities that either physically or virtually exist is a feasible way to disclose underlying mechanisms of science. From the perspective of complex networks, topology-based communities and topic-based communities reflect scientific collaboration and topical features of science respectively. However, the two features are not isolated but intertwined in scientific practice. This study proposes an approach to detect Topical Scientific Communities (TSCs) with both topology and topic features by applying machine learning techniques and network theory. As an example, the TSCs of the informetrics field are detected, and then the characteristics of these TSCs are analyzed. It is shown that collaboration patterns on the topic level can be revealed by analyzing the static network structure and dynamics of TSCs. Furthermore, cross-topic collaborations at multiple levels could be investigated through TSCs. In addition, TSCs can effectively organize researchers in terms of productivity. Future work will further explore and generalize characteristics of TSCs, and the applications of TSCs to other tasks of studying science.
Since Lawrence in 2001 proposed the open access (OA) citation advantage, the potential benefit of OA in relation to citation impact has been discussed in depth. The methodology to test this postulate ranges from comparing the impact factors of OA journals versus traditional ones, to comparing citations of OA versus non-OA articles published in the same non-OA journals. However, conclusions are not entirely consistent among fields, and two possible explications have been suggested in those fields where a citation advantage has been observed for OA: the early view and the selection bias postulates. In this study, a longitudinal and multidisciplinary analysis of gold OA citation advantage is developed. All research articles in all journals for all subject categories in the multidisciplinary database Web of Science are considered. A total of 1,138,392 articles-60,566 (5.3%) OA articles and 1,077,826 (94.7%) non-OA articles-published in 2009 are analysed. The citation window considered goes from 2009 to 2014, and data are aggregated for the 249 disciplines (subject categories). At journal level, we also study the evolution of journal impact factors for OA and non-OA journals in those disciplines whose OA prevalence is higher (top 36 subject categories). As the main conclusion, there is no generalizable gold OA citation advantage, neither at article nor at journal level.
This study describes the Danish publication award system (BFI), investigates whether its built-in incentives have had an effect on publication behavior at the University of Southern Denmark, and discusses the possible future implications on researcher incentives should universities wish to measure BFI on the individual level. We analyzed publication data from the university CRIS system (Pure) and from SciVal. Several studies indicate that co-authored scholarly journal articles attract more citations than single author articles. The reason for this are not clear, however, research collaboration across institutions and countries is commonly accepted in the research community and among university managements as one way of increasing the researcher's and institution's reputation and impact. The BFI system is designed to award scholarly publication activity at Danish universities, especially publication in international journals of high status. However, we find that the built-in incentives leave the researcher and his or her institution with a dilemma: If the researchers optimize their performance by forming author groups with external collaborators, the optimal way of doing so for the researchers is not the optimal way seen from the perspective of the university. Our analysis shows that the typical article has 6.5 authors, two of which are internal, and that this has remained stable since the introduction of the BFI. However, there is variation across the disciplines. While 'the Arts and Humanities' and 'the Social Sciences' seem to compose author groups in a way which does not optimize the performance of the institution, both 'Health' and 'the Natural Sciences' seem to optimize according to criteria other than those specified in the BFI.
Publication hit lists of authors, institutes, scientific disciplines etc. within scientific databases like Web of Science or Scopus are often used as a basis for scientometric analyses and evaluations of these authors, institutes etc. However, such information services do not necessarily cover all publications of an author. The purpose of this article is to introduce a re-interpreted scientometric indicator called "visibility," which is the share of the number of an author's publications on a certain information service relative to the author's entire A"uvre based upon his/her probably complete personal publication list. To demonstrate how the indicator works, scientific publications (from 2001 to 2015) of the information scientists Blaise Cronin (N = 167) and Wolfgang G. Stock (N = 152) were collected and compared with their publication counts in the scientific information services ACM, ECONIS, Google Scholar, IEEE Xplore, Infodata eDepot, LISTA, Scopus, and Web of Science, as well as the social media services Mendeley and ResearchGate. For almost all information services, the visibility amounts to less than 50%. The introduced indicator represents a more realistic view of an author's visibility in databases than the currently applied absolute number of hits in those databases.
Researchers tend to cite highly cited articles, but how these highly cited articles influence the citing articles has been underexplored. This paper investigates how one highly cited essay, Hirsch's "h-index" article (H-article) published in 2005, has been cited by other articles. Content-based citation analysis is applied to trace the dynamics of the article's impact changes from 2006 to 2014. The findings confirm that citation context captures the changing impact of the H-article over time in several ways. In the first two years, average citation mention of H-article increased, yet continued to decline with fluctuation until 2014. In contrast with citation mention, average citation count stayed the same. The distribution of citation location over time also indicates three phases of the H-article "Discussion," "Reputation," and "Adoption" we propose in this study. Based on their locations in the citing articles and their roles in different periods, topics of citation context shifted gradually when an increasing number of other articles were co-mentioned with the H-article in the same sentences. These outcomes show that the impact of the H-article manifests in various ways within the content of these citing articles that continued to shift in nine years, data that is not captured by traditional means of citation analysis that do not weigh citation impacts over time.
This study attempts to analyse the relationship between the peer-review activity of scholars registered in Publons and their research performance as reflected in Google Scholar. Using a scientometric approach, this work explores correlations between peer-review measures and bibliometric indicators. In addition, decision trees are used to explore which researchers (according to discipline, academic status and gender) make most of the reviews and which of them accept most of the papers, assuming that these are reasonable proxies for reviewing quality. Results show that there is a weak correlation between bibliometric indicators and peer-review activity. The decision tree analysis suggests that established male academics made the most reviews, while young female scholars are the most demanding reviewers. These results could help editors to select good reviewers as well as opening a new source of data for scientometrics analyses.
According to the data from the Scopus publication database, as analyzed in several recent studies, more than 70,000 papers have been published in the area of Software Engineering (SE) since late 1960's. According to our recent work, 43% of those papers have received no citations at all. Since citations are the most commonly used metric for measuring research (academic) impact, these figures raise questions (doubts) about the (non-existing) impact of such a large set of papers. It is a reality that typical academic reward systems encourage researchers to publish more papers and do not place a major emphasis on research impact. To shed light on the issue of volume (quantity) versus citation-based impact of SE research papers, we conduct and report in this paper a quantitative bibliometrics assessment in four aspects: (1) quantity versus impact of different paper types (e.g., conference versus journal papers), (2) ratios of uncited (non-impactful) papers, (3) quantity versus impact of papers originating from different countries, and (4) quantity versus impact of papers by each of the top-10 authors (in terms of number of papers). To achieve the above objective, we conducted a quantitative exploratory bibliometrics assessment, comprised of four research questions, to assess quantity versus impact of SE papers with respect to the aspects discussed above. We extracted the data through a systematic, automated and repeatable process from the Scopus paper database, which we also used in two previous papers. Our results show that the distribution of SE publications has a major inequality in terms of impact overall, and also when categorized in terms of the above four aspects. The situation in the SE literature is similar to the other areas of science as studied by previous bibliometrics studies. Also, among our results is the fact that journal articles and conference papers have been cited 12.6 and 3.6 times on average, confirming the expectation that journal articles have more impact, in general, than conference papers. Also, papers originated from English-speaking countries have in general more visibility and impact (and consequently citations) when compared to papers originated from non-English-speaking countries. Our results have implications for improvement of academic reward systems, which nowadays mainly encourage researchers to publish more papers and usually neglect research impact. Also, our results can help researchers in non-English-speaking countries to consider improvements to increase their research impact of their upcoming papers.
This study explores how the citation of open access (OA) journal articles occurs by analyzing the impact of certain journal characteristics, namely, whether the journal is OA and whether its country of publication is the same as the affiliation of a paper's author. As the language of a paper is an important factor contributing to paper citations, this study uses papers in English. This analysis included publications from 77 countries from 2010 to 2012. This analysis included 19,530 journals and 3,215,742 papers without duplication. The results showed that papers published in OA and international journals were cited in more countries than non-OA and domestic journals, and a higher percentage of these were being cited by foreign countries. From these findings, it was determined that the more widely accessible OA journals were effectively being accessed by researchers from multiple countries. However, of the top 10% most cited papers in international journals, a higher percentage of these came from non-OA compared to OA journals. Among domestic journals, no such difference was found. Papers published in non-OA international journals were most cited in foreign countries with a large number of published papers. Hence, the effect of OA's expanded accessibility, while having an apparent effect on heightening the interest of foreign readership, has a limited impact in terms of increasing citations.
A bibliometric analysis based on Web of Science was carried out to provide a critical analysis of the literature on atmospheric pollution sources during 2006-2015. The result showed that particulate matter represented the core of this research field, and methods of source apportionment such as positive matrix factorization were the mainstream techniques. Five clusters were identified from the keywords network with central node of particulate matter, traffic, heavy metal, elemental carbon and renewable energy respectively. The hotspots and their relationships were illustrated to describe the characteristics of this research field. The USA and China took the leading position, however, their research emphasis were health effect and characteristics of pollutants respectively. International collaboration was mostly conducted within Asia-Pacific countries, and EU countries. For journals, Atmospheric Environment was most productive during the study period while Environmental Science and Technology had highest impact factor in 2015. This study provides an effective approach to obtain a general knowledge of the atmospheric pollution sources and supports a deeper understanding of research directions in the future.
Multiple studies report that male scholars cite publications of male authors more often than their female colleagues do-and vice versa. This gender homophily in citations points to a fragmentation of science along gender boundaries. However, it is not yet clear whether it is actually (perceived) gender characteristics or structural conditions related to gender that are causing the heightened citation frequency of same-sex authors. A bibliometric study on the two leading German communication science journals Publizistik and Medien & Kommunikationswissenschaft was employed to further analyze the causes of the phenomenon. As scholars tend to primarily cite sources from their own area of research, differences among male and female scholars regarding their engagement in certain research fields become relevant. It was thus hypothesized that the research subject might mediate the relationship between the citing and cited authors' genders. A first analysis based on n = 917 papers published in the period from 1970 to 2009 confirmed the expected gender-differences regarding research-activity in certain fields. Subsequently, structural equation modeling was employed to test the suggested mediation model. Results show the expected mediation to be a complementary one indicating that gender homophily in citations is partly due to topical boundaries. While there are alternative explanations for the remaining direct effect, it may suggest that a fragmentation of science along gender boundaries is indeed an issue that communication science must face.
In an earlier paper we identified three 'sleeping beauties' in Psychology, that is three important papers that were not cited by others for many years before becoming much later citation classics. In this paper we identify the 'princes' that alerted psychologists to these 'sleeping beauties', and we show how new computer-based techniques now help us to locate princes as well as sleeping beauties.
This paper aims to determine the existence of differential characteristics between monographic special issues and regular non-monographic issues published by psychology journals according to different bibliometric indicators. The materials studied consisted of a total of 1120 articles published in 10 Ibero-American psychology journals included in the JCRs from 2013 to 2015. The number of monographic articles was 286 and the non-monographic works were 834. The results indicate that the articles published in monographic special issue journal receive a higher number of citations and that their publication times are shorter. A greater presence of journal committee members as authors of the papers in monographic special issues was also observed, and the number of authors per paper was lower compared to articles published in non-monographs. As a conclusion, publishing papers in monographic special issues versus non-monographic in the reviewed journals has some advantages for both journals and authors, such as greater international visibility and shorter publication times.
The Keeling curve has become a chemical landmark, whereas the papers by Charles David Keeling about the underlying carbon dioxide measurements are not cited as often as can be expected against the backdrop of his final approval. In this bibliometric study, we analyze Keeling's papers as a case study for under-citedness of climate change publications. Three possible reasons for the under-citedness of Keeling's papers are discussed: (1) The discourse on global cooling at the starting time of Keeling's measurement program, (2) the underestimation of what is often seen as "routine science", and (3) the amount of implicit/informal citations at the expense of explicit/formal (reference-based) citations. Those reasons may have contributed more or less to the slow reception and the under-citedness of Keeling's seminal works.
Gender studies (GS) has been challenged on epistemological grounds. Here, we compare samples of peer-reviewed academic journal publications written by GS authors and authors from closely related disciplines in the social sciences. The material consisted of 2805 statements from 36 peer-reviewed journal articles, sampled from the Swedish Gender Studies List, which covers > 12,000 publications. Each statement was coded as expressing a lack of any of three aspects of objectivity: Bias, Normativity, or Political activism, or as considering any of four realms of explanation for the behaviours or phenomena under study: Biology/genetics, Individual/group differences, Environment/culture, or Societal institutions. Statements in GS publications did to a greater extent express bias and normativity, but not political activism. They did also to a greater extent consider cultural, environmental, social, and societal realms of explanation, and to a lesser extent biological and individual differences explanations.
Using bibliometric data for the evaluation of the research of institutions and individuals is becoming increasingly common. Bibliometric evaluations across disciplines require that the data be normalized to the field because the fields are very different in their citation processes. Generally, the major bibliographic databases such as Web of Science (WoS) and Scopus are used for this but they have the disadvantage of limited coverage in the social science and humanities. Coverage in Google Scholar (GS) is much better but GS has less reliable data and fewer bibliometric tools. This paper tests a method for GS normalization developed by Bornmann et al. (J Assoc Inf Sci Technol 67:2778-2789, 2016) on an alternative set of data involving journal papers, book chapters and conference papers. The results show that GS normalization is possible although at the moment it requires extensive manual involvement in generating and validating the data. A comparison of the normalized results for journal papers with WoS data shows a high degree of convergent validity.
ResearchGate has launched its own citation index by extracting citations from documents uploaded to the site and reporting citation counts on article profile pages. Since authors may upload preprints to ResearchGate, it may use these to provide early impact evidence for new papers. This article assesses the whether the number of citations found for recent articles is comparable to other citation indexes using 2675 recently-published library and information science articles. The results show that in March 2017, ResearchGate found less citations than did Google Scholar but more than both Web of Science and Scopus. This held true for the dataset overall and for the six largest journals in it. ResearchGate correlated most strongly with Google Scholar citations, suggesting that ResearchGate is not predominantly tapping a fundamentally different source of data than Google Scholar. Nevertheless, preprint sharing in ResearchGate is substantial enough for authors to take seriously.
Human computation games (HCGs) harness human intelligence through enjoyable gameplay to address computational problems that are beyond the power of computer programs but trivial for humans. With the popularity of crowdsourcing, different types of HCGs have been developed using various gameplay mechanics to attract online users to contribute outputs. Two commonly used mechanics are collaboration and competition. Yet there is little research examining whether HCGs perform better than nongame applications in terms of motivations and perceptions. Thus, this study investigates the effects of collaborative and competitive mechanics on intrinsic motivation and perceived output quality in mobile content sharing HCGs. Using a within-subjects experiment, 160 participants were recruited from 2 local universities. The findings suggest that the nongame application was perceived to yield better quality output than both HCGs, but the latter offered a greater satisfaction of motivational needs, which may motivate individuals to continue playing them. Taken together, the present findings inform researchers and designers of HCGs that games could serve as a motivator to encourage participation. However, the usefulness of HCGs may be dependent on how one can effectively manage the entertainment-output generation duality of such games. This article concludes by presenting implications, limitations, and future research directions.
Product information labels can help users understand complex information, leading them to make better decisions. One area where consumers are particularly prone to make costly decision-making errors is long-term saving, which requires understanding of complex concepts such as uncertainty and trade-offs. Although most people are poorly equipped to deal with such concepts, interactive design can potentially help users make better decisions. We developed an interactive information label to assist consumers with retirement saving decision-making. To evaluate it, we exposed 450 users to one of four user interface conditions in a retirement saving simulator where they made 35 yearly decisions under changing circumstances. We found a significantly better ability of users to reach their goals with the information label. Furthermore, users who interacted with the label made better decisions than those who were presented with a static information label. Lastly, we found the label particularly effective in helping novice savers.
Web search engines act as gatekeepers when people search for information online. Research has shown that search engine users seem to trust the search engines' ranking uncritically and mostly select top-ranked results. This study further examines search engine users' selection behavior. Drawing from the credibility and information research literature, we test whether the presence or absence of certain credibility cues influences the selection probability of search engine results. In an observational study, participants (N=247) completed two information research tasks on preset search engine results pages, on which three credibility cues (source reputation, message neutrality, and social recommendations) as well as the search result ranking were systematically varied. The results of our study confirm the significance of the ranking. Of the three credibility cues, only reputation had an additional effect on selection probabilities. Personal characteristics (prior knowledge about the researched issues, search engine usage patterns, etc.) did not influence the preference for search results linked with certain credibility cues. These findings are discussed in light of situational and contextual characteristics (e.g., involvement, low-cost scenarios).
The accurate suggestion of interesting friends arises as a crucial issue in recommendation systems. The selection of friends or followees responds to several reasons whose importance might differ according to the characteristics and preferences of each user. Furthermore, those preferences might also change over time. Consequently, understanding how friends or followees are selected emerges as a key design factor of strategies for personalized recommendations. In this work, we argue that the criteria for recommending followees needs to be adapted and combined according to each user's behavior, preferences, and characteristics. A method is proposed for adapting such criteria to the characteristics of the previously selected followees. Moreover, the criteria can evolve over time to adapt to changes in user behavior, and broaden the diversity of the recommendation of potential followees based on novelty. Experimental evaluation showed that the proposed method improved precision results regarding static criteria weighting strategies and traditional rank aggregation techniques.
This study explores the main determinants of social network adoption at the country level. We use the technology-organization-environment (TOE) framework to investigate factors influencing social network adoption. The authors use cross-sectional data from 130 countries. The results indicate that social network adoption, at the country level, is positively influenced by the technological maturity, public readiness, and information and communication technology law sophistication. Technological, organizational, and environmental factors altogether accounted for 67% of variance in social network adoption. These findings provide a first insight into the usage of social network sites at the country level, as well as the main factors that influence public adoption. Implications for research and practice are discussed.
Query reformulations can occur multiple times in a session, and queries observed in the same session tend to be related to each other. Due to the interdependent nature of queries in a session, it has been challenging to analyze query reformulation data while controlling for possible dependencies among queries. This study proposes a multilevel modeling approach in an attempt to analyze the effects of contextual factors and system features on types of query reformulation, as well as the relationship between types of query reformulation and search performance within a single research model. The results revealed that system features and users' educational background significantly influence users' query reformulation behaviors. Also, types of query reformulation had a significant impact on search performance. The main contribution of this study lies in that it adopted the multilevel modeling method to analyze query reformulation behavior while considering the nested structure of search session data. Multilevel analysis enables us to design an extensible research model to include both session-level and action-level factors, which provides a more extended understanding of the relationships among factors that influence query reformulation behavior and search performance. The multilevel modeling used in this study has practical implications for future query reformulation studies.
Music mood recognition (MMR) has attracted much attention in music information retrieval research, yet there are few MMR studies that focus on non-Western music. In addition, little has been done on connecting the 2 most adopted music mood representation models: categorical and dimensional. To bridge these gaps, we constructed a new data set consisting of 818 Chinese Pop (C-Pop) songs, 3 complete sets of mood annotations in both representations, as well as audio features corresponding to 5 distinct categories of musical characteristics. The mood space of C-Pop songs was analyzed and compared to that of Western Pop songs. We also explored the relationship between categorical and dimensional annotations and the results revealed that one set of annotations could be reliably predicted by the other. Classification and regression experiments were conducted on the data set, providing benchmarks for future research on MMR of non-Western music. Based on these analyses, we reflect and discuss the implications of the findings to MMR research.
Visitors to museums and other cultural heritage sites encounter a wealth of exhibits in a variety of subject areas, but can explore only a small number of them. Moreover, there typically exists rich complementary information that can be delivered to the visitor about exhibits of interest, but only a fraction of this information can be consumed during the limited time of the visit. Recommender systems may help visitors to cope with this information overload. Ideally, the recommender system of choice should model user preferences, as well as background knowledge about the museum's environment, considering aspects of physical and thematic relevancy. We propose a personalized graph-based recommender framework, representing rating history and background multi-facet information jointly as a relational graph. A random walk measure is applied to rank available complementary multimedia presentations by their relevancy to a visitor's profile, integrating the various dimensions. We report the results of experiments conducted using authentic data collected at the Hecht museum. 1 An evaluation of multiple graph variants, compared with several popular and state-of-theart recommendation methods, indicates on advantages of the graph-based approach.
Whereas traditional science maps emphasize citation statistics and static relationships, this paper presents a term-based method to identify and visualize the evolutionary pathways of scientific topics in a series of time slices. First, we create a data preprocessing model for accurate term cleaning, consolidating, and clustering. Then we construct a simulated data streaming function and introduce a learning process to train a relationship identification function to adapt to changing environments in real time, where relationships of topic evolution, fusion, death, and novelty are identified. The main result of the method is a map of scientific evolutionary pathways. The visual routines provide a way to indicate the interactions among scientific subjects and a version in a series of time slices helps further illustrate such evolutionary pathways in detail. The detailed outline offers sufficient statistical information to delve into scientific topics and routines and then helps address meaningful insights with the assistance of expert knowledge. This empirical study focuses on scientific proposals granted by the United States National Science Foundation, and demonstrates the feasibility and reliability. Our method could be widely applied to a range of science, technology, and innovation policy research, and offer insight into the evolutionary pathways of scientific activities.
The high adoption of smart mobile devices among consumers provides an opportunity for e-commerce retailers to increase their sales by recommending consumers with real time, personalized coupons that take into account the specific contextual situation of the consumer. Although context-aware recommender systems (CARS) have been widely analyzed, personalized pricing or discount optimization in recommender systems to improve recommendations' accuracy and commercial KPIs has hardly been researched. This article studies how to model user-item personalized discount sensitivity and incorporate it into a real time contextual recommender system in such a way that it can be integrated into a commercial service. We propose a novel approach for modeling context-aware user-item personalized discount sensitivity in a sparse data scenario and present a new CARS algorithm that combines coclustering and random forest classification (CBRF) to incorporate the personalized discount sensitivity. We conducted an experimental study with real consumers and mobile discount coupons to evaluate our solution. We compared the CBRF algorithm to the widely used context-aware matrix factorization (CAMF) algorithm. The experimental results suggest that incorporating personalized discount sensitivity significantly improves the consumption prediction accuracy and that the suggested CBRF algorithm provides better prediction results for this use case.
The prevalence of sites in which users can contribute content increases ordinary citizens' participation in emerging forms of knowledge sharing. This article investigates the practices associated with the roles of participants who actively contribute to the coproduction of knowledge in three online communities and how these roles differ in controversial and noncontroversial threads. The Measles, Mumps, and Rubella (MMR) vaccine was selected as a contentious scientific topic because of persistent belief about an alleged link between the vaccine and autism. Contributions to three online communities that engage mothers with young children were analyzed to identify participant roles. No consistent roles were evident in noncontroversial threads, but the role of mediator consistently appeared in controversial threads in all three communities. This study helps to articulate the roles played in online communities that engage in knowledge collaboration. The variety of roles in online communities has implications for both the study for practice and the design of information technologies.
Patents sometimes cite webpages either as general background to the problem being addressed or to identify prior publications that limit the scope of the patent granted. Counts of the number of patents citing an organization's website may therefore provide an indicator of its technological capacity or relevance. This article introduces methods to extract URL citations from patents and evaluates the usefulness of counts of patent web citations as a technology indicator. An analysis of patents citing 200 US universities or 177 UK universities found computer science and engineering departments to be frequently cited, as well as research-related webpages, such as Wikipedia, YouTube, or the Internet Archive. Overall, however, patent URL citations seem to be frequent enough to be useful for ranking major US and the top few UK universities if popular hosted subdomains are filtered out, but the hit count estimates on the first search engine results page should not be relied upon for accuracy.
Big Science and cross-disciplinary collaborations have reshaped the intellectual structure of research areas. A number of works have tried to uncover this hidden intellectual structure by analyzing citation contexts. However, none of them analyzed by document logical structures such as sections. The two major goals of this study are to find characteristics of authors who are highly cited section-wise and to identify the differences in section-wise author networks. This study uses 29,158 of research articles culled from the ACL Anthology, which hosts articles on computational linguistics and natural language processing. We find that the distribution of citations across sections is skewed and that a different set of highly cited authors share distinct academic characteristics, according to their citation locations. Furthermore, the author networks based on citation context similarity reveal that the intellectual structure of a domain differs across different sections.
SlideShare is a free social website that aims to help users distribute and find presentations. Owned by LinkedIn since 2012, it targets a professional audience but may give value to scholarship through creating a long-term record of the content of talks. This article tests this hypothesis by analyzing sets of general and scholarly related SlideShare documents using content and citation analysis and popularity statistics reported on the site. The results suggest that academics, students, and teachers are a minority of SlideShare uploaders, especially since 2010, with most documents not being directly related to scholarship or teaching. About two thirds of uploaded SlideShare documents are presentation slides, with the remainder often being files associated with presentations or video recordings of talks. SlideShare is therefore a presentation-centered site with a predominantly professional user base. Although a minority of the uploaded SlideShare documents are cited by, or cite, academic publications, probably too few articles are cited by SlideShare to consider extracting SlideShare citations for research evaluation. Nevertheless, scholars should consider SlideShare to be a potential source of academic and nonacademic information, particularly in library and information science, education, and business.
Although peer-review and citation counts are commonly used to help assess the scholarly impact of published research, informal reader feedback might also be exploited to help assess the wider impacts of books, such as their educational or cultural value. The social website Goodreads seems to be a reasonable source for this purpose because it includes a large number of book reviews and ratings by many users inside and outside of academia. To check this, Goodreads book metrics were compared with different book-based impact indicators for 15,928 academic books across broad fields. Goodreads engagements were numerous enough in the arts (85% of books had at least one), humanities (80%), and social sciences (67%) for use as a source of impact evidence. Low and moderate correlations between Goodreads book metrics and scholarly or non-scholarly indicators suggest that reader feedback in Goodreads reflects the many purposes of books rather than a single type of impact. Although Goodreads book metrics can be manipulated, they could be used guardedly by academics, authors, and publishers in evaluations.
Although news stories target the general public and are sometimes inaccurate, they can serve as sources of real-world information for researchers. This article investigates the extent to which academics exploit journalism using content and citation analyses of online BBC News stories cited by Scopus articles. A total of 27,234 Scopus-indexed publications have cited at least one BBC News story, with a steady annual increase. Citations from the arts and humanities (2.8% of publications in 2015) and social sciences (1.5%) were more likely than citations from medicine (0.1%) and science (<0.1%). Surprisingly, half of the sampled Scopus-cited science and technology (53%) and medicine and health (47%) stories were based on academic research, rather than otherwise unpublished information, suggesting that researchers have chosen a lower-quality secondary source for their citations. Nevertheless, the BBC News stories that were most frequently cited by Scopus, Google Books, and Wikipedia introduced new information from many different topics, including politics, business, economics, statistics, and reports about events. Thus, news stories are mediating real-world knowledge into the academic domain, a potential cause for concern.
The paper examines the research performance of European universities in a disaggregated way, using a large array of indicators from Scopus publications, including indicators of volume (number of articles; number of citations) and indicators of quality (percentage of publications in top 10% and top 25% SNIP journals; percentage of citations from top 10% and top 25% journals). These indicators are considered dependent variables in a multi-level estimation framework, in which research performance in a scientific area depends on variables at the level of university and at the level of the external regional environment. The area examined is Medicine, for the 2007-2010 period. The paper exploits for the first time the integration of publication data with the census of European universities (ETER). A large number of hypotheses are tested and discussed.
The objective of this research is elaborating new criteria for evaluating the significance of the research results achieved by scientific teams. It is known, that the h-index (Hirsch index) is used to evaluate scientific organizations, as well as individual scientific workers. On the one hand, such a scientometric indicator as the "h-index of a scientific organization" reflects the organization's scientific potential objectively. On the other hand, it does not always adequately reflect the significance that the results of a scientific team's research activity have for the scientific megaenvironment (scientific community). The i-index has an even greater disadvantage, being principally limited by the size of a scientific team, although h-index is also dependent on the number of publications. Not trying to diminish the significance of the traditional parameters for monitoring the research activity of scientific organizations, including the institutions of higher education, the authors stress the necessity of using not only the traditional indicators, but also other parameters reflecting the significance of a scientific team's research results for the scientific community. It should also not be forgotten that a scientific team is a social system whose functioning is not limited to the "sum" of individual scientific workers' activities. The authors suggest new criteria of significance of research activity of scientific teams, which are suitable for the specific usage, hence they (the indicators) should be used with great caution; it is most appropriate to use the authors' criteria for analyzing the dynamics of the research activity of scientific teams (following the principle "Compare yourself with yesterday's yourself"). The authors' proposed citation-based indicators make it possible to evaluate the true significance of research activity of a scientific team for the scientific community; while defining and justifying the new criteria, the authors also took into consideration the actuality of such a problem as the struggle with the self-citation effect (in a wider context-the problem of struggling with the artificial "improvement" of the scientometric indicators). The methodological basis of the research is formed by the system, metasystem, probability statistic, synergetic, sociological and qualimetric approaches. The research methods are the analysis of the problem situation, the analysis of the scientific literature and the best practices of research activity management at the institutions of higher education (benchmarking), the cognitive, structural-functional and mathematical modelling, the methods of graph, set and relation theory, the methods of qualimetry (the theory of latent variables), the methods of probability theory and mathematical statistics.
Concerns regarding the high level of research and development (R&D) expenditure on military technology have prompted many nations to pursue a dual-use regime in military R&D. However, the value of dual-use military technology has not yet been quantitatively investigated. We explore whether military technology with a higher level of duality has been more valuable than that with a lower level of duality. We assume that the patent of valuable military technology was renewed until its termination. We retrieve military patents from the United States Patent and Trademark Office during 1976-2014 based on their International Patent Classification (IPC) as F41 or F42. Then, we propose three indicators to assess the duality level of them. The first indicator is based on the determination of whether the patented technology is utilizable in both the military and the civilian sectors using its IPC. For the second indicator, we estimate the potential of convergence of a patented technology with various technological fields using the degree of centrality of the IPC's co-occurrence network. The third indicator is based on ratio of forward citation by the civilian sector over the total number of forward citations as a measurement of technology diffusion toward the civilian sector. Using logistic regression, we found that the first two indicators are positively associated with patent renewal decision, while the last indicator is nonsignificant. The effects of the two significant indicators suggests that military technologies are more valuable when the technology itself can be used in various sectors, including the civilian sector, and can be converged with technologies in different fields. However, the nonsignificant influence of the third variable suggests that the relation between patent value and diffusion effects toward following inventions is not confined to the civilian sector. Our findings provide evidence of the impact of dual-use policies in military R&D.
Technologies play an important role in the survival and development of enterprises. Understanding and monitoring the core technological components (e.g., technology process, operation method, function) of a technology is an important issue for researchers to develop R&D policy and manage product competitiveness. However, it is difficult to identify core technological components from a mass of terms, and we may experience some difficulties with describing complete technical details and understanding the terms-based results. This paper proposes a Subject-Action-Object (SAO)-based method, in which (1) a syntax-based approach is constructed to extract the SAO structures describing the function, relationship and operation in specified topics; (2) a systematic method is built to extract and screen technological components from SAOs; and (3) we propose a "relevance indicator" to calculate the relevance of the technological components to requirements, and finally identify core technological components based on this indicator. Based on the considerations for requirements and novelty, the core technological components identified have great market potential and can be useful in monitoring and forecasting new technologies. An empirical study of graphene is performed to demonstrate the proposed method. The resulting knowledge may hold interest for R&D management and corporate technology strategies in practice.
Technology licensing is viewed as the key factor for activating the sleeping patents. This study re-examines the relationship between the firm size and its technology licensing activity. The empirical results show that there is a U-shaped relationship between the firm size and technology licensing. However, this U-shaped relationship appears only in the markets with high competition, which confirms a moderate role of the technology competition in the relationship between the firm size and technology licensing. Chinese firms lag behind developed countries in terms of the licensing strategies. e.g., Chinese firms have fewer patents that are cross licensed. China's export-oriented firms show relatively more positive licensing propensity, where large, small and medium sized firms do not show essentially different willingness to license out their patents compared with non export-oriented firms. China's state owned firms are less likely to license out their patents compared with that of private firms. Policy implications are presented at the end of this study.
Our objective was to study the relationship between the design and content of randomized clinical trials (RCTs) and the subsequent number of citations in the medical literature and attention in online news and social media. We studied RCTs published during 2014 in five highly cited medical journals. This was a retrospective review focusing on characteristics of the individual trials and measures of citation and lay media attention. Primary outcome measures included citation count and Altmetric((R)) scores (a composite score measuring attention in news, blogs, Twitter((R)), and Facebook((R))). Two hundred and forty two RCTs were included in the final analysis. Trial characteristics that were positive predictors of citation count included investigation of Hepatitis C treatment (r = 0.35, p < 0.001), private funding (r = 0.24, p < 0.001), mortality-related endpoint (r = 0.22, p < 0.001), and research setting within the United States (r = 0.13, p < 0.001). The trial characteristic that positively predicted Altmetric score was the population size potentially affected (r = 0.39, p < 0.001). The only negative predictor of citation count was the size of the population potentially affected (r = -0.21, p < 0.001). Negative predictors of the Altmetric score included investigation of Hepatitis C treatment (r = -0.21, p < 0.001) and private funding (r = -0.13, p < 0.001). While correlation magnitudes were weak, the predictors of biomedical literature citation and non-academic media coverage were different. These predictors may affect editorial decisions and, given the rising influence of health journalism, further study is warranted.
The aim of this paper is to explore the power-law relationship between the degree centrality of countries and their citation-based performance in Management Information Systems research. We analyzed 27,662 articles that received 127,974 citations. The distribution of the citation-based performance follows a power law with exponent of -2.46 +/- 0.05. The distribution of the centrality degree of countries follows a power law with exponent of -2.26 +/- 0.24. The citation-based performance and degree centrality exhibited a power-law correlation with a scaling exponent of 1.22 +/- 0.04. Citations to the articles of a country in MIS tend to increase 2(1.22) or 2.33 times each time it doubles its degree centrality in the international collaborative network. Policies that encourage a country to increase its degree centrality in a collaboration network can disproportionately increase the impact of its research.
Taking advantage of the easy access to the rich and massive scholarly data, more and more researchers are focusing on the studies of analyzing and utilizing the scholarly big data. Among them, evaluating the scientific impact of scholars has significant importance. Measuring the scientific impact of scholars can not only provide basis for the applications of academic foundations and awards, but also shed light on the research directions for scholars. Currently, citation based methods and network based metrics are the most commonly used ways to evaluate the scientific impact. However, these approaches ignore several important facts, i.e. the dynamics of citations and the initial qualities of different articles. To alleviate the shortcomings of them, we propose a Time-aware Ranking algorithm (TRank) to evaluate the impact of scholars. Due to scholars' sustainable supreme concerns of academic innovations, the TRank algorithm gives more credits to the newly published scholarly papers as well as their references according to the representative time functions. Our method also combines the merits of random walk algorithms and heterogeneous network topology, i.e. the mutual influences among different scholarly entities in heterogeneous academic networks. To validate the suitable time function for TRank algorithm and explore its performance, we construct the experiments on two real datasets: (1) Digitial Bibliography and Library Project, and (2) American Physical Society. The experimental results demonstrate that our algorithm outperforms other methods in selecting outstanding scholars and the evaluation results on the overall impact of scholars.
Biological collections are sources of knowledge, particularly critical to understand life when they house specimens from megadiverse countries. However, the scientific value of biological collections is usually unknown because the lack of an explicit link between knowledge and specimens. Here we compiled 628 papers from 152 journals that used collection objects from the Colecciones Biolgicas del Instituto de Investigacin de Recursos Biolgicos Alexander von Humboldt, Colombia (IAvH-CB) as sources. The compilation was largely based on expert knowledge. However, to assess the performance of our method we compared our results with results obtained conducting automatic searches in academic databases. We calculated different metrics and depicted geographical, taxonomic, and bibliometric trends. We found that geographic coverage of the IAvH-CB objects used in the studies is largely regional or national. Taxonomically, we found records of 176 families in 61 orders of taxa, but there is large variation among the number of studies in different groups. The bibliometric analyses indicated that there is a growing trend in the number of publications and citations over the years, and that the citation number as well as the H index of this set of papers is comparable to the knowledge produced by major researchers in Colombia and of similar magnitude to that of the production of relatively small or medium sized collections in the USA. The compilation method used performed well, with broad coverage and an omission rate below 8%, compared with automated searches. However, we conclude that both approaches, expert knowledge and automated searches, are complementary. IAvH-CB are a massive source of scientific knowledge about Colombian biodiversity and they are instrumental for documenting basic issues about taxa in the country.
We analyze the scientific productivity of the public state universities that are located outside of the metropolitan area of Mexico City through the outputs of researchers who have the benefit of been "New Full Time Professors" (NFTP) of the "Faculty Improvement Program" (PRODEP). This program was designed by the Mexican government with special emphasis to support the creation of new full-time positions in these universities. A good proportion of professors hired as NFTP in the universities mentioned earlier became members of the national system of researchers (SNI, Sistema Nacional de Investigadores). As a consequence of these two public policies, SNI and PRODEP, our results show that research groups of these universities are reducing their science gap with respect to the stronger research institutions located in Mexico City in terms of the outputs that are being published in mainstream journals with high impact factors. We found that engineering is an area that has been mainly benefited with the PRODEP program in the period 2010-2015.
The field of dissemination and implementation (D&I) research in health has grown considerably in the past decade. Despite the potential for advancing the science, limited research has focused on mapping the field. We administered an online survey to individuals in the D&I field to assess participants' demographics and expertise, as well as engagement with journals and conferences, publications, and grants. A combined roster-nomination method was used to collect data on participants' advice networks and collaboration networks; participants' motivations for choosing collaborators was also assessed. Frequency and descriptive statistics were used to characterize the overall sample; network metrics were used to characterize both networks. Among a sub-sample of respondents who were researchers, regression analyses identified predictors of two metrics of academic performance (i.e., publications and funded grants). A total of 421 individuals completed the survey, representing a 30.75% response rate of eligible individuals. Most participants were White (n = 343), female (n = 284, 67.4%), and identified as a researcher (n = 340, 81%). Both the advice and the collaboration networks displayed characteristics of a small world network. The most important motivations for selecting collaborators were aligned with advancing the science (i.e., prior collaborators, strong reputation, and good collaborators) rather than relying on human proclivities for homophily, proximity, and friendship. Among a sub-sample of 295 researchers, expertise (individual predictor), status (advice network), and connectedness (collaboration network) were significant predictors of both metrics of academic performance. Network-based interventions can enhance collaboration and productivity; future research is needed to leverage these data to advance the field.
The article evaluates 48 approaches to leadership for the first time for the period of 1965-2016 using bibliometric methods. Our analysis combines four parameters: the number of publications, citations, self-citations, and dispersion of literature (a new parameter). The question of what is the most influential approach to leadership is addressed. We argue that the interplay among the four attributes observed in this study shows that transformational leadership is the most influential and popular approach to leadership in leadership studies today. However, there is a question of whether the excessive self-citations in this approach to leadership play a role in its augmented visibility.
We investigate whether and in what measure scientists tend to diversify their research activity, and if this tendency varies according to their belonging to different disciplinary areas. We analyze the nature of research diversification along three dimensions: extent of diversification, intensity of diversification, and degree of relatedness of topics in which researchers diversifies. For this purpose we propose three bibliometric indicators, based on the disciplinary placement of scientific output of individual scientists. The empirical investigation shows that the extent of diversification is lowest for scientists in Mathematics and highest in Chemistry; intensity of diversification is lowest in Earth sciences and highest in Industrial and information engineering; and degree of relatedness is lowest in Earth sciences and highest in Chemistry.
In this study, the evolution of the connected health concept is analysed and visualized to investigate the ever-tightening relationship between health and technology as well as emerging possibilities regarding delivery of healthcare services. A scientometric analysis was undertaken to investigate the trends and evolutionary relations between health and information systems through the queries in the Web of Science database using terms related to health and information systems. To understand the evolutionary relation between different concepts, scientometric analyses were conducted within five-year intervals using the VantagePoint, SciMAT, and CiteSpace II software. Consequently, the main stream of publications related to the connected health concept matching telemedicine cluster was determined. All other developments in health and technologies were discussed around this main stream across years. The trends obtained through the analysis provide insights about the future of healthcare and technology relationship particularly with rising importance of privacy, personalized care along with mobile networks and mobile infrastructure.
Synthetic biology is an emerging domain that combines biological and engineering concepts and which has seen rapid growth in research, innovation, and policy interest in recent years. This paper contributes to efforts to delineate this emerging domain by presenting a newly constructed bibliometric definition of synthetic biology. Our approach is dimensioned from a core set of papers in synthetic biology, using procedures to obtain benchmark synthetic biology publication records, extract keywords from these benchmark records, and refine the keywords, supplemented with articles published in dedicated synthetic biology journals. We compare our search strategy with other recent bibliometric approaches to define synthetic biology, using a common source of publication data for the period from 2000 to 2015. The paper details the rapid growth and international spread of research in synthetic biology in recent years, demonstrates that diverse research disciplines are contributing to the multidisciplinary development of synthetic biology research, and visualizes this by profiling synthetic biology research on the map of science. We further show the roles of a relatively concentrated set of research sponsors in funding the growth and trajectories of synthetic biology. In addition to discussing these analyses, the paper notes limitations and suggests lines for further work.
This paper is a result of the WOW project (Wind power On Wikipedia) which forms part of the SAPIENS (Scientometric Analyses of the Productivity and Impact of Eco-economy of Spain) project (Sanz-Casado et al. in Scientometrics 95(1):197-224, 2013). WOW is designed to observe the relationship between scholarly publications and societal impact or visibility through the mentions of scholarly papers (journal articles, books and conference proceedings papers) in the Wikipedia, English version. We determine (1) the share of scientific papers from a specific set defined by Wind Power research in Web of Science (WoS) 2006-2015 that are included in Wikipedia entries, named data set A; (2) the distribution of scientific papers in Wikipedia entries on Wind Power, named data set B, captured via the three categories for the topic Wind Power in the Wikipedia Portal: Wind Power, Wind turbines and Wind farms; (3) the distributions of document types in the two wiki entry data sets' reference lists. In parallel the paper aims at designing and test indicators that measure societal impact and R&D properties of the Wikipedia, such as, a wiki reference focus measure; and a density measure of those types in wiki entries. The study is based on Web mining techniques and a developed software that extracts a range of different types of Wikipedia references from the data sets A and B. Findings show that in data set A 25.4% of the wiki references are academic, with a density of 17.62 academic records detected per wiki entry. However, only 0.62% of the original WoS records on Wind Power are also found as wiki references, implying that the direct societal impact through the Wikipedia is extremely small for Wind Power research. In the second Wikipedia set on Wind Power (data set B), the presence of scientific papers is even more insignificant (10.6%; density: 3.08; WoS paper percentage: 0.26%). Notwithstanding, the Wikipedia can be used as a tool informing about the transfer from scholarly publications to popular and non-peer reviewed publications, such as Web pages (news, blogs), popular magazines (science/technology) and research reports. Non-scholarly wiki reference types stand for 74.6% of the wiki references (data set A) and almost 90% in data set B. Interestingly, the few WoS articles in wiki entries on Wind Power present on average 34.3 citations received during the same period (2006-2015) as WoS Wind Power publications not mentioned in wiki entries only receives on average 5.9 citations. Owing to the scarcity of Wind Power research papers in the Wikipedia, it cannot be applied as a direct source in evaluation of Wind Power research. This is in line with other recent studies regarding other subject areas. However, our analysis presents and discusses six supplementary indirect indicators for research evaluation, based on publication types found in the wiki entry reference lists: share of (WoS) records; density; and reference focus, plus popular science knowledge export, non-scholarly knowledge export and academic knowledge export. The same indicators are direct measures of the Wikipedia reference properties.
Knowledge organization systems (KOS) have long been established as a tool to represent organized interpretation of knowledge structures. Various types of KOS such as discipline tree for research projects, subject categories for research publications, and classifications schemes for research patents have been constructed and widely used in R&D contexts. However, the incompatible KOS, together with information proliferation in the Big Data Era, impose great challenges for effective research management. How to facilitate interoperability among heterogeneous research information sources is an important problem to be solved. KOS mapping methods were proposed to provide alternative subject access by establishing equivalence or partial equivalence relationships between classes in different KOS but suffered from "lagging mapping" and "deficient mapping". This research proposes a social network approach that leverages online social platform information (i.e., research activities and social activities) for KOS mapping. The underlying assumption behind the approach is that "two classes/terms in different KOS are related if their corresponding research objects are connected to similar researchers". The social network approach leverages social network analysis instead of semantic and structure analysis of metadata information for mapping degree calculation. The approach has been implemented on the largest research social platform in China and successfully realizes mapping between KOS for research management purpose. We conducted mapping between National Natural Science Foundation of China discipline tree and Web of Science subject categories in this study to examine the performance of social network approach.
This paper offers an overview of the bibliometric study of the domain of library and information science (LIS), with the aim of giving a multidisciplinary perspective of the topical boundaries and the main areas and research tendencies. Based on a retrospective and selective search, we have obtained the bibliographical references (title and abstract) of academic production on LIS in the database LISA in the period 1978-2014, which runs to 92,705 documents. In the context of the statistical technique of topic modeling, we apply latent Dirichlet allocation, in order to identify the main topics and categories in the corpus of documents analyzed. The quantitative results reveal the existence of 19 important topics, which can be grouped together into four main areas: processes, information technology, library and specific areas of information application.
Prior research shows that article reader counts (i.e. saves) on the online reference manager, Mendeley, correlate to future citations. There are currently no evidenced-based distribution strategies that have been shown to increase article saves on Mendeley. We conducted a 4-week randomized controlled trial to examine how promotion of article links in a novel online cross-publisher distribution channel (TrendMD) affect article saves on Mendeley. Four hundred articles published in the Journal of Medical Internet Research were randomized to either the TrendMD arm (n = 200) or the control arm (n = 200) of the study. Our primary outcome compares the 4-week mean Mendeley saves of articles randomized to TrendMD versus control. Articles randomized to TrendMD showed a 77% increase in article saves on Mendeley relative to control. The difference in mean Mendeley saves for TrendMD articles versus control was 2.7, 95% CI (2.63, 2.77), and statistically significant (p < 0.01). There was a positive correlation between pageviews driven by TrendMD and article saves on Mendeley (Spearman's rho r = 0.60). This is the first randomized controlled trial to show how an online cross-publisher distribution channel (TrendMD) enhances article saves on Mendeley. While replication and further study are needed, these data suggest that cross-publisher article recommendations via TrendMD may enhance citations of scholarly articles.
In this paper, we propose a methodology to detect latent referential articles through a universal, citation-based investigation. We discuss articles' dynamic vitality performance, concealed in their citation distributions, in order to understand the mechanisms that govern which articles are likely to be referenced in the future. Articles have diverse vitality performances expressed in the amount of citations obtained in different time periods. Through an examination of the correlation between articles' future citation count and their past citations, we establish the optimal time period during which it is best to forecast articles' future referential possibilities. The results show that the latest 2 years is the optimal time period. In other words, the correlation between the articles' future citation count and their past citation count reaches a maximum value in the most recent 2-year period. The articles with a higher vitality performance in the most recent 2 years have a higher ratio of being cited as references in the future. These results help, not only, in understanding mechanisms of generating references, but also provide an additional indicator for decision makers to evaluate the academic performance of individuals according to their citation performance in the latest 2 years.
The Cooperative Patent Classifications (CPC) recently developed cooperatively by the European and US Patent Offices provide a new basis for mapping patents and portfolio analysis. CPC replaces International Patent Classifications (IPC) of the World Intellectual Property Organization. In this study, we update our routines previously based on IPC for CPC and use the occasion for rethinking various parameter choices. The new maps are significantly different from the previous ones, although this may not always be obvious on visual inspection. We provide nested maps online and a routine for generating portfolio overlays on the maps; a new tool is provided for "difference maps" between patent portfolios of organizations or firms. This is illustrated by comparing the portfolios of patents granted to two competing firms-Novartis and MSD-in 2016. Furthermore, the data is organized for the purpose of statistical analysis.
In striving for academic relevance and recognition, editors exert a significant influence on a journal's mission and content. We examine how characteristics of editors, in particular the diversity of editorial teams, are related to journal impact. Our sample comprises 2244 editors who were affiliated with 645 volumes of 138 business and management journals. Using multi-level modeling, we relate editorial team characteristics to journal impact as reflected in three widely used measures: Five-year Impact Factor, SCImago Journal Rank, and Google Scholar h5 index. Results show that multiple editorships and editors' affiliation to institutions of high reputation are positively related to journal impact, while the length of editors' terms is negatively associated with impact scores. Surprisingly, we find that diversity of editorial teams in terms of gender and nationality is largely unrelated to journal impact. Our study extends the scarce knowledge on editorial teams and their relevance to journal impact by integrating different strands of literature and studying several demographic factors simultaneously. Results indicate that the editorial team's scientific achievement is more decisive than team diversity in terms of journal impact. The study has useful implications for the composition of editorial teams.
The essential function of scientific disciplines is to generate knowledge using the scientific method. In a context characterised by acknowledgement of methodological complementarity and the existence of innumerable mechanisms for production (articles, thesesaEuro broken vertical bar) and dissemination (databases, platformsaEuro broken vertical bar), production is a measure of any given discipline's normative development and projection. As a result of this situation, in Pedagogy-and, more specifically, in Educational Theory-we can witness a growing concern with systematising knowledge in a complex and dynamic context and based on reference frameworks. Therefore, this article aims to describe scientific production in Educational Theory, with the intention of discovering the creation and dissemination processes involved in theses retrieved from the Spanish doctoral theses database TESEO (2001-2015). A descriptive-comparative census study was carried out, with a total of n = 229 sampling units. The results show the following: thematic axes, which categorise thesis content into groups, are present; dissemination is influenced by the non-specific use of keywords, as well as by the area's organisation and structure in different universities; there are no statistically significant differences in terms of author gender. In conclusion, we can identify an ever greater intention to provide the area with more scientific content but little concern with increasing this content's visibility in databases such as TESEO. All in all, research production in Spain relating to Educational Theory is a good representative sample of researchers' efforts to project their interest in discovering, understanding and improving education. Therefore, special care should be taken when compiling production data in databases.
Impact factors are commonly used to assess journals relevance. This implies a simplified view on science as a single-stage linear process. Therefore, few top-tier journals are one-sidedly favored as outlets, such that submissions to top-tier journals explode whereas others are short of submissions. Consequently, the often claimed gap between research and practical application in application-oriented disciplines as business administration is not narrowing but becoming entrenched. A more complete view of the scientific system is needed to fully capture journalsA ' contributions in the development of a discipline. Simple citation measures, as e.g. citation counts, are commonly used to evaluate scientific work. There are many known dangers of miss- or over-interpretation of such simple data and this paper adds to this discussion by developing an alternative way of interpreting a discipline based on the positions and roles of journals in their wider network. Specifically, we employ ideas from the network analytic approach. Relative positions allow the direct comparison between different fields. Similarly, the approach provides a better understanding of the diffusion process of knowledge as it differentiates positions in the knowledge creation process. We demonstrate how different modes of social capital create different patterns of action that require a multidimensional evaluation of scientific research. We explore different types of social capital and intertwined relational structures of actors to compare journals with different bibliometric profiles. Ultimately, we develop a multi-dimensional evaluation of actor roles based upon multiple indicators and we test this approach by classifying management journals based on their bibliometric environment.
This paper is devoted to the challenges of measuring, analyzing and visualizing research capacity of university. We identify the related methodological issues, propose solutions and apply these solutions to a complex analysis of the research potential of three departments of a Russian university. First, we briefly review the current literature on different aspects of an analysis of research capacity of university. The next step is a discussion on the key challenges faced when analyzing the publication activity of a university. Further, we discuss the opportunities offered by and limitations of using the Web of Science and Scopus databases to determine the research capabilities of universities. In the empirical section of the paper, we analyse the research capacity of university departments and individual employees using simple yet illustrative tools of bibliometric analysis. We also make recommendations for university administrative personnel, which can be derived from our analysis.
This study assesses the knowledge-building dynamics of emerging technologies, their participating country-level actors, and their interrelations. We examine research on induced pluripotent stem (iPS) cells, a recently discovered stem cell species. Compared to other studies, our approach conflates the totality of publications and patents of a field, and their references, into single "techno-scientific networks" across intellectual bases (IB) and research fronts (RF). Diverse mapping approaches-co-citation, direct citation, and bibliographic coupling networks-are used, driven by the problems tackled by iPS cell researchers. Besides the study of the field of iPS cells as a whole, we assessed the roles of relevant countries in terms of "knowledge exploration," "knowledge nurturing," "knowledge exploitation," and cognitive content. The results show that a fifth of nodes in IB and edges in RF interconnect science (S) and technology (T). S and T domains tell different, yet complementing stories: S overstresses upstream activities, and T captures the increasing influential role of application domains and general technologies. Both S and T reflect the path-dependent nature of iPS cells in embryonic stem cell technologies. Building on the feedback between IB and RF, we examine the dominating role of the United States. Japan, the pioneer, falls behind in quantity, yet its global influence remains intact. New entrants, such as China, are advancing rapidly, yet, cognitively, the bulk of efforts are still upstream. Our study demonstrates the need for bibliometric assessment studies to account for S&T co-evolution. The multiple data source-based, integrated bibliometric approaches of this study are initial efforts toward this direction.
Editorial boards of cardiology journals are analyzed to find patterns of power positions in cardiology publications. The study covers the meso (journal), macro (national) and the micro (individual researcher) levels. It was confirmed that the editors are prominent members of the research community both as their publication productivity and their citation impact are concerned. It was also confirmed that the publication and citation excellence of the editors and, even more, the Editors-in-Chief is positively correlated with the standing of the journal as measured, e.g., by the impact factor.
Debate and controversy concerning the issue of climate change generally results in the hindering and obstruction of social and governmental action on this issue. This paper analyses the scientific background, i.e. the reference list of the IPCC Fifth Assessment Report "The Physical Science Basis" and an alternative climate change report of a US think tank institute "Climate Change Reconsidered II. Physical Science". We compared these two reports to the antecedent reports from 2007 (IPCC AR4 WGI) and 2009 (Climate Change Reconsidered). For the purposes of the study, we developed a database containing all the references collected from the four reports. The bibliometric analysis focused on the distribution of references among peer reviewed scientific journals and the most frequently cited lead authors that created the basis for the evaluation of their different scientific emphasis. Our findings underline that there is still no convergence between the scientific literature of the IPCC and the contrarian reports; however, the remarkable quantitative development on both sides and the qualitative progress of the IPCC report allows us to draw somewhat surprising conclusions in the context of climate change science. Contrary to expectations, controversy is beneficial to the science of climate change as it fosters the review process on both sides of the debate.
A two-step cluster analysis was performed on the absolute downloads and the relative downloads of Chinese journal papers published between 2006 and 2008. Four patterns were identified from the perspective of absolute downloads; the first three patterns can be expressed as power functions, signifying their evident aging trends, although this does not apply to pattern 4. Two patterns were identified from the perspective of relative downloads, and both present power distributions with minor differences in decline speed. Furthermore, we delved into the relationships between total downloads and article features in varying patterns and found that there are only weak correlations between total downloads and title length, number of authors, and number of keywords. However, there are moderate to high correlations between initial downloads-defined as downloads made during the first year after publication-and total downloads, suggesting that it is possible to forecast total downloads according to initial downloads. Additionally, it was found that total instances of highly downloaded papers have no correlations with article features.
Social media platform makes more and more influence on academic communication with the development of web 2.0 + technology. Wechat (WeiXin in Chinese) is one of the social media tools based on social media platform to publicly communicate about research developments and finds for academics. The development of Wechat not only enriches the development of social media tools but also provides a new perspective and method for the evaluation of academic influence. The traditional metrics are not necessarily immediate feedback or reproducible, which are based on citation methods of evaluating and filtering articles. Wechat as one of the popular social media tools, which can provide possibility to make impact evaluation as a novel and promising approach to supplement the traditional citation metrics. Wechat not only can support the social influence for non-academic findings, but also provide academic impact of scholarships. The proposed Wechat index may be a useful and timely metric to measure uptake of research findings with public in real time. In short, Wechat is greatly supplement the offline methods of research dissemination and networking.
The study of acknowledgments as a source of data on research funding is gaining ground in science, as exemplified by the present inclusion of this information in bibliographic databases such as Web of Science (WoS). The objective of this paper is to explore the completeness and accuracy of WoS in extracting and processing funding acknowledgment data. With this purpose, a random sample of articles published in eight thematic areas and selected from the scientific output of Spain in 2014 are analyzed. Funding information that appears in original articles is recorded by WoS in the "funding text" field, but is also extracted to the "funding agency" and "grant number" subfields. In the extraction process, some funding information was lost in 12% of the articles and the distribution of agencies and grant numbers by subfield was not always consistent. In addition, funding support is often incompletely reported by authors: in about half of the articles we studied, the country of origin of the funder or the grant number(s) were not mentioned. We propose and discuss the need to develop more detailed guidelines on how to acknowledge funding. More accurate documentation of funding sources in published articles would benefit researchers, funders and journals, and enhance the reliability and usefulness of studies on funding acknowledgments.
Bibliometrics is widely used as an evaluation tool to assist prospective R&D decision-making. In the UK, for example, the National Institute for Health Research (NIHR) has employed bibliometric analysis alongside wider information in several awarding panels for major funding schemes. In this paper, we examine various aspects of the use of bibliometric information by members of these award selection panels, based on interviews with ten panel members from three NIHR panels, alongside analysis of the information provided to those panels. The aim of the work is to determine what influence bibliometrics has on their decision-making, to see which types of bibliometric measures they find more and less useful, and to identify the challenges they have when using these data. We find that panel members broadly support the use of bibliometrics in panel decision-making, and that the data are primarily used in the initial individual assessment of candidates, playing a smaller role in the selection panel meeting. Panel members felt that the most useful measures of performance are normalised citation scores and the number or proportion of papers in the most highly cited X% (e.g. 5, 10%) for the field. Panel members expressed concerns around the comparability of bibliometrics between fields, but the discussion suggested this largely represents a lack of understanding of bibliometric techniques, confirming that effective background information is important. Based on the evidence around panel behaviour and concerns, we set out guidance around providing bibliometrics to research funding panels.
The present study extends prior analysis of the link between winning the Nobel Prize in Economic Sciences and winning the John Bates Clark Medal, arguably the two most prestigious awards in the discipline, by examining the length of time between bestowal of the two awards using a right-censored tobit model. In doing so, we find that an increase of one standard deviation in the impact of a Clark Medal winner's research portfolio is enough to reduce the time span between bestowal of these two honors by almost 10 years, an important consideration given that the more prestigious award, the Nobel Prize, is not conferred posthumously. Furthermore, our results also suggest that institutional affiliation is also important in reducing time between bestowals. In this regard, affiliation with any one of three institutions, namely Princeton, Stanford and Chicago, appears to significantly reduce this time span by 10 or more years.
The Ministry of Science and Technology (MOST) of China has set forth ambitious goals, as part of its Citation Impact Upgrading Plan (CIUP), to fortify the standing of Chinese academics as well as Chinese academic journals. At present, MOST primarily considers Clarivate Analytics journal impact factor (JIF), which is a proprietary scientometric measure, as a measure of "quality". Academic publishing is however, starting to move away from metrics such as the JIF that can be gamed, and that do not truly reflect the academic worth of individual scientists, or of journals. Metrics such as altmetrics, which show the paper's popularity among social media, or a greater balance of metrics, to buffer the monopolized impact of the JIF on metrics-based rewards systems, may be issues that China and MOST need to consider as global academic publishing tends towards a state of open science where open access journals that reach a wider audience may have greater value than journals with a high JIF. Not only are China's academics well-funded by the state, the Chinese academic market is a highly coveted market by publishers and other parties interested in advancing their academic or commercial interests. Given the current fluid and rapidly evolving state of academic publishing, and the fairly rigid JIF-based rewards system in place in China at the moment, coupled with a recent spate in academic misconduct from Chinese researchers, this letter offers some suggestions as to the need for China to rethink its policies regarding what factors influence academic rewards.
The excellence shift is proposed, which shows universities' ability to produce highly cited papers as measured against their basic academic research efficiency. To demonstrate our approach, we use data from 50 US universities.
The use of authoritative chemical resources by scientists is an important first step to finding reliable and credible information for supporting and validating research results. Given the vast number of commercially and freely available online resources used for searching chemical and physical information, the utility of the traditional ready-reference print resources such as the CRC Handbook of Chemistry and Physics (referred to as the "Rubber Bible") and The Merck Index (referred to as the "Chemist's Bible") may no longer be regarded suitable or useful for looking-up information and hence no longer required for purchase by academic institutions. To investigate this hypothesis, a study was undertaken to examine the usage and impact of these resources through citation in scholarly articles. The 'Cited Reference Search' from the Web of Science database is used to search, collect, and analyze article citations from the Science Citation Index Expanded to the CRC Handbook of Chemistry and Physics and The Merck Index, between the years 2002 and 2015 inclusive. The distribution of article citations to these chemical resources was analyzed by document type, research field, country, affiliation, and journal. The article citation yearly counts to these chemical resources were further compared to Wikipedia.
In this letter, we study an open participation model of peer review in which potential reviewers choose whether to review a manuscript, at a cost, without a formal invitation. The outcome is a compromise among the reviewers' recommendations. Here we show that the equilibrium number of reviewers in the public peer review is small, their recommendations are extreme, and the outcome is likely to be random when the compromise is the median of the reviewers' recommendations.
We investigate the coverage of Microsoft Academic (MA) just over a year after its re-launch. First, we provide a detailed comparison for the first author's record across the four major data sources: Google Scholar (GS), MA, Scopus and Web of Science (WoS) and show that for the most important academic publications, journal articles and books, GS and MA display very similar publication and citation coverage, leaving both Scopus and WoS far behind, especially in terms of citation counts. A second, large scale, comparison for 145 academics across the five main disciplinary areas confirms that citation coverage for GS and MA is quite similar for four of the five disciplines. MA citation coverage in the Humanities is still substantially lower than GS coverage, reflecting MA's lower coverage of non-journal publications. However, we shouldn't forget that MA coverage for the Humanities still dwarfs coverage for this discipline in Scopus and WoS. It would be desirable for other researchers to verify our findings with different samples before drawing a definitive conclusion about MA coverage. However, based on our current findings we suggest that, only one year after its re-launch, MA is rapidly become the data source of choice; it appears to be combining the comprehensive coverage across disciplines, displayed by GS, with the more structured approach to data presentation, typical of Scopus and WoS. The Phoenix seems to be ready to leave the nest, all set to start its life into an adulthood of research evaluation.