Utilizing Name Entity Recognition to Identify Unique Actors

by Jerome M. Hendricks

My dissertation work explores the actions of intermediary firms in periods of rapid technological change. As our economy has become increasingly geared toward knowledge sectors (Powell and Snellman 2004), intermediary firms take on an increasingly important role in establishing markets and developing consumer relationships with products and services. Using the independent record store as a case of an intermediary operating in a rapidly changing market, I argue that certain actors can collectively alter the symbolic meaning of goods and services to enable their survival. To make this argument, I have collected over 2400 music industry and media documents from 1992-2012 in order to track changes in strategy and field understanding over time. In a recent paper, I was able to establish changes in organization understandings empirically by comparing the discourse of independent record stores before and after drastic technological innovation. Since then, I have been investigating ways I might test these findings by tracking types of music retail firms and their strategies and understandings over the twenty-year period. In the discussion that follows, I will share my experience looking to computer science technologies for new ways to identify and track actors and ideas in large data sets.

Early coding of my data utilized an ethnographic content analysis (ECA) approach through Atlas.ti, CAQDAS. ECA allows for the quantitative emphasis of structured data collection in association with descriptive information which informs the context in which meanings emerge (Altheide and Schneider 2013). I found this approach very useful and I could conceivably use it to answer other questions I have. However, from a practical standpoint, large data sets like mine take a long time to code and analyze this way. On the advice of a committee member of mine, Dr. John Mohr, I looked to new ways that social scientists have incorporated data mining software to extract meanings from large data sets. I was immediately intrigued by the growing body of literature that utilizes topic modeling as procedure for coding text into meaningful categories of word clusters associated within and across documents. For examples and analysis of this approach, refer to the special issue on topic models and cultural sciences in Poetics found here. While this approach to extracting “topics” would appear to offer a compelling way to uncover the strategies and understandings I am interested in, without isolating the types of actors associated with each topic, my unit of analysis shifts from the types of organizations to the data sources themselves.

In the same issue of Poetics mentioned above, Mohr and Bogdanov (2013) point to other compatible data mining strategies that can be combined with topic modeling to obtain an even closer view of meanings in texts. Specifically, Mohr and colleagues (2013) analyze the discursive style of the state utilizing a series of natural language processing (NLP), semantic parsing, and topic modeling procedures. This approach allows the authors to identify significant actors, determine their actions in texts, and consider the context in which these actions take place. A subfield of NLP called name entity recognition (NER) offers the most promise for identifying different people, places, organizations, and other miscellaneous human artifacts in texts. Because these tools require expertise in computer programming, I contacted Dr. Dan Roth at the University of Illinois at Urbana-Champaign to inquire further about NER and its applications. From his demo page, you can try a variety of different NLP procedures on sample texts including the NER tagger that will be the central focus of the remainder of this discussion.

Before discussing the specific approach that Dr. Roth, his assistant Chase Duncan, and I have taken, it is important to consider a few challenges with the NER tagger due to the unique nature of my data set. First, while my data set is large, it isn’t particularly massive. Currently, these tools are better suited for open exploration of hundreds of thousands of documents. So while my practical concerns over time management and the size of my data set are real, my data set is rather small relative to the concerns of computer scientists. With a data set in the thousands, accuracy becomes central as there is much less margin for error. In other words, we’ll simply have fewer opportunities to capture target organizations. This leads to a second concern; independent record stores are a somewhat unique entity and can be easily overlooked or misclassified by the NER tagger. Consider the Mohr et al. (2013) paper discussed above, the entities of interest are relatively well-known (multi-national organizations, nations, geo-political entities, and so on) and can be easily verified. Rather than finding references to the “United States” or “Afghanistan”, we are looking for “Dave’s Records” or “Bucket O’Blood Books & Records.” While both procedures require a certain amount of programming the tagger to improve its performance, I am unaware of a complete historical record of independent record stores that can be utilized for training purposes. More information on training an NER tagger for social science purposes can be found here.

Despite these challenges, our team is confident that we can train the NER tagger to perform at a high level despite the somewhat unique entities we aim to identify. The first step in training the tagger requires a separate data set that includes a variety of unique and standard music store names (from “Permanent Records” to “Musicland” to “Best Buy”) and mirrors the “messiness” of media data like links to other news stories, advertisements, and so on. To date, I have compiled 100 articles not used in the original data set for testing the NER tagger. To assist the tagger in identifying stores, we will incorporate Dr. Roth’s Wikifier tool which utilizes Wikipedia as an authoritative source for resolving identities. While many small stores will not be listed on Wikipedia, this will help us increase the accuracy of identifying large chain retailers and popular independent record stores throughout our data. As Dr. Roth and his colleagues have noted previously (Godby et al. 2009) other authority sources have the potential of increasing the effectiveness of resolving identity issues. To this end, utilizing various online databases of independent record stores (e.g. recordstoreday.com, vinylhunt.com, or goingthruvinyl.com) may also be useful in training the NER tagger. Once the modified version of the NER tagger is complete, we will be able to test our trained tagger on this separate data set and compare our results with human classification to assess accuracy and prepare our tool for the original “large” data set.

While it is entirely likely that my research will utilize some of the data mining software tools already familiar to social science research, our attempts to adapt the NER tagger to unique actors has significant implications for content analysis in social science research. In terms of data set size, our ability to train the NER tool more specifically will provide the required level of accuracy for smaller projects only attainable through manual coding procedures. In conjunction with other data mining procedures, such accuracy can allow for hypothesis testing as well as exploratory work. By standardizing these tagging procedures and training processes, the transferability among similar situations may suggest some generalizability of results. And, in light of the cooperative efforts that have brought us this far, prospects for software packages that are more accessible to social scientists, not unlike many topic modeling packages, also seem possible. Though these implications may be little more than conjecture on my part at this point, the prospects for developing procedures that contribute to new approaches to content analysis are exciting. I look forward to reporting the testing results as they become available and assessing the possibilities of NER tagging when actor identities are relatively unique.

References

Altheide, David L., and Christopher J. Schneider. 2012. Qualitative Media Analysis. Second Edition edition. Los Angeles: SAGE Publications, Inc.

Godby, Carol Jean, Patricia Hswe, Larry Jackson, Judith Klavans, Lev Ratinov, and Dan Roth. 2010. “Who’s Who in Your Digital Collection: Developing a Tool for Name Disambiguation and Identity Resolution.” Journal of the Chicago Colloquium on Digital Humanities and Computer Science 1 (2).

Mohr, John W., and Petko Bogdanov. 2013. “Introduction—Topic Models: What They Are and Why They Matter.” Poetics 41 (6). Topic Models and the Cultural Sciences: 545–69.

John W Mohr, Robin Wagner-Pacifici. Ronald L. Breiger, Petko Bogdanov. 2013. “Graphing the Grammar of Motives in National Security Strategies: Cultural Interpretation, Automated Text Analysis and the Drama of Global Politics.” Poetics 41 (6). Topic Models and the Cultural Sciences: 670-700.

Powell Walter W., and Kaisa Snellman. The Knowledge Economy. Annual Review of Sociology,. 2004;30:199-220.

Advertisements
Posted in method, org soc, tech | Leave a comment

Tactics, causes, and theory: Who gets to define what counts as “activism”?

by Carla Ilten

Recently, I gave a presentation at Theorizing the Web 14 in New York. It was entitled: “Activism 2.0: The Politics and Business of Platforms Built for Social Change.” When the presentation got sorted into a panel called “(Ref)user: Movements of Resistance,” I knew I would have to do some good justification work. The audience would expect stories of grassroots uprising, of feminist resistance and of e-bandits (as my co-panelists offered); hopefully heroic, but certainly not for the dark side of the force! The cases I presented, Sparked and Kiva, two platforms that organize micro-volunteering and micro-lending, fall short of heroic, high-stakes activism. No, worse: what they organize in an extremely efficient, online-only way, rubs “social change” the wrong way for some of us. Yet, as I argued, the two platforms exhibit what newer social movement theory on online activism (Earl/Kimport 2011) would consider the most cutting-edge “internet-enabled” organization of participation – low cost, high outcome mobilization! Not least, the people who engage in these micro-actions feel that they are making a difference of some sort. I tried to highlight this finding – I will call it a contradiction for want of a more nuanced analysis at this point – as the starting point of a research journey. Not completely successfully, as the following exchange occurred on twitter after my talk:

twitter_convo_carla_ttw

As a political and interested human being, of course I have (strong!) preferences with regard to causes and visions – I make normative judgments about what activism I find legitimate and desirable. As a sociologist, though, I can and must analytically distinguish between activism and causes, or between movement tactics and movement goals. What is more, when the people whose activities I observe use those terms to describe what they’re doing, then I have to take them seriously, all the while making my own, possibly counter- analysis. That is indeed what I presented in my talk: a project that started out looking for activism online, found something that looked similar but different, and concluded that it was not the wrong case, but… a case of what? This is where things get interesting.

Posted in conferences, soc mov, tech, Uncategorized | Tagged | Leave a comment

Inspired or Civic: The Value of Art in Various Settings

by Michael De Anda Muñiz

This past week I was able to attend a 96 Acres event (96 Acres is an art project on the Cook County Jail) at the Museum of Contemporary Art, a Latina artist’s job talk for a university position, and a steering committee meeting for 96 Acres. It was a week that exemplified my main interest in Latina social justice artists. In one week, Latina artists interacted with a major art institution, an academic institution, and community members in separate settings. Each brought them into contact with very different audiences, and in each setting 96 Acres was one of, if not the, main focus.

The event at the museum was “an intervention”. This means that the museum had all its galleries open, and individuals involved in 96 Acres did performance art throughout the museum. Therefore, visitors were able to see the museum’s collections but were also confronted by performance pieces throughout the night. For example, as visitors looked at paintings, performers would engage with them in various ways, such as spoken word poetry or by repeating phrases and words. Additionally, as visitors moved throughout the space, they had to contend with the bodies of performers. Some laid on the stairs repeating words and phrases. Others stood in the middle of a gallery. The university job talk involved a Latina artist presenting her artwork and her pedagogical philosophy to an art department. She shared information about her personal background, her artistic approaches and themes, her pedagogical approaches, and her current work. Then, she took questions from current faculty. Lastly, I attended the steering committee meeting for 96 Acres. The lead artist, a graduate student, a high school art teacher, and two high school students attended this meeting. The central topic of the meeting was discussing possible summer projects and installations.

Value and worth are among the major concepts that came up during my observations. In the various settings, individuals had different understandings of the value and worth of artwork. At the museum, visitors had several reactions to the performances. Some visitors were uncomfortable and confused about what 96 Acres was about. Others saw the performances as nothing beyond the performance but did not seem to connect it to critiques of the criminal legal system and incarceration. The audience at the job talk seemed more interested in the social critiques behind the art. They asked a lot of questions about how the art connects to social issues, the philosophy that motivates the art, and art as a educational tool. The steering committee focused on the process behind the projects. They asked who should be involved in the planning of projects, what form they should take, what topics they should cover, and where they should be located.

In all of these cases, actors judged art and 96 Acres using different “orders of worth” (Boltanski and Thévenot 1999). Some museum visitors judged the project on an inspired order of worth. 96 Acres was only worth their time and money if they felt it was creative and incited some emotional response. Steering committee members and job talk audience members used a civic order of worth. They judged 96 Acres based on its commitment to equality and justice. 96 Acres is only worth their time if it results in projects that are aimed at bringing about positive changes for those affected by the jail and criminal legal system. Interestingly, even those using the same order of worth – civic – had some disagreement about what “justice” and “equality” looked like and how to get there. Latina artists must deal with all these competing orders of worth. They must maintain the cooperation and support of actors with conflicting orders of worth. This is one of the more interesting parts of Latina social justice artists’ work.

Posted in art, econ soc | Leave a comment

American Federation of Teachers Conference Highlights Commercialization of Higher Ed

by Lydia Hou

The American Federation of Teachers 2014 Conference was held April 11-13 in Baltimore Maryland, offering numerous panels and keynote speakers exploring concerns and advancements of higher education in the United States. Some notable individuals who presented research and policy work at the conference included: Charlie Eaton of Berkeley, Erica Smiley of Jobs with Justice, Sara Goldrick-Rab of University of Wisconsin-Madison, and Tressie McMillan Cottom of Emory University, among many others – all of whom were addressing core themes of the AFT conference – including various levels of examination of the commercialization of higher education. I attended this conference as a representative of the University of Illinois at Chicago’s Graduate Employee’s Organization IFT/AFT Local 6297 (AFL-CIO). Some of these considerations included strategies to uncover the influence of Wall Street on institutions of higher education, the impact of for-profit universities on transitioning structures of traditional colleges and universities, and the influence of corporate industry on undergraduate and graduate students’ experiences with debt.

One particular theme included the transition of organizational structure in colleges and universities to incorporate corporatized efficiency as much as possible – less faculty would have influence over university decisions or policy, profit would be a signal of success, and curriculum would be transferred online as much as possible in order to take the “human” aspect out of teaching due to the costs (monetary, timely, and otherwise) that are attached to those who instruct college students. Many attested to the fact that corporate companies who sponsor various features of universities – banks, sports, etc. – are impacting the quality of education due to increases in prioritization of profits.

Many important questions are being raised in the context of commercializing higher education – perhaps most importantly, is higher education a public or private good and can it be considered a human right? As we look to the future of higher education and the importance of providing opportunities for mobility and equality among future generations, one must question the transition of education to a corporate structure. Higher education is now necessary in the same way that primary education is viewed, needing equivalent structural support that it now seeks through corporate sponsorship. Moving forward to a future of more corporate influence and higher student debt on college and university campuses has the potential to increase disparities as fewer and fewer individuals are able to access higher education that should be viewed as a human right rather than a social privilege.

Posted in econ soc, Uncategorized | Leave a comment

Cultural Patrimony versus Monetary Debt: What are “Public Goods” Worth?

by Claire Smith

What is the value of art?  Does it have different value when it is under public ownership?  Art is something that defies standard valuation metrics of commodities as it holds value beyond a market-driven price.  Symbolic meaning is a fundamental aspect of prices, particularly in the case of such things as art where price as evaluation metric is taboo; art has meaning and value beyond its monetary worth.  When art is held publically, there is an additional dimension of value – that of the public good.  What is the public good and what is it worth?  Who decides the answers to this question?  The bankruptcy of Detroit and the potential sale of the Detroit Institute of Arts collection illustrate the ambiguity and tension in determining the value of a collection of art.  At stake is the dire financial situation of the city, the creditors, the pensioners, art lovers, donors of the art, and the public good, and they all have unique perspectives and stakes in the final outcome of the debate over the fate of the DIA collection.  The incommensurability of the value of cultural patrimony and the monetary value of the debt leads to a lack of coordination of valuations; there is no single quality by which you can compare the value of paying down Detroit’s debt and the value the art holds for the collective good of the people of Detroit.  How do you compare the pensioner’s interests in receiving the pension they were promised to the symbolic value of public art and the cultural heritage of a city?  The process by which that decision is made begs questions pertaining to what exactly is the “common good” and how are relations of power and inequality manifested through such decisions?   Whose art is this and what is its worth? Is it worth paying down the debt?  Is it worth boosting the financial stability of the pensioners?  Is it valuable because of its potential to foster economic revitalization, or is it valuable for something more such as a symbolic, commonly held, unpriceable good?  The relational nature of how these decisions are made and how value is assigned and negotiated are important questions and Detroit serves as an illustrative example because so much is at stake – a renowned art collection with historical roots to the city, the urban distresses of the contemporary city, and the complex interconnectedness between them.

Posted in art, econ soc, Uncategorized | Leave a comment

Chicago Ethnography Conference 2014: elite parties, youth empowerment, and lots of economic relational work

by Carla Ilten

The Organizational Dynamics course we are currently working through at UIC has introduced me to a mind-blowing range of theories around value, both economic and social, and relationality. Not surprisingly, this is the lens through which I saw just about everything I encountered at the 16th Annual Chicago Ethnography Conference that was held at Northwestern University this past Saturday, March 15.

In the first keynote, Nina Eliasoph talked about “Rendering Invisible Dilemmas Visible.” In her ethnographic study of organizations whose goal it is to empower disadvantaged youth, she discovered that actors had to juggle different values and identities depending on whom they faced. The dilemma was particularly salient for the needy, to-be-empowered youth who had to perform both categories at the same time: when funding was applied for, youth were presented as needy. When awards were received for engagement, youth were (and requested to be) presented as empowered. The simultaneity of “problem” and “solution” was solved through the earmarking of monies (funding versus awards) as well as through the relational (identity) work of the involved youth. The teenagers oscillated between object and subject status in this relation with the environment: either as objects of the organizations’ work, or as empowered, entrepreneurial agents in volunteering.

The second keynote was fascinatingly entertaining – Ashley Mears picked her empirical field wisely and apparently had quite a bit of fun while analyzing elite “bottle parties” all over the world. The theoretical fruits of studying the performance circuits that underlie the organization of those parties are no less exciting. In what she calls a “relational approach to ownership,” Mears shows how “Party Girls” not only have bodily capital, as the Bourdieusian sociology would have it, but circulate as bodily (partily?) capital enjoyed by super-rich men in the process of elite conspicuous consumption. Girl capital is administered by a group of intermediaries, who also translate the exchange from monetary (the club-intermediary link) to in-kind (the intermediary-Girl link). This translation helps actors maintain the precarious boundary between this specific form of performance – let’s call it party work – from taboo sex work. Scrumptious dinners, drinks and fun are acceptable compensation for supplying feminine decoration, whereas hard cold money would defile the actors in the exchange. Again, actors walk the tightrope of objectification as well: Party Girls are circulated as embodied capital for someone else’s profit, but in order to be legitimate, the whole operation requires Girls to perform as voluntarily partying subjects.

Trades involving bodies require much relational work and the drawing of boundaries through distinct practices. Whether needy youth become the capital of an empowerment organization, or model-like girls the capital of elite parties, a great deal of fine tuning is required to navigate the morality of these economic exchanges.

Posted in conferences, econ soc, Uncategorized | Leave a comment

XXXIV Sunbelt Conference: Picking up new methods on the beach

by Tünde Cserpes

The perks of academia: the XXXIV Sunbelt Social Networks Conference of the International Network for Social Network Analysis (INSNA) made me leave the Chicago winter to spend a week in sunny St Pete Beach, Florida. The conference organizers even went to such lengths to encourage us to take a break during during lunch time and “take a swim, go wind surfing, or do other sunbelty things”. This was my second time presenting at this conference. Last year we were in Hamburg, Germany, under much less welcoming weather conditions.

 Sunbelt is a relatively long conference: it ran from Tuesday through Sunday. Besides having regular and poster sessions, the organizers made sure that there were opportunities to pick up new analytic skills (hence the workshop sessions stretching up to one and a half days long) and to socialize. Jeffrey C. Johnson from Easy Carolina University delivered the keynote speech on Thursday. Each evening, a hospitality suite facilitated the mingling of participants.

 I am sure there are many ways in which people go about participating in conferences. Personally, here is my approach: I always skim the whole program ahead of time, circling presentations I am interested in. I also e-mail those scholars with whom I really want to meet and have questions to ask. This, year I met some fellow graduate students and familiar professors as well. I think it is nice to make arrangements beforehand because, in my experience, I am often busy with other things during the conference.

 The “short” program for this year’s Sunbelt was about 30 pages long. Sunbelt has a nice tradition which encourages ‘session hopping’. After listening to a presentation in a panel, you can leave the room and go to another panel. The drawback is that your 20 minutes presenting time remains even if there is someone missing from the program. There was a great variety of sessions, ranging from the deepest technicalities of using ERGMs, stochastic actor-network modeling, and treating missing network data to mixed-methods and historical approaches. Moreover, there were also panels on substantive issues such as political structures, organizational dynamics, market formations, and collective action. I have also seen some great examples of mixing cultural sociology with text analytic methods. Apparently, natural language processing has set its foot on social network land as well.

 The multi-disciplinary nature of Sunbelt does teach you patience and open-mindedness. You will eventually run across presentations, which – although in a statistically sophisticated way but – conclude that the social factor is important (Ta dah!). On the other hand, I have seen presentations which made statisticians’ hair stand up on the back of their neck when the social scientist used OLS regression to analyze interdependent data. Overall, Sunbelt is a great place to pick up new methods, find allies from other fields with whom you can collaborate in the future, and expand your theoretical horizons in ways you might never have dreamed of. Sunbelt is the antithesis of the idea that social network analysis is a purely methodological field. Methodology is indeed an important aspect, but from what I have seen so far, those who remain at the center of the field are able to interrogate their data using these new analytic techniques and, in addition, make a theoretically interesting contribution to their field of study.

 Ps: at the conference you had a chance to buy a “Can’t we all just get along” t-shirt featuring the co-citation network of social networks and network science. If you want to replicate the results, no problem: the corresponding data file is on the flash drive you got at the registration.

net

Can’t we all just get along

Posted in conferences, SNA, Uncategorized | Tagged | Leave a comment