Semantic plugin: AlchemyAPI

A new plugin is available for Gephi that utilizes the power of natural language processing (NLP) software to analyze text documents and visualize their contents. The plug-in was created by AlchemyAPI (alchemyapi.com), and utilizes the AlchemyAPI REST service to semantically process a web page or text file and show all the subjects of the text (people, places and things, known collectively as named entities) as nodes in Gephi.

 

Image

Graph of the American Revolution wikipedia entry.

The plug-in is a powerful tool to distill dense and unstructured textual data into easy to understand graphs. Extracted entities possess a relevance attribute which is a measure of how pertinent the subject is to the source text, and also a count attribute that indicates the number of times the subject is named in the source text. Both of these attributes can be used to affect the visualization.

Once installed, the plug-in can be accessed through the File->Generate->Semantic Analysis menu. As an example of the functionality of the plug-in, we’ll examine the wikipedia entry for the American Revolution. To make a graph with this article, enter the article’s url into the Semantic Analysis dialog box. The plug-in will extract over 350 people, places, and things from the wikipedia page. You can use this data to create a word cloud type visualization of the article, like the one above.

If subtype analysis is enabled, you can also visualize the types and subtypes of named entities. For example, the nodes in the image below were extracted from a recent news article. They represent Dmitry Medvedev and his ontological classifications. The edges from Medvedev’s node identify him as a Person, Politician, and President (classifications he shares with Mahmoud Ahmadinejad). A complete list of the subtypes AlchemyAPI returns can be found at http://www.alchemyapi.com/api/entity/types.html.

Image

Detail of named entity subtypes

The plug-in can also be used to visualize the connections between multiple text documents. Connections will be drawn between the document node and the entities that the texts share, creating a powerful way of discovering recurring themes within an archive. As an example, see the connections shared between the wikipedia pages for the American Revolution and the French Revolution in the picture below. Common entities like ‘France’, ‘Britain’, and ‘Thomas Paine’ are linked by both the French Revolution and American Revolution articles.

Image

Graph of connections between American and French Revolution wikipedia entries.

As more documents are added to the graph, a web of entities form. The relevance and count of connected entities increase with the number of documents that mention them.

We hope you use this plug-in to make the data in your text more accessible. If you have any questions or suggestions for the makers of this plug-in, please leave them in the comments section.

Our thanks to the Gephi team for their remarkable visualization program, and all the documentation and help that made this plug-in possible.
/seadragon-samples/espn_out_2/seadragon.html
Graph of espn.com front page and linked articles.

Shaun Roach

Download the Gephi plugin for AlchemyAPI here, or find it in your Gephi plug-in center.

Graph visualization on the web with Gephi and Seadragon

The project takes another big step forward and bring dynamic graph exploration on the web in one click from Gephi with the Seadragon Web Export plugin.

Mathieu Bastian and Julian Bilcke worked on a Seadragon export plugin. Directly export large graph pictures and put it on the web. Seadragon is pure Javascript and works on all modern browsers. As it uses images tiles (like Google Maps), there is no graph size limit.

ImageGo to your Gephi installation and then to the Plugin Center (Tools > Plugin) to install the plugin. You can also download manually the plugin archive or get the source code.

/seadragon-samples/diseasome/seadragon.html

Sample with Diseasome Network dataset directly exported from Gephi

Communication about (large) graphs is currently limited because it’s not easy to put them on the web. Graph visualization has very much same aims as other types of visualization and need powerful web support. It’s a long time we are thinking about the best way to do this and found that there is no perfect solution. We need in the same time efficiency, interactivity and portability. The simpleness of making and hacking the system is also important, as we want developers to be able to improve it easily.

By comparing technologies we found that Seadragon is the best short-term solution, with minimum efforts and maximum results. It has however still a serious limitation: interactivity. No search and no click on nodes are possible for the moment. But as it is JS, I don’t see hurdles to add these features in the future, help needed.

The table below see our conclusions on technologies we are considering. We are very much eager to discuss it on the forum. As performance is the most important demand, WebGL is a serious candidate but development would require time and resources. We plan to start a WebGL visualization engine prototype next summer, for Google Summer of Code 2011, but we would like to discuss specifications with anyone interested and make this together.

Portability Efficiency Effort Interactivity
Flash
Java2D/Processing
Canvas (Processing.js/RaphaelJS)
WebGL
Seadragon
Figure: Comparing technologies able to display networks on the web.

How to use the plugin?

Install the plugin from Gephi, “Tools > Plugin” and find Seadragon Web Export. After restarting Gephi, the plugin is installed in the export menu. Load a sample network and try the plugin. Go to the Preview tab to configure the rendering settings like colors, labels and edges.

Image

Export directly from Gephi Export menu

The settings asks for a valid directory where to export the files and the size of the canvas. Bigger is the canvas, more you can zoom in, but it takes longer time to generate and to load.

Image

Export settings, configure the size of the image

Note that result on the local hard-drive can’t be viewed with Chrome, due to a bug. Run Chrome with “–allow-file-access-from-files” option to make it work.

Kudos to Microsoft Live Labs for this great library, released in Ms-PL open source license. Thank you to Franck Cuny for the CPAN Explorer project that inspired this plugin. Other interesting projects are GEXF Explorer, a Flash-based dynamic widget and gexf4js, load GEXF files into Protovis.

GSoC 2010 mid-term: Direct Social Networks Import

Yi Du

During this summer, six students are working on Gephi with the Google Summer of Code. They contribute to Gephi by developing new features that will be integrated in the 0.8 version, released later this year.

Yi Du is adding the module Direct Social Networks Import during this summer, which provides several kinds of importers like Emails, Twitter or Facebook. The goal of this article is to briefly introduce some of the importers, as well as several samples provided.

The ability to import any kind of structured data and build network from it is essential for users. This step is often missing and requires time and scripting abilities, although tools and libraries able to read and parse all type of data already exist. Moreover it has never been so easy to quickly access meaningful datasets online.

Email importer

Email is a simple and widely used tool in communication among people, yet many people have no knowledge of its mechanism. To some extent, our work on analyzing emails can help people better know their relationship with others. In our email importer module, each email address is represented as a node. If there are two email addresses with the same display name, an option will be provided to allow the user to determine whether to regard them as a node or two different nodes. Afterwards, if there is an email from A to B, an edge will be built, along with an option permitting the user to determine whether Cc or Bcc will be viewed as an edge.

We provide two ways to import emails: on the one hand, the emails are obtained from the email server (POP3 or IMAP), in a one-by-one manner. On the other hand, we get the emails from local files or folder. This importer will arise a problem, that is, different email clients may have different file format. Fortunately, our importer has an easy-to-extend API, as well as a default implementation (EML files). EML is standard and can be obtained from Thunderbird, Outlook and Gmail with tools like Gmail Backup.

This is a sample to illustrate how email importer outputs the data (2000 emails with EGO filter, 700 nodes, 1300 edges).
fig1a_The_EGO_graph fig1b_Graph_whose_indegrees_bigger_than_0
fig1c_Modularize_the_graph fig1d_Subgraph_who_has_the_max_number_of_Modularity_count
fig1e_The_hottest_group

Twitter importers

Twitter is a very popular social network. People can send and receive short messages, which we usually call tweets, using Twitter. We can follow person we are interest in and topics we like. Twitter networks has been popularized by NodeXL which has a similar feature. See this nice gallery.

We provide two kinds of networks: “Twitter Search Network” and “Twitter User Network”.

We support Twitter search network to analyze people who search or mention similar keywords. We present one Twitter user as a node and define three kinds of edge construction:

  • Replies-to relationship: If A reply to B in a searched tweet, an edge from A to B will be added.
  • Mentions relationship: If A mentions B in a searched tweet, an edge from A to B will be added.
  • Followers relationship: If A follows B in constructed graph, an edge from A to B will be added.

The second network we provide is “twitter user network”. We analyze people who follow each other to show the relationships between twitter users. We add an edge from A to B if A follows B in the whole graph by default. We provide three options for vertex construction:

  • Person followed by the user: If searched user A follows B, B will be added as a vertex.
  • Person following the user: If A follows searched user B, A will be added as a vertex.
  • Both: Both the above two options.

The interface of the two importers are shown as below.
fig2a_User_network_importer_ui fig2b_Search_network_importer_ui

New-York Times importer

The New York Times is an American daily newspaper founded and continuously published in New York City. It has a series of APIs for developers on news and social networks. There are several APIs of NYT, such as Article Search API, Best Seller API, etc.

We provide two kinds of social network importers in Gephi: “Article Network” and “TimesPeople Network”. We use article network to analyze articles with specific filters (date, facets, etc). User can choose which option constructs the edge. For example, user can choose date as the edge. If two articles have the same date attribute, an edge between them will be built. TimesPeople is a social network for Times readers, it’s similar to Facebook, we can analyze the relationship between them.

Interface of NYT article network import and TimesPeople network are shown below:
fig3a_NYT_article_network_importer_ui fig3b_NYT_timespeople_network_importer_ui

Display of TimesPeople network:
Display of TimesPeople network Display of TimesPeople network
Display of TimesPeople network

Conclusion and future work

In this article, we introduced several importers: Email, Twitter and NYT. By using these importers, users can import data they want and analyze them. They can find the hottest group, the relationship of their friends, the most related author of a facet and other import information by analyzing them.
Until the end of the GSoC, we will have four major importers: Email, Twitter, NYT and Facebook. Among these four importers, Twitter will have “Twitter User Network” and “Twitter Search Network”. NYT will have “NYT article search network” and “NYT TimesPeople Network”. Facebook will have “Facebook Friends Network” and “Facebook Group Network”. Besides adding Facebook importer, we will also optimizing the UI of the importers, and make them more user friendly.

Yi Du

Map Geocoded data with Gephi

ImageThe mixture between network and geographic data has a fantastic potential and didn’t completely reveals its power yet. Alexis Jacomy, a student member of the Gephi community just released a new Plugin named GeoLayout, which aims to bridge this gap. Congratulations!

The Plugin use latitude/longitude coordinates to set correct nodes position on the network. Several projections are available, including Mercator which is used by Google Maps and other online services.

The Plugin is available from Gephi Plugin Center. The author is looking for feedbacks, please visit the plugin page.

I wanted to try with the classical USA Airline Routes network dataset, and detail the experience.

Install Plugin

In Gephi, go to the Tools menu and then Plugins. In the Available Plugins tab check the GeoLayout and click on Install. The plugin is installed and you are asked to reboot Gephi. Click OK.

Image

Open Dataset

Download the airlines-sample.gexf (Save As…) dataset and open it with Gephi.

The network is an undirected graph with 235 nodes and 1297 edges. For each node there are two additional data latitude and longitude, expressed in degrees.

You should see the graph opened like this.

Image

Use GeoLayout

Go to the Layout module and choose Geo Layout in the list. And then just click the Run button.

Image Image Image

Result

You can see the result immediately. Analysis and aesthetics refinement can be done now. Please visit the Quick Start Tutorial for a step by step introduction to Gephi.

Image