User Guide for the Patent Co-Inventor Network Visualization Tool

Purpose

The purpose of the Patent Co-Inventor Network Visualization Tool is to visualize a particular type of connection among inventors: those who have filed a patent together, and therefore have worked closely together on at least one project. There are a number of reasons this could be useful:

  • You might want to know which companies were major players in a technological field, in an intuitive, visual way.
  • You might start with an inventor and want to know who she collaborated with at an important juncture in her career. Was she a loner? Immersed in a dense web of inventors working on similar topics? At the periphery of the field, or a key figure connecting several networks?
  • You might be curious about ways that a field (or inventor's career) changed over time. Comparing several social networks representing different periods can visualize these changes.

The goal is to be able to see around a given technological field in an intuitive way, quickly and easily.




Getting Started

Inventor Name search will look up the inventor IDs from our database for any given name. This will erase the default values the first time you do it. Any additional times you use the search, it will simply add the new search results onto the old ones.

Inventors will likely have multiple IDs, because our algorithm tries to keep track of separate inventors with identical names. If it looks like this search has found the wrong person, simply remove the unwanted inventor ID from the comma-separated values box before pressing 'Generate.'

Patent Class Search (Sampled) finds everyone patenting in a particular technology class (within the chosen year), using US Patent and Trademark Office technology classifications. These can be found at http://www.uspto.gov/web/patents/classification/ or via the Classification section of a Google Patent page. Then, because this will be a huge number of patents for many fields, it chooses a random sampling of these inventors. The random sampling then serves as the starting point, and sets off finding co-inventors.

Patent Sub-Class Search finds everyone patenting in a particular technology sub-class. This also will often create a very large diagram, so be sure to limit the time frame as much as possible.

Each of these options is described in more detail below.




The Data

The data underlying this visualization is the Fung Institute for Engineer Leadership's patent database, which draws and processes data from the United States Patent and Trademark Office (USPTO).

The data covers 1976 to the present. Unfortunately, before that year the USPTO did not require the same type of data from patent applicants, so this is the current standard. Google Patents goes back further, but even its earlier data relies upon scanned documents and is much less usable further back in time.

The Fung Institute dataset has been processed and disambiguated, so in theory, each individual inventor should have one unique ID assigned. This means you can differentiate John Smith from Wisconsin and John Smith from Arkansas as two separate people, but still know that John Smith from Wisconsin is the same person after he moves to Nebraska. In practice, this process is still imperfect, but it nevertheless makes this data set uniquely powerful at tracking the movement of inventors.

A full, technical description of the data can be found at this following page: Description of Data

Bulk downloads of this data for research purposes are also possible: Bulk Downloads


The Social Network Diagrams

How it Works

The tool, in theory, can map any number of layers of co-inventors' co-inventors' co-inventors' co-inventors' ....  In practice, we limit to 3 generations, as described below, because the results get unwieldy quickly.

We start from an initial set of inventors, using the unique inventor IDs in our database. (In practice, each inventor may have multiple IDs, since names are not always uniform in the database, despite efforts to disambiguate the results).
We will call these people the "seed inventors."

When you first load the page, there are default seed inventor IDs already entered.

 

One "generation" (co-inventors):

The tool finds all patents filed by the seed inventors within the time frame entered. For example, John Doe might have filed for patents 1002043 ("Better mousetrap") and 43421110 ("Best mousetrap") during this window. If the inventors listed on "Best moustrap" are John Doe, Thomas Jefferson, and Albert Einstein, the network will create three connections:

John Doe: Thomas Jefferson
John Doe: Albert Einstein
Thomas Jefferson: Albert Einstein

It repeats this process for each patent. Note that it does not (at this point) care whether Jefferson and Einstein co-invented on one patent or one hundred.

The following diagram consists of 7 patents filed by a total of 13 inventors:

You can see everyone with whom our seed inventor (Chenming Hu) filed a patent as a co-inventor in this window (1998-2000), and also the relationships among these other patentees.
On one patent, for example, the inventors were Chenming Hu, Nathan W Cheung, and Xiang Lu.


Two "generations" (co-co-inventors):

At two generations, we take things a step further. Now the tool treats all of the names it found in Step 1 above (one "generation") as seed inventors, and runs through the process again. It finds all of the patents they applied for in the chosen time frame, then finds all inventors affiliated with each of those patents.

The initial picture is still the same, but expanded out another degree of co-inventorship.

We're beginning to see that there were entire networks of co-inventors out there, but that they have very little in common – other than their affiliation with Chenming Hu, our seed inventor.


 

Three "generations" (co-co-co-inventors):

By the time we get to three generations, the number of inventors being mapped can get very large indeed.

 




Interpreting the Network Diagrams

The size of the dots representing an inventor are proportional to the number of times that the inventor's patents have been cited by other patents as "prior art." Thus, an inventor who had 10 patents, each of them cited by 10 different future patents, will be the same size as an inventor with one patent cited by 100 others. By far, the most common number of future cites a patent will receive is 0.

The color of the dots is determined by the assignee on his most recent patent (within the chosen dates).

The 10 assignees represented most often in the final data are automatically assigned different colors. Each inventor's dot is then assigned the color of this most recent assignee, as a proxy for affiliation.

The network treats each inventor like a charged particle, repelling all others to which it is not linked. That means that it can take some time for the network to sort itself out into a coherent picture.

If it seems that the image is unnecessarily complex, you can "untangle" the results manually. To do this, simply use your mouse to drag a specific inventor in the direction of your choice. Connected inventors will be drawn along.




Using the Tool

Option 1: Search by Inventor Name

Ignore the default settings if you wish to create a map for a new inventor (or inventors). Simply enter a first name and last name into the "Search by Inventor Name" box and press 'Search.'

The first time you search for a name, it will replace the defaults. Any subsequent names you search for will be ADDED to earlier results. Thus, if you want to create a network diagram using three inventors as the seed inventors, simply search for them one after another.

When you search for a name, the tool will do its best to find all possible variations on the name (ie, both Chen Ming and Chenming). If one of the names it finds seems incorrect, just delete the corresponding Inventor ID from that box before you generate the social network.

Once you're ready, press "Generate" and you're set!

Option 2: Patent Class Search

See: What do you mean by "patent class"?

Since mapping every inventor of every patent in a given class would be overwhelming (both in computing resources to render and human ability to interpret the result), we must find ways to cut the size down to a reasonable level. (See: My visualization didn't load! It got halfway through and stopped.). This mode offers one form of compromise between the "big picture" of a technology class and the constraints of effective visualization.

It takes a random sampling of 10 samples per month (120 for a year) for a chosen year within a chosen patent class, then uses the inventors on these patents as the initial (or "seed") inventors for the visualization.

The end result, then, will vary somewhat each time you press "Generate." We encourage users to run the same settings 2-3 times to see if it you get more or less consistent results. Depending on the density of patenting within that patent class, how often inventors move between firms, and the average size of patenting teams in the industry, you might find that you get very disconnected results, or you might find a tight web. This tool is likely most useful for showing change in an industry across time, or giving a rough snap-shot, rather than a definitive picture of an industry.

Option 3: Search by Patent Sub-class

See: What do you mean by "patent class"?

Like with the sampled patent class search, this is an alternate way of keeping the amount of data to a reasonable level while giving a meaningful broader perspective. In this case, instead of looking across an entire patent class, it looks across every inventors filing in a patent sub-class. Sub-classes are quite specific, but we can see every patent in that area within a given time frame.

As you will notice, this usually generates many seed inventors, sometimes hundreds even for a given year. This is why we recommend only 1-2 generations and a short time frame. (See: My visualization didn't load! It got halfway through and stopped.)

Images generated from this option will give images something like the following, which maps the semiconductor industry from 1998-2000 (2 generations):




FAQs

Why are there multiple IDs listed for my inventor?

One of the strengths of this database is that the inventors have been "disambiguated," meaning we can tell apart two people named John Smith. However, this process inevitably will have some errors on the side of over-splitting (thinking there are 3 John Smiths) or under-splitting (combining two different people). The database errs on the side of over-splitting, which means that in some cases inventors will have 2-3 (or rarely, more) unique ID signifiers.

When conducting an Inventor Name search, all results for that name will show up. You can click "Sample Patent" to see if this is, indeed, the inventor you mean. If not, you can manually delete the related identifier from the box to the right before pressing Generate.



My visualization didn't load! It got halfway through and stopped.

In principle, this tool can visualize a network of any size, but in practice there are real limits. Once the tool approaches 2,000-3,000 inventors, rendering becomes much, much slower, and the web server sometimes stalls. This can result in you ending up on "processing.php" but with no link to you visualization at the bottom of the page.

If this happens, it is worth retrying once or twice (sometimes once the tool has cached some data, it can get a bit further), but generally you will need to limit the breadth of your search. You can try a shorter time span, fewer starting inventors, or both.

Currently, "date applied" mode sometimes runs slower than "date granted" mode.



What are "patent classes" anway?

By "patent class," we mean the "technology class" that a US Patent and Trademark Office examiner has assigned to this patent. Each patent receives one or more major class designations. For example, a semiconductor might be class 438 ("Semiconductor Device Manufacturing: Process") and 257 ("Active Solid-State Devices (eg Transistors, Solid-State Diodes)").
Within a main technology class, the patent will then receive several sub-class designations that are much more specific. Thus, you might find patent subclass 438/151: "Having insulated gate" (http://www.uspto.gov/web/patents/classification/uspc438/defs438.htm#C438S151000).

You can find the patent classification assigned to a patent on its Google Patent page. For example:

https://www.google.com/patents/US6413802

The number of patents filed per year in a given sub-class will vary widely across technologies, but might be as high as hundreds or thousands.



Why sample the patent class? Doesn't that have a big influence on results?

Since mapping every inventor of every patent in a given class would be overwhelming (both in computing resources to render and human ability to interpret the result), we must find ways to cut the size down to a reasonable level. Sampled results certainly need to be taken in context, but you can generate several versions of the same settings and compare results for yourself. Our experimentation found a high degree of similarities in the end results for most industries and years we tested.



My diagram seems to be cut off in the window!

The tool attempts to guess the appropriate size of render.php, the page that shows your final visualization, but sometimes it guesses wrong. In that case, you can easily adjust the settings in the URL.

Your URL will look something like this: http://patentnetwork.berkeley.edu/inventors/render.php?screen_width=1957.45&screen_height=1957.45&charge=7000&mode=regular

To change the screen size, simply edit those numbers and press "enter." For example, if the screen is far too big, we might change the url to: http://patentnetwork.berkeley.edu/inventors/render.php?screen_width=300&screen_height=400&charge=7000&mode=regular

As you can see, the "screen_width=XXXX" and "screen_height=XXXX" are the values we changed

In the same way, you can adjust the "charge" between the inventors, which decides how far apart each inventor will be.



Why does my inventor have such a small network?

This program captures one important but narrow form of social connection: inventors who have together filed for a patent since the 1970s. Two inventors being connected almost certainly means they worked together closely, but two inventors NOT being connected does not mean a lack of a connection. Conferences, trade publications, scientific journal articles, graduate school connections - all of these are very important social ties that are not directly displayed through this tool. Thus, an inventor might be enormously historically important yet not show up on these diagrams if he never filed patents.

Other possibilities include issues with the formalities of patent filing. Your inventor might have filed one patent under John Smith, another under John P. Smith, another under John Patrick Smith, and another under J P Smith. These will not always show up as one inventor, despite our disambiguation efforts. This would splinter the resulting diagram.



Why can't I map something to 4 degrees of connection or larger?

For the sake of conserving resources, this tool has a limit of 3 generations of co-inventor relationships. If you wish to generate larger diagrams for scholarly or educational purposes, you can contact the tool's creators.