Patent Co-Inventor Network Visualization Tool
User Guide for the Patent Co-Inventor Network Visualization Tool
The purpose of the Patent Co-Inventor Network Visualization Tool is to visualize a particular type of connection among inventors: those who have filed a patent together, and therefore have worked closely together on at least one project. There are a number of reasons this could be useful:
The goal is to be able to see around a given technological field in an intuitive way, quickly and easily.
Inventor Name search will look up the inventor IDs from our database for any given name. This will erase the default values the first time you do it. Any additional times you use the search, it will simply add the new search results onto the old ones.
Inventors will likely have multiple IDs, because our algorithm tries to keep track of separate inventors with identical names. If it looks like this search has found the wrong person, simply remove the unwanted inventor ID from the comma-separated values box before pressing 'Generate.'
Patent Class Search (Sampled) finds everyone patenting in a particular technology class (within the chosen year), using US Patent and Trademark Office technology classifications. These can be found at http://www.uspto.gov/web/patents/classification/ or via the Classification section of a Google Patent page. Then, because this will be a huge number of patents for many fields, it chooses a random sampling of these inventors. The random sampling then serves as the starting point, and sets off finding co-inventors.
Patent Sub-Class Search finds everyone patenting in a particular technology sub-class. This also will often create a very large diagram, so be sure to limit the time frame as much as possible.
Each of these options is described in more detail below.
The data underlying this visualization is the Fung Institute for Engineer Leadership's patent database, which draws and processes data from the United States Patent and Trademark Office (USPTO).
The data covers 1976 to the present. Unfortunately, before that year the USPTO did not require the same type of data from patent applicants, so this is the current standard. Google Patents goes back further, but even its earlier data relies upon scanned documents and is much less usable further back in time.
The Fung Institute dataset has been processed and disambiguated, so in theory, each individual inventor should have one unique ID assigned. This means you can differentiate John Smith from Wisconsin and John Smith from Arkansas as two separate people, but still know that John Smith from Wisconsin is the same person after he moves to Nebraska. In practice, this process is still imperfect, but it nevertheless makes this data set uniquely powerful at tracking the movement of inventors.
A full, technical description of the data can be found at this following page: Description of Data
Bulk downloads of this data for research purposes are also possible: Bulk Downloads
How it Works
The tool, in theory, can map any number of layers of co-inventors' co-inventors' co-inventors' co-inventors' .... In practice, we limit to 3 generations, as described below, because the results get unwieldy quickly.
We start from an initial set of inventors, using the unique inventor IDs in our database. (In practice, each inventor may have multiple IDs, since names are not always uniform in the database, despite efforts to disambiguate the results).
When you first load the page, there are default seed inventor IDs already entered.
One "generation" (co-inventors):
The tool finds all patents filed by the seed inventors within the time frame entered. For example, John Doe might have filed for patents 1002043 ("Better mousetrap") and 43421110 ("Best mousetrap") during this window. If the inventors listed on "Best moustrap" are John Doe, Thomas Jefferson, and Albert Einstein, the network will create three connections:
John Doe: Thomas Jefferson
It repeats this process for each patent. Note that it does not (at this point) care whether Jefferson and Einstein co-invented on one patent or one hundred.
The following diagram consists of 7 patents filed by a total of 13 inventors:
You can see everyone with whom our seed inventor (Chenming Hu) filed a patent as a co-inventor in this window (1998-2000), and also the relationships among these other patentees.
Two "generations" (co-co-inventors):
At two generations, we take things a step further. Now the tool treats all of the names it found in Step 1 above (one "generation") as seed inventors, and runs through the process again. It finds all of the patents they applied for in the chosen time frame, then finds all inventors affiliated with each of those patents.
The initial picture is still the same, but expanded out another degree of co-inventorship.
We're beginning to see that there were entire networks of co-inventors out there, but that they have very little in common – other than their affiliation with Chenming Hu, our seed inventor.
Three "generations" (co-co-co-inventors):
By the time we get to three generations, the number of inventors being mapped can get very large indeed.
The size of the dots representing an inventor are proportional to the number of times that the inventor's patents have been cited by other patents as "prior art." Thus, an inventor who had 10 patents, each of them cited by 10 different future patents, will be the same size as an inventor with one patent cited by 100 others. By far, the most common number of future cites a patent will receive is 0.
The color of the dots is determined by the assignee on his most recent patent (within the chosen dates).
The 10 assignees represented most often in the final data are automatically assigned different colors. Each inventor's dot is then assigned the color of this most recent assignee, as a proxy for affiliation.
The network treats each inventor like a charged particle, repelling all others to which it is not linked. That means that it can take some time for the network to sort itself out into a coherent picture.
If it seems that the image is unnecessarily complex, you can "untangle" the results manually. To do this, simply use your mouse to drag a specific inventor in the direction of your choice. Connected inventors will be drawn along.
Ignore the default settings if you wish to create a map for a new inventor (or inventors). Simply enter a first name and last name into the "Search by Inventor Name" box and press 'Search.'
The first time you search for a name, it will replace the defaults. Any subsequent names you search for will be ADDED to earlier results. Thus, if you want to create a network diagram using three inventors as the seed inventors, simply search for them one after another.
When you search for a name, the tool will do its best to find all possible variations on the name (ie, both Chen Ming and Chenming). If one of the names it finds seems incorrect, just delete the corresponding Inventor ID from that box before you generate the social network.
Once you're ready, press "Generate" and you're set!
Since mapping every inventor of every patent in a given class would be overwhelming (both in computing resources to render and human ability to interpret the result), we must find ways to cut the size down to a reasonable level. (See: My visualization didn't load! It got halfway through and stopped.). This mode offers one form of compromise between the "big picture" of a technology class and the constraints of effective visualization.
It takes a random sampling of 10 samples per month (120 for a year) for a chosen year within a chosen patent class, then uses the inventors on these patents as the initial (or "seed") inventors for the visualization.
The end result, then, will vary somewhat each time you press "Generate." We encourage users to run the same settings 2-3 times to see if it you get more or less consistent results. Depending on the density of patenting within that patent class, how often inventors move between firms, and the average size of patenting teams in the industry, you might find that you get very disconnected results, or you might find a tight web. This tool is likely most useful for showing change in an industry across time, or giving a rough snap-shot, rather than a definitive picture of an industry.
Like with the sampled patent class search, this is an alternate way of keeping the amount of data to a reasonable level while giving a meaningful broader perspective. In this case, instead of looking across an entire patent class, it looks across every inventors filing in a patent sub-class. Sub-classes are quite specific, but we can see every patent in that area within a given time frame.
As you will notice, this usually generates many seed inventors, sometimes hundreds even for a given year. This is why we recommend only 1-2 generations and a short time frame. (See: My visualization didn't load! It got halfway through and stopped.)
Images generated from this option will give images something like the following, which maps the semiconductor industry from 1998-2000 (2 generations):
One of the strengths of this database is that the inventors have been "disambiguated," meaning we can tell apart two people named John Smith. However, this process inevitably will have some errors on the side of over-splitting (thinking there are 3 John Smiths) or under-splitting (combining two different people). The database errs on the side of over-splitting, which means that in some cases inventors will have 2-3 (or rarely, more) unique ID signifiers.
When conducting an Inventor Name search, all results for that name will show up. You can click "Sample Patent" to see if this is, indeed, the inventor you mean. If not, you can manually delete the related identifier from the box to the right before pressing Generate.
In principle, this tool can visualize a network of any size, but in practice there are real limits. Once the tool approaches 2,000-3,000 inventors, rendering becomes much, much slower, and the web server sometimes stalls. This can result in you ending up on "processing.php" but with no link to you visualization at the bottom of the page.
If this happens, it is worth retrying once or twice (sometimes once the tool has cached some data, it can get a bit further), but generally you will need to limit the breadth of your search. You can try a shorter time span, fewer starting inventors, or both.
Currently, "date applied" mode sometimes runs slower than "date granted" mode.
By "patent class," we mean the "technology class" that a US Patent and Trademark Office examiner has assigned to this patent. Each patent receives one or more major class designations. For example, a semiconductor might be class 438 ("Semiconductor Device Manufacturing: Process") and 257 ("Active Solid-State Devices (eg Transistors, Solid-State Diodes)").
You can find the patent classification assigned to a patent on its Google Patent page. For example:
The number of patents filed per year in a given sub-class will vary widely across technologies, but might be as high as hundreds or thousands.
Since mapping every inventor of every patent in a given class would be overwhelming (both in computing resources to render and human ability to interpret the result), we must find ways to cut the size down to a reasonable level. Sampled results certainly need to be taken in context, but you can generate several versions of the same settings and compare results for yourself. Our experimentation found a high degree of similarities in the end results for most industries and years we tested.
The tool attempts to guess the appropriate size of render.php, the page that shows your final visualization, but sometimes it guesses wrong. In that case, you can easily adjust the settings in the URL.
Your URL will look something like this: http://patentnetwork.berkeley.edu/inventors/render.php?screen_width=1957.45&screen_height=1957.45&charge=7000&mode=regular
To change the screen size, simply edit those numbers and press "enter." For example, if the screen is far too big, we might change the url to: http://patentnetwork.berkeley.edu/inventors/render.php?screen_width=300&screen_height=400&charge=7000&mode=regular
As you can see, the "screen_width=XXXX" and "screen_height=XXXX" are the values we changed
In the same way, you can adjust the "charge" between the inventors, which decides how far apart each inventor will be.
This program captures one important but narrow form of social connection: inventors who have together filed for a patent since the 1970s. Two inventors being connected almost certainly means they worked together closely, but two inventors NOT being connected does not mean a lack of a connection. Conferences, trade publications, scientific journal articles, graduate school connections - all of these are very important social ties that are not directly displayed through this tool. Thus, an inventor might be enormously historically important yet not show up on these diagrams if he never filed patents.
Other possibilities include issues with the formalities of patent filing. Your inventor might have filed one patent under John Smith, another under John P. Smith, another under John Patrick Smith, and another under J P Smith. These will not always show up as one inventor, despite our disambiguation efforts. This would splinter the resulting diagram.
For the sake of conserving resources, this tool has a limit of 3 generations of co-inventor relationships. If you wish to generate larger diagrams for scholarly or educational purposes, you can contact the tool's creators.