Investigative Journalism

 

From http://www.tcij.org/training-material/car Also look at http://www.tcij.org/links

How to find media email addresses

 

CAR

Computer-assisted reporting (CAR) remains one of the biggest advances in the past 20 years for investigative reporting. This section has information about how CAR can assist you in your work.

CAR covers two main areas, data mining and online research. We have also included sections on anonymity, as protecting your identity and data when online becomes more important.

data mining
Excel 1
Access
SQL – to come

online investigating
finding people
advanced search
finding hidden documents
finding website owners
automated web browsing
anonymity

published stories

All of the following stories came from analysing data produced by Freedom of Information Act enquiries.

Elena Egawhary (front page) in The Guardian, July 2007.

Heather Brooke in The Times, December 2007.

The Fire Brigades Union in Metro, February 2008.

finding people

From finding whistleblowers or experts in esoteric fields, there are a number of methods that can both improve accuracy and save time for investigators.

This section has information about the ‘hidden web’; and those subscription, free, and non-indexed sources (including directories and archives) that will help you to find contributors online.

Bespoke people finders

These are the most useful tool if you have the name of the person you are looking for but want more information such as contact details or articles they’ve written.

192.com www.192.com (subscription)

Or try one of the free (albeit less robust) alternatives:

123 people www.123people.com
Pipl www.pipl.com
Yasni www.yasni.com
Yoname
www.yoname.com

Search engine functions (in Google)

Below are some of the advanced functions available to the searcher, for more detailed ways of searching, including by document type, see newsgathering.

The domain function: site:
When using the domain function there are three elements involved, if you were looking for an academic expert you would used the following:

  • subject term/s

  • the term connecting the subject to his/her profession (ie expert, department, professor etc.)

  • the domain function: site: eg.ac.uk (for UK universities)

Compare:
“solvent abuse” professor site:ac.uk
With:
expert solvent abuse

And compare:
fraud ~data professor site:.ac.uk
With:
expert data fraud

Specifying the subject, level of expertise and limiting the search to academic urls can make finding experts much quicker.

A full list of top-level domains can be found at NORID domains.

Google’s cached option will show you the page as it was when it was first indexed, so you won’t miss out on your terms if the page has changed.

You can also use the domain function to find local pressure groups/nimby groups/associations and non-commercial bodies. For example, when looking for pressure group/s opposed to the building of phone masts.
Compare:
group “telephone masts” opposed
With:
group “telephone masts” opposed site:.org.uk

You can also use the domain function to find discussions (and hence contributors) in Facebook, and other social networks.
See:
“I worked” “lehman brothers” site:facebook.com/topic

You can only do this through Google, not within the Facebook search, see the Slewfootsnoop blog for more information.

Finding contributors via social networks
Facebook is one of the most popular social networking sites. Many of its users are interested in international social and political issues, and some are experts in their field – the site contains groups based on themes and issues from around the world. Try searching
ecology society site:facebook.com

Likewise, Myspace has similar groups – try searching their groups for alternative energy.

Other social networks are popular in different parts of the world. For example, if you have a Google account and are interested in finding contributors from South America – give Orkut a try. It’s very popular in Brazil and India.

Likewise, Badoo is more popular in the rest of Europe than in the UK, and they are even making an effort to progress in the Russian Social Network market.

But perhaps the best place to start is amongst those services which allow online communities to create their own social networks. Ning is a good example of this.

If you are looking for professional communities then LinkedIn is probably a good place to start. See the Slewfootsnoop blog for a comparison between LinkedIn and Facebook for finding people.

It may be possible to find contributors and potentially useful actuality from photo-sharing network Flickr – try this tag-search for local pollution.

Technorati is currently the best known search engine for blogs. An alternative is Google’s Advanced Blog Search.

You can also used the advanced search on Twitter

Contributor finding in pre-web 2.0 sources
Try using Google scholar to keep up-to-date on the latest academic findings and experts in your field.

Amazon advanced search is also a great place to find experts around a subject matter.

And don’t discount the various forums, and boards people use to express themselves, and flag up issues worth investigating – you can even create your own search engine to track people who contribute to different online forums.

Contributor Finding Online – Murray Dick – July 2009

useful links
ProfNet
A database or communications professionals and PR people.

advanced search

This section outlines how you can make your searching more accurate. It is taken from notes and lectures by Murray Dick.

narrowing searches
You can tighten your search results using the following:
AND: (implicit)
OR: blair wmd OR weapons
NOT: rangers -qpr
phrase search: which is the “richest bank in the UK” (try with and without quotes).

Wildcard: Google doesn’t support the wildcard in the way it is conventionally used in other search engines – MSN, Yahoo or Exalead – it uses automatic stemming.

Nevertheless, you can use a * in phrase-searching. Google treats the * as a placeholder for a word or more than one word, where you want to do an expansive search. For example, “corruption in the * industry” expert can help you find experts in corruption in different fields.

The plus sign (+) allows you to stop Google from stemming your words – if you are interested in a word in a particular case. It can also be applied stop Google finding reference to certain words that link to (rather than feature in) the pages you are searching from, when viewing cached content. Lastly, it can be applied to media sources, allowing you to search stories about a specific company in Google News.

Synonyms (~) for example: ~marriage will find references to love, marriage, romance etc.

It’s worth bearing in mind that other engines offer an even broader range of search operators. Exalead, for example, permits atleast and proximity searching. Their atleast function allows the searcher to find pages that feature a term prominently, which can be useful when you are searching for backgrounders on people or issues.

The proximity search function allows the searcher to find terms which occur close to each other, which can be useful when trying to unearth connections between people and events in the news.

This API seach allows proximity searching in Google results, albeit only where the terms you wish to find are no more than three words apart. There’s more information about how API searching can help in journalistic research on the Slewfootsnoop blog.

the occurrences function: intitle:
This is used for finding reliable backgrounders, however, bear in mind that standards in metadata vary widely. Think about what is included in the professional sites’ web page titles. For example, if you want to find background information (analysis, not news, professional not amateur) about Somalia’s troubled political history:
Compare:
somalia crisis background
With:
somalia crisis intitle:Q&A

Instead of background you could try: depth, comment, analysis or brief.

You can search for this terms in the url using the following search:
inurl: Somalia analysis

searching through documents
By specifying the type of file you want to search within you can tighten your search even further. Financial information is more likely to be held in and excel spreadsheet that a web page, so limiting the search to within this type of file produces more accurate results:
Compare:
house prices Greenwich
With:
“house prices” greenwich 2007 filetype:xls
Also try switching format to Powerpoint (filetype:ppt) for finding experts – they are likely have demonstrated their expertise in presentations.

languages
You can also make use of the language selector in advanced search for article.
Compare:
scudetto “silvio berlusconi” (with and without filter switched to English).

links
This is how you find out who is linking to a site which can highlight bias, or partisanship. In Google advanced search go to ‘date, usage rights, numeric range’ and copy the url of the site you are checking where it says ‘find pages that link to the page’. Other useful tools for doing this are Back Link Watch and iwebtool.

You can find out more about searching for hidden documents elsewhere on this site.

Contributor Finding Online – Murray Dick – July 2009

useful links
Search Engine Watch
Provides data and ratings on the different search engines.

Startpage
Claims to be the world’s most private search engine as it does not record your IP address.

A9
Searches e-commerce websites

Internet Archive
Also known as the Wayback Machine, this is a digital library of web sites as they used to be.

Reseach clinic
Features links, tools and study material for professional researchers. The site accompanies courses delivered by the BBC’s Internet research specialist, Paul Myers.

finding hidden documents

hidden documents

People often leave sensitive and confidential information online. Sometimes this information is carelessly deposited in forums and social networking websites.

At other times, information is made available because of system errors or carelessly configured servers. In any case, search engines find it – and so can you, if you know how to look for it.

Video feeds from private security cameras, confidential medical records, personal resignation letters, and a host of other information have been unintentionally left for you to find.

Useful search terms
define:imbroglio – gives you the meaning of a word
links:www.tcij.org – tells you the links to a site
phonebook:smith – lists the Smiths in the US phonebook
site:www.bbc.co.uk iraq – restricts the search to the site given
castle ~glossary – provides a glossary of the given word
site:gov – search is limited to government sites

Advanced searching
The Google guide of advanced search techniques (pdf) will provide more detailed ways of searching for information and you can find more information on search techniques on the web, including YouTube, by searching for “Google hacking” or “Google hacks”. However, be very careful what you download as the documents could have viruses – to be safe, open any documents in Microsoft Writer or Open Office.

The searches listed below are of the most interest to journalists.

filetype:pdf
filetype:xls
filetype:doc
filetype:txt

Using this search you can look within a specific document for information, for example, you are more likely to find financial information by looking through excel files. You can use any file extension to narrow your search – jpg, wav, ppt, avi etc.

Combine search terms to narrow results further: filetype:xls “house prices” and specify the location with + London.

or exclude terms, for example filetype:doc “security plan” -guidelines will exclude guidelines from the results adding site:gov will restrict the search to government sites.

Security cameras
Typing “inurl:viewerframe mode motion” into a Google search will give a list of security cameras that can be viewed.

This page is based on a training session by Mike Schrenk – November 2009.

 

finding website owners
   

website owners


This page looks at what information you can find from a website address. It is taken from notes from a course by Mike Schrenk

domain name system
Through the DNS (domain name system) you can find information such as who a web url is registered to and when it was registered. You can do this using the IP (internet protocol) address of the site you want information about.

What is my IP will tell you your IP address.

Network Tools by typing the url of a website under look-up you can find the IP address and some basic information such as where the site is registered.

For example www.bbc.co.uk has the IP address 212.58.253.68 you can then use of the regional internet registries (below) to find out more information.

Domain Tools is another site for finding out about a site based on the url. By using the ‘Who is’ function, you can find out who owns the site, the contact details and when the ownership expires etc.

Look up Server offers another option to find information on a DNS as do All Who Is and Reverse DNS lookup

regional internet registries
IP addresses are registered regionally and also provide useful information:
Afrinic – Africa
Apnic – Asia Pacific
Ripe – Europe the Middle East and parts of Central Asia
Lacnic Latin America and Caribbean
Arin – North America

This page is based on a training session by Mike Schrenk – November 2009.

automated web browsing

This handout is a supplement to the full presentation given by Mike Schrenk at the cij summer school 2009.

The full presentation is available at: http://www.schrenk.com/cij

Online research often requires repetitive downloading of web pages. That process – along with extracting information found on websites, is tedious and error prone. Screen scraping and iMacros allow journalists to automate the process of computer aided research.

screen scraper

A screen scraper is a software that conducts automated browsing activities on the internet. A primary purpose of a screen scraper is to extract information from websites.

iMacros

iMacros is a browser plug-in that lets you to write ‘macros’ which are ‘pseudo’ programming tools that allow the automation of standard programs (like browsers).

iMacros is available for Internet Explorer and Firefox. I have had better results with Firefox and highly recommend its use over Internet Explorer.

Location for iMacros download (for Firefox)
https://addons.mozilla.org/en-US/firefox/addon/3863

initiating iMacros

The iMacros button in Firefox is located in the browser tool bar next to the url.

resources

Firefox download page https://addons.mozilla.org/en-US/firefox/addon/3863

iMacros home page http://www.iopus.com/

iMacros command reference http://wiki.imacros.net/Command_Reference

iMacros user forums http://forum.iopus.com/

Demo website http://www.schrenk.com/cij/imacros_demo.php

command reference

The following is a lists of all available iMacros commands. Each command has either zero or more parameters. If parameters can be omitted, they are enclosed by square brackets. If several
choices are possible for the same parameter, they appear in brackets and are separated by the | character. Integer numbers are denoted by the letters n or m, all other name denote a series of
characters (strings).

The ‘ character indicates a comment. If a line starts with ‘ everything behind the ‘ is ignored. Typically this is used for comments or to disable specific parts of a macro.
Note: a macro cannot have empty lines, as an empty line indicates the end of the macro. So every line in the macro must have at least the comment symbol.

ADD result_var added_value
Adds a value to a variable.

BACK
Opens the previously visited web page.

CLEAR
Clears browser cache and cookies on the hard drive.

CLICK X=n Y=m [CONTENT=some_content] “Clicks” on the element at the specified X/Y coordinates.

CMDLINE variable default_value
Sets the variable to a value retrieved from the command line.

DISCONNECT
Disconnects the current dial-up connection.

EXTRACT POS=[R]n TYPE=(TXT|HREF|TITLE|ALT) ATTR=Anchor*
Extracts data from websites.

FILEDELETE NAME=file_name
Deletes a file.

FILTER TYPE=IMAGES STATUS=(ON|OFF)
Filters web site elements. Currently the support for filtering is experimental. If you need any other data filtered, please let us know what kind of filter you would like to see added.

FRAME F=n
Directs all following TAG or EXTRACT commands to the specified frame.

IMAGECLICK IMAGE=image_file

CONFIDENCE=n [CONTENT=some_content]
Sends a WINCLICK command to the specified image.

IMAGESEARCH IMAGE=image_file CONFIDENCE=n
Searches for the input image specified via the IMAGE attribute.

ONCERTIFICATEDIALOG C=n
Selects the client side certificate from a dialog.

ONDIALOG POS=n BUTTON=(YES|NO|CANCEL) [CONTENT=some_content]
Handles JavaScript dialogs.

ONDOWNLOAD FOLDER=folder_name FILE=file_name
Handles download dialogs.

ONERRORDIALOG BUTTON=(YES|NO) CONTINUE=(YES|NO)
Handles error dialogs.

ONLOGIN USER=username PASSWORD=password
Handles login dialogs.

ONSECURITYDIALOG BUTTON=(YES|NO) CONTINUE=(YES|NO)
Handles security dialogs.

ONWEBPAGEDIALOG KEYS=some_keys
Handles web page dialogs.

PRINTPrints the current browser window.

PROMPT prompt_text variable_name [default_value]
Displays a popup to ask for a value. This value is stored in the variable.

PROXY ADDRESS=proxy_URL:port [BYPASS=page_name]
Connects to a proxy server to run the current macro.

REDIAL ISP
Redials a connection.

REFRESH
Refreshes (Reloads) current browser window.

SAVEAS TYPE=(CPL|MHT|HTM|TXT|EXTRACT|BMP) FOLDER=folder_name FILE=file_name
Saves information to a file.

SET variable_name variable_value
Assigns values to built-in variables.

SIZE X=n Y=m
Resizes the iMacros Browser Window.

STOPWATCH ID=id

TAB T=(n|OPEN|CLOSE|CLOSEALLOTHERS)
Sets focus on the tab with number n.

TAG POS=n TYPE=type [FORM=form] ATTR=attr [CONTENT=some_content]
Selects a webpage element.

URL GOTO=some_URL
Navigates to a URL in the currently active tab.

VERSION BUILD=4213805
Specifies the version of iMacros that created this macro.

WAIT SECONDS=(n|#DOWNLOADCOMPLETE#)
Waits for a specific time.

anonymity


The ability to remain anonymous can liberate journalists and facilitate research that would otherwise be impossible. The internet provides unique opportunities to conduct serious research while protecting your identity.

Anonymity becomes more important as regimes place added restrictions on journalists’ ability to speak freely. Regardless of the measures governments take, however, journalists are still able to publish stories through the use of “proxies”.

Why anonymity is useful to journalists

Hiding your identity while doing research
Anonymous browsing techniques may protect your identity and thereby provide greater access while conducting research.

Allowing you to perform repetitive research
Anonymity may protect you if performing automated or repetitive research tools.

Pretending that you are somewhere else
With certain techniques, you can conduct research while appearing to be doing so from another country.

Protecting your sources
Your sources may use anonymity techniques to either protect their identity or to make their story possible.

Defeating national digital defenses
Anonymity techniques can defeat national firewalls and get information out to the rest of the world.

Rights to anonymity
Nations have varying views of anonymity and anonymous use of the internet. In the United States, the Supreme Court has ruled repeatedly that the right to anonymous free speech is protected by the First Amendment. A much-cited 1995 Supreme Court ruling in McIntyre v Ohio Elections Commission reads:

“Protections for anonymous speech are vital to democratic discourse. Allowing dissenters to shield their identities frees them to express critical, minority views… Anonymity is a shield from the tyranny of the majority… It thus exemplifies the purpose behind the Bill of Rights, and of the First Amendment in particular: to protect unpopular individuals from retaliation… at the hand of an intolerant society.”

A number of nations tightly control access to websites and other online resources.

An introduction to the internet

Your IP address may identify:
Your country (location)
Your organisation, through reverse DNS look-ups www.lookupserver.com
Possibly you!

Anonymous email
Anonymous email is possible via a product called Nyms, which allows the creation of disposable email addresses via the Nyms network.

Proxies
Proxies act as intermediaries and protect your identity there are different types of proxies from different sources:

Open proxies are servers that either intentionally (or because of misconfiguration) allow people to connect through their network, and assume one of their network IP addresses. Open proxies are best avoided.

An example of website that lists open proxies is
www.xroxy.com/proxylist.htm

There are also commercial proxies that do a better job. For example, Anonymizer

Tor
Another proxy alternative is the Tor project. Tor is the proxy network that facilitates journalism from some of the most hostile environments in the world. It is free software and an open network that helps protects against a form of network surveillance that threatens personal freedom and privacy, confidential business activities and relationships, and state security known as traffic analysis.

Tor was originally developed for the US Navy for the primary purpose of protecting government communications. Today, it is used every day for a wide variety of purposes by the military, journalists, law enforcement officers, activists, and many others.

It protects you by bouncing your communications around a distributed network of relays run by volunteers all around the world: it prevents somebody watching your internet connection from learning what sites you visit, and it prevents the sites you visit from learning your physical location. Tor works with many of your existing applications, including web browsers, instant messaging clients, remote login, and other applications based on the TCP protocol.

Installing Tor
Tor for Firefox optimising Tor in Firefox
Installing a proxy in your browser

From the Tor site
How Tor works
Getting up to speed on Tor’s past, present, and future
Download Tor

This page is based Mike Schrenk’s talk at the CIJ Summer School – July 2009.

useful links
Pretty Good Privacy
Computer program that encrypts files and documents on hard drives. Can be used for emails.

Scramdisk
Computer program that encrypts hard drives.

 

 

 

 

 

 

 

 

From http://www.tcij.org/training-material/car

CAR

Computer-assisted reporting (CAR) remains one of the biggest advances in the past 20 years for investigative reporting. This section has information about how CAR can assist you in your work.

CAR covers two main areas, data mining and online research. We have also included sections on anonymity, as protecting your identity and data when online becomes more important.

data mining
Excel 1
Access
SQL – to come

online investigating
finding people
advanced search
finding hidden documents
finding website owners
automated web browsing
anonymity

published stories

All of the following stories came from analysing data produced by Freedom of Information Act enquiries.

Elena Egawhary (front page) in The Guardian, July 2007.

Heather Brooke in The Times, December 2007.

The Fire Brigades Union in Metro, February 2008.

 

How to Find Media Email Addresses
Is exactly what it says.