No data retrieval, no data to use
Data retrieval is the first step. It’s about getting the raw material to work on. Data is widely available on the Internet. And everyday you and everbody else do generate more and more data. So do (non profit) organizations and government.
So let’s use this data! But where and how to get it… just type the subject in your favorite search engine and you will end up with a collection of links to data in raw or ready to use format.
Nowadays more and more organizations are offering datasets on their sites in a wide variety of formats. Well known formats are downloadable pdf reports, plain text/tables on html pages, video and audio images, downloadable spreadsheets, Application Programming Interfaces (API).
How to get the data?
Get your hand on the data can be as easy as browsing a a single web pages. Or make it as a complex as using spiders to grab content from millions of web pages.
- Download prepared data sets.
- For example check the Guardian who maintains a fine collection of links to great downloads.
- Collect by doing your own research.
- Search and browse the internet and collect facts on the subject you are looking for.
- Use available Application Programming Interfaces (API)
- Connect to an API and fetch the data by an ad hoc query or a real time connection.
- Scrape the web
- Automatically fetch the contents of one page or of millions of web pages using (custom) software.