A technical, examiner-aligned guide for students in Economics, Psychology, Biology, Business, and World Studies.

Data collection is the backbone of a strong Extended Essay. Examiners consistently reward essays that demonstrate rigorous methodology, clear justification of data sources, and the ability to link evidence to argument. Weak or poorly justified data leads to superficial analysis and lower marks—even if the writing is polished.

Below is a subject-specific guide on how to plan, gather, evaluate, and integrate data, with technical considerations and concrete examples.

1. Start With a Methodology That Fits Your Research Question

A strong EE begins with an RQ that implicitly defines:

  • the type of data needed
  • the units of analysis
  • the variables
  • the methodology

Example RQs and their data implications:

Before collecting anything, the student must explicitly define:

  • Independent variable
  • Dependent variable
  • Measurement instruments
  • Data collection window / sampling frame

This forms the foundation of a credible methodology section.

2. Primary vs Secondary Data: When to Use What

A. Primary Data: High Control, High Validity

Particularly relevant for: Psychology, Biology, ESS, some World Studies.

Primary Data Methods:

  • Experiments
  • Field measurements
  • Surveys (with rigorous design)
  • Structured observations
  • Interviews (with coding schemes)

Example (Psychology):

A working-memory experiment might collect:

  • n = 30 participants
  • 2 conditions (Sleep-deprived vs control)
  • DV = Digit Span score
  • Statistical test = Independent t-test

Technical Note:
IB expects validated psychological instruments.
E.g., Digit Span Task, Stroop Test, n-back tasks—not invented questions.

Example (Biology):

For a nitrate concentration experiment:

  • IV: Nitrate concentration (0, 5, 10, 20 mg/L)
  • DV: Relative Growth Rate (RGR)
  • Controlled variables: temperature, light intensity, pH, nutrient composition
  • 3–5 replicates per condition

Technical Note:
Data must be quantitative, repeatable, and include raw data tables.

B. Secondary Data: High Reliability, High Breadth

Useful for: Economics, Business, Global Politics, World Studies.

Strong Secondary Data Sources:

  • Government databases (DOS, MTI, IMF, World Bank)
  • Academic journals (Google Scholar, JSTOR)
  • Health datasets (MOH, CDC, WHO)
  • Company annual reports
  • Market reports (Statista, Euromonitor)

Example (Economics):

For a policy impact question:

  • Yearly sugar import volumes
  • Retail sugar prices
  • Household expenditure survey
  • Soda sales data
  • Nutri-Grade compliance reports

Possible techniques:

  • Price elasticity estimation
  • Before–after analysis
  • Regression (if appropriate, but must be justified thoroughly)
  • Trend decomposition

Technical Note:
IB examiners favour students who justify why each dataset is appropriate and discuss its limitations and biases (e.g., proxy variables).

Example (Business Management):

RQ on whether R&D spending improved product innovation:

  • R&D expenditure from annual reports
  • Number of product launches
  • Innovation awards
  • Revenue breakdown by category
  • Patent filings

Possible analyses:

  • Correlation analysis
  • R&D intensity ratios
  • Trend analysis
  • Gross margin changes associated with new products

3. Technical Criteria for “IB-Quality” Data

The IB implicitly grades data quality based on the following:

A. Validity

Data must measure the specific concept intended.

Example:
Using “Instagram likes” as a proxy for “consumer demand” is invalid for a Business EE.

B. Reliability

Data should be consistent and replicable.

  • In Biology → repeated trials
  • In Psychology → standardised tests
  • In Economics → official statistics
  • In Business → audited financial statements

C. Representativeness

Especially important for survey-based EEs.

Example:
A survey of 20 classmates is not representative for consumption patterns of Singapore’s entire youth population.

You must define:

  • sample size
  • sampling method
  • justification of sample demographics

D. Ethics

Required for human studies (Psychology, World Studies, Economics surveys).

  • informed consent
  • anonymity
  • no minors without parental approval
  • no deception (Psychology rule)
  • minimal risk procedures

The EE must include a brief ethical statement.

4. Data Collection Techniques by Subject

A. Economics

Best datasets:

  • time-series macro indicators
  • cross-sectional survey data
  • price and quantity data
  • regulatory or policy documents
  • company market share records

Advanced techniques:

  • QD–QP trend analysis
  • Demand curve approximations
  • Calculating PED, XED, or YED
  • Interrupted time series (before/after policy)
  • Simple regressions (only when justified properly)

Example:
Analysing the impact of ERP rates on vehicle flow using LTA traffic volume data.

B. Psychology

Common designs:

  • experimental (most common)
  • correlational (with validated instruments)
  • quasi-experimental

Data must be numerical and statistically analysable:

  • reaction times
  • memory scores
  • Likert scales (validated only)

Avoid:

  • personality quizzes written by the student
  • unvalidated “stress tests”
  • extremely small sample sizes (e.g., <15)

C. Biology

Data collection must follow scientific rigor:

  • replicate each experiment 3–5 times
  • record environmental conditions
  • present raw data with uncertainties
  • calculate means and standard deviations
  • use statistical tests (t-test, chi-square, ANOVA where appropriate)

Example:
Measuring chlorophyll concentration using a spectrophotometer and converting absorbance values using the Beer–Lambert law.

D. Business Management

Students rely mostly on secondary data:

  • financial ratios
  • market share
  • growth rates
  • innovation indexes
  • operational KPIs

Good EEs show multi-angled analysis:

  • investor reports
  • SWOT but only as evidence-supported
  • competitive benchmarking
  • cost-benefit interpretation backed by numerical data

E. World Studies

Mixed-method research is common:

  • policy analysis
  • environmental data
  • epidemiological statistics
  • interviews with experts
  • cross-city comparisons

Example:
Comparing PM2.5 levels pre- and post-congestion charge in Singapore vs London, using WHO Air Quality Database.

5. How Much Data Is “Enough”?

IB examiners prefer:

  • Deep analysis of a moderate dataset
    over
  • Superficial analysis of a massive dataset

Rule of thumb:

  • 3–5 strong datasets (Economics, Business)
  • n ≥ 20 participants (Psychology)
  • 3–5 experimental conditions × 3 replicates (Biology)

6. Presenting Your Data Correctly

Technical checklist:
✔ clear units
✔ axis labels and legends
✔ error bars where relevant
✔ raw data tables in the appendix
✔ identification of anomalies
✔ justified statistical tests
✔ integrated commentary (NOT isolated charts)

Conclusion

High-quality data collection transforms the EE from a descriptive essay into a rigorous academic investigation. When students:

  • choose the correct methodology,
  • justify every dataset,
  • demonstrate statistical or economic reasoning, and
  • acknowledge ethical and methodological limitations,

they consistently score in the top bands.