Read123456789: reading research papers
(For a video lecture on this page, see here.)
As technologies change, technologists need to continually update their technical knowledge. The problem with that is that reading all the latest research is very hard. Working through complex technical papers is a complex and technical task. For example, if you ask new graduate students to read ten papers in a particular sub-field:
- It can take a full day to read the first paper.
- But after reading ten papers, they can do it much faster.
Since reading is so important, the rest of this page offers:
The assignment can be used two ways:
- For a fast-path assignment for newbies, lecturers could assign
- For a more advanced and longer assignment, lecturers could assign parts
How to Read Papers, Faster
There are four keys to reading papers faster:
- Rhetorical Strategies : Understanding the rhetorical strategies taken by the authors.
- Terminology: Having a working background knowledge of the half-dozen key terms in a paper.
- Context: Experts can read papers faster when they know of other work in the field and can place this new paper into the context of other work.
- Feature extraction: Experts are experts at anything since they know what to look for, and what can be skipped over. This is true for many tasks, including reading:
- Experts do not read entire papers, word for word.
- Rather, they hunt and peek looking for certain key features (which we number below as 1 to 19).
Feature extraction, details
- Which can be exploded into various parts...
- ... any of which might be repurposed in other areas.
To put that another way, we should not read papers but we should survey them, to
- Map out their internal structure
- To find and extract whatever parts might be useful to use.
Of course, once we find the (little) bits that we really want to use, then we might spend hours/days struggling to understand those (small) parts. But otherwise, we need to read over papers, not through them.
Here is a list of what we might find within a paper:
|1.Motivational statements||reports or challenge statements or lists of open issues that prompt an analysis;|
|2.Hypotheses||Expected effects in some area;.|
|3.Checklists||Used to design the analysis (see also, the Checklist Manifesto ;.|
|4.Related Work||Comprehensive, annotated, and insightful (e.g. showing the development or open areas in a field);.|
|5.Study instruments||e.g. surveys interview scripts, etc;.|
|6.Statistical tests||Mathematical tools to analyze results (along with some notes explaining why or when this test is necessary);.|
|7.Commentary||About the scripts used in the analysis;.|
|8.Informative visualizations||e.g. Sparklines http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msgid=0001OR .|
|9.Baseline results||Results against which new work can be compared;.|
|10.Sampling procedures||e.g. ``how did you choose the projects you studied?'';.|
|11.Patterns||describing best practices for performing this kind of analysis; .|
|Anti-patterns||describing cautionary tales of ``gotchas'' to avoid when doing this kind of work;|
|12.Negative results||Anti-patterns, backed up by empirical results;.|
|13.Tutorial materials||Guides to help newcomers become proficient in the area. Some of these tutorial materials may be generated by the researcher and others may be collected from other sources..|
|14.New results||Guidance on how to best handle future problems..|
|15.Future work:||Based on the results, speculations about open issues of future issues that might become the motivation for the next round of research.|
Here of items that are usually too large to add to a paper, but which a paper might list as an external resource:
|16.Data||Used in an analysis; either raw from a project; or some derived product.|
|17.Scripts||used to perform the analysis (the main analysis or the subsequent statistical tests or visualizations; e.g. the Python Sparklines generator or code for a fast a12 test. Scripts can also implement some of the patterns identified by the paper.|
|18.Sample models||Can generate exemplar data; or which offer an executable form of current hypotheses. Or, these models could be a set of standard problems everyone shares (e.g.the verification comminity and optimization community have libraries of standard models (or models ported from commercial apps) that they all use to baseline their tools)|
|19.Delivery tools||Things that let other people automatically rerun the analysis; e.g. + Config management files that can + build the system/ paper from raw material and/or + update the relevant files using some package manager + Virtual machines containing all the above scripts, data, etc, pre-configured such that a newcomer can automatically run the old analysis.|
Rhetorical Strategies: details
Parts of a Paper
The following notes on "parts of a paper" are taken from the excellent notes by Tim Sheard and Todd Leen.
When reading a paper, take care to note:
- Items 1 to 19, listed above.
- Comments on:
- The thesis being investigated
- The contribution
- The method of investigation
- The “power” of the results
- The applicability of the results
- Summary of the technical development
- Details of any examples
So a first pass of a paper, skim over to find
- The abstract, (to determine relevance to determine kind of paper);
- Pictures tables, graphs, and diagrams concepts (just to get the big picture);
- Any of the items 1 to 19 listed above;
- References (do you recognize them?)
Swales' Three-Move Model
The following notes on "Swales' Three-Move Model" are taken from the excellent notes by James Luberda.
The following is based upon an empirically-derived model of how “real-world” research article introductions commonly proceed:
- Note that it is not a set of rules, but rather something of a guide as to what readers of research articles and academic essays are likely to expect (and find), a set of patterns in introductions that facilitate their reading and comprehension.
- You might think of each “move” below as a kind of verbal action—a “move” a writer will make to have a particular effect on the reader.
Move 1 Establishing a territory
- In this opening move, the writer may do one or more of the following to broadly sketch out where the subject of his/her essay falls—the “big picture”
- Point out the importance of the general subject
- Make generalizations about the subject
- Review items of previous research
Move 2 Establishing a niche
- In this move, the writer then indicates to the reader the particular area of the broader subject that the essay will deal with. This can be done using one or more of the following:
- Make a counter-claim, i.e. assert something contrary to expectations
- Indicate a gap in the existing research/thinking
- Raise a question about existing research/thinking
- Suggest the essay is continuing a tradition, i.e. it is following in the footsteps of previous research/thinking
Move 3 Occupying the niche
- In this move, the writer then sketches out exactly what this particular essay will accomplish in relation to move2, and gives the reader a sense of how the essay will proceed. In general, each of the steps below will appear in this move, in order:
- Step 1: Outline the purpose of the essay, or state the research that was pursued
- Step 2: State the principal findings of the essay—what the reader can expect the essay/research will have accomplished for them by the time they get to the end
- Step 3: Indicate, roughly, the structure of the essay—what will appear in it and in what order
Exercises In Reading Faster
Note that, at first, it will take hours to read one paper. However, after a couple your reading will speed up dramatically. So do not be discouraged if, at first, this is ridiculously slow.
Part1: Learn Historical Context
In the following, anything shown in italics is explained below.
- One: Find a highly cited paper from the automated software engineering literature
- Find some source of highly cited papers
- Do not review any paper from your own institution (so, fear not, you don't have to review the lecturer's paper)
- For students of general software engineering, start with the International Conference on Software Engineering
- For students of automated software engineering, start with the International Conference on Automated SE
- Pick any 2011 paper and summarize some of its parts.
- Find some source of highly cited papers
- Two,Three,Four,Five: Explore context, backwards
- Find four papers in the One's reference list
- That date 2008 to 2010
- That are highest cited (Note that recent papers have less cites than older papers). + Walk them backwards in time, summarizing some of their parts
- Find four papers in the One's reference list
- By summarize parts we mean write 500 to 1000 words on text:
- Starting with a clear reference to the paper. + e.g. Tim Menzies, Burak Turhan, Ayse Bener, Gregory Gay, Bojan Cukic, and Yue Jiang. 2008. Implications of ceiling effects in defect predictors. In Proceedings of the 4th international workshop on Predictor models in software engineering (PROMISE '08).
- Write down the four most important keywords in the paper, plus a two line definition of each.
- Label them ii1, ii2, ii3, ii4
- Offer very brief notes on any four of the items listed as 1 to 19 (above).
- Label them iii1, iii2, iii3, iii4
- Write down three ways the paper could be improved.
- Label them iv1, iv2, iv3
- For Two,Three,Four, etc also comment on the connection to the other papers.
- Do you know how long 1000 words is? About as long as this page. So you want to write something half this size.
- You goal is being able to generate such a summary in thirty minutes:
- It is unlikely you will reach this goal until after you have read numerous papers.
- To find highest cited papers, look up items from the reference list in the week1 paper paper in scholar.google.com (or dl.acm.org/ or
ieeexplore.ieee.org) and count their citations. For example, looking up
"Mining metrics to predict component failures" in scholar.google.com produces:
Looking bottom, you can see Cited by 527. If you click there, you find many others published since the first paper:
Google scholar sorts these top-to-bottom most-to-least cited (so the most cited papers are shown at top). So,
- To find the highest cited papers that cite the week1 paper, look up your week1 paper in scholar.google.com (or dl.acm.org/ or ieeexplore.ieee.org) and count their citations.
Part2: Identify reusable data
- Six: For any paper in the above sequence, report any reusable data.
- To report any reusable data, try to fill in the form here. Hand in either: - A page shown what you entered from those fields - Or an explanation why your kind of papers do not generated data of the kind that can be entered here.
Part3: Track advances.
- Seven,Eight,Nine: Explore context, forwards + Find three papers that cite the One paper + That date 2012 to 2015 + That are highly cited (Note that recent papers have less cites than older papers).. + Walk them forwards in time, summarizing some of their parts
Part4 (one big essay)
Take all the above and summarize the procession of research 2008 to 2015 of some automated software engineering issue.
- 10 pages, 2 columns, using the Word or Latex formats shown in this page.
- Include at least 20 references, eight of which you studied above while the others are related work (or, indeed, far flung work that you think should be connected to your eight but , so far, no one has done so).
- Mention as many as possible of items listed 1 to 19, above.
Note that, for this essay, the keyword definitions you generated above will become the core of your related work section.
For full marks:
- Through out your text, comment on how eight of these nine papers improved (failed to improve, ignored, extended, refined) the issues mentioned in an early paper.
- End with your own recommendations of the path from here. Mention the issues that are now retired, that no one has retired, that someone should retire, or that no one should even try to retire.
Note: if the papers you studied above proved to be dull, fell free to start again with some other 2011 paper from here. Note that, by the time you get to Part4, it will take you less than a day to work through eight papers (it may even just take you one afternoon).