Final project
Overview
The course project involves independent work on a topic selected from a menu of project themes. These projects leave freedom for creativity within constraints designed for pedagogical and fair evaluation.
Logistics
- Teams of maximum 2 people are strongly encouraged, in which case you should outline the final report who did what.
- Submit on canvas, a pdf document of maximum 4 pages for a 1 person project, and 6 for a 2-person project, excluding references and appendices. The appendix can have arbitrary length but may not be read in detail during grading.
- Both proposal and submitted project should contain a link to a public git repository containing the source code for generating the report and figures. For team projects, make sure each team member commit their change using their account to document the contributions of each team member.
Project timeline
See schedule.
Core guideline
Every project should contain a component where Bayesian inference is applied to a real problem and a real dataset. This is important: unless you have not received explicit permission in the proposal phase to deviate from this you may be severely penalized if you do not do this!
In addition, each team should pick one of the “project themes” listed below, exploring topics building and going beyond what we will cover during the course. If two project proposals are too similar, I reserve the right to assign changes to one of the projects (typically the one with the last proposal submission date).
After selection of a project theme, you should start hunting for a few real-world candidate datasets appropriate for the selected theme. Two potential datasets should be listed in the proposal. The final report should analyze at least one real dataset.
Some resources:
- Vanderbilt Biostatistics datasets
- TidyTuesday
- Inter-university Consortium for Political and Social Research
- WHO mortality data
- The World Bank Data
- More…
You should not pick a dataset that has already been analyzed using the same approach as you. Provide references for the closest analyses of the same data and explain how they differ from yours.
Rubric for the project proposal
- Basic requirements
- Team is identified.
- The proposal identifies which of the project themes it will address.
- There is a working link to a public repo containing commits from all team members.
- Two real-world candidate datasets appropriate for the selected theme are clearly described (e.g. a URL, showing the structure or
head
of a dataframe). - A short summary of potential approaches to tackle the project theme.
- If it is a team project, the proposal contains a short plan for ensuring the two team members will contribute roughly equally.
As long as the team submits a reasonable project proposal, I will give full grade (5/5) (along with some feedback). I will remove one point if the git requirement is not fully satisfied. Late submission within 2 days will receive max grade of 4/5, and 0/5 after the grace period. Details of the project can change after submission of the proposals. Larger changes are allowed but with my permission, so they should not be discussed at the last minute, i.e. certainly before the last lecture.
Total: 5%
Rubric for the final report
- Basic requirements (5%)
- The report fits within the prescribed page limits.
- There is a working link to a public repo containing commits from all team members.
- The report follows best practices of technical writing.
- If it is a team project, a short paragraph clearly explains and quantifies the contributions of each team member.
- Problem formulation. (15%) The report clearly describes:
- a real-world inference task/problem,
- succinct but sufficient context (e.g. biological terminology) needed to understand the problem,
- the key modelling/methodological/challenge, clearly associating it with one of the items under “menu of project themes” above.
- The report contains a literature review: (10%) relevant literature is cited and properly summarized.
- Data analysis (40%)
- A Bayesian model is precisely described (e.g. using the
.. ~ ..
notation) - Implementation code in the appendix (e.g. using Stan)
- Prior choice is motivated. If appropriate, several choices are compared or sensitivity analysis is performed.
- Critical evaluation of the posterior approximation. An appropriate combination of diagnostics, synthetic datasets and other validation strategies.
- A Bayesian model is precisely described (e.g. using the
- Project theme: methodological/theoretical aspect of the project (20%)
- Is the approach sound?
- Creative?
- Discussion (5%)
- Does the report describe key limitations?
Total: 95%