WGCNA run for female lncRNA. Link to code.

Step by step workflow…

  • Use subsetted count matrices for males and females from the previous DESeq2 workflow
  • Create metadata data frames for ID, sex, and treatment: combined, male, female
  • Round count data to integers and filter out genes with row sums of 0
    • 4750 lncRNAs filtered down to 4552 at this step
  • pOverA setting: Basically using this to choose lncRNAs that are expressed in 50% of samples (since there are two groups control and treatment 50/50). Setting to 0 seems important since there are 2 treatments, meaning some samples would be at 0 if the lncRNA is only environmentally sensitive.
    • Filtered from 4552 to 4391 assuming 1 out of all individuals expressing at least 5 counts is an appropriate metric.
  • Make sure metadata frame and count matrix names match and make sure sample ID headings match the metadata frame
  • Create a DESeqDataSet design from gene count matrix and labels
  • log-transform the data using a variance stabilizing transforamtion (VST). This is only for visualization purposes. Essentially, this is roughly similar to putting the data on the log2 scale. It will deal with the sampling variability of low counts by calculating within-group variability (if blind=FALSE).
  • Calculate the size factors of our samples. This is a rough estimate of how many reads each sample contains compared to the others. Size factors need to be less than 4.
    • All size factors are less than 4
  • VST transformation
  • Plot heatmap to sample distances image
  • PLot PCA by treatment image
  • Transpose filtered gene count matrix so gene IDs are rown and sample IDs are columns
  • Check for genes and samples with too many missing values with goodSamplesGenes. There shouldn’t be any because we performed pre-filtering.
    • TRUE
  • Skipped outlier removal step because we expect large differences in expression since we want to capture as many lncRNAs as possible in this run.
  • Choosing a soft threshold power (β). is the number to which the co-expression similarity is raised to calculate adjacency. The function pickSoftThreshold performs a network topology analysis. The user chooses a set of candidate powers.
  • Use pickSoftThrshold and plot results… image
    • Choose a soft threshold value at the earliest point in the curve where you see plataeu from the initial curve.
    • Chose 7
  • Use soft power value to calculate co-expression adjacency and create a topologinal overlap matrix of similarity.
  • Cluster lncRNA expression data using the topological overlap matrix (TOM)
  • Form distance matrix from TOM dissimilarity and plot genet tree. image
    • Clustering looks good
  • Identify modules of lncRNAs with similar expression patterns.
  • Minimum module size set to 5 image
  • Plot module similarity based on eigengene value image
  • Merge modules with >85% eigengene similarity and plot image
  • Save merged modules
    • 238 modules total
Color Count
aliceblue 7
antiquewhite 7
antiquewhite1 8
antiquewhite2 14
antiquewhite3 6
antiquewhite4 11
aquamarine 5
aquamarine1 5
bisque2 5
bisque3 6
bisque4 18
black 334
blanchedalmond 6
blue 616
blue1 7
blue2 10
blue3 8
blue4 9
blueviolet 9
brown1 8
brown2 10
brown3 7
brown4 13
burlywood 6
burlywood1 6
burlywood2 5
chartreuse4 5
chocolate 5
chocolate1 6
chocolate2 7
chocolate3 7
chocolate4 8
coral 9
coral1 11
coral2 11
coral3 9
coral4 15
cornflowerblue 7
cornsilk 7
cornsilk2 6
cyan 35
cyan1 5
cyan4 5
darkgoldenrod 13
darkgoldenrod1 6
darkgoldenrod3 15
darkgoldenrod4 7
darkgreen 58
darkgrey 21
darkmagenta 16
darkolivegreen 16
darkolivegreen1 8
darkolivegreen2 9
darkolivegreen4 10
darkorange 222
darkorange2 13
darkorchid3 5
darkorchid4 5
darkred 23
darksalmon 6
darkseagreen 7
darkseagreen1 7
darkseagreen2 8
darkseagreen3 9
darkseagreen4 11
darkslateblue 13
darkturquoise 22
darkviolet 10
deeppink 9
deeppink1 8
deeppink2 7
deepskyblue 6
deepskyblue4 6
dodgerblue1 5
dodgerblue2 5
dodgerblue3 6
firebrick 7
firebrick2 8
firebrick3 9
firebrick4 10
goldenrod3 5
goldenrod4 10
green1 6
green2 7
green3 7
green4 8
grey60 29
honeydew 9
honeydew1 29
hotpink3 5
hotpink4 6
indianred 6
indianred1 7
indianred2 8
indianred3 9
  • Prepare trait data. Data has to be numeric, so I will substitute time_points and type for numeric values. Make a dataframe that has a column for treatment and a row for samples. This will allow for correlations between mean eigengenes and treatment. image
  • Correlate traits with eigengenes. Calculate correlations between ME’s and treatment. Calculate kME values (module membership).
  • Generate a complex heatmap of module-trait relationships.
    • heatmap found here
  • Examine mean module eigengene values for modules of interest (those that are significant from the complex heatmap) image
  • Plot mean eigen values by treatment. image

This process allows you to examine modules by treatment, so we look at expression differences more closely. Next step is to use this process on male lncRNA and then male and female mRNA to correlate between lncRNA and mRNA.