To select sequences for the first N-zip library, only genes with significant enrichment (adjusted P < 0.05) in the analyzed primary neuronal datasets were considered as candidates to generate a list of genes with reliable neurite localization. This selection was further restricted to genes for which an enrichment value could be calculated in our PCN system (log2FC not NA). Then, genes with (1) a significant enrichment in at least four datasets; (2) median log2FC > 1; and (3) either mean log2FC > 1 or a positive log2FC value in all datasets were chosen. Additionally, genes with a significant enrichment value in at least five datasets and either median log2FC > 1 or mean log2FC > 1 or a positive log2FC value in all datasets were chosen as well.
This initial unbiased set of genes was then manually refined by (1) excluding genes encoded by the mitochondrial genome as well as some genes with the annotated nuclear or mitochondrial function (Pola1, Ezh2, Smc4, Cenpb, Pink1 and Ncl); (2) adding genes with a known zipcode or neurite localization sequence (Camk2a94 (link), Actb5 (link), Bdnf9 (link), Arc11 (link),95 (link), Cdc42 (ref. 25 (link)), Map2 (ref. 96 (link)) and Bc1 (ref. 97 (link)); (3) adding genes that showed localization in non-primary31 (link) and in-house datasets as well as our PCN and fewer other primary datasets (Rab13, Net1, Hmgn5, 2410006H16Rik, Pfdn5, Tagln2, Pfdn1 and Cryab); and (4) restricting the genes encoding for ribosomal proteins and translation factors to a smaller subset with sufficiently large 3′ UTRs (Rplp2, Rpl12, Rpl39, Rpl37, Rpl14, Rps28, Rpsa, Rps24, Rps23, Rps18, Eef1b2, Eef1a1 and Eef1g).
Free full text: Click here