Freebase
Start typing to get some suggestions
  • Explore
  • Use
  • Build
  • Developers

        Discussions on lukeschubert

        lukeschubert » Discuss

        Start a New Discussion

        Discussion will be posted in:

        • lukeschubert

        Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

        • General Support,
        • Developer Support
        1.  

          Proteins & Genes - Mouse & Human

          also posted to
          • Sorcin,
          • SRI,
          • Protein,
          • Gene,
          • Biology
          12 posts, latest post: tfmorris, 3 days ago
          Link to discussion
          1. tfmorris Top Contributor Freebase Experts
            Oct 14, 2009
            tfmorris says:

            I'm just learning all this stuff, but I'm confused about how the proteins and genes are being modeling in relation to each other. When I look at the Sorcin gene and Sorcin protein, they don't seem to mesh very well. I understand that genes typically get named with the same name as their associated protein, even though they're two different things.

            • Aren't the NCBI ID (/biology/gene/ncbi_id) and the Entrez ID (biology/protein/entrez_gene_id) the same thing? Consistent terminology (and documentation!) would help make this clear.

            • Why does the protein link to the mouse gene (only)?

            • Why is the linkage done via an external identifier, rather than directly?

            I know Luke has been working on related areas (particularly cleaning up duplicates), so I've copied him on the discussion. Are there others doing active work in this space?

            1. tfmorris Top Contributor Freebase Experts
              Oct 14, 2009
              tfmorris says:

              I forgot one:

              • Should the gene symbols and alternate symbols (e.g. SRI for Sorcin) be added to the aliases list to make them easier to find and duplicates less likely to be created?
            2. lukeschubert Freebase Experts
              Oct 16, 2009
              lukeschubert says:

              I agree with all these points. In particular, I think we need a direct linkage between proteins and genes.

              I don't know if we've captured any mouse genes inside Freebase, though that information is present in Wikipedia and probably in NCBI - I think we should! Maybe creating a "Mouse genome" topic and "Mouse chromosome x" topics would be a good start ... So while Wikipedia conflates human genes and mouse genes, I think we should keep them separate.

            3. tfmorris Top Contributor Freebase Experts
              Oct 21, 2009
              tfmorris says:

              Dan (druderman) apparently tried to reply too, but his response got trashed by the system :-(

              https://bugs.freebase.com/browse/FREEBASE-1127

              As far as mouse genes go, the Sorcin protein Entrez gene id: 109552, is actually a mouse gene, so there are already mouse genes in Freebase (although this sounds like it's a bug since the schema says these should be human proteins).

              Wikipedia has a single page for all three (protein, human gene, mouse gene), but they do list the mouse gene and human gene ids separately, so they're not really conflated. The Wikipedia page was changed (yesterday!) to say that Sorcin is the protein and SRI is the gene, but Entrez http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=ShowDetailView&TermToSearch=6717 gives Sorcin as the "Official Full Name" of the gene and SRI as the symbol. The mouse gene (http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=ShowDetailView&TermToSearch=109552) uses the same terms.

              Dan - if you're still listening and you want to email me a reply (@gmail), I'll make sure it gets posted.

            4. tfmorris Top Contributor Freebase Experts
              Oct 21, 2009
              tfmorris says:

              Dan (/user/drunderman) emailed the attached to post here:

              I agree with keeping the genes for different organisms separate. The data model that appeals to me most is one of collecting scientific statements rather than amassing "facts". So, for example, in a particular human genome build a given genomic locus (given by chromosome, strand, and base range) is identified as a given named gene. This annotation may someday change (e.g. it may turn out to be a pseudogene). Similarly, someone may have defined an orthology between genomic loci in two different organisms, thus equating them with a single gene name. Again, this is subject to change. What's nice about Freebase is that you can define a compound value type which links these inter-organismic loci as an "Orthology" (a Freebase Type one would create), and, importantly, that orthology would have attached to it some data about who said it. Note that not everyone may agree on the same set of orthologs! So keeping track of who said what is important.

              What we will end up with over time is a dynamic view of the genome and its annotations as they change over the years. This way we can mine knowledge across time rather than just referring to the current view of the "truth".

            5. tfmorris Top Contributor Freebase Experts
              Oct 21, 2009
              tfmorris says:

              Darn embedded markup! That was just supposed to be a dashed line separating my preface, not a giant heading.

              I love Dan's model. It aligns with thoughts I've had about expressing research conclusions (his "scientific statements"). One key piece, I think, is to develop a stronger citation model and practices to link the "in Freebase" with the "outside Freebase."

            6. lukeschubert Freebase Experts
              5 days ago
              lukeschubert says:

              Why don't we start with adding a property to Gene of "Protein encoded" and the corresponding (reciprocal) property to Protein of "Encoded by"? (I assume that each Gene encodes one Protein but each Protein could be encoded by multiple Genes, e.g. mouse and human.)

              I do like the idea of an Orthology type - can we work on that later?

            7. druderman Top Contributor
              4 days ago
              druderman says:

              Might be best to define a compound value type for this linkage so it is clear how the correspondence between gene and protein was arrived at. I have some experience with this so I'm happy to help with the schema.

              What information are you thinking of using to relate protein to gene?

              Dan

            8. lukeschubert Freebase Experts
              4 days ago
              lukeschubert says:

              The Entrez Gene database seemed like a good starting point.

              If you want to suggest a schema, Dan, that would be great.

            9. druderman Top Contributor
              4 days ago
              druderman says:

              Entrez Gene sounds good. Looks like the NCBI ID that I placed with each gene is the same as Entrez Gene (please correct me if I'm wrong). Can you point me to the online data source you'd like to use which maps between mouse and human?

              As for a schema, in the big picture I'd like to eventually include genomic locations of exons, mRNA transcripts, and then their corresponding protein products. But that's longer term. For now we might want to simply create a CVT which links protein entry to gene entry and is annotated as an identity based on Entrez Gene ID. So the CVT would have at one end the gene and at the other end the protein. We could either define a general CVT for linking gene to protein and then a more specific CVT which is an Entrez Gene ID link. Or we could just have a CVT for gene to protein and then a flag within that CVT which explains the link (e.g. some text, like "Entrez Gene ID link").

              Other ideas?

              Dan

            10. lukeschubert Freebase Experts
              3 days ago
              lukeschubert says:

              Sounds good to me. (I always prefer starting with the simpler scheme and then building up gradually to the more complex.)

              I don't know of any data sources that map between mouse and human (except for Wikipedia!) - I was hoping that the linkage would fall out of the data.

            11. tfmorris Top Contributor Freebase Experts
              3 days ago
              tfmorris says:

              Homologene is supposed to show cross-species gene connections, I think. Here's the entry for the human SRI gene:

              http://www.ncbi.nlm.nih.gov/sites/entrez?Db=homologene&Cmd=Retrieve&list_uids=37736&log$=seqview_homolog

              At the general level, the proteins produced by these genes have the same name:

              Human gene http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=ShowDetailView&TermToSearch=6717#geneGeneral%20protein%20info Mouse gene http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=ShowDetailView&TermToSearch=109552#geneGeneral%20protein%20info

              but it looks like at the detailed level, even the proteins are tracked separately by species. This link has a cluster of 18 "different" proteins across 11 different species.

              http://www.uniprot.org/uniref/?query=member%3aP30626+identity:0.9

              I'm not sure what level this stuff needs to be modeled at. Also, the more I look at the various databases that are available, the more I wonder what value Freebase would add to the ecosystem. What gaps need to be filled?

          Discussion is posted in:

          • close Sorcin
          • close SRI
          • close Protein
          • close Gene
          • close lukeschubert
          • close Biology

          Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

          • General Support,
          • Developer Support
        2.  

          Number of items in review queue

          3 posts, latest post: lukeschubert, Oct 26, 2009
          Link to discussion
          1. lukeschubert Freebase Experts
            Oct 24, 2009
            lukeschubert says:

            Can be found by running this query: http://www.freebase.com/app/queryeditor/?q={%20%22type%22%3A%20%22%2Fpipeline%2Ftask%22%2C%20%22status%22%3A%20%22open%22%2C%20%22return%22%3A%20%22count%22%20}

            Query Editor: { "type": "/pipeline/task", "status": "open", "return": "count" }

            (thanks to pak21)

            1. pak21 Freebase Experts
              Oct 24, 2009
              pak21 says:

              Or it's listed at the top of the expert hub :-)

            2. lukeschubert Freebase Experts
              Oct 26, 2009
              lukeschubert says:

              I should check the expert hub more often ...

          Discussion is posted in:

          • close lukeschubert

          Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

          • General Support,
          • Developer Support
        3.  

          Query for NCBI Taxon ID

          2 posts, latest post: lukeschubert, Jul 27, 2009
          Link to discussion
          1. lukeschubert Freebase Experts
            Jul 27, 2009
            lukeschubert says:

            Sample query here.

            1. lukeschubert Freebase Experts
              Jul 27, 2009
              lukeschubert says:

              I started at the beginning ...

          Discussion is posted in:

          • close lukeschubert

          Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

          • General Support,
          • Developer Support
        4.  

          Australian Senators - members

          1 post, latest post: lukeschubert, Dec 1, 2008
          Link to discussion
          1. lukeschubert Freebase Experts
            Dec 1, 2008
            lukeschubert says:

            A brief list can be found here.

          Discussion is posted in:

          • close lukeschubert

          Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

          • General Support,
          • Developer Support
        5.  

          NCBI link

          1 post, latest post: lukeschubert, Jul 31, 2008
          Link to discussion
          1. lukeschubert Freebase Experts
            Jul 31, 2008
            lukeschubert says:

            Sample NCBI link here:

            http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=7742

             

          Discussion is posted in:

          • close lukeschubert

          Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

          • General Support,
          • Developer Support
        6.  

          Tshirt

          1 post, latest post: skud, Jun 12, 2008
          Link to discussion
          1. skud Metaweb Staff
            Jun 12, 2008
            skud says:

            Hey, I'd like to send you a Freebase tshirt in recognition of your contributions here.  Could you email me your tshirt size and postal address to kirrily@metaweb.com?  Thanks!

          Discussion is posted in:

          • close lukeschubert

          Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

          • General Support,
          • Developer Support
        7.  

          Thanks for the Australian libraries

          5 posts, latest post: lukeschubert, May 19, 2008
          Link to discussion
          1. skud Metaweb Staff
            Apr 17, 2008
            skud says:

            Hey, thanks for the Australian library adds today.  I was just adding SLNSW and SLVIC and noticed yours.  Wonder if we can find a list of local libraries or library systems to import?

            1. lukeschubert Freebase Experts
              Apr 18, 2008
              lukeschubert says:

              No problem :)  I've started with those already in Freebase from Wikipedia, but was going to move on to other SA ones soon.

              Not sure if there's one list for all of Australia.  I just found this for South Australian libraries; this only allows a map-based search over all Australian libraries ... 

              How's this for Victorian libraries? 

            2. skud Metaweb Staff
              Apr 18, 2008
              skud says:

              Yeah I did find that Victorian search. 

               I think it would be best to enter library systems and
              build out branches from there; when I went to enter "Richmond Library" (my local one when I was in Melbourne) I found that there were lots of similarly-named ones in the US, and without any properties to distinguish my one from the others, it would just be an orphaned topic with little use.

              http://www.libraries.vic.gov.au/librarylocator/az.cgi?listServices=1 looks like a not-bad list of public library *systems* in Vic.

               You know about the list import tool right?  Please tell me you do.  It would make this way easier.

               

            3. doconnor Top Contributor Freebase Experts
              May 18, 2008
              doconnor says:

              Hey Luke, want to help me flesh out the SA libraries with more information?

              By the way, are you http://www.perl.com/pub/au/Schubert_Luke ?

            4. lukeschubert Freebase Experts
              May 19, 2008
              lukeschubert says:

              Sure.  Any libraries in particular?  Last time I checked there were still a few missing ...

              Yep, that's me.

          Discussion is posted in:

          • close lukeschubert

          Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

          • General Support,
          • Developer Support
        8.  

          And hi, btw

          2 posts, latest post: lukeschubert, Sep 17, 2007
          Link to discussion
          1. skud Metaweb Staff
            Sep 9, 2007
            skud says:

            Just thought I'd introduce myself as a fellow Australian Freebaser. Do you IM at all? Would love to have someone close to my timezone to natter with about freebase stuff.

            K.

            1. lukeschubert Freebase Experts
              Sep 17, 2007
              lukeschubert says:

              Hi - yes, always good to see more Australians here.

              I'll drop you an email - not sure if I want my email address on Freebase yet. :)

          Discussion is posted in:

          • close lukeschubert

          Think this discussion also relates to something else? Cross-post it by adding a new discussion area:

          • General Support,
          • Developer Support

        Search Discussions

        Related Discussions

        • Sorcin
        • SRI
        • Protein
        • Gene
        • Biology
        ©2009  Metaweb
        • Page History
        • RDF
        • Feedback
        • Attribution Policy
        • Terms of Service
        • About Us
        • Jobs
        • Freebase Blog
        Freebase contains information on:
        • Arts & Entertainment
        • Products & Services
        • Science & Technology
        • Society
        • Special Interests
        • Sports
        • Time & Space
        Dev Tools
        Refresh cache | Query Editor | Normal view | Explore | Explore2 | Admin view | View transaction log | Suggest transaction log | Client transaction log | hide (F8) | debug-level
        TID(s):
        Controller: 0.507s
        Template: 0.205s
        Cost: br=27.0, cc=0.572, ch=0.0, cm=0.0, cm+h=0.0, cr=0.0, cs=6.0, cw=3.0, dr=7857.0, dt=0.829, dw=0.0, gqr=0.0, in=5030.0, ir=67.0, iw=0.0, mcs=0.0, mcu=0.104, minflt=79.0, mr=3.0, nivcsw=210.0, nreqs=9.0, nvcsw=40.0, pf=0.0, pr=0.0, stime=0.003, te=0.096, tf=0.204, tg=0.192, tm=0.304, tr=0.052, ts=0.0, tu=0.044, utime=0.568, va=21712.0