Skip to Content
Softagram
  • Home
  • Services
    • ERP
    • AI Services
    • Security
  • Company
  • Contact
  • Blog
    • News
    • Guides
  • Tech
  • Sign in
  • Contact Us
Softagram
      • Home
      • Services
        • ERP
        • AI Services
        • Security
      • Company
      • Contact
      • Blog
        • News
        • Guides
      • Tech
    • Sign in
    • Contact Us

    Accessing raw data produced by Softagram

    Reference
  • All Blogs
  • Oppaat
  • Accessing raw data produced by Softagram
  • August 15, 2020 by
    Accessing raw data produced by Softagram
    Ville Laitila

    Commits data

    git/commits.jsonl.zip data is produced as a side product of analysis. It contains JSON rows such as this:
    ​
    ​["2019-08-13 21:37 +0900", "deadbeef...", "project/repo", ["beefdead"], 1565699847, 1565699847, "dev@acme.org", "Firstname Developer", [["modify", 2]], 1, "commit summary", "full commit msg", [["file_changed.md", [2, ""]]]]

    Explanations of the data fields in commits.jsonl.zip with real life examples:

    "2019-08-13 21:37 +0900",   Datetime string of commit author timestamp
    ​"90ec7996cc4bc1a96410c1794965b8c5e1479f37",    Commit SHA1 from Git
    ​"databrickskoalas/koalas",    Project/repo-name   (repo-name as in Git)
    ​["82e2e410817dc1728f97038f193d823f615d0d6a"],   Parent commits SHA1 list
    ​1565699847,  Git author timestamp of the commit
    ​1565699847,  Git commiter timestamp of the commit
    ​"developer-name@gmail.com",   Author of the commit as in Git log (email)
    ​"Developer Name",    Author of the commit as in Git log (displayname)
    ​[["modify", 2]],     Information about how many lines were modified / created.
    ​1,                               How many files were changed/created
    ​"correct pip installation command (#642)",    Commit msg summary line
    ​"correct pip installation command (#642)\n\n",  Full commit message
    ​[["CONTRIBUTING.md", [2, ""]]]    List of each modified/created/removed file and information on how many lines were modified in each of them.  [2, ""] means  2 lines modified.


    The order of the columns is protected, meaning that new columns will be added to the end if the file content is extended. The order of the lines in the file is not guaranteed to be according to time, but because the timestamp is the first column, it is easy to get it sorted with cat and sort.

    Example how to handle it in Python

    import json

    import json
    for line in open(fname):
        commit_entry = json.loads(line)
        (commit_time, sha, repo_path, parents_sha_list, commit_committed_date, commit_authored_date, commit_author_email, commit_author_name, changed_lines, changed_files, commit_summary, commit_message, commit_impact) = commit_entry
        # Do something with the above fields

    There are large aggregated commit datasets of open source projects, and will offer them without cost for research purposes. If interested, contact us through web page chat.

    Other data sets

    This help article will be extended on a need basis. Please contact us if you have special raw data needs.

    Originally published at help.softagram.com

    in Oppaat
    # Softagram Analyzer

    Designed for companies

    We are a team of passionate people whose goal is to improve everyone's life through disruptive products. We build great products to solve your business problems. Our products are designed for small to medium size companies willing to optimize their performance.

    Ota yhteyttä

    Softagram Oy
    Ketolanperäntie 469 
    90450 ​Kempele 

    • +358504836173
    • info@softagram.com
    Follow us
    Copyright © Softagram Oy
    Powered by Odoo - Create a free website