Siirry sisältöön
Softagram
  • Etusivu
  • Palvelut
    • Toiminnanohjaus
    • Tekoälypalvelut
    • Tietoturvapalvelut
  • Yritys
  • Ota yhteyttä
  • Uutiset ja oppaat
    • Uutiset
    • Oppaat
  • Tech
  • Kirjaudu sisään
  • Ota yhteyttä
Softagram
      • Etusivu
      • Palvelut
        • Toiminnanohjaus
        • Tekoälypalvelut
        • Tietoturvapalvelut
      • Yritys
      • Ota yhteyttä
      • Uutiset ja oppaat
        • Uutiset
        • Oppaat
      • Tech
    • Kirjaudu sisään
    • Ota yhteyttä

    Accessing raw data produced by Softagram

    Reference
  • Kaikki blogit
  • Oppaat
  • Accessing raw data produced by Softagram
  • 15. elokuuta 2020 kirjoittanut
    Accessing raw data produced by Softagram
    Ville Laitila

    Commits data

    git/commits.jsonl.zip data is produced as a side product of analysis. It contains JSON rows such as this:
    ​
    ​["2019-08-13 21:37 +0900", "deadbeef...", "project/repo", ["beefdead"], 1565699847, 1565699847, "dev@acme.org", "Firstname Developer", [["modify", 2]], 1, "commit summary", "full commit msg", [["file_changed.md", [2, ""]]]]

    Explanations of the data fields in commits.jsonl.zip with real life examples:

    "2019-08-13 21:37 +0900",   Datetime string of commit author timestamp
    ​"90ec7996cc4bc1a96410c1794965b8c5e1479f37",    Commit SHA1 from Git
    ​"databrickskoalas/koalas",    Project/repo-name   (repo-name as in Git)
    ​["82e2e410817dc1728f97038f193d823f615d0d6a"],   Parent commits SHA1 list
    ​1565699847,  Git author timestamp of the commit
    ​1565699847,  Git commiter timestamp of the commit
    ​"developer-name@gmail.com",   Author of the commit as in Git log (email)
    ​"Developer Name",    Author of the commit as in Git log (displayname)
    ​[["modify", 2]],     Information about how many lines were modified / created.
    ​1,                               How many files were changed/created
    ​"correct pip installation command (#642)",    Commit msg summary line
    ​"correct pip installation command (#642)\n\n",  Full commit message
    ​[["CONTRIBUTING.md", [2, ""]]]    List of each modified/created/removed file and information on how many lines were modified in each of them.  [2, ""] means  2 lines modified.


    The order of the columns is protected, meaning that new columns will be added to the end if the file content is extended. The order of the lines in the file is not guaranteed to be according to time, but because the timestamp is the first column, it is easy to get it sorted with cat and sort.

    Example how to handle it in Python

    import json

    import json
    for line in open(fname):
        commit_entry = json.loads(line)
        (commit_time, sha, repo_path, parents_sha_list, commit_committed_date, commit_authored_date, commit_author_email, commit_author_name, changed_lines, changed_files, commit_summary, commit_message, commit_impact) = commit_entry
        # Do something with the above fields

    There are large aggregated commit datasets of open source projects, and will offer them without cost for research purposes. If interested, contact us through web page chat.

    Other data sets

    This help article will be extended on a need basis. Please contact us if you have special raw data needs.

    Originally published at help.softagram.com

    in Oppaat
    # Softagram Analyzer

    Älyä toiminnanohjaukseen

    Olemme intohimoinen tiimi, jonka tavoitteena on parantaa kaikkien elämää mullistavien tuotteiden avulla. Kehitämme loistavia tuotteita liiketoimintasi ongelmien ratkaisemiseksi. Tuotteemme on suunniteltu pienille ja keskisuurille yrityksille, jotka haluavat optimoida suorituskykynsä.

    Ota yhteyttä

    Softagram Oy
    Ketolanperäntie 469 
    90450 ​Kempele 

    • +358504836173
    • info@softagram.com
    Seuraa meitä
    Copyright © Softagram Oy
    Järjestelmää pyörittää Odoo - Luo ilmainen verkkosivu